Wednesday, November 25, 2009

Scaling up is out, scaling out is in

One of the more interesting, if less visible, trends in the past half-decade has been that clock speeds on modern CPUs have stagnated. I'm writing this post on my Macbook, which turns one year old next week. It's equipped with a 2GHz processor and 2GB of RAM. It's the first computer I've bought since 2002, when I built a ~1.2GHz Athlon system with 1GB of RAM. Instead of a factor of 10 faster in 8 years, it's a factor of less than two, and I don't think we're going to see much more in terms of clock speed in the future. Check out the graph below:



Since around 2002 clock speeds have held steady at about 2GHz. The primary constraint has been thermal. As processors moved into the multi-GHz range they started to dissipate up to 100W of heat, which becomes impractical to cool (ever had your legs burned by your laptop?). "Scaling up" clock speeds had hit a wall. So hardware engineers had to focus on other ways of making things faster. They did some increasingly clever things like superscalar execution (dispatching multiple instructions per clock cycle), new specialized instructions (SSE, etc), hyperthreading (a single processor appearing as two processors to the OS), then on to the logical conclusion of multi-core (multiple CPU dies in a single package). Performance now comes from "scaling out" to multiple cores, and if you're running a service, multiple machines.

The consequence of this shift from faster clock cycles to more processors has been that after decades of sitting on their asses and waiting for the next doubling of clock speeds to make up for their lazy coding, software engineers have to actually write code differently to get it to run fast. This could mean traditional optimization, re-writing existing code to run faster without fundamentally changing the approach to the problem. But increasingly it means taking advantage of the way hardware is evolving by writing code to take advantage of multiple cores by splitting the problem into independent pieces that can be executed simultaneously.

To some degree the service we're building at Kikini can naturally take advantage of multiple cores, since we're serving many simultaneous requests. However, due to the transactional nature of databases, there is a limit to how much performance you can get by simply adding more cores. Write operations require locks which cause other transactions to fail, so even if you had infinite cores you'd still be constrained by how your design your database.

All this points to three main ways to achieve high performance:
  1. Optimize individual queries
  2. Design queries and the database schema to minimize locking to take advantage of multiple cores
  3. Partition data in clever ways to spread the load across multiple servers
Fundamental to this is to always be measuring, which is why it's important to have an automated system like I described earlier this month so that engineers can stay focused on the important stuff.

No comments:

Post a Comment