Since around 2002 clock speeds have held steady at about 2GHz. The primary constraint has been thermal. As processors moved into the multi-GHz range they started to dissipate up to 100W of heat, which becomes impractical to cool (ever had your legs burned by your laptop?). "Scaling up" clock speeds had hit a wall. So hardware engineers had to focus on other ways of making things faster. They did some increasingly clever things like superscalar execution (dispatching multiple instructions per clock cycle), new specialized instructions (SSE, etc), hyperthreading (a single processor appearing as two processors to the OS), then on to the logical conclusion of multi-core (multiple CPU dies in a single package). Performance now comes from "scaling out" to multiple cores, and if you're running a service, multiple machines.
The consequence of this shift from faster clock cycles to more processors has been that after decades of sitting on their asses and waiting for the next doubling of clock speeds to make up for their lazy coding, software engineers have to actually write code differently to get it to run fast. This could mean traditional optimization, re-writing existing code to run faster without fundamentally changing the approach to the problem. But increasingly it means taking advantage of the way hardware is evolving by writing code to take advantage of multiple cores by splitting the problem into independent pieces that can be executed simultaneously.
To some degree the service we're building at Kikini can naturally take advantage of multiple cores, since we're serving many simultaneous requests. However, due to the transactional nature of databases, there is a limit to how much performance you can get by simply adding more cores. Write operations require locks which cause other transactions to fail, so even if you had infinite cores you'd still be constrained by how your design your database.
All this points to three main ways to achieve high performance:
- Optimize individual queries
- Design queries and the database schema to minimize locking to take advantage of multiple cores
- Partition data in clever ways to spread the load across multiple servers