One of the great things about our culture is constant innovation. I can honestly say that new products and services have made my life better in some way in 2009, and I'd like to call those out as a way of congratulating the people and companies who created them. None of the following actually came about in 2009, but 2009 was the year that I started to use them.
Amazon Kindle
The first time I saw a Kindle was in the Microsoft 117 Cafe, when Jerry Lin joined us for lunch and shared the latest gadget he'd received from Amazon. Jerry is known to be a prolific Amazon customer, notorious for receiving daily deliveries to his office (actually, he was rarely at work before the typical 11am-2pm delivery time, relying on the patience of neighboring officemates to sign for his packages). So it was no surprise to learn that his latest toy was a Kindle (version 1), which he demoed for us. The E Ink screen provides a reading experience much closer to paper than a computer display, making it less stressful on your eyes. But this atypical display, combined with the curious vertical silver reading position indicator and scrollwheel, make the device look like something envisioned in the 1960s: a bizarre amalgamation of analog and digital.
Nevertheless, the merits of the device were quite clear: newspapers, magazines, and books delivered wirelessly to a device with up to two weeks of battery life and a size and weight more compact than a single book. As a frequent (often international) traveler, I could immediately see the value of this. One of the less expected things about my time living in China was that I began to really miss the simple pleasure of reading. With something like the Kindle I could have an entire bookstore at my fingertips, anywhere in the world. Jerry did admit a downside: the available book selection, while large at hundreds of thousands, did not always contain the book you are looking for. But on the other hand the selection is large enough that you'll never run out of books you want to read. Jerry told us he had returned all the books he had bought from Amazon in recent years (which Amazon admirably allowed) and repurchased them on Kindle.
I promptly ordered a Kindle, which at the time (late 2008) was on "back order." In fact, it was no longer in production, and all pending orders were upgraded to the superior version 2. In Feb 2009 I finally received my Kindle 2.
The Kindle has changed my reading habits. The form factor makes the reading experience more pleasant, especially compared to reading large hardcover books. Wireless delivery of The Economist, which arrives Friday morning like clockwork, is a thousand times more reliable than receiving the same in the mail. By mail The Economist would arrive sometimes Friday, often times Saturday, and disappointingly often on Monday, which would leave me without enough time to read a full issue before the next issue arrived. The availability of an iPhone client and the capability of the Kindle and the iPhone to sync last-read positions makes it possible to read on-the-go without missing a beat. The overall result is that with the Kindle, I find myself reading more.
I'd like to take a moment to emphasize this last point, and note that the same has been true every time a medium has evolved, despite criticism from those who oppose or fear change. When the phonographic record gave way to the CD, many viewed this as a step backward for recorded music. They moaned that the digital CD could never capture the nuances of an analog record, and the small packaging made album art less relevant. The truth is that the CD is capable of storing and playing back audio with a fidelity that comfortably exceeds the capability of most humans to perceive. The loss of a few square inches of medium for album art is regrettable, but it was never something that was important to the experience of music, much less central to it. But besides affording listeners a higher fidelity listening experience, the slimmer, smaller CD enabled listening to music not just in the home, but in the car and on the sidewalk and in the subway in a way that records or tapes never could. CD recorders enabled people to make flawless copies of their collections for their cars or public transit commutes. Listening to music was now a ubiquitous feature of life. I don't need to point out how this became even more true with the advent of MP3 encoding and devices like the iPod. It is far, far easier to count people not wearing earbuds on the subway or bus than counting those with. All this in the face of the exact same tired criticism from the same old critics.
As it was with music and CDs/MP3s, so it is and will be with books and eBooks. Yes, eBooks as they exist today have lower fidelity compared to paper. Devices like the Kindle 2 support only 16 shades of black and white, and dealing with images and photographs is clunky. If anyone doubts that these problems will be solved in the next couple years along with the inevitable march of technological progress, I'm prepared to back up my confident words with a wager. However, I doubt any readers would dare bet against this. And let's look at what even the primitive readers allow today: reading essentially your whole library at any time and place. No longer do I have to choose which single book to take with me on a trip, nor need I attempt to stuff a 600 page hardcover in my laptop bag to read on the bus. All the books I own and thousands that I don't are available to me in a convenient package. And even if I find myself waiting in a lobby for 20 minutes without my Kindle, I have my iPhone, were the book I'm reading is waiting for me, synced to the page I last read on my Kindle.
The Kindle represents more than just a cool device and a premium reading experience. I'm sorry if you like the smell of ink, or the texture of paper, or displaying your book collection on shelves as though they were trophies. The Kindle represents the beginning of a resurgence in reading, making books and newspapers and knowledge much easier for everyone to obtain. After all, that's what reading is about, right?
Zipcar
Zipcar offers members by-the-hour car rentals in urban areas. Scheduling is done online or via an iPhone app, and can be done mere minutes before you get the car. Members use a special magnetic card (or the iPhone app) to lock and unlock the car; keys and a gas card are inside. In most cases the cost is less than $10/hr, which gets you a standard compact like a Honda Civic, or a light utility vehicle like a Scion xB, and includes mileage, insurance ($500 deductible) and gas. In urban centers, garages containing zipcars are located every few blocks.
The result is that for people who live in urban areas that have fairly good public transportation and where car ownership is prohibitively expensive (parking in downtown San Francisco runs $500/month), Zipcar is an excellent option. It's perfect for me, since I visit San Francisco every few weeks.
Moreover, it changes the equation a bit for people who are deciding where to live. Although it is often more expensive to live in areas like downtown where good public transportation is available, if you can do away with the expense of owning a car, living closer downtown becomes more viable. This is a net positive since living closer to where you work, shop, and play puts less stress on both the environment and your pocketbook.
Netflix Watch Instantly Streaming
I've been a customer of Netflix since 2004. I've always thought their model for renting DVDs was almost perfect: huge selection, low hassle, very convenient, and affordable. In the past few years Netflix has been quietly transforming itself into a company that deals in streaming content as well as their traditional rent-by-mail service. It's heartening to see a company acknowledge the future and embrace change rather than fear and reject it.
The change I'm talking about is the diminishing importance of physical media for movies. Before Blu-ray even came out, pundits spoke of it as the last physical format for movies. For mass-market purposes, they are probably right. At normal viewing distances and screen sizes, 1080p Blu-ray discs are not too far off from the limit of human perception of detail in moving images. Certainly 1440p and higher will eventually come out, but the difference between that and what's currently available will be unnoticeable to most. In short, there is little compelling reason for another revolution in disc formats.
The advantages of delivering movies over the Internet are clear: cost, convenience, selection. The question is who's going to deliver the content, and how's it going to get onto the TV in my living room? Netflix wants to be the one to do that, and to some extent they already are.
Netflix's now has a substantial catalog of titles available for streaming. Customers paying as little as $8.99 a month can stream unlimited content. Though a lot of the available content is cruft, I have noticed that more and more I am able to find quality content. I've been watching Lost on Netflix streaming, available in HD to boot. There are tons of great movies available, from classics to new releases, though almost never any recent hits. But the quality and quantity has been moving relentlessly upward.
So, how does it end up on my TV? Because if it's just on my 13" laptop screen, that will never replace DVD, let alone Blu-ray. For my roommate Eddy's birthday, he received a Blu-ray player, with built-in Netflix streaming capability, which is not uncommon among Blu-ray players (as well as a capability of the XBox360 and PS3, already in millions of living rooms). The device has WiFi, and can connect via my Netflix account to my "watch instantly" queue. In the several months that we've had the player, we've haven't played a single Blu-ray disc, but we've watched at least a hundred hours of streaming Netflix.
I look forward to Netflix making more deals and expanding their TV and movie selection, as well as offering more titles in HD. Wave of the future, dude, 100% electronic.
Honorable Mention: Virgin America
It doesn't necessarily fit into the category of "innovative," but certainly VA has changed things for the better. With its fleet of new A319/A320 planes, with live TV and on-demand music, TV, and movies, flying VA is very comfortable. The in-flight entertainment system even allows ordering food and drinks. Additionally, WiFi is available in-flight for about $10.
All this is nice, but it should be standard for new aircraft. The primary way VA has made things better is by giving other carriers some real competition. Previously, the best deals flying SEA/SFO were typically with Alaska, with its not-so-new fleet of 737s. A typical roundtrip ran $250-$300. VA flights can be found for as little as $39 one way (plus tax). A typical SEA/SFO roundtrip costs $110 with tax. This low-cost, comfortable and convenient flight has allowed me and my girlfriend to see each other quite often.
Gotta love a competitive market!
Friday, December 25, 2009
Sunday, December 20, 2009
Loose Dependency Injection
In the past year or so I've come to see the immense value in the principal of Inversion of Control (IoC)/Dependency Injection (DI) (see Fowler), and frameworks like Spring. Besides keeping classes and components isolated and focused in purpose, it also makes testing easier because instead of injecting real implementations, you can inject mocks into the component under test.
However, like any good idea, if taken to the extreme it becomes counterproductive. If everything a moderately complex class did was abstracted and injected, you would end up with a confusing and incoherent jumble of tiny classes. You would also risk exposing too much of the internals of a class by requiring any consumer of that class to create and inject pieces unrelated to the behavior the consumer wishes to dictate.
Let's make a simple example. Suppose we had a component that resizes an image. But in order to complete its work, needs to create a temporary file. Let's first take a look at an implementation that doesn't use DI.
Simple enough, but how are we going to test this? It uses a static method, which we don't own and can't change, to create the temporary file. We don't have any way to mock it or inspect it, so we're pretty much out of luck for testing it.
Now let's use a strict form of DI. We'll abstract the temporary file creation into a separate class, and require consumers to provide an implementation at construction time.
Better. At least we can write a test to mock TempFileFactory, inject the mock into the ImageResizer, and validate the interactions between ImageResizer and the temporary file. But now we've burdened consumers of ImageResizer -- which simply want to resize a file -- with the requirement of managing temporary files (by creating a TempFileFactory; alternately we could have required consumers to inject a temporary file, which is probably even worse) and the awkward knowledge that ImageResizer uses temporary files. If we made a breakthrough in the ImageResizer so that it no longer needed to use a temporary file, all the consumers would need to change their code.
So how do we get the benefits of testability and isolation without this downside? We still embrace the concept of DI but use defaults to hide this from consumers, in what I call "Loose Dependency Injection"
Fundamentally we're still using DI; the difference is that there's only one implementation of the dependency, and it is "injected" by the default constructor. The consumer has no knowledge that the ImageResizer has anything to do with temporary files. ImageResizer could change to not use temporary files, and no client code would need to change. Tests for ImageResizer are easy to write because we can mock ImageResizer.TempFileFactory. The best of all worlds!
However, like any good idea, if taken to the extreme it becomes counterproductive. If everything a moderately complex class did was abstracted and injected, you would end up with a confusing and incoherent jumble of tiny classes. You would also risk exposing too much of the internals of a class by requiring any consumer of that class to create and inject pieces unrelated to the behavior the consumer wishes to dictate.
Let's make a simple example. Suppose we had a component that resizes an image. But in order to complete its work, needs to create a temporary file. Let's first take a look at an implementation that doesn't use DI.
public class ImageResizer { public File resizeImage(File image) throws IOException { File tmp = File.createTempFile("tmp", null); // do work on tmp ... } }
Simple enough, but how are we going to test this? It uses a static method, which we don't own and can't change, to create the temporary file. We don't have any way to mock it or inspect it, so we're pretty much out of luck for testing it.
Now let's use a strict form of DI. We'll abstract the temporary file creation into a separate class, and require consumers to provide an implementation at construction time.
public class ImageResizer { /** Abstraction of temp file management */ public static class TempFileFactory { File createTempFile() throws IOException { return File.createTempFile("tmp", null); } } private final TempFileFactory fileFactory; /** Dependency-injection constructor */ public ImageResizer(TempFileFactory fileFactory) { this.fileFactory = fileFactory; } public File resizeImage(File image) throws IOException { File tmp = fileFactory.createTempFile(); // do work on tmp ... } }
Better. At least we can write a test to mock TempFileFactory, inject the mock into the ImageResizer, and validate the interactions between ImageResizer and the temporary file. But now we've burdened consumers of ImageResizer -- which simply want to resize a file -- with the requirement of managing temporary files (by creating a TempFileFactory; alternately we could have required consumers to inject a temporary file, which is probably even worse) and the awkward knowledge that ImageResizer uses temporary files. If we made a breakthrough in the ImageResizer so that it no longer needed to use a temporary file, all the consumers would need to change their code.
So how do we get the benefits of testability and isolation without this downside? We still embrace the concept of DI but use defaults to hide this from consumers, in what I call "Loose Dependency Injection"
public class ImageResizer { /** Package-private abstraction of temp file management */ static class TempFileFactory { File createTempFile() throws IOException { return File.createTempFile("tmp", null); } } private final TempFileFactory fileFactory; /** Public constructor, injects its own dependency */ public ImageResizer() { this.fileFactory = new TempFileFactory(); } /** Package-private constructor for use by test */ ImageResizer(TempFileFactory fileFactory) { this.fileFactory = fileFactory; } public File resizeImage(File image) throws IOException { File tmp = fileFactory.createTempFile(); // do work on tmp ... } }
Fundamentally we're still using DI; the difference is that there's only one implementation of the dependency, and it is "injected" by the default constructor. The consumer has no knowledge that the ImageResizer has anything to do with temporary files. ImageResizer could change to not use temporary files, and no client code would need to change. Tests for ImageResizer are easy to write because we can mock ImageResizer.TempFileFactory. The best of all worlds!
Wednesday, November 25, 2009
Scaling up is out, scaling out is in
One of the more interesting, if less visible, trends in the past half-decade has been that clock speeds on modern CPUs have stagnated. I'm writing this post on my Macbook, which turns one year old next week. It's equipped with a 2GHz processor and 2GB of RAM. It's the first computer I've bought since 2002, when I built a ~1.2GHz Athlon system with 1GB of RAM. Instead of a factor of 10 faster in 8 years, it's a factor of less than two, and I don't think we're going to see much more in terms of clock speed in the future. Check out the graph below:
Since around 2002 clock speeds have held steady at about 2GHz. The primary constraint has been thermal. As processors moved into the multi-GHz range they started to dissipate up to 100W of heat, which becomes impractical to cool (ever had your legs burned by your laptop?). "Scaling up" clock speeds had hit a wall. So hardware engineers had to focus on other ways of making things faster. They did some increasingly clever things like superscalar execution (dispatching multiple instructions per clock cycle), new specialized instructions (SSE, etc), hyperthreading (a single processor appearing as two processors to the OS), then on to the logical conclusion of multi-core (multiple CPU dies in a single package). Performance now comes from "scaling out" to multiple cores, and if you're running a service, multiple machines.
The consequence of this shift from faster clock cycles to more processors has been that after decades of sitting on their asses and waiting for the next doubling of clock speeds to make up for their lazy coding, software engineers have to actually write code differently to get it to run fast. This could mean traditional optimization, re-writing existing code to run faster without fundamentally changing the approach to the problem. But increasingly it means taking advantage of the way hardware is evolving by writing code to take advantage of multiple cores by splitting the problem into independent pieces that can be executed simultaneously.
To some degree the service we're building at Kikini can naturally take advantage of multiple cores, since we're serving many simultaneous requests. However, due to the transactional nature of databases, there is a limit to how much performance you can get by simply adding more cores. Write operations require locks which cause other transactions to fail, so even if you had infinite cores you'd still be constrained by how your design your database.
All this points to three main ways to achieve high performance:
Since around 2002 clock speeds have held steady at about 2GHz. The primary constraint has been thermal. As processors moved into the multi-GHz range they started to dissipate up to 100W of heat, which becomes impractical to cool (ever had your legs burned by your laptop?). "Scaling up" clock speeds had hit a wall. So hardware engineers had to focus on other ways of making things faster. They did some increasingly clever things like superscalar execution (dispatching multiple instructions per clock cycle), new specialized instructions (SSE, etc), hyperthreading (a single processor appearing as two processors to the OS), then on to the logical conclusion of multi-core (multiple CPU dies in a single package). Performance now comes from "scaling out" to multiple cores, and if you're running a service, multiple machines.
The consequence of this shift from faster clock cycles to more processors has been that after decades of sitting on their asses and waiting for the next doubling of clock speeds to make up for their lazy coding, software engineers have to actually write code differently to get it to run fast. This could mean traditional optimization, re-writing existing code to run faster without fundamentally changing the approach to the problem. But increasingly it means taking advantage of the way hardware is evolving by writing code to take advantage of multiple cores by splitting the problem into independent pieces that can be executed simultaneously.
To some degree the service we're building at Kikini can naturally take advantage of multiple cores, since we're serving many simultaneous requests. However, due to the transactional nature of databases, there is a limit to how much performance you can get by simply adding more cores. Write operations require locks which cause other transactions to fail, so even if you had infinite cores you'd still be constrained by how your design your database.
All this points to three main ways to achieve high performance:
- Optimize individual queries
- Design queries and the database schema to minimize locking to take advantage of multiple cores
- Partition data in clever ways to spread the load across multiple servers
Sunday, November 22, 2009
Working Around JSVC's Logging Limitations
JSVC is a popular option for people using Tomcat as their web container. The main advantage of JSVC is that it allows downgrading the user running a process (since most Linux systems require the root user to open a port below 1024), and also acts as a watchdog to restart the JVM if it crashes. However one big problem with JSVC is that it can only write the output of the JVM it's hosting to two files on the filesystem corresponding to stdout and stderr. This is problematic since it doesn't allow for log rotation or any other form of redirection.
At Kikini, we created a logging solution to append log statements into SimpleDB so that logs from all our machines end up in a central location, unbounded by normal filesystem limits, and easily query-able against and monitored, allowing us to react quickly to diagnose problems. The simplest way to use our logger is to redirect the output from the target process to the stdin of our logging process. However JVSC makes this rather difficult since it is hard-coded to only write to files on the filesystem.
Fortunately we have a trick up our sleeve in the form of UNIX named pipes, which can use as a target for JSVC to write to and a source for the logger to read from:
At Kikini, we created a logging solution to append log statements into SimpleDB so that logs from all our machines end up in a central location, unbounded by normal filesystem limits, and easily query-able against and monitored, allowing us to react quickly to diagnose problems. The simplest way to use our logger is to redirect the output from the target process to the stdin of our logging process. However JVSC makes this rather difficult since it is hard-coded to only write to files on the filesystem.
Fortunately we have a trick up our sleeve in the form of UNIX named pipes, which can use as a target for JSVC to write to and a source for the logger to read from:
mkfifo pipe.out mkfifo pipe.err /usr/bin/startlogger.sh STDOUT < pipe.out /usr/bin/startlogger.sh STDERR < pipe.err /usr/bin/jsvc -outfile pipe.out -errfile pipe.err ...Now JSVC will start up, and write into the pipes we created, which will be redirected into the mylogger processes.
Friday, November 13, 2009
Using Maven Chronos Without an External JMeter Install
Performance is one of the things we're really focused on at Kikini. But we want to stay focused on actually improving performance, and not spending a lot of cycles making manual measurements and interpreting logs. JMeter is probably the best open-source tool out there for measuring performance of a web application. I designed a JMeter test plan to simulate users visiting our site. Unfortunately while JMeter is great at making measurements, it stops short of data analysis and reporting.
Ideally we would like to get perf reports out of every build, which means we would like to do reporting as part of our Maven build, with results available as easily readable charts on our build server. The top hit you're likely to get from searching for "maven jmeter" is the awful JMeterMavenPlugin. I say awful because it wasn't easy to integrate, and if you look at the source code it's obvious that the project was done in spare time. There are a number of comments in the source like "this mess is necessary because..." which makes me think the whole thing is poorly designed, and if you search around you will indeed find that there are a number of problems people have encountered trying to use it. Finally, the output from the plugin is just the simple JMeter log, and not the reports I'd like.
All the way down in the middle of the second page of the Google results I found this gem: chronos-maven-plugin. Not only does this look like a well-designed and well-executed project, it produces wonderful HTML reports, perfect for plugging into our build server! This is a snippet of what the Chronos output looks like:
The only downside is that the Chronos plugin requires an external install of JMeter, which kind of defeats the whole purpose of Maven. Fortunately, inspired by an Atlassian post, I worked out a way to use the Chronos plugin without making JMeter a manual install by using the maven-dependency-plugin. First I deployed the JMeter ZIP file as an artifact on our Artifcatory repository:
In my POM, I set jmeter.home to the location that we'll be unpacking JMeter into:
Next I use the dependency plugin in the pre-integration-test step to unpack JMeter into the target folder:
Finally I configure Chronos to run:
Bingo. Now anyone running our build can get the JMeter performance reports with nothing more complex than running "mvn verify chronos:report".
Ideally we would like to get perf reports out of every build, which means we would like to do reporting as part of our Maven build, with results available as easily readable charts on our build server. The top hit you're likely to get from searching for "maven jmeter" is the awful JMeterMavenPlugin. I say awful because it wasn't easy to integrate, and if you look at the source code it's obvious that the project was done in spare time. There are a number of comments in the source like "this mess is necessary because..." which makes me think the whole thing is poorly designed, and if you search around you will indeed find that there are a number of problems people have encountered trying to use it. Finally, the output from the plugin is just the simple JMeter log, and not the reports I'd like.
All the way down in the middle of the second page of the Google results I found this gem: chronos-maven-plugin. Not only does this look like a well-designed and well-executed project, it produces wonderful HTML reports, perfect for plugging into our build server! This is a snippet of what the Chronos output looks like:
The only downside is that the Chronos plugin requires an external install of JMeter, which kind of defeats the whole purpose of Maven. Fortunately, inspired by an Atlassian post, I worked out a way to use the Chronos plugin without making JMeter a manual install by using the maven-dependency-plugin. First I deployed the JMeter ZIP file as an artifact on our Artifcatory repository:
<?xml version="1.0" encoding="UTF-8"?> <project> <modelVersion>4.0.0</modelVersion> <groupId>org.apache.jmeter</groupId> <artifactId>jmeter</artifactId> <version>2.3.4</version> <packaging>zip</packaging> <description>Artifactory auto generated POM</description> </project>
In my POM, I set jmeter.home to the location that we'll be unpacking JMeter into:
<properties> <jmeter-version>2.3.4</jmeter-version> <jmeter.home>${project.build.directory}/jakarta-jmeter-${jmeter-version}</jmeter.home> </properties>
Next I use the dependency plugin in the pre-integration-test step to unpack JMeter into the target folder:
<plugin> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-dependency-plugin</artifactId> <version>2.1</version> <executions> <execution> <id>unpack-jmeter</id> <phase>pre-integration-test</phase> <goals> <goal>unpack</goal> </goals> <configuration> <artifactItems> <artifactItem> <groupId>org.apache.jmeter</groupId> <artifactId>jmeter</artifactId> <version>${jmeter-version}</version> <type>zip</type> </artifactItem> </artifactItems> <outputDirectory>${project.build.directory}</outputDirectory> </configuration> </execution> </executions> </plugin>
Finally I configure Chronos to run:
<plugin> <groupId>org.codehaus.mojo</groupId> <artifactId>chronos-maven-plugin</artifactId> <version>1.0-SNAPSHOT</version> <configuration> <input>${basedir}/src/test/jmeter/UserSession.jmx</input> </configuration> <executions> <execution> <goals> <goal>jmeter</goal> <goal>savehistory</goal> </goals> </execution> </executions> </plugin>
Bingo. Now anyone running our build can get the JMeter performance reports with nothing more complex than running "mvn verify chronos:report".
Subscribe to:
Posts (Atom)