Random Tech Stuff

Seamless, Secure Routing to Blocked Sites

2012-01-25T02:42:00.000-08:00

Where I live, Internet access is restricted. Specific sites are blocked, as well as sections of some sites depending on the content. Blocking is done primarily in two ways:

False DNS results
Broken connections, including unacknowledged SYNs, high packet loss, and injected RSTs.

With access to a computer outside the country (such as a micro instance on EC2), these problems may be overcome by using the outside computer as a proxy. One simple solution is to use SSH as a SOCKS proxy. This is what I had been using for about a year. It works fine, but not all devices can use a SOCKS proxy, and performance is degraded by layering TCP inside TCP.

As a weekend project, I decided to implement a more sophisticated system, with the following goals:

Only route data destined for blocked networks through the proxy so that domestic sites remain fast
All devices connected to my home network should automatically be able to access blocked sites with zero configuration

With these goals in mind, I designed the following system:

Router maintains a secure VPN tunnel to the proxy (OpenVPN)
Routing table has entries for all IP blocks owned by blocked sites, routes this traffic over the tunnel
Local DNS server is configured to forward requests for blocked domains to Google's public DNS server over the VPN tunnel
Local DNS server forwards other requests to the ISP's DNS server

My router is a Mikrotik RouterOS device, which supports OpenVPN tunnels. I configured a VPN tunnel at the router level to a linux instance on EC2. This is pretty standard so I'll skip the details.

Setting up the routing table is where things get a little tricky. Fortunately the blocked sites are all rather large companies, and they all have their own AS number. Using radb it is possible to query which address blocks are advertised by an ASN. I wrote a script to parse this information and print out router commands for each block:

#!/bin/bash

if [ $# -ne 2 ]; then
  echo "Usage: `basename $0` [AS number] [friendly name]"
  exit 1
fi

asNumber=$1
asName=$2
whois -h whois.radb.net "!gAS$asNumber" | head -n -1 |awk 'NR>1' | tr -d '\n' | tr ' ' '\n' | sort | uniq | tr '\n' ' ' | awk "{for(i=1;i<=NF;i++)print \"add dst-address=\"\$i\" gateway=ovpn-out1 comment=\\\"AS$asNumber $asName\\\"\"}"

Here's the results for Twitter (AS13414).

add dst-address=199.16.156.0/22 gateway=ovpn-out1 comment="AS13414 Twitter"
add dst-address=199.59.148.0/22 gateway=ovpn-out1 comment="AS13414 Twitter"
add dst-address=24.75.96.0/21 gateway=ovpn-out1 comment="AS13414 Twitter"

I also did the same for Facebook (AS32934) and Google (AS15169), both of which have considerably more blocks. Unfortunately the data from radb contains redundant blocks, for example it has some CIDR blocks which are both a /24 and a /23, so the /24 is useless since it is already contained in the /23. The total number of routes is a bit over a thousand, which the Mikrotik can handle easily, but with some logic or a better data source it could be shrunk to a few hundred.

That was the hard bit. Next up is DNS. I installed Unbound and configured it to forward to Google's public DNS server (8.8.8.8) for the blocked domains (which, recall, will be routed over the VPN tunnel and so not subject to poisoning), and my ISP's DNS server for the rest:

server:
 verbosity: 1
 interface: 0.0.0.0
        access-control: 0.0.0.0/0 allow
 msg-cache-size: 16m
        rrset-cache-size: 32m
 chroot: ""

forward-zone:
    name: "facebook.com"
    forward-addr: 8.8.8.8
forward-zone:
    name: "fbcdn.net"
    forward-addr: 8.8.8.8
forward-zone:
    name: "facebook.net"
    forward-addr: 8.8.8.8
forward-zone:
    name: "facebook.com.edgekey.net"
    forward-addr: 8.8.8.8
forward-zone:
    name: "facebook.com.edgesuite.net"
    forward-addr: 8.8.8.8
forward-zone:
    name: "akamaiedge.net"
    forward-addr: 8.8.8.8
forward-zone:
    name: "google.com"
    forward-addr: 8.8.8.8
forward-zone:
    name: "gmail.com"
    forward-addr: 8.8.8.8
forward-zone:
    name: "googleusercontent.com"
    forward-addr: 8.8.8.8
forward-zone:
    name: "blogspot.com"
    forward-addr: 8.8.8.8
forward-zone:
    name: "blogger.com"
    forward-addr: 8.8.8.8
forward-zone:
    name: "youtube.com"
    forward-addr: 8.8.8.8
forward-zone:
    name: "twitter.com"
    forward-addr: 8.8.8.8
forward-zone:
    name: "."
    forward-addr: 192.168.88.1

And, we're done. Now any device can log onto the wifi in my apartment and have unrestricted access to all the normally blocked sites, and still have fast connectivity to domestic sites.

Connection keep-alive timeouts for popular browsers

2010-11-04T15:34:00.000-07:00

Recently I needed to know how long the popular browsers will keep an HTTP keep-alive connection before closing it. I was able to find documented values for IE and FireFox, but not other browsers. In fact I couldn't even find much in the way of anecdotes. So for the other browsers I decided to find out myself by testing against a Tomcat server configured to an hour-long keep-alive timeout. I then used each browser to make a single request and observed the TCP streams in Wireshark. Here are the results:

IE: 60 seconds (documentation)
FireFox: 300 seconds (documentation)
Chrome: 300 seconds (observed)
Safari: 30 seconds (observed)
Opera: 120 seconds (observed)

Note that for IE and FireFox these values are configurable by the user, and the developers behind the other browsers may change the timeout in future releases.

Authoring multipart Ubuntu cloud-init configuration with Java

2010-09-17T13:25:00.000-07:00

Canonical's wonderful Amazon EC2 Images come with a powerful configuration tool called cloud-init that lets you pass configuration via user-data. One of the more interesting capabilities is that cloud-init allows a combination of different configuration payloads using MIME as a system for aggregating parts.

Below is an example of how to create a multipart configuration compatible with cloud-init using Java:

import java.util.Properties;

import javax.mail.Session;
import javax.mail.internet.MimeBodyPart;
import javax.mail.internet.MimeMessage;
import javax.mail.internet.MimeMultipart;


public class CloudInitMultipart {

    public static void main(String[] args) throws Exception {
        String config = "#cloud-config\n" 
            + "mounts:\n" 
            + " - [ sdf, /mnt/data, \"auto\", \"defaults,nobootwait\", \"0\", \"0\" ]\n\n" 
            + "packages:\n"
            + " - emacs23-nox\n\n";
        MimeMultipart mime = new MimeMultipart();
        MimeBodyPart part1 = new MimeBodyPart();
        part1.setText(config, "us-ascii", "cloud-config");
        part1.setFileName("cloud-config.txt");
        MimeBodyPart part2 = new MimeBodyPart();
        String script = "#!/bin/bash\n\n" 
            + "NOW=`date +%s`\n" 
            + "touch /mnt/$NOW";
        part2.setText(script, "us-ascii", "x-shellscript");
        part2.setFileName("runme.sh");
        mime.addBodyPart(part1);
        mime.addBodyPart(part2);
        MimeMessage msg = new MimeMessage(Session.getDefaultInstance(new Properties()));
        msg.setContent(mime);
        msg.writeTo(System.out);
    }
}

This will create a multipart configuration combining a cloud-config element which installs emacs and creates an fstab entry, and also runs a bash script that creates a file. The output can then be used as user-data for launching an EC2 instance with this configuration.

How to Build Terracotta from Source

2010-09-07T08:39:00.000-07:00

It seems that the folks at Terracotta have decided to make it nearly impossible to download any version older than the current version. As is common in real-world applications, sometimes it is desirable to stay on a version a little behind the bleeding edge because you know what you've got works for what you're doing. Terracotta has made things more difficult than usual by holding back a critical fix for a compatibility issue between Java 1.6.0_20 and Terracotta 3.2.0. The fix is available in version 3.2.2, which is only available to customers with a support contract with Terracotta.

So, I'll show you how to build 3.2.2 from source. It's a little trickier than implied in the above-linked thread, and the Terracotta Build Page doesn't explain it all.

First, we need to check out the 3.2.2 source code:

svn co http://svn.terracotta.org/svn/tc/dso/tags/3.2.2

Next, set up some required environment variables. Terracotta needs to know where your JRE, JDK, and Ant live. The following locations worked for me on my Ubuntu 10.04 install with Sun's Java 6, substitute the locations for your OS/Java distro:

export ANT_HOME=/usr/share/ant
export JAVA_HOME=/usr/lib/jvm/java-6-sun/jre
export JAVA_HOME_16=/usr/lib/jvm/java-6-sun

In my case, I only have Java 6, and I don't care about previous versions of Java. So we need to instruct the Terracotta build system to not try to use older releases. Modify the file 3.2.2/code/base/jdk.def.yml to comment out the Java 1.5 stuff:

#
# All content copyright (c) 2003-2006 Terracotta, Inc.,
# except as may otherwise be noted in a separate copyright notice.
# All rights reserved
#

# Defines the various JDKs used by the build system.
#
# Each JDK specification begins with a unique name that uniquely identifies
# the JDK version.  After the name come the following attributes:
#
#   min_version: The minumum JDK version
#   max_version: The maximum JDK version
#   env: A list of names of configuration properties that the build system uses
#        to locate the JDK installation
#   alias: A list of alternative names for the JDK

#J2SE-1.5:
#    min_version: 1.5.0_0
#    max_version: 1.5.999_999
#    env:
#      - J2SE_15
#      - JAVA_HOME_15
#    alias:
#      - tests-1.5
#      - "1.5"

JavaSE-1.6:
    min_version: 1.6.0_0
    max_version: 1.6.999_999
    env:
      - JAVASE_16
      - JAVASE_6
      - JAVA_HOME_16
    alias:
      - tests-1.6
      - "1.6"

Ok, now we're ready to build. Here's what I used to build the core Terracotta distribution using the Sun JDK. You may need to tweak the jdk parameter as needed for the location of your jdk.

cd 3.2.2/code/base
./tcbuild --no-extra dist dso OPENSOURCE jdk=/usr/lib/jvm/java-6-sun

The build will download all its dependencies and compile the Terracotta release. Note that this is the core distribution only, it does not build TIMs or anything like that. Once the build is complete, there will be a new folder

3.2.2/code/base/build/dist

This contains the Terracotta distribution that you would have downloaded.

BASH Script Self-Destructs, grep to the Rescue

2010-07-05T14:05:00.000-07:00

I was working on a bash script and periodically testing it out. It had gotten somewhere in the 30-40 line range when I made a fatal error. I added an if statement that looked something like this:

var1=0
var2=$0
if [ $var1 > $var2 ]; then
  echo "true"
else
  echo "false"
fi

I had actually made two mistakes. First, I meant for var2 to have a value of $?, which is the exit value of the last command and not the path of the script itself, which is what $0 evaluates to. Second, I used the > operator instead of -gt as required by BASH. So what the if statement ended up doing was redirect /dev/null into the file containing my script! After running this code, your script self-destructs into a 0-byte file. I was particularly annoyed because writing a decent-sized BASH script is a meticulous process and one that I'm obviously not expert at, and so I was looking at a good half-hour of lost work.

Arcane UNIX nonsense had gotten me into this mess, I figured it could get me out as well. My file was very likely still sitting on some sector somewhere on the disk. I knew my script contained the word "COLLECTD_SOCKET" which wasn't likely to appear anywhere else on the drive. So I unmounted the filesystem (on device /dev/sdf) and ran the following command:

grep -A40 -B10 COLLECTD_SOCKET /dev/sdf

What this does is search the raw contents of the entire drive at the device level for the term "COLLECTD_SOCKET" and print the 10 lines before the match and 40 lines after the match. It took a little while (as you'd expect for reading the whole device) but I found a number of old versions of the script I was working on, including the version just before by bug caused it to self-destruct.

I guess the lesson here is that UNIX gives you lots of ammunition to shoot yourself with, but it also gives you plenty of gauze to help you heal yourself as well.

Setting up Collectd Collection3 on Ubuntu Lucid 10.04

2010-06-03T14:42:00.000-07:00

Unfortunately the wiki on how to set up collection3 is not that great. In particular it glosses over how to configure apache. But if you're running Ubuntu Lucid 10.04, it's actually pretty easy to set up collectd and collection3. I'll walk you through the steps.

First, you'll need to install the needed dependencies:

sudo apt-get update -y
sudo apt-get install -y apache2 libconfig-general-perl librrds-perl libregexp-common-perl libhtml-parser-perl collectd-core

Then we need to configure collectd to sample some data and store the data as RRDs. Drop this file in /etc/collectd/collectd.conf

LoadPlugin cpu
LoadPlugin load
LoadPlugin memory
LoadPlugin disk
LoadPlugin rrdtool
<Plugin rrdtool>
  DataDir "/var/lib/collectd/rrd/"
</Plugin>

Next we configure apache to use collection3. Copy this file into /etc/apache2/conf.d/collection3.conf

ScriptAlias /collectd/bin/ /usr/share/doc/collectd-core/examples/collection3/bin/
Alias /collectd/ /usr/share/doc/collectd-core/examples/collection3/

<Directory /usr/share/doc/collectd-core/examples/collection3/>
    AddHandler cgi-script .cgi
    DirectoryIndex bin/index.cgi
    Options +ExecCGI
    Order Allow,Deny
    Allow from all
</Directory>

Now let's start collectd and restart apache:

sudo /etc/init.d/apache2 reload
sudo /etc/init.d/collectd start

It'll take collectd a minute to gather enough data to usefully graph. Then you can point your browser to http://your.host.name/collectd/

And you'll be able to graph data!

Note 1: You may need to choose "hour" from the pulldown if you just started collectd, since it doesn't have enough data to graph a day yet
Note 2: The apache configuration is not secure; anyone could just navigate to your machine and see those graphs. Use SSL/.htaccess or other methods to lock down access

Migrating Unfuddle Tickets to JIRA

2010-05-05T10:55:00.000-07:00

I found myself needing to migrate bugs from Unfuddle, which exports them in a custom XML format, to JIRA, which can import CSV (documentation). I threw together a quick Java class to help me do this. It takes backup.xml generated from Unfuddle and creates a CSV which can be read by JIRA. It imports the following fields:

Summary
Status
Description
Milestone (as a custom JIRA field)
Assignee
Reporter
Resolution (if resolved)
Resolution description (as a comment)
Creation time
Resolved time (if resolved)

Furthermore it outputs the bugs in the order of the ID in Unfuddle, so that if you're importing into an empty JIRA project, the bugs will have the same number as in Unfuddle. It assumes the JIRA usernames correspond to Unfuddle usernames, though you can easily map differences by modifying the lookupUser function. Once you generate the CSV, you can give the configuration file below to the JIRA CSV Import wizard to take care of the mappings. You'll want to update

existingprojectkey
user.email.suffix

to match your project. There are a few notable things that are missed with this tool:

Time of day for creation/resolved
Comments

The tool should run without modification and requires only Joda Time as a dependency under JDK 1.6. This is total slapdash, quick-n-dirty, git-er-done code for a one-off conversion. If anyone would like to extend this tool or generalize it, that would be great :)

Java class UnfuddleToJira.java:

// Original author Gabe Nell. Released under the Apache 2.0 License
// http://www.apache.org/licenses/LICENSE-2.0.html

import java.io.FileOutputStream;
import java.io.PrintStream;
import java.util.ArrayList;
import java.util.Collections;
import java.util.HashMap;
import java.util.List;
import java.util.Map;

import javax.xml.parsers.DocumentBuilderFactory;

import org.joda.time.DateTime;
import org.joda.time.format.DateTimeFormat;
import org.joda.time.format.DateTimeFormatter;
import org.w3c.dom.Document;
import org.w3c.dom.Element;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;

public class UnfuddleToJira {

    private static final DateTimeFormatter DATE_FORMATTER = DateTimeFormat.forPattern("yyyyMMdd");

    private final Document doc;
    private final PrintStream output;
    private final Map<String, String> milestones;
    private final Map<String, String> people;

    public UnfuddleToJira(Document doc, PrintStream output) {
        this.doc = doc;
        this.output = output;
        this.milestones = parseMilestones(doc);
        this.people = parsePeople(doc);
    }

    private static Map<String, String> parseMilestones(Document doc) {
        Map<String, String> milestones = new HashMap<String, String>();
        NodeList milestoneNodes = doc.getElementsByTagName("milestone");
        for (int i = 0; i < milestoneNodes.getLength(); i++) {
            Element elem = (Element)milestoneNodes.item(i);
            String title = elem.getElementsByTagName("title").item(0).getTextContent();
            String id = elem.getElementsByTagName("id").item(0).getTextContent();
            milestones.put(id, title);
        }
        System.out.println("Found " + milestones.size() + " milestones: " + milestones);
        return milestones;
    }

    private static Map<String, String> parsePeople(Document doc) {
        Map<String, String> people = new HashMap<String, String>();
        NodeList peopleNodes = doc.getElementsByTagName("person");
        for (int i = 0; i < peopleNodes.getLength(); i++) {
            Element elem = (Element)peopleNodes.item(i);
            String name = elem.getElementsByTagName("username").item(0).getTextContent();
            String id = elem.getElementsByTagName("id").item(0).getTextContent();
            people.put(id, name);
        }
        System.out.println("Found " + people.size() + " people: " + people);
        return people;
    }

    private static String prepareForCsv(String input) {
        if (input == null) return "";
        return "\"" + input.replace("\"", "\"\"") + "\"";
    }

    private static String convertDate(String input) {
        return DATE_FORMATTER.print(new DateTime(input));
    }

    private String lookupUser(String id) {
        String person = people.get(id);
        /**
         * Here you can transform a person's username if it changed between
         * Unfuddle and JIRA. Eg: <tt> 
         * if ("gabe".equals(person)) {
         *     person = "gabenell";
         * }
         * </tt>
         */
        return person;
    }

    private String lookupMilestone(String id) {
        return milestones.get(id);
    }

    private void writeCsvHeader() {
        StringBuilder builder = new StringBuilder(256);
        builder.append("Summary, ");
        builder.append("Status, ");
        builder.append("Assignee, ");
        builder.append("Reporter,");
        builder.append("Resolution,");
        builder.append("CreateTime,");
        builder.append("ResolveTime,");
        builder.append("Milestone,");
        builder.append("Description");
        output.println(builder.toString());
    }

    private void writeCsvRow(Ticket ticket) {
        StringBuilder builder = new StringBuilder(256);
        builder.append(prepareForCsv(ticket.summary)).append(", ");
        builder.append(prepareForCsv(ticket.status)).append(", ");
        builder.append(prepareForCsv(lookupUser(ticket.assigneeId))).append(", ");
        builder.append(prepareForCsv(lookupUser(ticket.reporterId))).append(", ");
        builder.append(prepareForCsv(ticket.resolution)).append(", ");
        builder.append(prepareForCsv(convertDate(ticket.createdTime))).append(", ");
        String resolveTime = ticket.resolution != null ? convertDate(ticket.lastUpdateTime) : null;
        builder.append(prepareForCsv(resolveTime)).append(", ");
        builder.append(prepareForCsv(lookupMilestone(ticket.milestoneId))).append(", ");
        builder.append(prepareForCsv(ticket.description));

        // JIRA doesn't have the notion of a resolution description, add it as a
        // comment
        if (ticket.resolutionDescription != null) {
            builder.append(",").append(prepareForCsv(ticket.resolutionDescription));
        }
        output.println(builder.toString());
    }

    public void writeCsv() throws Exception {
        NodeList ticketNodes = doc.getElementsByTagName("ticket");
        List<Ticket> tickets = new ArrayList<Ticket>();
        for (int i = 0; i < ticketNodes.getLength(); i++) {
            Node node = ticketNodes.item(i);
            Element nodeElem = (Element)node;
            Ticket ticket = new Ticket();
            NodeList ticketElements = nodeElem.getChildNodes();
            for (int j = 0; j < ticketElements.getLength(); j++) {
                Node ticketSubNode = ticketElements.item(j);
                String nodeName = ticketSubNode.getNodeName();
                if ("id".equals(nodeName)) {
                    ticket.id = ticketSubNode.getTextContent();
                } else if ("status".equals(nodeName)) {
                    ticket.status = ticketSubNode.getTextContent();
                } else if ("summary".equals(nodeName)) {
                    ticket.summary = ticketSubNode.getTextContent();
                } else if ("description".equals(nodeName)) {
                    ticket.description = ticketSubNode.getTextContent();
                } else if ("milestone-id".equals(nodeName)) {
                    ticket.milestoneId = ticketSubNode.getTextContent();
                } else if ("assignee-id".equals(nodeName)) {
                    ticket.assigneeId = ticketSubNode.getTextContent();
                } else if ("reporter-id".equals(nodeName)) {
                    ticket.reporterId = ticketSubNode.getTextContent();
                } else if ("resolution".equals(nodeName)) {
                    ticket.resolution = ticketSubNode.getTextContent();
                } else if ("resolution-description".equals(nodeName)) {
                    ticket.resolutionDescription = ticketSubNode.getTextContent();
                } else if ("created-at".equals(nodeName)) {
                    ticket.createdTime = ticketSubNode.getTextContent();
                } else if ("updated-at".equals(nodeName)) {
                    ticket.lastUpdateTime = ticketSubNode.getTextContent();
                }
            }
            tickets.add(ticket);
        }
        System.out.println("Writing " + tickets.size() + " tickets...");

        // Output to CSV in order of ticket number
        writeCsvHeader();
        Collections.sort(tickets);
        for (Ticket ticket : tickets) {
            writeCsvRow(ticket);
        }
    }

    public static class Ticket implements Comparable<Ticket> {

        public String id;
        public String summary;
        public String status;
        public String description;
        public String milestoneId;
        public String assigneeId;
        public String reporterId;
        public String resolution;
        public String resolutionDescription;
        public String createdTime;
        public String lastUpdateTime;

        @Override
        public int compareTo(Ticket other) {
            return Integer.parseInt(id) - Integer.parseInt(other.id);
        }
    }

    public static void main(String[] args) throws Exception {
        DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
        factory.setNamespaceAware(true);
        if (args.length != 2) {
            System.err.println("Usage: UnfuddleToJira /path/to/unfuddle/backup.xml /path/to/jira/output.csv");
            return;
        }
        String inputFilename = args[0];
        String outputFilename = args[1];
        PrintStream output = new PrintStream(new FileOutputStream(outputFilename), true, "UTF-8");
        UnfuddleToJira converter = new UnfuddleToJira(factory.newDocumentBuilder().parse(inputFilename), output);
        converter.writeCsv();
        output.close();
    }

}

Configuration file:

# written by PropertiesConfiguration
# Wed May 05 07:12:57 UTC 2010
existingprojectkey = WEB
importsingleproject = false
importexistingproject = true
mapfromcsv = false
field.Resolution = resolution
field.Milestone = customfield_Milestone:select
field.Assignee = assignee
field.Summary = summary
field.Status = status
field.Description = description
field.Reporter = reporter
field.CreateTime = created
value.Status.closed = 6
value.Resolution.works_for_me = 5
value.Resolution.will_not_fix = 2
value.Status.new = 1
value.Status.reassigned = 1
value.Resolution.invalid = 4
value.Resolution.postponed = 2
value.Status.accepted = 3
value.Resolution.fixed = 1
value.Resolution.duplicate = 3
user.email.suffix = @kikini.com
date.import.format = yyyyMMdd
field.ResolveTime = resolutiondate
date.fields = CreateTime
date.fields = ResolveTime

Installing Sun Java 6 on Ubuntu 10.4 Lucid Lynx

2010-04-20T21:16:00.000-07:00

It looked like Canonical was going to totally abandon Sun's JDK with the release of Lucid Lynx. After heated discussions, however, instead it was merely tucked away even deeper into the recesses of alternative repositories. Now it lives in a partner respository, so as root you'll need to run

add-apt-repository "deb http://archive.canonical.com/ lucid partner"
apt-get update

to add the appropriate repository. Now you can use apt-get install as before to install the sun-java6-jdk or sun-java6-jre packages. For those curious, this is the "official" way to do this according to the release notes.

Connecting to JMX on Tomcat 6 through a firewall

2010-04-19T22:51:00.000-07:00

One of the flaws (in my opinion, and shared by others) of the design of JMX/RMI is that the server listens on a port for connections, and when one is established it negotiates a new secondary port to open on the server side and expects the client to connect to that. Well, OK, except that it will pick an available port at random, and if your target machine is behind a firewall, well, you're out of luck because you don't know which port to open up!

With the release of Tomcat 6.0.24, a new Listener (the JmxRemoteLifecycleListener) is available that lets you connect to JMX running on your Tomcat server using jconsole. Using this Listener you can specify the secondary port number instead of it being picked at random. This way, you can open two known ports on your firewall and jconsole will happily connect and read data from Tomcat's JVM over JMX.

Setting it up is pretty easy. First, copy catalina-jmx-remote.jar from the extras folder of the binary distribution into Tomcat's lib folder.

Update your server.xml to include the Listener:

<Listener className="org.apache.catalina.mbeans.JmxRemoteLifecycleListener" rmiRegistryPortPlatform="10001" rmiServerPortPlatform="10002"/>

Replace the ports with whichever ones you wish. Make sure to open up those ports on your firewall. Be sure to properly configure JMX using an authentication and SSL. Or if you're just setting this up for testing, you can go with the totally insecure and unsafe configuration and add the following JVM arguments to your Tomcat startup script (typically CATALINA_OPTS or JAVA_OPTS):

-Dcom.sun.management.jmxremote.ssl=false
-Dcom.sun.management.jmxremote.authenticate=false

Now you can start Tomcat. On your client machine, start jconsole and drop in the following URL for your remote process:

service:jmx:rmi://your.public.dns:10002/jndi/rmi://your.public.dns:10001/jmxrmi

Obviously you need to replace your.public.dns with the DNS address of your Tomcat machine, and if you chose different ports, change those as well. With some luck, you'll connect and be getting data!

If you're on EC2 or a similar network where you have an internal DNS name that's different from your external/public DNS name, one more step is required. Additionally set the following property to the server's external/public DNS name

-Djava.rmi.server.hostname=your.public.dns

And with that bit of magic you should be off and collecting data!

Optimizing PostgreSQL/Tomcat for Write-Heavy Workloads

2010-04-18T16:53:00.000-07:00

Recently I've been working on tuning the performance of a Tomcat web front-end and PostgreSQL back-end. In particular I wanted to stress some write-heavy scenarios, so I designed a JMeter test plan and ran it using Maven Chronos (as described in this post). Also I have collectd running on the machines and reporting various system metrics to a central server for graphing. This is essential to help identify which system resources are contributing to a performance bottleneck.

In this post I don't want to get too hung up on the exact nature of the queries or the hardware configurations. Instead I'd like to focus on the investigative process itself. Let's start off by showing some graphs from a 20-minute stress run. The JMeter test plan is configured to ramp up linearly from 0 to 50 client threads over a 5-minute period, then continue for another 15 minutes:

Postgres	Tomcat

Right away we notice that the stress run isn't really stressing either the Tomcat or Postgres machines. (aside: in this post I'm only going to show CPU graphs. Obviously you need to look at other resources as well. However, for the purposes of this discussion, looking at CPU is enough to get the idea.) At first it might seem that we're not hitting the server hard enough. Maybe 50 client threads is too few? Yet as we can see from the throughput graph performance, the overall throughput rises until we get to about 15 threads, and after that it is fairly flat. So this suggests that the problem is not with the test setup, but something in the server configurations.

Also notice that performance is actually pretty bad from a query response time perspective. The response times are all over the map, with a median around 300ms, a 95th-percentile all the way at about 1.2 seconds, and some queries lasting as long as 3.5 seconds. Ouch!

The most suspicious thing to me is that throughput doesn't increase when more than about 15 threads are hitting the servers. Both Tomcat and PostgreSQL are designed to be extremely capable in high-volume environments. No way could 15 threads be causing us to max out. The huge variance in response times implies that requests are being queued rather than handled right away. After running the test again, I logged into Postgres and ran SELECT count(*) FROM pg_stat_activity a few times during the run. There were never more than 8 connections to the database.

As it turns out, 8 is the default value for the maximum number of connections allowed with the Apache Commons Database Connection Pool (DBCP). In our case this looks to be the first culprit, and explains why we never got any throughput increases after just a small number of client threads and why response time variance was so high. So let's bump the maximum DBCP connections up to 50 and see what it looks like:

Postgres	Tomcat

Nice! Not only did our throughput increase by about 50% but the response times are more consistent and have fewer extremes. The throughput graph shows that our throughput increases until about 40 client threads, which is below the maximum database connections. This suggests contention for the thread pool is no longer a big issue. Also the second core on the Postgres machine finally began to be utilized. Our system is spending less time queuing and more time working.

But check out the high IO wait times on the Postgres machine. 20-30% of CPU time is waiting on IO to complete rather than doing useful processing. This seems like a really high proportion of time. As I mentioned at the beginning, the test plan I'm running has a relatively high amount of writes. Other metrics on the Postgres machine related to disk IO (not reproduced here) also showed that this was a likely bottleneck. So I set about researching how to improve my Postgres configuration for write performance. The following were valuable resources:

I played around with a number of parameters. The most important for this workload turned out to be those related to writing WAL files. In particular, changing the following parameters to these values had the biggest impact:

synchronous_commit = off
full_page_writes = off

The results:

Postgres	Tomcat

Clearly a major improvement. Throughput increased by another 25%, and response times not only dropped by about 50% but are now very consistent. On the Postgres machine very little time is spent waiting on IO. In fact it looks like our throughput increased all the way up to 50 client threads, suggesting that if we increase the number of threads we'll see that the system is capable of even more.

One odd thing about the last test results is the periodic drops to zero throughput. That's a mystery I'll solve for you in a future post.

It's important to fully understand the impact of these settings before deciding to put them into production. This combination of these settings makes it possible to lose transactions and increases the chances of a corrupt WAL in the event of an OS crash or power failure (though not as much as turning off fsync). As such configuring Postgres in this way should only be done if you can tolerate or otherwise mitigate this possibility.

Generating Constant Bandwidth on Linux using fio

2010-03-28T11:05:00.000-07:00

It took a lot of searching for me to find a way to generate network traffic at a specific rate between two hosts, so I thought I would share the answer. It's pretty easy to test the available bandwidth between two hosts using netcat to transfer a bunch of random data as fast as the network allows. However I wanted to test resource monitoring and graphing system, which means I needed to generate network traffic at a known rate so that I could judge the resulting graphs against my expectations.

I found you can use fio, which is a generic I/O testing tool, to achieve this. fio allows specifying the transfer rate and also has a network engine. So using fio I can configure one host as the receiver and one as the sender and transfer data at a known rate. Here's what the config files look like:

Sender jobfile.ini:

[test1]
filename=receiver.dns.name/7777
rw=write
rate=750k
size=100M
ioengine=net

Receiver jobfile.ini:

[test1]
filename=localhost/7777
rw=read
size=100M
ioengine=net

Obviously you would replace "receiver.dns.name" with the DNS name of the receiving host, and adjust the size and rate parameters as you like. (It's worth noting that the fio documentation is either wrong or misleading on what the filename should be for the receiver. It claims the receiver should only specify the port, but when I tried that it failed to run. Setting the host to localhost seemed to work and the receiver started listening.) To run the test, simply run:

fio jobfile.ini

first on the receiving host, then on the sending host. fio will then transfer 100 Megabytes of data at a rate of 750KB/sec between the two hosts. And we can see from the chart that indeed a constant rate was generated:

The observed rate is a bit above the 750KB/sec specified, but what's being measured is the number of bytes being transferred through the eth0 interface. Since the data is transferred over TCP there is some overhead to the packet structure, which I believe accounts for the extra few KB/sec observed.

How I learned to stop worrying and love Unit Testing

2010-02-21T11:30:00.000-08:00

I admit it. Throughout my whole career at Microsoft, even as a Dev Lead, I was not a true believer in Unit Testing. That's not to say I didn't write unit tests or require my team to write tests. But I didn't believe that the benefits reaped from unit testing were sufficiently valuable given the time it took to write them (for me, about equal to the time to implement the product code itself).

Now, post-Microsoft, I am a true believer. A zealot even. I can't imagine a world in which I write code that has a ton of unit tests covering it.

So what changed? My eyes have been opened to a development world in which real testing infrastructure exists. In my former role, what I used was a testing framework known as Tux, which ships with Windows CE. It was enhanced for Windows Mobile and given a usable GUI. The result was something like JUnit, eg, a simple framework for defining test groups and specifying setup/teardown functions. The GUI was very much like the NUnit GUI.

So far, so good. There's nothing wrong with this setup. However, a test-running framework is necessary but not sufficient for unit testing. The missing piece was a mocking infrastructure.

One of the most frustrating things about working for Microsoft (and I'm sure the same is true of other big software firms) was that everything, and I do mean everything, had to be developed in-house. For legal reasons we couldn't even look at solutions available in the open source community. The predictable result is that a massive amount of effort is expended to duplicate functionality that already exists elsewhere. In many cases the reality of product schedules and resource constraints mean that we simply must do without certain functionality entirely. This was the case with mocking. Developers were left to create their own mocks manually, or figure out how to write a test without using mocks. I identified the lack of a mocking infrastructure as a major problem, but failed to do anything about it.

Exeunt Gabe stage-left from Microsoft to Kikini and a world of open source.

At Kikini we use JUnit for running tests and a simply beautiful component called Mockito for mocking. I cannot emphasize enough how wonderful Mockito is. Mockito uses Reflection to allow you to mock any class or interface with incredible simplicity:

MyClass myInstance = mock(MyClass.class);

Done. The mocked instance implements all public methods with smart return values, such as false for booleans, empty Collections for Collections, and null for Objects. Specifying a return value for a specific call is trivial:

when(myInstance.myMethod(eq("expected_parameter"))).thenReturn("mocked_result");

The semantics are so beautiful that I am certain that readers who have never heard of Mockito or perhaps have never even used a mocking infrastructure can understand what is happening here. When the method myMethod() is invoked on the mock, and the parameter is "expected_parameter", then the String "mocked_result" is returned. The only thing which may not be completely obvious is the eq(), which means that the parameter must .equals() the given value. The default rules still apply so that if a parameter other than "expected_parameter" is given, the default null is returned.

Verifying an interaction took place on a mock is just as trivial:

verify(myInstance).myMethod(eq("expected_parameter"));

If the method myMethod() was not invoked with "expected_parameter", an exception is thrown and the test fails. Otherwise, it continues.

Sharp-eyed readers will note that the functionality described so far requires that equals() be properly implemented, and when dealing with external classes this is sometimes not the case. What then? Let's suppose we have an external class UglyExternal, it has a method complexStuff(ComplexParameter param), and ComplexParameter does not implement equals(). Are we out of luck? Nope.

UglyExternal external = mock(UglyExternal.class);
MyClass myInstance = new MyClass(external);
myInstance.doStuff();
ArgumentCaptor<ComplexParameter> arg = ArgumentCaptor.forClass(ComplexParameter.class);
verify(external).complexStuff(arg.capture());
ComplexParameter actual = arg.getValue();
// perform validation on actual

This is really awesome. We're able to capture the arguments given to mocks and run whatever validation we like on the captured argument.

Now let's get even fancier. Let's say we have an external component that does work as a side-effect of a function call rather than a return value. A common example would be a callback. Let's say we're using an API like this:

public interface ItemListener {
    public void itemAvailable(String item);
}

public class ExternalClass {
    public void doStuff(ItemListener listener) {
        // do work and call listener.itemAvailable()
    }
}

Now in the course of doing its job, our class MyClass will provide itself as a callback to ExternalClass. How can we mock the interaction of ExternalClass with MyClass?

ExternalClass external = mock(ExternalClass.class);
doAnswer(new Answer() {
    @Override
    public Object answer(InvocationOnMock invocation) throws Throwable {
        Object[] args = invocation.getArguments();
        ItemListener listener = (ItemListener)args[0];
        listener.itemAvailable("callbackResult1");
        return null;
    }
}).when(external).doStuff((ItemListener)isNotNull());

We use the concept of an Answer, which allows us to write code to mock the behavior of ExternalClass.doStuff(). In this case we've made it so that any time ExternalClass.doStuff() is called, it will invoke ItemListener.itemAvailable("callbackResult1").

There is even more functionality to Mockito, but in the course of writing hundreds of tests in the past 9 months I have never had to employ any more advanced functionality. I would say that only 1% of tests require the fancy Answer mechanism, about 5% require using argument capturing, and the remainder can be done with the simple when/verify functionality.

The truly wonderful thing, and the point of my writing this blog entry, is that a mocking infrastructure like Mockito enables me to write effective unit tests very quickly. I would say that I spend 25% or less of my development time writing tests. Yet with this small time investment I have a product code to test code ratio of 1.15, which means I write almost as much test code as product code.

Even more important, the product code I write is perforce highly componentized and heavily leverages dependency injection and inversion of control, principals which are well-known to improve flexibility and maintainability. With a powerful mocking infrastructure it becomes very easy and in fact natural to write small classes with a focused purpose, as their functionality can be easily mocked (and therefore ignored) when testing higher-level classes. I have always been told that writing for testability can make your product code better, but I never really understood that until I had the right testing infrastructure to take advantage of.

Now, I'm a believer.

A Taxonomy of Software Developers

2010-02-07T20:35:00.000-08:00

After spending years of my previous life at Microsoft as a Dev, Tech Lead, and Dev Lead, I've worked with a broad range of software developers from the US, China, India, and all over the world. I've also been involved in interviewing well over a hundred candidates, and many hiring (and some firing) decisions. From this I've come up a taxonomy describing the characteristics of the various software developers I've encountered, how to spot them, and what to do with them.

Typical Developers

The hallmark of a Typical Developer is a relatively narrow approach to problem solving. When fixing a bug, they concentrate on their immediate task with little regard to the larger project. When they declare the bug fixed, what that means is that the exact repro steps in the bug will no longer repro the issue. However, frequently in fixing the issue described in the bug, they have missed a larger root cause, or have broken something else in the system. This is illustrated in Fig. 1:

In most cases the code a Typical Developer writes is a very small net improvement for the overall project when viewed from a release management perspective. Sometimes the traction is zero if the issue that they created is just as severe as the issue they fixed. Sometimes the traction is slightly positive if the issue they created or the case they missed is easier to fix than the original issue.

When viewed from an engineering management perspective, however, the picture is very different. This is due to the nature of the approach Typical Developers take when actually writing code. A typical bug has the form "under condition X, the project behaves as Y, when it should behave as Z." The Typical Developer is very likely to fix the problem in this way:

// adding parameter isX to handle a special case
void doBehavior(boolean isX) {
  // usually we want to do Y, but in this special case we should do Z.
  if (isX == true) {
    doBehaviorZ();
  } else {
    doBehaviorY();
  }
}

The Typical Developer simply figures out how to directly apply logic to the code that determines behavior, then make the code behave differently based on that. This is reasonable, but if it's the only way the developer can think of to change behavior, after a while working in the same code it begins to look something like this:

void doBehavior(boolean alternate, String data, File output, Enum enum) {
  if (enum == STATE_A) {
    doBehaviorA(data, alternate);
  } else if (enum == STATE_B && !(alternate || data == null)) {
    doBehaviorB(output);
  } else {
    switch(enum) {
      case STATE_B:
      case STATE_D:
        doBehaviorA(data, !alternate);
        // FALLTHROUGH!
      case STATE_C:
        doBehaviorC(output);
        if (alternate) {
          doBehavior(!alternate, null, null, enum);
        }
        break;
      default:
        // We should never get here!
        assert(false);
        break;
    }
  }
}

When I see code after months of a Typical Developer working on it, this is my reaction:

The Typical Developer will never take a step back and think "Hmm, we're getting a lot of these kinds of issues. Maybe the structure of our code is wrong, and we should refactor it to accommodate all the known requirements and make it easier to changes."

Now the project is in trouble. The team may be able to release the current version (often there is no alternative) after exhaustive manual testing, but the team can never be confident that they fully tested all the scenarios. The first priority after releasing will be to remove all the code written by the Typical Developer and write it from scratch.

Another characteristic of Typical Developers is insufficient testing. Often the code they write will be difficult or impossible to unit test. If unit testing is a requirement, they'll write tests which are just as bad as their code. In other words the tests will be unreliable, require big changes to get passing when a small code change is made, and not test anything important. Furthermore the same narrow approach to development shows through in manual testing. The Typical Developer will follow the steps in the bug when testing their fix, and never stop to think "what other behavior could be impacted by my change?"

Typical Developers are quite willing to chalk up their constant regressions and low quality to factors like "I'm working in legacy code" or "I'm not familiar with this area" or "the tools aren't good enough." Though all of those things may be true, that is the nature of software development, and Typical Developers don't understand how to change their environment for the better.

The root cause behind these failings is most often that the Typical Developer is simply not cut out for real software development. Because the software industry is so deeply in need of talent, no matter how marginal, Typical Developers will always find work. Hiring managers are too willing to fill manpower gaps in order to ship on time. (In fairness, Microsoft managers are pretty good about avoiding this pitfall. However, there are times when it is considered OK to "take a bet" on a marginal candidate.)

A special type of Typical Developer is the brilliant person who simply doesn't care enough. They're in software development because it pays well and they can skate by with putting in 40hrs a week. These Typical Developers are especially annoying because they'll employ their brilliance only when justifying their lazy workarounds, and not on actual design and implementation.

What should managers do with Typical Developers? In most cases manage them out as quickly as they can. Though a Typical Developer may be of use in the final push of releasing a project, in the long run having them working on a project is a net negative. Even if Typical Developers came for free, I wouldn't hire them. It is exceedingly rare for a Typical Developer to become a Good Developer, though in rare circumstances I've seen it happen under the guidance of Great Managers.

Good Developers

Good Developers fix bugs and deliver features on time, tested, and adaptable to future requirements. This is illustrated in Fig. 2:

Once a Good Developer delivers a bugfix or feature, typically that's the last you hear of it. A Good Developer will not fall into the traps that a Typical Developer does. When they see a pattern emerging they identify it and take steps to solve the issue once and for all. They are not afraid of refactoring. They'll come into your office and say "Hey, it's not sustainable to do all these one-off fixes for this class of issue. I'm going to need a week to re-do the whole thing so we never have to worry about it again." And you say great, please do it!

Good Developers will encounter the same environmental issues Typical Developers do, eg, legacy code, or weak tools. Good Developers will not let this stand. They'll realize that if a tool is not good enough to do a job, then they have to improve the tool or build a new tool. Once they've done that, then they'll get back to work on the original problem.

Good Developers are Good Testers. Their code is written to be testable, and because they are able to take a larger view, they have a good idea of the impact of their changes and how they should be tested. Pride is also a factor here. Good Developers would be embarrassed and shamed if they delivered something that wasn't stable.

From a release management perspective, Good Developers are well liked, though their perceived throughput may not be high since they are spending time making the system as a whole better and not just fixing a bug as fast as they possibly can. Good managers recognize and nurture this. Bad managers push them to put in the quick fix and deal with the engineering consequences in-between releases. Good Developers will protest against this but often acquiesce. A Good Developer in the hands of a Good Manager can turn into a Great Developer.

Managers should work hard to keep Good Developers since they're so hard to find and hire. That does not mean forcing them to remain on the team, as doing so risks turning a Good Developer into the "brilliant" variety of Typical Developer described above. Reward Good Developers well and give them interesting things to work on.

Great Developers

Exceedingly rare, the hallmark of the Great Developer is the ability to solve problems you didn't know you had. This is illustrated in Fig. 3:

When tasked with work, a Great Developer will take a holistic view of their task and the project they're working on along with full cognizance of the priorities upper management has for this release and the next. A Great Developer will understand the impact of a feature while it's still in the spec-writing phase and point out factors the designers, PMs, and managers hadn't thought of.

When designing and implementing a feature, a Great Developer will take the time to design in solutions to problems that Good Developers and Typical Developers have run into, even though they're not obviously connected. A solution from a Great Developer will often change how a number of components work and interact, solving a whole swath of problems at a stroke.

Similar to Good Developers, a Great Developer will never let lack of tools support or unfamiliar code deter them. But they'll also re-engineer the tools and legacy environment to such a degree that they create something valuable not only to themselves but to many others as well.

Unlike Good Developers, a Great Developer can almost never be coerced into compromising long-term quality for expediency. They'll either tell you flat out "no, we need more time, period" or they'll grumble and come in on the weekend to implement the real fix themselves.

Sometimes mistaken for a Great Developer is the Good Developer in Disguise. These Good Developers have recognized the impact on others that a Great Developer has, and seek to emulate that by engaging almost exclusively in side projects related to tools improvement and "developer efficiency" initiatives. The Good Developer in Disguise has no actual time to do their own work, but fools management into believing that they're Great Developers. Truly Great Developers improve their environment as a mere side effect of them doing their own job the way they think it ought to be done.

It goes without saying that Great Developers should be even more jealously guarded than Good Developers, with the same caveat about not turning them into prisoners. The flip side is that Great Developers should not be allowed to go completely off on their own into the wilderness. No doubt they will build something amazing, but it runs the risk of being something amazing that you don't need. Better to give broad, high-level goals and let them do their thing.

Final Note

Although I named Typical Developers "typical," I mean that they're typical in terms of the overall industry. Although there were enough Typical Developers at Microsoft, most fell into the Good Developer category.

Poor Beanshell Performance and Custom Functions for JMeter

2010-01-29T18:51:00.000-08:00

I'm building a relatively complex JMeter test plan to simulate load on the Kikini website. As soon as you need to do anything remotely complex, you exceed the capability of the built-in JMeter configuration elements and functions. The initial version of my test plan therefore used the BeanShell capability, which allowed me to do relatively complex things in a familiar language (BeanShell is essentially interpreted Java).

All fine and good until we need to run tests longer than 10 minutes or with more than 10 threads. An issue in BeanShell causes massive slowdowns if used inside loops (eg, inside a sampler), which in fact was what I was doing. When I worked around the issue by resetting the interpreter on each call, I found that JMeter was spending so much time processing BeanShell code that it couldn't effectively scale up to more than about 10 threads. The bottom line is that BeanShell is unfit for use if it must be called repeatedly in a JMeter test.

The only way I could find to get the complex behavior I want without compromising performance was to implement my own JMeter function. JMeter offers a number of simple functions out-of-the-box. Although JMeter isn't really an API, it does have a Function interface which you could implement. Then from inside any test element, you can call your function:

${__myFunction(arg1, arg2)}

And you'll get back a string that is the result of your function. Before we get to function class itself, there is some background to discuss.

First, JMeter isn't an API. But with a little bit of work, you can program against it. If you download the JMeter binary distribution, you can extract ApacheJMeter_core.jar. This JAR contains the interfaces you'll code against.

Second, you need a way to get your custom function onto JMeter's classpath. You can set the search_paths system property, and JMeter will find it. This is great because then you do not have to modify the JMeter distribution to use your custom functions.

Once you're ready with your custom JAR, you can invoke JMeter:

jmeter -Jsearch_paths=/path/to/yourfunction.jar

Alright, on to the code. This is a skeleton (please ignore the naming) which will simply return Array.toString() on the arguments you give:

package com.kikini.perf.jmeter.functions;

import java.util.ArrayList;
import java.util.Arrays;
import java.util.Collection;
import java.util.Collections;
import java.util.List;

import org.apache.jmeter.engine.util.CompoundVariable;
import org.apache.jmeter.functions.AbstractFunction;
import org.apache.jmeter.functions.InvalidVariableException;
import org.apache.jmeter.samplers.SampleResult;
import org.apache.jmeter.samplers.Sampler;

public class MaskUserIDFunction extends AbstractFunction {

    private static final List<String> DESC = Arrays.asList("uid_to_mask");
    private static final String KEY = "__maskUserID";

    private List<CompoundVariable> parameters = Collections.emptyList();

    @Override
    public String execute(SampleResult arg0, Sampler arg1) throws InvalidVariableException {
        List<String> resolvedArgs = new ArrayList<String>(parameters.size());
        for (CompoundVariable parameter : parameters) {
            resolvedArgs.add(parameter.execute());
        }
        // TODO: mask the user ID in resolvedArgs.get(0). For demo purposes,
        // just return the arguments given.
        return resolvedArgs.toString();
    }

    @Override
    public String getReferenceKey() {
        return KEY;
    }

    @SuppressWarnings("unchecked")
    @Override
    public void setParameters(Collection arg0) throws InvalidVariableException {
        parameters = new ArrayList<CompoundVariable>(arg0);
    }

    @Override
    public List<String> getArgumentDesc() {
        return DESC;
    }

}

There are a few crucial things to note here. The package name contains ".functions". That is a requirement, otherwise your function will not be recognized by JMeter. Notice that the type of the arguments is CompoundVariable. You must call execute() on them to resolve them to a String.

Otherwise this is relatively straightforward. Now I can call my function from inside a sampler:

And it will return the correct results:

So, how do Java functions perform versus the BeanShell functions? My test plan had about 10 samplers, most of which used BeanShell before, but now use native Java functions. My dedicated JMeter machine is a dual-core system with 2GB of RAM.

Before: JMeter maxed out at ~45 requests per second, 90%+ CPU usage
After: Generates 150+ requests per second with 2-3% CPU usage

Huge win! I don't actually know what the limit is now but I'm guessing I could get thousands of requests per second now.

Releasing simpledb-appender as open source

2010-01-24T20:49:00.000-08:00

I've released the SimpleDB appender I wrote as open source under the Apache 2.0 License. The project is hosted here:

http://code.google.com/p/simpledb-appender/

The purpose of this project is to allow Java applications using the SLF4J API with Logback to write logs to Amazon SimpleDB. This allows centralization of the logs, and opens powerful querying capabilities. Also scripts and tools are included so that even non-Java applications can have their stdout/stderr logged to SimpleDB as well.

The project is tested and works well. Developers familiar with SLF4J should have no problem integrating it into their apps. The documentation for using it as a tool for non-Java applications is a little weak but I have a demo shell script that should at least get folks started.

Let me know how it works for you!

Amazon Web Services Expanding into Asia

2010-01-14T11:06:00.000-08:00

Last year, I privately speculated that having launched datacenters in the Eastern US and Western Europe, the next obvious locations for Amazon Web Services (AWS) would be the Western US and Asia. In December 2009, AWS announced availability zones in Northern California.

What I didn't realize until today was the AWS actually announced their intentions to expand into Asia back in November 2009. Multiple availability zones will be available in Singapore in the first half of 2010.

Singapore does make some sense as a location. A glance at the map (source: openstreetmap.org) reveals that Singapore is pretty central, located roughly equidistant from China, India, and Australia. So if AWS is persuing a strategy to minimize the average global latency, it is probably a good choice. It also offers a relatively stable political and economic environment, though there is some political risk to locating yourself in an authoritarian country.

But when I first thought about a datacenter in Asia, my thought would have been hosting it in Korea. Korea is one of the most connected (in the data networking sense) countries on Earth, and is in close proximity to the other two most important markets in Asia: China and Japan. Korea is a very stable political and economic environment, and doesn't have the significant political risk associated with hosting in China or the less significant risk of Singapore. Latency from Korea to China and Japan is very low. I imagine the cost of running a datacenter in Korea is not much more expensive than Singapore, given that living standards are comparable.

Still, I can't complain. Hosting in Singapore will allow a better web experience for users throughout Asia. I hope to see AWS continue expanding geographically.

My Favorite Innovations of 2009

2009-12-25T14:31:00.000-08:00

One of the great things about our culture is constant innovation. I can honestly say that new products and services have made my life better in some way in 2009, and I'd like to call those out as a way of congratulating the people and companies who created them. None of the following actually came about in 2009, but 2009 was the year that I started to use them.

Amazon Kindle

The first time I saw a Kindle was in the Microsoft 117 Cafe, when Jerry Lin joined us for lunch and shared the latest gadget he'd received from Amazon. Jerry is known to be a prolific Amazon customer, notorious for receiving daily deliveries to his office (actually, he was rarely at work before the typical 11am-2pm delivery time, relying on the patience of neighboring officemates to sign for his packages). So it was no surprise to learn that his latest toy was a Kindle (version 1), which he demoed for us. The E Ink screen provides a reading experience much closer to paper than a computer display, making it less stressful on your eyes. But this atypical display, combined with the curious vertical silver reading position indicator and scrollwheel, make the device look like something envisioned in the 1960s: a bizarre amalgamation of analog and digital.

Nevertheless, the merits of the device were quite clear: newspapers, magazines, and books delivered wirelessly to a device with up to two weeks of battery life and a size and weight more compact than a single book. As a frequent (often international) traveler, I could immediately see the value of this. One of the less expected things about my time living in China was that I began to really miss the simple pleasure of reading. With something like the Kindle I could have an entire bookstore at my fingertips, anywhere in the world. Jerry did admit a downside: the available book selection, while large at hundreds of thousands, did not always contain the book you are looking for. But on the other hand the selection is large enough that you'll never run out of books you want to read. Jerry told us he had returned all the books he had bought from Amazon in recent years (which Amazon admirably allowed) and repurchased them on Kindle.

I promptly ordered a Kindle, which at the time (late 2008) was on "back order." In fact, it was no longer in production, and all pending orders were upgraded to the superior version 2. In Feb 2009 I finally received my Kindle 2.

The Kindle has changed my reading habits. The form factor makes the reading experience more pleasant, especially compared to reading large hardcover books. Wireless delivery of The Economist, which arrives Friday morning like clockwork, is a thousand times more reliable than receiving the same in the mail. By mail The Economist would arrive sometimes Friday, often times Saturday, and disappointingly often on Monday, which would leave me without enough time to read a full issue before the next issue arrived. The availability of an iPhone client and the capability of the Kindle and the iPhone to sync last-read positions makes it possible to read on-the-go without missing a beat. The overall result is that with the Kindle, I find myself reading more.

I'd like to take a moment to emphasize this last point, and note that the same has been true every time a medium has evolved, despite criticism from those who oppose or fear change. When the phonographic record gave way to the CD, many viewed this as a step backward for recorded music. They moaned that the digital CD could never capture the nuances of an analog record, and the small packaging made album art less relevant. The truth is that the CD is capable of storing and playing back audio with a fidelity that comfortably exceeds the capability of most humans to perceive. The loss of a few square inches of medium for album art is regrettable, but it was never something that was important to the experience of music, much less central to it. But besides affording listeners a higher fidelity listening experience, the slimmer, smaller CD enabled listening to music not just in the home, but in the car and on the sidewalk and in the subway in a way that records or tapes never could. CD recorders enabled people to make flawless copies of their collections for their cars or public transit commutes. Listening to music was now a ubiquitous feature of life. I don't need to point out how this became even more true with the advent of MP3 encoding and devices like the iPod. It is far, far easier to count people not wearing earbuds on the subway or bus than counting those with. All this in the face of the exact same tired criticism from the same old critics.

As it was with music and CDs/MP3s, so it is and will be with books and eBooks. Yes, eBooks as they exist today have lower fidelity compared to paper. Devices like the Kindle 2 support only 16 shades of black and white, and dealing with images and photographs is clunky. If anyone doubts that these problems will be solved in the next couple years along with the inevitable march of technological progress, I'm prepared to back up my confident words with a wager. However, I doubt any readers would dare bet against this. And let's look at what even the primitive readers allow today: reading essentially your whole library at any time and place. No longer do I have to choose which single book to take with me on a trip, nor need I attempt to stuff a 600 page hardcover in my laptop bag to read on the bus. All the books I own and thousands that I don't are available to me in a convenient package. And even if I find myself waiting in a lobby for 20 minutes without my Kindle, I have my iPhone, were the book I'm reading is waiting for me, synced to the page I last read on my Kindle.

The Kindle represents more than just a cool device and a premium reading experience. I'm sorry if you like the smell of ink, or the texture of paper, or displaying your book collection on shelves as though they were trophies. The Kindle represents the beginning of a resurgence in reading, making books and newspapers and knowledge much easier for everyone to obtain. After all, that's what reading is about, right?

Zipcar

Zipcar offers members by-the-hour car rentals in urban areas. Scheduling is done online or via an iPhone app, and can be done mere minutes before you get the car. Members use a special magnetic card (or the iPhone app) to lock and unlock the car; keys and a gas card are inside. In most cases the cost is less than $10/hr, which gets you a standard compact like a Honda Civic, or a light utility vehicle like a Scion xB, and includes mileage, insurance ($500 deductible) and gas. In urban centers, garages containing zipcars are located every few blocks.

The result is that for people who live in urban areas that have fairly good public transportation and where car ownership is prohibitively expensive (parking in downtown San Francisco runs $500/month), Zipcar is an excellent option. It's perfect for me, since I visit San Francisco every few weeks.

Moreover, it changes the equation a bit for people who are deciding where to live. Although it is often more expensive to live in areas like downtown where good public transportation is available, if you can do away with the expense of owning a car, living closer downtown becomes more viable. This is a net positive since living closer to where you work, shop, and play puts less stress on both the environment and your pocketbook.

Netflix Watch Instantly Streaming

I've been a customer of Netflix since 2004. I've always thought their model for renting DVDs was almost perfect: huge selection, low hassle, very convenient, and affordable. In the past few years Netflix has been quietly transforming itself into a company that deals in streaming content as well as their traditional rent-by-mail service. It's heartening to see a company acknowledge the future and embrace change rather than fear and reject it.

The change I'm talking about is the diminishing importance of physical media for movies. Before Blu-ray even came out, pundits spoke of it as the last physical format for movies. For mass-market purposes, they are probably right. At normal viewing distances and screen sizes, 1080p Blu-ray discs are not too far off from the limit of human perception of detail in moving images. Certainly 1440p and higher will eventually come out, but the difference between that and what's currently available will be unnoticeable to most. In short, there is little compelling reason for another revolution in disc formats.

The advantages of delivering movies over the Internet are clear: cost, convenience, selection. The question is who's going to deliver the content, and how's it going to get onto the TV in my living room? Netflix wants to be the one to do that, and to some extent they already are.

Netflix's now has a substantial catalog of titles available for streaming. Customers paying as little as $8.99 a month can stream unlimited content. Though a lot of the available content is cruft, I have noticed that more and more I am able to find quality content. I've been watching Lost on Netflix streaming, available in HD to boot. There are tons of great movies available, from classics to new releases, though almost never any recent hits. But the quality and quantity has been moving relentlessly upward.

So, how does it end up on my TV? Because if it's just on my 13" laptop screen, that will never replace DVD, let alone Blu-ray. For my roommate Eddy's birthday, he received a Blu-ray player, with built-in Netflix streaming capability, which is not uncommon among Blu-ray players (as well as a capability of the XBox360 and PS3, already in millions of living rooms). The device has WiFi, and can connect via my Netflix account to my "watch instantly" queue. In the several months that we've had the player, we've haven't played a single Blu-ray disc, but we've watched at least a hundred hours of streaming Netflix.

I look forward to Netflix making more deals and expanding their TV and movie selection, as well as offering more titles in HD. Wave of the future, dude, 100% electronic.

Honorable Mention: Virgin America

It doesn't necessarily fit into the category of "innovative," but certainly VA has changed things for the better. With its fleet of new A319/A320 planes, with live TV and on-demand music, TV, and movies, flying VA is very comfortable. The in-flight entertainment system even allows ordering food and drinks. Additionally, WiFi is available in-flight for about $10.

All this is nice, but it should be standard for new aircraft. The primary way VA has made things better is by giving other carriers some real competition. Previously, the best deals flying SEA/SFO were typically with Alaska, with its not-so-new fleet of 737s. A typical roundtrip ran $250-$300. VA flights can be found for as little as $39 one way (plus tax). A typical SEA/SFO roundtrip costs $110 with tax. This low-cost, comfortable and convenient flight has allowed me and my girlfriend to see each other quite often.

Gotta love a competitive market!

Loose Dependency Injection

2009-12-20T16:09:00.000-08:00

In the past year or so I've come to see the immense value in the principal of Inversion of Control (IoC)/Dependency Injection (DI) (see Fowler), and frameworks like Spring. Besides keeping classes and components isolated and focused in purpose, it also makes testing easier because instead of injecting real implementations, you can inject mocks into the component under test.

However, like any good idea, if taken to the extreme it becomes counterproductive. If everything a moderately complex class did was abstracted and injected, you would end up with a confusing and incoherent jumble of tiny classes. You would also risk exposing too much of the internals of a class by requiring any consumer of that class to create and inject pieces unrelated to the behavior the consumer wishes to dictate.

Let's make a simple example. Suppose we had a component that resizes an image. But in order to complete its work, needs to create a temporary file. Let's first take a look at an implementation that doesn't use DI.

public class ImageResizer {

    public File resizeImage(File image) throws IOException {
        File tmp = File.createTempFile("tmp", null);
        
        // do work on tmp ...
    }
}

Simple enough, but how are we going to test this? It uses a static method, which we don't own and can't change, to create the temporary file. We don't have any way to mock it or inspect it, so we're pretty much out of luck for testing it.

Now let's use a strict form of DI. We'll abstract the temporary file creation into a separate class, and require consumers to provide an implementation at construction time.

public class ImageResizer {

    /** Abstraction of temp file management */
    public static class TempFileFactory {
        File createTempFile() throws IOException {
            return File.createTempFile("tmp", null);
        }
    }
    
    private final TempFileFactory fileFactory;
    
    /** Dependency-injection constructor */
    public ImageResizer(TempFileFactory fileFactory) {
        this.fileFactory = fileFactory;
    }
    
    public File resizeImage(File image) throws IOException {
        File tmp = fileFactory.createTempFile();
        
        // do work on tmp ...
    }
}

Better. At least we can write a test to mock TempFileFactory, inject the mock into the ImageResizer, and validate the interactions between ImageResizer and the temporary file. But now we've burdened consumers of ImageResizer -- which simply want to resize a file -- with the requirement of managing temporary files (by creating a TempFileFactory; alternately we could have required consumers to inject a temporary file, which is probably even worse) and the awkward knowledge that ImageResizer uses temporary files. If we made a breakthrough in the ImageResizer so that it no longer needed to use a temporary file, all the consumers would need to change their code.

So how do we get the benefits of testability and isolation without this downside? We still embrace the concept of DI but use defaults to hide this from consumers, in what I call "Loose Dependency Injection"

public class ImageResizer {

    /** Package-private abstraction of temp file management */
    static class TempFileFactory {
        File createTempFile() throws IOException {
            return File.createTempFile("tmp", null);
        }
    }
    
    private final TempFileFactory fileFactory;
    
    /** Public constructor, injects its own dependency */
    public ImageResizer() {
        this.fileFactory = new TempFileFactory();
    }
    
    /** Package-private constructor for use by test */
    ImageResizer(TempFileFactory fileFactory) {
        this.fileFactory = fileFactory;
    }
    
    public File resizeImage(File image) throws IOException {
        File tmp = fileFactory.createTempFile();
        
        // do work on tmp ...
    }
}

Fundamentally we're still using DI; the difference is that there's only one implementation of the dependency, and it is "injected" by the default constructor. The consumer has no knowledge that the ImageResizer has anything to do with temporary files. ImageResizer could change to not use temporary files, and no client code would need to change. Tests for ImageResizer are easy to write because we can mock ImageResizer.TempFileFactory. The best of all worlds!

Scaling up is out, scaling out is in

2009-11-25T16:10:00.000-08:00

One of the more interesting, if less visible, trends in the past half-decade has been that clock speeds on modern CPUs have stagnated. I'm writing this post on my Macbook, which turns one year old next week. It's equipped with a 2GHz processor and 2GB of RAM. It's the first computer I've bought since 2002, when I built a ~1.2GHz Athlon system with 1GB of RAM. Instead of a factor of 10 faster in 8 years, it's a factor of less than two, and I don't think we're going to see much more in terms of clock speed in the future. Check out the graph below:

Since around 2002 clock speeds have held steady at about 2GHz. The primary constraint has been thermal. As processors moved into the multi-GHz range they started to dissipate up to 100W of heat, which becomes impractical to cool (ever had your legs burned by your laptop?). "Scaling up" clock speeds had hit a wall. So hardware engineers had to focus on other ways of making things faster. They did some increasingly clever things like superscalar execution (dispatching multiple instructions per clock cycle), new specialized instructions (SSE, etc), hyperthreading (a single processor appearing as two processors to the OS), then on to the logical conclusion of multi-core (multiple CPU dies in a single package). Performance now comes from "scaling out" to multiple cores, and if you're running a service, multiple machines.

The consequence of this shift from faster clock cycles to more processors has been that after decades of sitting on their asses and waiting for the next doubling of clock speeds to make up for their lazy coding, software engineers have to actually write code differently to get it to run fast. This could mean traditional optimization, re-writing existing code to run faster without fundamentally changing the approach to the problem. But increasingly it means taking advantage of the way hardware is evolving by writing code to take advantage of multiple cores by splitting the problem into independent pieces that can be executed simultaneously.

To some degree the service we're building at Kikini can naturally take advantage of multiple cores, since we're serving many simultaneous requests. However, due to the transactional nature of databases, there is a limit to how much performance you can get by simply adding more cores. Write operations require locks which cause other transactions to fail, so even if you had infinite cores you'd still be constrained by how your design your database.

All this points to three main ways to achieve high performance:

Optimize individual queries
Design queries and the database schema to minimize locking to take advantage of multiple cores
Partition data in clever ways to spread the load across multiple servers

Fundamental to this is to always be measuring, which is why it's important to have an automated system like I described earlier this month so that engineers can stay focused on the important stuff.

Working Around JSVC's Logging Limitations

2009-11-22T15:01:00.000-08:00

JSVC is a popular option for people using Tomcat as their web container. The main advantage of JSVC is that it allows downgrading the user running a process (since most Linux systems require the root user to open a port below 1024), and also acts as a watchdog to restart the JVM if it crashes. However one big problem with JSVC is that it can only write the output of the JVM it's hosting to two files on the filesystem corresponding to stdout and stderr. This is problematic since it doesn't allow for log rotation or any other form of redirection.

At Kikini, we created a logging solution to append log statements into SimpleDB so that logs from all our machines end up in a central location, unbounded by normal filesystem limits, and easily query-able against and monitored, allowing us to react quickly to diagnose problems. The simplest way to use our logger is to redirect the output from the target process to the stdin of our logging process. However JVSC makes this rather difficult since it is hard-coded to only write to files on the filesystem.

Fortunately we have a trick up our sleeve in the form of UNIX named pipes, which can use as a target for JSVC to write to and a source for the logger to read from:

mkfifo pipe.out
mkfifo pipe.err
/usr/bin/startlogger.sh STDOUT < pipe.out
/usr/bin/startlogger.sh STDERR < pipe.err
/usr/bin/jsvc -outfile pipe.out -errfile pipe.err ...

Now JSVC will start up, and write into the pipes we created, which will be redirected into the mylogger processes.

Using Maven Chronos Without an External JMeter Install

2009-11-13T12:57:00.000-08:00

Performance is one of the things we're really focused on at Kikini. But we want to stay focused on actually improving performance, and not spending a lot of cycles making manual measurements and interpreting logs. JMeter is probably the best open-source tool out there for measuring performance of a web application. I designed a JMeter test plan to simulate users visiting our site. Unfortunately while JMeter is great at making measurements, it stops short of data analysis and reporting.

Ideally we would like to get perf reports out of every build, which means we would like to do reporting as part of our Maven build, with results available as easily readable charts on our build server. The top hit you're likely to get from searching for "maven jmeter" is the awful JMeterMavenPlugin. I say awful because it wasn't easy to integrate, and if you look at the source code it's obvious that the project was done in spare time. There are a number of comments in the source like "this mess is necessary because..." which makes me think the whole thing is poorly designed, and if you search around you will indeed find that there are a number of problems people have encountered trying to use it. Finally, the output from the plugin is just the simple JMeter log, and not the reports I'd like.

All the way down in the middle of the second page of the Google results I found this gem: chronos-maven-plugin. Not only does this look like a well-designed and well-executed project, it produces wonderful HTML reports, perfect for plugging into our build server! This is a snippet of what the Chronos output looks like:

The only downside is that the Chronos plugin requires an external install of JMeter, which kind of defeats the whole purpose of Maven. Fortunately, inspired by an Atlassian post, I worked out a way to use the Chronos plugin without making JMeter a manual install by using the maven-dependency-plugin. First I deployed the JMeter ZIP file as an artifact on our Artifcatory repository:

<?xml version="1.0" encoding="UTF-8"?>
<project>
  <modelVersion>4.0.0</modelVersion>
  <groupId>org.apache.jmeter</groupId>
  <artifactId>jmeter</artifactId>
  <version>2.3.4</version>
  <packaging>zip</packaging>
  <description>Artifactory auto generated POM</description>
</project>

In my POM, I set jmeter.home to the location that we'll be unpacking JMeter into:

<properties>
  <jmeter-version>2.3.4</jmeter-version>
  <jmeter.home>${project.build.directory}/jakarta-jmeter-${jmeter-version}</jmeter.home>
</properties>

Next I use the dependency plugin in the pre-integration-test step to unpack JMeter into the target folder:

<plugin>
  <groupId>org.apache.maven.plugins</groupId>
  <artifactId>maven-dependency-plugin</artifactId>
  <version>2.1</version>
  <executions>
    <execution>
      <id>unpack-jmeter</id>
      <phase>pre-integration-test</phase>
      <goals>
        <goal>unpack</goal>
      </goals>
      <configuration>
        <artifactItems>
          <artifactItem>
            <groupId>org.apache.jmeter</groupId>
            <artifactId>jmeter</artifactId>
            <version>${jmeter-version}</version>
            <type>zip</type>
          </artifactItem>
        </artifactItems>
        <outputDirectory>${project.build.directory}</outputDirectory>
      </configuration>
    </execution>
  </executions>
</plugin>

Finally I configure Chronos to run:

<plugin>
  <groupId>org.codehaus.mojo</groupId>
  <artifactId>chronos-maven-plugin</artifactId>
  <version>1.0-SNAPSHOT</version>
  <configuration>
    <input>${basedir}/src/test/jmeter/UserSession.jmx</input>
  </configuration>
  <executions>
    <execution>
      <goals>
        <goal>jmeter</goal>
        <goal>savehistory</goal>
      </goals>
    </execution>
  </executions>
</plugin>

Bingo. Now anyone running our build can get the JMeter performance reports with nothing more complex than running "mvn verify chronos:report".