Thursday, November 4, 2010

Connection keep-alive timeouts for popular browsers

Recently I needed to know how long the popular browsers will keep an HTTP keep-alive connection before closing it. I was able to find documented values for IE and FireFox, but not other browsers. In fact I couldn't even find much in the way of anecdotes. So for the other browsers I decided to find out myself by testing against a Tomcat server configured to an hour-long keep-alive timeout. I then used each browser to make a single request and observed the TCP streams in Wireshark. Here are the results:

  • IE: 60 seconds (documentation)
  • FireFox: 300 seconds (documentation)
  • Chrome: 300 seconds (observed)
  • Safari: 30 seconds (observed)
  • Opera: 120 seconds (observed)

Note that for IE and FireFox these values are configurable by the user, and the developers behind the other browsers may change the timeout in future releases.

Friday, September 17, 2010

Authoring multipart Ubuntu cloud-init configuration with Java

Canonical's wonderful Amazon EC2 Images come with a powerful configuration tool called cloud-init that lets you pass configuration via user-data. One of the more interesting capabilities is that cloud-init allows a combination of different configuration payloads using MIME as a system for aggregating parts.

Below is an example of how to create a multipart configuration compatible with cloud-init using Java:

import java.util.Properties;

import javax.mail.Session;
import javax.mail.internet.MimeBodyPart;
import javax.mail.internet.MimeMessage;
import javax.mail.internet.MimeMultipart;


public class CloudInitMultipart {

    public static void main(String[] args) throws Exception {
        String config = "#cloud-config\n" 
            + "mounts:\n" 
            + " - [ sdf, /mnt/data, \"auto\", \"defaults,nobootwait\", \"0\", \"0\" ]\n\n" 
            + "packages:\n"
            + " - emacs23-nox\n\n";
        MimeMultipart mime = new MimeMultipart();
        MimeBodyPart part1 = new MimeBodyPart();
        part1.setText(config, "us-ascii", "cloud-config");
        part1.setFileName("cloud-config.txt");
        MimeBodyPart part2 = new MimeBodyPart();
        String script = "#!/bin/bash\n\n" 
            + "NOW=`date +%s`\n" 
            + "touch /mnt/$NOW";
        part2.setText(script, "us-ascii", "x-shellscript");
        part2.setFileName("runme.sh");
        mime.addBodyPart(part1);
        mime.addBodyPart(part2);
        MimeMessage msg = new MimeMessage(Session.getDefaultInstance(new Properties()));
        msg.setContent(mime);
        msg.writeTo(System.out);
    }
}

This will create a multipart configuration combining a cloud-config element which installs emacs and creates an fstab entry, and also runs a bash script that creates a file. The output can then be used as user-data for launching an EC2 instance with this configuration.

Tuesday, September 7, 2010

How to Build Terracotta from Source

It seems that the folks at Terracotta have decided to make it nearly impossible to download any version older than the current version. As is common in real-world applications, sometimes it is desirable to stay on a version a little behind the bleeding edge because you know what you've got works for what you're doing. Terracotta has made things more difficult than usual by holding back a critical fix for a compatibility issue between Java 1.6.0_20 and Terracotta 3.2.0. The fix is available in version 3.2.2, which is only available to customers with a support contract with Terracotta.

So, I'll show you how to build 3.2.2 from source. It's a little trickier than implied in the above-linked thread, and the Terracotta Build Page doesn't explain it all.

First, we need to check out the 3.2.2 source code:

svn co http://svn.terracotta.org/svn/tc/dso/tags/3.2.2

Next, set up some required environment variables. Terracotta needs to know where your JRE, JDK, and Ant live. The following locations worked for me on my Ubuntu 10.04 install with Sun's Java 6, substitute the locations for your OS/Java distro:

export ANT_HOME=/usr/share/ant
export JAVA_HOME=/usr/lib/jvm/java-6-sun/jre
export JAVA_HOME_16=/usr/lib/jvm/java-6-sun

In my case, I only have Java 6, and I don't care about previous versions of Java. So we need to instruct the Terracotta build system to not try to use older releases. Modify the file 3.2.2/code/base/jdk.def.yml to comment out the Java 1.5 stuff:

#
# All content copyright (c) 2003-2006 Terracotta, Inc.,
# except as may otherwise be noted in a separate copyright notice.
# All rights reserved
#

# Defines the various JDKs used by the build system.
#
# Each JDK specification begins with a unique name that uniquely identifies
# the JDK version.  After the name come the following attributes:
#
#   min_version: The minumum JDK version
#   max_version: The maximum JDK version
#   env: A list of names of configuration properties that the build system uses
#        to locate the JDK installation
#   alias: A list of alternative names for the JDK

#J2SE-1.5:
#    min_version: 1.5.0_0
#    max_version: 1.5.999_999
#    env:
#      - J2SE_15
#      - JAVA_HOME_15
#    alias:
#      - tests-1.5
#      - "1.5"

JavaSE-1.6:
    min_version: 1.6.0_0
    max_version: 1.6.999_999
    env:
      - JAVASE_16
      - JAVASE_6
      - JAVA_HOME_16
    alias:
      - tests-1.6
      - "1.6"

Ok, now we're ready to build. Here's what I used to build the core Terracotta distribution using the Sun JDK. You may need to tweak the jdk parameter as needed for the location of your jdk.

cd 3.2.2/code/base
./tcbuild --no-extra dist dso OPENSOURCE jdk=/usr/lib/jvm/java-6-sun

The build will download all its dependencies and compile the Terracotta release. Note that this is the core distribution only, it does not build TIMs or anything like that. Once the build is complete, there will be a new folder

3.2.2/code/base/build/dist

This contains the Terracotta distribution that you would have downloaded.

Monday, July 5, 2010

BASH Script Self-Destructs, grep to the Rescue

I was working on a bash script and periodically testing it out. It had gotten somewhere in the 30-40 line range when I made a fatal error. I added an if statement that looked something like this:

var1=0
var2=$0
if [ $var1 > $var2 ]; then
  echo "true"
else
  echo "false"
fi

I had actually made two mistakes. First, I meant for var2 to have a value of $?, which is the exit value of the last command and not the path of the script itself, which is what $0 evaluates to. Second, I used the > operator instead of -gt as required by BASH. So what the if statement ended up doing was redirect /dev/null into the file containing my script! After running this code, your script self-destructs into a 0-byte file. I was particularly annoyed because writing a decent-sized BASH script is a meticulous process and one that I'm obviously not expert at, and so I was looking at a good half-hour of lost work.

Arcane UNIX nonsense had gotten me into this mess, I figured it could get me out as well. My file was very likely still sitting on some sector somewhere on the disk. I knew my script contained the word "COLLECTD_SOCKET" which wasn't likely to appear anywhere else on the drive. So I unmounted the filesystem (on device /dev/sdf) and ran the following command:

grep -A40 -B10 COLLECTD_SOCKET /dev/sdf

What this does is search the raw contents of the entire drive at the device level for the term "COLLECTD_SOCKET" and print the 10 lines before the match and 40 lines after the match. It took a little while (as you'd expect for reading the whole device) but I found a number of old versions of the script I was working on, including the version just before by bug caused it to self-destruct.

I guess the lesson here is that UNIX gives you lots of ammunition to shoot yourself with, but it also gives you plenty of gauze to help you heal yourself as well.

Thursday, June 3, 2010

Setting up Collectd Collection3 on Ubuntu Lucid 10.04

Unfortunately the wiki on how to set up collection3 is not that great. In particular it glosses over how to configure apache. But if you're running Ubuntu Lucid 10.04, it's actually pretty easy to set up collectd and collection3. I'll walk you through the steps.

First, you'll need to install the needed dependencies:

sudo apt-get update -y
sudo apt-get install -y apache2 libconfig-general-perl librrds-perl libregexp-common-perl libhtml-parser-perl collectd-core

Then we need to configure collectd to sample some data and store the data as RRDs. Drop this file in /etc/collectd/collectd.conf

LoadPlugin cpu
LoadPlugin load
LoadPlugin memory
LoadPlugin disk
LoadPlugin rrdtool
<Plugin rrdtool>
  DataDir "/var/lib/collectd/rrd/"
</Plugin>

Next we configure apache to use collection3. Copy this file into /etc/apache2/conf.d/collection3.conf

ScriptAlias /collectd/bin/ /usr/share/doc/collectd-core/examples/collection3/bin/
Alias /collectd/ /usr/share/doc/collectd-core/examples/collection3/

<Directory /usr/share/doc/collectd-core/examples/collection3/>
    AddHandler cgi-script .cgi
    DirectoryIndex bin/index.cgi
    Options +ExecCGI
    Order Allow,Deny
    Allow from all
</Directory>

Now let's start collectd and restart apache:

sudo /etc/init.d/apache2 reload
sudo /etc/init.d/collectd start

It'll take collectd a minute to gather enough data to usefully graph. Then you can point your browser to http://your.host.name/collectd/

And you'll be able to graph data!



Note 1: You may need to choose "hour" from the pulldown if you just started collectd, since it doesn't have enough data to graph a day yet
Note 2: The apache configuration is not secure; anyone could just navigate to your machine and see those graphs. Use SSL/.htaccess or other methods to lock down access

Wednesday, May 5, 2010

Migrating Unfuddle Tickets to JIRA

I found myself needing to migrate bugs from Unfuddle, which exports them in a custom XML format, to JIRA, which can import CSV (documentation). I threw together a quick Java class to help me do this. It takes backup.xml generated from Unfuddle and creates a CSV which can be read by JIRA. It imports the following fields:
  • Summary
  • Status
  • Description
  • Milestone (as a custom JIRA field)
  • Assignee
  • Reporter
  • Resolution (if resolved)
  • Resolution description (as a comment)
  • Creation time
  • Resolved time (if resolved)
Furthermore it outputs the bugs in the order of the ID in Unfuddle, so that if you're importing into an empty JIRA project, the bugs will have the same number as in Unfuddle. It assumes the JIRA usernames correspond to Unfuddle usernames, though you can easily map differences by modifying the lookupUser function. Once you generate the CSV, you can give the configuration file below to the JIRA CSV Import wizard to take care of the mappings. You'll want to update
  • existingprojectkey
  • user.email.suffix
to match your project. There are a few notable things that are missed with this tool:
  • Time of day for creation/resolved
  • Comments
The tool should run without modification and requires only Joda Time as a dependency under JDK 1.6. This is total slapdash, quick-n-dirty, git-er-done code for a one-off conversion. If anyone would like to extend this tool or generalize it, that would be great :)

Java class UnfuddleToJira.java:

// Original author Gabe Nell. Released under the Apache 2.0 License
// http://www.apache.org/licenses/LICENSE-2.0.html

import java.io.FileOutputStream;
import java.io.PrintStream;
import java.util.ArrayList;
import java.util.Collections;
import java.util.HashMap;
import java.util.List;
import java.util.Map;

import javax.xml.parsers.DocumentBuilderFactory;

import org.joda.time.DateTime;
import org.joda.time.format.DateTimeFormat;
import org.joda.time.format.DateTimeFormatter;
import org.w3c.dom.Document;
import org.w3c.dom.Element;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;

public class UnfuddleToJira {

    private static final DateTimeFormatter DATE_FORMATTER = DateTimeFormat.forPattern("yyyyMMdd");

    private final Document doc;
    private final PrintStream output;
    private final Map<String, String> milestones;
    private final Map<String, String> people;

    public UnfuddleToJira(Document doc, PrintStream output) {
        this.doc = doc;
        this.output = output;
        this.milestones = parseMilestones(doc);
        this.people = parsePeople(doc);
    }

    private static Map<String, String> parseMilestones(Document doc) {
        Map<String, String> milestones = new HashMap<String, String>();
        NodeList milestoneNodes = doc.getElementsByTagName("milestone");
        for (int i = 0; i < milestoneNodes.getLength(); i++) {
            Element elem = (Element)milestoneNodes.item(i);
            String title = elem.getElementsByTagName("title").item(0).getTextContent();
            String id = elem.getElementsByTagName("id").item(0).getTextContent();
            milestones.put(id, title);
        }
        System.out.println("Found " + milestones.size() + " milestones: " + milestones);
        return milestones;
    }

    private static Map<String, String> parsePeople(Document doc) {
        Map<String, String> people = new HashMap<String, String>();
        NodeList peopleNodes = doc.getElementsByTagName("person");
        for (int i = 0; i < peopleNodes.getLength(); i++) {
            Element elem = (Element)peopleNodes.item(i);
            String name = elem.getElementsByTagName("username").item(0).getTextContent();
            String id = elem.getElementsByTagName("id").item(0).getTextContent();
            people.put(id, name);
        }
        System.out.println("Found " + people.size() + " people: " + people);
        return people;
    }

    private static String prepareForCsv(String input) {
        if (input == null) return "";
        return "\"" + input.replace("\"", "\"\"") + "\"";
    }

    private static String convertDate(String input) {
        return DATE_FORMATTER.print(new DateTime(input));
    }

    private String lookupUser(String id) {
        String person = people.get(id);
        /**
         * Here you can transform a person's username if it changed between
         * Unfuddle and JIRA. Eg: <tt> 
         * if ("gabe".equals(person)) {
         *     person = "gabenell";
         * }
         * </tt>
         */
        return person;
    }

    private String lookupMilestone(String id) {
        return milestones.get(id);
    }

    private void writeCsvHeader() {
        StringBuilder builder = new StringBuilder(256);
        builder.append("Summary, ");
        builder.append("Status, ");
        builder.append("Assignee, ");
        builder.append("Reporter,");
        builder.append("Resolution,");
        builder.append("CreateTime,");
        builder.append("ResolveTime,");
        builder.append("Milestone,");
        builder.append("Description");
        output.println(builder.toString());
    }

    private void writeCsvRow(Ticket ticket) {
        StringBuilder builder = new StringBuilder(256);
        builder.append(prepareForCsv(ticket.summary)).append(", ");
        builder.append(prepareForCsv(ticket.status)).append(", ");
        builder.append(prepareForCsv(lookupUser(ticket.assigneeId))).append(", ");
        builder.append(prepareForCsv(lookupUser(ticket.reporterId))).append(", ");
        builder.append(prepareForCsv(ticket.resolution)).append(", ");
        builder.append(prepareForCsv(convertDate(ticket.createdTime))).append(", ");
        String resolveTime = ticket.resolution != null ? convertDate(ticket.lastUpdateTime) : null;
        builder.append(prepareForCsv(resolveTime)).append(", ");
        builder.append(prepareForCsv(lookupMilestone(ticket.milestoneId))).append(", ");
        builder.append(prepareForCsv(ticket.description));

        // JIRA doesn't have the notion of a resolution description, add it as a
        // comment
        if (ticket.resolutionDescription != null) {
            builder.append(",").append(prepareForCsv(ticket.resolutionDescription));
        }
        output.println(builder.toString());
    }

    public void writeCsv() throws Exception {
        NodeList ticketNodes = doc.getElementsByTagName("ticket");
        List<Ticket> tickets = new ArrayList<Ticket>();
        for (int i = 0; i < ticketNodes.getLength(); i++) {
            Node node = ticketNodes.item(i);
            Element nodeElem = (Element)node;
            Ticket ticket = new Ticket();
            NodeList ticketElements = nodeElem.getChildNodes();
            for (int j = 0; j < ticketElements.getLength(); j++) {
                Node ticketSubNode = ticketElements.item(j);
                String nodeName = ticketSubNode.getNodeName();
                if ("id".equals(nodeName)) {
                    ticket.id = ticketSubNode.getTextContent();
                } else if ("status".equals(nodeName)) {
                    ticket.status = ticketSubNode.getTextContent();
                } else if ("summary".equals(nodeName)) {
                    ticket.summary = ticketSubNode.getTextContent();
                } else if ("description".equals(nodeName)) {
                    ticket.description = ticketSubNode.getTextContent();
                } else if ("milestone-id".equals(nodeName)) {
                    ticket.milestoneId = ticketSubNode.getTextContent();
                } else if ("assignee-id".equals(nodeName)) {
                    ticket.assigneeId = ticketSubNode.getTextContent();
                } else if ("reporter-id".equals(nodeName)) {
                    ticket.reporterId = ticketSubNode.getTextContent();
                } else if ("resolution".equals(nodeName)) {
                    ticket.resolution = ticketSubNode.getTextContent();
                } else if ("resolution-description".equals(nodeName)) {
                    ticket.resolutionDescription = ticketSubNode.getTextContent();
                } else if ("created-at".equals(nodeName)) {
                    ticket.createdTime = ticketSubNode.getTextContent();
                } else if ("updated-at".equals(nodeName)) {
                    ticket.lastUpdateTime = ticketSubNode.getTextContent();
                }
            }
            tickets.add(ticket);
        }
        System.out.println("Writing " + tickets.size() + " tickets...");

        // Output to CSV in order of ticket number
        writeCsvHeader();
        Collections.sort(tickets);
        for (Ticket ticket : tickets) {
            writeCsvRow(ticket);
        }
    }

    public static class Ticket implements Comparable<Ticket> {

        public String id;
        public String summary;
        public String status;
        public String description;
        public String milestoneId;
        public String assigneeId;
        public String reporterId;
        public String resolution;
        public String resolutionDescription;
        public String createdTime;
        public String lastUpdateTime;

        @Override
        public int compareTo(Ticket other) {
            return Integer.parseInt(id) - Integer.parseInt(other.id);
        }
    }

    public static void main(String[] args) throws Exception {
        DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
        factory.setNamespaceAware(true);
        if (args.length != 2) {
            System.err.println("Usage: UnfuddleToJira /path/to/unfuddle/backup.xml /path/to/jira/output.csv");
            return;
        }
        String inputFilename = args[0];
        String outputFilename = args[1];
        PrintStream output = new PrintStream(new FileOutputStream(outputFilename), true, "UTF-8");
        UnfuddleToJira converter = new UnfuddleToJira(factory.newDocumentBuilder().parse(inputFilename), output);
        converter.writeCsv();
        output.close();
    }

}

Configuration file:

# written by PropertiesConfiguration
# Wed May 05 07:12:57 UTC 2010
existingprojectkey = WEB
importsingleproject = false
importexistingproject = true
mapfromcsv = false
field.Resolution = resolution
field.Milestone = customfield_Milestone:select
field.Assignee = assignee
field.Summary = summary
field.Status = status
field.Description = description
field.Reporter = reporter
field.CreateTime = created
value.Status.closed = 6
value.Resolution.works_for_me = 5
value.Resolution.will_not_fix = 2
value.Status.new = 1
value.Status.reassigned = 1
value.Resolution.invalid = 4
value.Resolution.postponed = 2
value.Status.accepted = 3
value.Resolution.fixed = 1
value.Resolution.duplicate = 3
user.email.suffix = @kikini.com
date.import.format = yyyyMMdd
field.ResolveTime = resolutiondate
date.fields = CreateTime
date.fields = ResolveTime

Tuesday, April 20, 2010

Installing Sun Java 6 on Ubuntu 10.4 Lucid Lynx

It looked like Canonical was going to totally abandon Sun's JDK with the release of Lucid Lynx. After heated discussions, however, instead it was merely tucked away even deeper into the recesses of alternative repositories. Now it lives in a partner respository, so as root you'll need to run

add-apt-repository "deb http://archive.canonical.com/ lucid partner"
apt-get update

to add the appropriate repository. Now you can use apt-get install as before to install the sun-java6-jdk or sun-java6-jre packages. For those curious, this is the "official" way to do this according to the release notes.

Monday, April 19, 2010

Connecting to JMX on Tomcat 6 through a firewall

One of the flaws (in my opinion, and shared by others) of the design of JMX/RMI is that the server listens on a port for connections, and when one is established it negotiates a new secondary port to open on the server side and expects the client to connect to that. Well, OK, except that it will pick an available port at random, and if your target machine is behind a firewall, well, you're out of luck because you don't know which port to open up!

With the release of Tomcat 6.0.24, a new Listener (the JmxRemoteLifecycleListener) is available that lets you connect to JMX running on your Tomcat server using jconsole. Using this Listener you can specify the secondary port number instead of it being picked at random. This way, you can open two known ports on your firewall and jconsole will happily connect and read data from Tomcat's JVM over JMX.

Setting it up is pretty easy. First, copy catalina-jmx-remote.jar from the extras folder of the binary distribution into Tomcat's lib folder.

Update your server.xml to include the Listener:

<Listener className="org.apache.catalina.mbeans.JmxRemoteLifecycleListener" rmiRegistryPortPlatform="10001" rmiServerPortPlatform="10002"/>

Replace the ports with whichever ones you wish. Make sure to open up those ports on your firewall. Be sure to properly configure JMX using an authentication and SSL. Or if you're just setting this up for testing, you can go with the totally insecure and unsafe configuration and add the following JVM arguments to your Tomcat startup script (typically CATALINA_OPTS or JAVA_OPTS):

-Dcom.sun.management.jmxremote.ssl=false
-Dcom.sun.management.jmxremote.authenticate=false

Now you can start Tomcat. On your client machine, start jconsole and drop in the following URL for your remote process:

service:jmx:rmi://your.public.dns:10002/jndi/rmi://your.public.dns:10001/jmxrmi

Obviously you need to replace your.public.dns with the DNS address of your Tomcat machine, and if you chose different ports, change those as well. With some luck, you'll connect and be getting data!

If you're on EC2 or a similar network where you have an internal DNS name that's different from your external/public DNS name, one more step is required. Additionally set the following property to the server's external/public DNS name

-Djava.rmi.server.hostname=your.public.dns

And with that bit of magic you should be off and collecting data!

Sunday, April 18, 2010

Optimizing PostgreSQL/Tomcat for Write-Heavy Workloads

Recently I've been working on tuning the performance of a Tomcat web front-end and PostgreSQL back-end. In particular I wanted to stress some write-heavy scenarios, so I designed a JMeter test plan and ran it using Maven Chronos (as described in this post). Also I have collectd running on the machines and reporting various system metrics to a central server for graphing. This is essential to help identify which system resources are contributing to a performance bottleneck.

In this post I don't want to get too hung up on the exact nature of the queries or the hardware configurations. Instead I'd like to focus on the investigative process itself. Let's start off by showing some graphs from a 20-minute stress run. The JMeter test plan is configured to ramp up linearly from 0 to 50 client threads over a 5-minute period, then continue for another 15 minutes:


PostgresTomcat

Right away we notice that the stress run isn't really stressing either the Tomcat or Postgres machines. (aside: in this post I'm only going to show CPU graphs. Obviously you need to look at other resources as well. However, for the purposes of this discussion, looking at CPU is enough to get the idea.) At first it might seem that we're not hitting the server hard enough. Maybe 50 client threads is too few? Yet as we can see from the throughput graph performance, the overall throughput rises until we get to about 15 threads, and after that it is fairly flat. So this suggests that the problem is not with the test setup, but something in the server configurations.

Also notice that performance is actually pretty bad from a query response time perspective. The response times are all over the map, with a median around 300ms, a 95th-percentile all the way at about 1.2 seconds, and some queries lasting as long as 3.5 seconds. Ouch!

The most suspicious thing to me is that throughput doesn't increase when more than about 15 threads are hitting the servers. Both Tomcat and PostgreSQL are designed to be extremely capable in high-volume environments. No way could 15 threads be causing us to max out. The huge variance in response times implies that requests are being queued rather than handled right away. After running the test again, I logged into Postgres and ran SELECT count(*) FROM pg_stat_activity a few times during the run. There were never more than 8 connections to the database.

As it turns out, 8 is the default value for the maximum number of connections allowed with the Apache Commons Database Connection Pool (DBCP). In our case this looks to be the first culprit, and explains why we never got any throughput increases after just a small number of client threads and why response time variance was so high. So let's bump the maximum DBCP connections up to 50 and see what it looks like:



PostgresTomcat

Nice! Not only did our throughput increase by about 50% but the response times are more consistent and have fewer extremes. The throughput graph shows that our throughput increases until about 40 client threads, which is below the maximum database connections. This suggests contention for the thread pool is no longer a big issue. Also the second core on the Postgres machine finally began to be utilized. Our system is spending less time queuing and more time working.

But check out the high IO wait times on the Postgres machine. 20-30% of CPU time is waiting on IO to complete rather than doing useful processing. This seems like a really high proportion of time. As I mentioned at the beginning, the test plan I'm running has a relatively high amount of writes. Other metrics on the Postgres machine related to disk IO (not reproduced here) also showed that this was a likely bottleneck. So I set about researching how to improve my Postgres configuration for write performance. The following were valuable resources:


I played around with a number of parameters. The most important for this workload turned out to be those related to writing WAL files. In particular, changing the following parameters to these values had the biggest impact:

synchronous_commit = off
full_page_writes = off

The results:



PostgresTomcat

Clearly a major improvement. Throughput increased by another 25%, and response times not only dropped by about 50% but are now very consistent. On the Postgres machine very little time is spent waiting on IO. In fact it looks like our throughput increased all the way up to 50 client threads, suggesting that if we increase the number of threads we'll see that the system is capable of even more.

One odd thing about the last test results is the periodic drops to zero throughput. That's a mystery I'll solve for you in a future post.

It's important to fully understand the impact of these settings before deciding to put them into production. This combination of these settings makes it possible to lose transactions and increases the chances of a corrupt WAL in the event of an OS crash or power failure (though not as much as turning off fsync). As such configuring Postgres in this way should only be done if you can tolerate or otherwise mitigate this possibility.

Sunday, March 28, 2010

Generating Constant Bandwidth on Linux using fio

It took a lot of searching for me to find a way to generate network traffic at a specific rate between two hosts, so I thought I would share the answer. It's pretty easy to test the available bandwidth between two hosts using netcat to transfer a bunch of random data as fast as the network allows. However I wanted to test resource monitoring and graphing system, which means I needed to generate network traffic at a known rate so that I could judge the resulting graphs against my expectations.

I found you can use fio, which is a generic I/O testing tool, to achieve this. fio allows specifying the transfer rate and also has a network engine. So using fio I can configure one host as the receiver and one as the sender and transfer data at a known rate. Here's what the config files look like:

Sender jobfile.ini:
[test1]
filename=receiver.dns.name/7777
rw=write
rate=750k
size=100M
ioengine=net
Receiver jobfile.ini:
[test1]
filename=localhost/7777
rw=read
size=100M
ioengine=net
Obviously you would replace "receiver.dns.name" with the DNS name of the receiving host, and adjust the size and rate parameters as you like. (It's worth noting that the fio documentation is either wrong or misleading on what the filename should be for the receiver. It claims the receiver should only specify the port, but when I tried that it failed to run. Setting the host to localhost seemed to work and the receiver started listening.) To run the test, simply run:
fio jobfile.ini
first on the receiving host, then on the sending host. fio will then transfer 100 Megabytes of data at a rate of 750KB/sec between the two hosts. And we can see from the chart that indeed a constant rate was generated:


The observed rate is a bit above the 750KB/sec specified, but what's being measured is the number of bytes being transferred through the eth0 interface. Since the data is transferred over TCP there is some overhead to the packet structure, which I believe accounts for the extra few KB/sec observed.

Sunday, February 21, 2010

How I learned to stop worrying and love Unit Testing

I admit it. Throughout my whole career at Microsoft, even as a Dev Lead, I was not a true believer in Unit Testing. That's not to say I didn't write unit tests or require my team to write tests. But I didn't believe that the benefits reaped from unit testing were sufficiently valuable given the time it took to write them (for me, about equal to the time to implement the product code itself).

Now, post-Microsoft, I am a true believer. A zealot even. I can't imagine a world in which I write code that has a ton of unit tests covering it.

So what changed? My eyes have been opened to a development world in which real testing infrastructure exists. In my former role, what I used was a testing framework known as Tux, which ships with Windows CE. It was enhanced for Windows Mobile and given a usable GUI. The result was something like JUnit, eg, a simple framework for defining test groups and specifying setup/teardown functions. The GUI was very much like the NUnit GUI.

So far, so good. There's nothing wrong with this setup. However, a test-running framework is necessary but not sufficient for unit testing. The missing piece was a mocking infrastructure.

One of the most frustrating things about working for Microsoft (and I'm sure the same is true of other big software firms) was that everything, and I do mean everything, had to be developed in-house. For legal reasons we couldn't even look at solutions available in the open source community. The predictable result is that a massive amount of effort is expended to duplicate functionality that already exists elsewhere. In many cases the reality of product schedules and resource constraints mean that we simply must do without certain functionality entirely. This was the case with mocking. Developers were left to create their own mocks manually, or figure out how to write a test without using mocks. I identified the lack of a mocking infrastructure as a major problem, but failed to do anything about it.

Exeunt Gabe stage-left from Microsoft to Kikini and a world of open source.

At Kikini we use JUnit for running tests and a simply beautiful component called Mockito for mocking. I cannot emphasize enough how wonderful Mockito is. Mockito uses Reflection to allow you to mock any class or interface with incredible simplicity:

MyClass myInstance = mock(MyClass.class);

Done. The mocked instance implements all public methods with smart return values, such as false for booleans, empty Collections for Collections, and null for Objects. Specifying a return value for a specific call is trivial:

when(myInstance.myMethod(eq("expected_parameter"))).thenReturn("mocked_result");

The semantics are so beautiful that I am certain that readers who have never heard of Mockito or perhaps have never even used a mocking infrastructure can understand what is happening here. When the method myMethod() is invoked on the mock, and the parameter is "expected_parameter", then the String "mocked_result" is returned. The only thing which may not be completely obvious is the eq(), which means that the parameter must .equals() the given value. The default rules still apply so that if a parameter other than "expected_parameter" is given, the default null is returned.

Verifying an interaction took place on a mock is just as trivial:

verify(myInstance).myMethod(eq("expected_parameter"));

If the method myMethod() was not invoked with "expected_parameter", an exception is thrown and the test fails. Otherwise, it continues.

Sharp-eyed readers will note that the functionality described so far requires that equals() be properly implemented, and when dealing with external classes this is sometimes not the case. What then? Let's suppose we have an external class UglyExternal, it has a method complexStuff(ComplexParameter param), and ComplexParameter does not implement equals(). Are we out of luck? Nope.

UglyExternal external = mock(UglyExternal.class);
MyClass myInstance = new MyClass(external);
myInstance.doStuff();
ArgumentCaptor<ComplexParameter> arg = ArgumentCaptor.forClass(ComplexParameter.class);
verify(external).complexStuff(arg.capture());
ComplexParameter actual = arg.getValue();
// perform validation on actual

This is really awesome. We're able to capture the arguments given to mocks and run whatever validation we like on the captured argument.

Now let's get even fancier. Let's say we have an external component that does work as a side-effect of a function call rather than a return value. A common example would be a callback. Let's say we're using an API like this:

public interface ItemListener {
    public void itemAvailable(String item);
}

public class ExternalClass {
    public void doStuff(ItemListener listener) {
        // do work and call listener.itemAvailable()
    }
}

Now in the course of doing its job, our class MyClass will provide itself as a callback to ExternalClass. How can we mock the interaction of ExternalClass with MyClass?

ExternalClass external = mock(ExternalClass.class);
doAnswer(new Answer() {
    @Override
    public Object answer(InvocationOnMock invocation) throws Throwable {
        Object[] args = invocation.getArguments();
        ItemListener listener = (ItemListener)args[0];
        listener.itemAvailable("callbackResult1");
        return null;
    }
}).when(external).doStuff((ItemListener)isNotNull());

We use the concept of an Answer, which allows us to write code to mock the behavior of ExternalClass.doStuff(). In this case we've made it so that any time ExternalClass.doStuff() is called, it will invoke ItemListener.itemAvailable("callbackResult1").

There is even more functionality to Mockito, but in the course of writing hundreds of tests in the past 9 months I have never had to employ any more advanced functionality. I would say that only 1% of tests require the fancy Answer mechanism, about 5% require using argument capturing, and the remainder can be done with the simple when/verify functionality.

The truly wonderful thing, and the point of my writing this blog entry, is that a mocking infrastructure like Mockito enables me to write effective unit tests very quickly. I would say that I spend 25% or less of my development time writing tests. Yet with this small time investment I have a product code to test code ratio of 1.15, which means I write almost as much test code as product code.

Even more important, the product code I write is perforce highly componentized and heavily leverages dependency injection and inversion of control, principals which are well-known to improve flexibility and maintainability. With a powerful mocking infrastructure it becomes very easy and in fact natural to write small classes with a focused purpose, as their functionality can be easily mocked (and therefore ignored) when testing higher-level classes. I have always been told that writing for testability can make your product code better, but I never really understood that until I had the right testing infrastructure to take advantage of.

Now, I'm a believer.

Sunday, February 7, 2010

A Taxonomy of Software Developers

After spending years of my previous life at Microsoft as a Dev, Tech Lead, and Dev Lead, I've worked with a broad range of software developers from the US, China, India, and all over the world. I've also been involved in interviewing well over a hundred candidates, and many hiring (and some firing) decisions. From this I've come up a taxonomy describing the characteristics of the various software developers I've encountered, how to spot them, and what to do with them.

Typical Developers

The hallmark of a Typical Developer is a relatively narrow approach to problem solving. When fixing a bug, they concentrate on their immediate task with little regard to the larger project. When they declare the bug fixed, what that means is that the exact repro steps in the bug will no longer repro the issue. However, frequently in fixing the issue described in the bug, they have missed a larger root cause, or have broken something else in the system. This is illustrated in Fig. 1:


In most cases the code a Typical Developer writes is a very small net improvement for the overall project when viewed from a release management perspective. Sometimes the traction is zero if the issue that they created is just as severe as the issue they fixed. Sometimes the traction is slightly positive if the issue they created or the case they missed is easier to fix than the original issue.

When viewed from an engineering management perspective, however, the picture is very different. This is due to the nature of the approach Typical Developers take when actually writing code. A typical bug has the form "under condition X, the project behaves as Y, when it should behave as Z." The Typical Developer is very likely to fix the problem in this way:

// adding parameter isX to handle a special case
void doBehavior(boolean isX) {
  // usually we want to do Y, but in this special case we should do Z.
  if (isX == true) {
    doBehaviorZ();
  } else {
    doBehaviorY();
  }
}

The Typical Developer simply figures out how to directly apply logic to the code that determines behavior, then make the code behave differently based on that. This is reasonable, but if it's the only way the developer can think of to change behavior, after a while working in the same code it begins to look something like this:

void doBehavior(boolean alternate, String data, File output, Enum enum) {
  if (enum == STATE_A) {
    doBehaviorA(data, alternate);
  } else if (enum == STATE_B && !(alternate || data == null)) {
    doBehaviorB(output);
  } else {
    switch(enum) {
      case STATE_B:
      case STATE_D:
        doBehaviorA(data, !alternate);
        // FALLTHROUGH!
      case STATE_C:
        doBehaviorC(output);
        if (alternate) {
          doBehavior(!alternate, null, null, enum);
        }
        break;
      default:
        // We should never get here!
        assert(false);
        break;
    }
  }
}

When I see code after months of a Typical Developer working on it, this is my reaction:


The Typical Developer will never take a step back and think "Hmm, we're getting a lot of these kinds of issues. Maybe the structure of our code is wrong, and we should refactor it to accommodate all the known requirements and make it easier to changes."

Now the project is in trouble. The team may be able to release the current version (often there is no alternative) after exhaustive manual testing, but the team can never be confident that they fully tested all the scenarios. The first priority after releasing will be to remove all the code written by the Typical Developer and write it from scratch.

Another characteristic of Typical Developers is insufficient testing. Often the code they write will be difficult or impossible to unit test. If unit testing is a requirement, they'll write tests which are just as bad as their code. In other words the tests will be unreliable, require big changes to get passing when a small code change is made, and not test anything important. Furthermore the same narrow approach to development shows through in manual testing. The Typical Developer will follow the steps in the bug when testing their fix, and never stop to think "what other behavior could be impacted by my change?"

Typical Developers are quite willing to chalk up their constant regressions and low quality to factors like "I'm working in legacy code" or "I'm not familiar with this area" or "the tools aren't good enough." Though all of those things may be true, that is the nature of software development, and Typical Developers don't understand how to change their environment for the better.

The root cause behind these failings is most often that the Typical Developer is simply not cut out for real software development. Because the software industry is so deeply in need of talent, no matter how marginal, Typical Developers will always find work. Hiring managers are too willing to fill manpower gaps in order to ship on time. (In fairness, Microsoft managers are pretty good about avoiding this pitfall. However, there are times when it is considered OK to "take a bet" on a marginal candidate.)

A special type of Typical Developer is the brilliant person who simply doesn't care enough. They're in software development because it pays well and they can skate by with putting in 40hrs a week. These Typical Developers are especially annoying because they'll employ their brilliance only when justifying their lazy workarounds, and not on actual design and implementation.

What should managers do with Typical Developers? In most cases manage them out as quickly as they can. Though a Typical Developer may be of use in the final push of releasing a project, in the long run having them working on a project is a net negative. Even if Typical Developers came for free, I wouldn't hire them. It is exceedingly rare for a Typical Developer to become a Good Developer, though in rare circumstances I've seen it happen under the guidance of Great Managers.

Good Developers

Good Developers fix bugs and deliver features on time, tested, and adaptable to future requirements. This is illustrated in Fig. 2:


Once a Good Developer delivers a bugfix or feature, typically that's the last you hear of it. A Good Developer will not fall into the traps that a Typical Developer does. When they see a pattern emerging they identify it and take steps to solve the issue once and for all. They are not afraid of refactoring. They'll come into your office and say "Hey, it's not sustainable to do all these one-off fixes for this class of issue. I'm going to need a week to re-do the whole thing so we never have to worry about it again." And you say great, please do it!

Good Developers will encounter the same environmental issues Typical Developers do, eg, legacy code, or weak tools. Good Developers will not let this stand. They'll realize that if a tool is not good enough to do a job, then they have to improve the tool or build a new tool. Once they've done that, then they'll get back to work on the original problem.

Good Developers are Good Testers. Their code is written to be testable, and because they are able to take a larger view, they have a good idea of the impact of their changes and how they should be tested. Pride is also a factor here. Good Developers would be embarrassed and shamed if they delivered something that wasn't stable.

From a release management perspective, Good Developers are well liked, though their perceived throughput may not be high since they are spending time making the system as a whole better and not just fixing a bug as fast as they possibly can. Good managers recognize and nurture this. Bad managers push them to put in the quick fix and deal with the engineering consequences in-between releases. Good Developers will protest against this but often acquiesce. A Good Developer in the hands of a Good Manager can turn into a Great Developer.

Managers should work hard to keep Good Developers since they're so hard to find and hire. That does not mean forcing them to remain on the team, as doing so risks turning a Good Developer into the "brilliant" variety of Typical Developer described above. Reward Good Developers well and give them interesting things to work on.

Great Developers

Exceedingly rare, the hallmark of the Great Developer is the ability to solve problems you didn't know you had. This is illustrated in Fig. 3:



When tasked with work, a Great Developer will take a holistic view of their task and the project they're working on along with full cognizance of the priorities upper management has for this release and the next. A Great Developer will understand the impact of a feature while it's still in the spec-writing phase and point out factors the designers, PMs, and managers hadn't thought of.

When designing and implementing a feature, a Great Developer will take the time to design in solutions to problems that Good Developers and Typical Developers have run into, even though they're not obviously connected. A solution from a Great Developer will often change how a number of components work and interact, solving a whole swath of problems at a stroke.

Similar to Good Developers, a Great Developer will never let lack of tools support or unfamiliar code deter them. But they'll also re-engineer the tools and legacy environment to such a degree that they create something valuable not only to themselves but to many others as well.

Unlike Good Developers, a Great Developer can almost never be coerced into compromising long-term quality for expediency. They'll either tell you flat out "no, we need more time, period" or they'll grumble and come in on the weekend to implement the real fix themselves.

Sometimes mistaken for a Great Developer is the Good Developer in Disguise. These Good Developers have recognized the impact on others that a Great Developer has, and seek to emulate that by engaging almost exclusively in side projects related to tools improvement and "developer efficiency" initiatives. The Good Developer in Disguise has no actual time to do their own work, but fools management into believing that they're Great Developers. Truly Great Developers improve their environment as a mere side effect of them doing their own job the way they think it ought to be done.

It goes without saying that Great Developers should be even more jealously guarded than Good Developers, with the same caveat about not turning them into prisoners. The flip side is that Great Developers should not be allowed to go completely off on their own into the wilderness. No doubt they will build something amazing, but it runs the risk of being something amazing that you don't need. Better to give broad, high-level goals and let them do their thing.

Final Note

Although I named Typical Developers "typical," I mean that they're typical in terms of the overall industry. Although there were enough Typical Developers at Microsoft, most fell into the Good Developer category.

Friday, January 29, 2010

Poor Beanshell Performance and Custom Functions for JMeter

I'm building a relatively complex JMeter test plan to simulate load on the Kikini website. As soon as you need to do anything remotely complex, you exceed the capability of the built-in JMeter configuration elements and functions. The initial version of my test plan therefore used the BeanShell capability, which allowed me to do relatively complex things in a familiar language (BeanShell is essentially interpreted Java).

All fine and good until we need to run tests longer than 10 minutes or with more than 10 threads. An issue in BeanShell causes massive slowdowns if used inside loops (eg, inside a sampler), which in fact was what I was doing. When I worked around the issue by resetting the interpreter on each call, I found that JMeter was spending so much time processing BeanShell code that it couldn't effectively scale up to more than about 10 threads. The bottom line is that BeanShell is unfit for use if it must be called repeatedly in a JMeter test.

The only way I could find to get the complex behavior I want without compromising performance was to implement my own JMeter function. JMeter offers a number of simple functions out-of-the-box. Although JMeter isn't really an API, it does have a Function interface which you could implement. Then from inside any test element, you can call your function:

${__myFunction(arg1, arg2)}

And you'll get back a string that is the result of your function. Before we get to function class itself, there is some background to discuss.

First, JMeter isn't an API. But with a little bit of work, you can program against it. If you download the JMeter binary distribution, you can extract ApacheJMeter_core.jar. This JAR contains the interfaces you'll code against.

Second, you need a way to get your custom function onto JMeter's classpath. You can set the search_paths system property, and JMeter will find it. This is great because then you do not have to modify the JMeter distribution to use your custom functions.

Once you're ready with your custom JAR, you can invoke JMeter:

jmeter -Jsearch_paths=/path/to/yourfunction.jar

Alright, on to the code. This is a skeleton (please ignore the naming) which will simply return Array.toString() on the arguments you give:

package com.kikini.perf.jmeter.functions;

import java.util.ArrayList;
import java.util.Arrays;
import java.util.Collection;
import java.util.Collections;
import java.util.List;

import org.apache.jmeter.engine.util.CompoundVariable;
import org.apache.jmeter.functions.AbstractFunction;
import org.apache.jmeter.functions.InvalidVariableException;
import org.apache.jmeter.samplers.SampleResult;
import org.apache.jmeter.samplers.Sampler;

public class MaskUserIDFunction extends AbstractFunction {

    private static final List<String> DESC = Arrays.asList("uid_to_mask");
    private static final String KEY = "__maskUserID";

    private List<CompoundVariable> parameters = Collections.emptyList();

    @Override
    public String execute(SampleResult arg0, Sampler arg1) throws InvalidVariableException {
        List<String> resolvedArgs = new ArrayList<String>(parameters.size());
        for (CompoundVariable parameter : parameters) {
            resolvedArgs.add(parameter.execute());
        }
        // TODO: mask the user ID in resolvedArgs.get(0). For demo purposes,
        // just return the arguments given.
        return resolvedArgs.toString();
    }

    @Override
    public String getReferenceKey() {
        return KEY;
    }

    @SuppressWarnings("unchecked")
    @Override
    public void setParameters(Collection arg0) throws InvalidVariableException {
        parameters = new ArrayList<CompoundVariable>(arg0);
    }

    @Override
    public List<String> getArgumentDesc() {
        return DESC;
    }

}

There are a few crucial things to note here. The package name contains ".functions". That is a requirement, otherwise your function will not be recognized by JMeter. Notice that the type of the arguments is CompoundVariable. You must call execute() on them to resolve them to a String.

Otherwise this is relatively straightforward. Now I can call my function from inside a sampler:



And it will return the correct results:


So, how do Java functions perform versus the BeanShell functions? My test plan had about 10 samplers, most of which used BeanShell before, but now use native Java functions. My dedicated JMeter machine is a dual-core system with 2GB of RAM.

Before: JMeter maxed out at ~45 requests per second, 90%+ CPU usage
After: Generates 150+ requests per second with 2-3% CPU usage

Huge win! I don't actually know what the limit is now but I'm guessing I could get thousands of requests per second now.

Sunday, January 24, 2010

Releasing simpledb-appender as open source

I've released the SimpleDB appender I wrote as open source under the Apache 2.0 License. The project is hosted here:

http://code.google.com/p/simpledb-appender/

The purpose of this project is to allow Java applications using the SLF4J API with Logback to write logs to Amazon SimpleDB. This allows centralization of the logs, and opens powerful querying capabilities. Also scripts and tools are included so that even non-Java applications can have their stdout/stderr logged to SimpleDB as well.

The project is tested and works well. Developers familiar with SLF4J should have no problem integrating it into their apps. The documentation for using it as a tool for non-Java applications is a little weak but I have a demo shell script that should at least get folks started.

Let me know how it works for you!

Thursday, January 14, 2010

Amazon Web Services Expanding into Asia

Last year, I privately speculated that having launched datacenters in the Eastern US and Western Europe, the next obvious locations for Amazon Web Services (AWS) would be the Western US and Asia. In December 2009, AWS announced availability zones in Northern California.

What I didn't realize until today was the AWS actually announced their intentions to expand into Asia back in November 2009. Multiple availability zones will be available in Singapore in the first half of 2010.



Singapore does make some sense as a location. A glance at the map (source: openstreetmap.org) reveals that Singapore is pretty central, located roughly equidistant from China, India, and Australia. So if AWS is persuing a strategy to minimize the average global latency, it is probably a good choice. It also offers a relatively stable political and economic environment, though there is some political risk to locating yourself in an authoritarian country.


But when I first thought about a datacenter in Asia, my thought would have been hosting it in Korea. Korea is one of the most connected (in the data networking sense) countries on Earth, and is in close proximity to the other two most important markets in Asia: China and Japan. Korea is a very stable political and economic environment, and doesn't have the significant political risk associated with hosting in China or the less significant risk of Singapore. Latency from Korea to China and Japan is very low. I imagine the cost of running a datacenter in Korea is not much more expensive than Singapore, given that living standards are comparable.

Still, I can't complain. Hosting in Singapore will allow a better web experience for users throughout Asia. I hope to see AWS continue expanding geographically.