Wednesday, January 25, 2012

Seamless, Secure Routing to Blocked Sites

Where I live, Internet access is restricted. Specific sites are blocked, as well as sections of some sites depending on the content. Blocking is done primarily in two ways:
  • False DNS results
  • Broken connections, including unacknowledged SYNs, high packet loss, and injected RSTs.
With access to a computer outside the country (such as a micro instance on EC2), these problems may be overcome by using the outside computer as a proxy. One simple solution is to use SSH as a SOCKS proxy. This is what I had been using for about a year. It works fine, but not all devices can use a SOCKS proxy, and performance is degraded by layering TCP inside TCP.

As a weekend project, I decided to implement a more sophisticated system, with the following goals:

  • Only route data destined for blocked networks through the proxy so that domestic sites remain fast
  • All devices connected to my home network should automatically be able to access blocked sites with zero configuration
With these goals in mind, I designed the following system:

  • Router maintains a secure VPN tunnel to the proxy (OpenVPN)
  • Routing table has entries for all IP blocks owned by blocked sites, routes this traffic over the tunnel
  • Local DNS server is configured to forward requests for blocked domains to Google's public DNS server over the VPN tunnel
  • Local DNS server forwards other requests to the ISP's DNS server
My router is a Mikrotik RouterOS device, which supports OpenVPN tunnels. I configured a VPN tunnel at the router level to a linux instance on EC2. This is pretty standard so I'll skip the details.

Setting up the routing table is where things get a little tricky. Fortunately the blocked sites are all rather large companies, and they all have their own AS number. Using radb it is possible to query which address blocks are advertised by an ASN. I wrote a script to parse this information and print out router commands for each block:


if [ $# -ne 2 ]; then
  echo "Usage: `basename $0` [AS number] [friendly name]"
  exit 1

whois -h "!gAS$asNumber" | head -n -1 |awk 'NR>1' | tr -d '\n' | tr ' ' '\n' | sort | uniq | tr '\n' ' ' | awk "{for(i=1;i<=NF;i++)print \"add dst-address=\"\$i\" gateway=ovpn-out1 comment=\\\"AS$asNumber $asName\\\"\"}"
Here's the results for Twitter (AS13414).
add dst-address= gateway=ovpn-out1 comment="AS13414 Twitter"
add dst-address= gateway=ovpn-out1 comment="AS13414 Twitter"
add dst-address= gateway=ovpn-out1 comment="AS13414 Twitter"
I also did the same for Facebook (AS32934) and Google (AS15169), both of which have considerably more blocks. Unfortunately the data from radb contains redundant blocks, for example it has some CIDR blocks which are both a /24 and a /23, so the /24 is useless since it is already contained in the /23. The total number of routes is a bit over a thousand, which the Mikrotik can handle easily, but with some logic or a better data source it could be shrunk to a few hundred.

That was the hard bit. Next up is DNS. I installed Unbound and configured it to forward to Google's public DNS server ( for the blocked domains (which, recall, will be routed over the VPN tunnel and so not subject to poisoning), and my ISP's DNS server for the rest:

 verbosity: 1
        access-control: allow
 msg-cache-size: 16m
        rrset-cache-size: 32m
 chroot: ""

    name: ""
    name: ""
    name: ""
    name: ""
    name: ""
    name: ""
    name: ""
    name: ""
    name: ""
    name: ""
    name: ""
    name: ""
    name: ""
    name: "."
And, we're done. Now any device can log onto the wifi in my apartment and have unrestricted access to all the normally blocked sites, and still have fast connectivity to domestic sites.

Thursday, November 4, 2010

Connection keep-alive timeouts for popular browsers

Recently I needed to know how long the popular browsers will keep an HTTP keep-alive connection before closing it. I was able to find documented values for IE and FireFox, but not other browsers. In fact I couldn't even find much in the way of anecdotes. So for the other browsers I decided to find out myself by testing against a Tomcat server configured to an hour-long keep-alive timeout. I then used each browser to make a single request and observed the TCP streams in Wireshark. Here are the results:

  • IE: 60 seconds (documentation)
  • FireFox: 300 seconds (documentation)
  • Chrome: 300 seconds (observed)
  • Safari: 30 seconds (observed)
  • Opera: 120 seconds (observed)

Note that for IE and FireFox these values are configurable by the user, and the developers behind the other browsers may change the timeout in future releases.

Friday, September 17, 2010

Authoring multipart Ubuntu cloud-init configuration with Java

Canonical's wonderful Amazon EC2 Images come with a powerful configuration tool called cloud-init that lets you pass configuration via user-data. One of the more interesting capabilities is that cloud-init allows a combination of different configuration payloads using MIME as a system for aggregating parts.

Below is an example of how to create a multipart configuration compatible with cloud-init using Java:

import java.util.Properties;

import javax.mail.Session;
import javax.mail.internet.MimeBodyPart;
import javax.mail.internet.MimeMessage;
import javax.mail.internet.MimeMultipart;

public class CloudInitMultipart {

    public static void main(String[] args) throws Exception {
        String config = "#cloud-config\n" 
            + "mounts:\n" 
            + " - [ sdf, /mnt/data, \"auto\", \"defaults,nobootwait\", \"0\", \"0\" ]\n\n" 
            + "packages:\n"
            + " - emacs23-nox\n\n";
        MimeMultipart mime = new MimeMultipart();
        MimeBodyPart part1 = new MimeBodyPart();
        part1.setText(config, "us-ascii", "cloud-config");
        MimeBodyPart part2 = new MimeBodyPart();
        String script = "#!/bin/bash\n\n" 
            + "NOW=`date +%s`\n" 
            + "touch /mnt/$NOW";
        part2.setText(script, "us-ascii", "x-shellscript");
        MimeMessage msg = new MimeMessage(Session.getDefaultInstance(new Properties()));

This will create a multipart configuration combining a cloud-config element which installs emacs and creates an fstab entry, and also runs a bash script that creates a file. The output can then be used as user-data for launching an EC2 instance with this configuration.

Tuesday, September 7, 2010

How to Build Terracotta from Source

It seems that the folks at Terracotta have decided to make it nearly impossible to download any version older than the current version. As is common in real-world applications, sometimes it is desirable to stay on a version a little behind the bleeding edge because you know what you've got works for what you're doing. Terracotta has made things more difficult than usual by holding back a critical fix for a compatibility issue between Java 1.6.0_20 and Terracotta 3.2.0. The fix is available in version 3.2.2, which is only available to customers with a support contract with Terracotta.

So, I'll show you how to build 3.2.2 from source. It's a little trickier than implied in the above-linked thread, and the Terracotta Build Page doesn't explain it all.

First, we need to check out the 3.2.2 source code:

svn co

Next, set up some required environment variables. Terracotta needs to know where your JRE, JDK, and Ant live. The following locations worked for me on my Ubuntu 10.04 install with Sun's Java 6, substitute the locations for your OS/Java distro:

export ANT_HOME=/usr/share/ant
export JAVA_HOME=/usr/lib/jvm/java-6-sun/jre
export JAVA_HOME_16=/usr/lib/jvm/java-6-sun

In my case, I only have Java 6, and I don't care about previous versions of Java. So we need to instruct the Terracotta build system to not try to use older releases. Modify the file 3.2.2/code/base/jdk.def.yml to comment out the Java 1.5 stuff:

# All content copyright (c) 2003-2006 Terracotta, Inc.,
# except as may otherwise be noted in a separate copyright notice.
# All rights reserved

# Defines the various JDKs used by the build system.
# Each JDK specification begins with a unique name that uniquely identifies
# the JDK version.  After the name come the following attributes:
#   min_version: The minumum JDK version
#   max_version: The maximum JDK version
#   env: A list of names of configuration properties that the build system uses
#        to locate the JDK installation
#   alias: A list of alternative names for the JDK

#    min_version: 1.5.0_0
#    max_version: 1.5.999_999
#    env:
#      - J2SE_15
#      - JAVA_HOME_15
#    alias:
#      - tests-1.5
#      - "1.5"

    min_version: 1.6.0_0
    max_version: 1.6.999_999
      - JAVASE_16
      - JAVASE_6
      - JAVA_HOME_16
      - tests-1.6
      - "1.6"

Ok, now we're ready to build. Here's what I used to build the core Terracotta distribution using the Sun JDK. You may need to tweak the jdk parameter as needed for the location of your jdk.

cd 3.2.2/code/base
./tcbuild --no-extra dist dso OPENSOURCE jdk=/usr/lib/jvm/java-6-sun

The build will download all its dependencies and compile the Terracotta release. Note that this is the core distribution only, it does not build TIMs or anything like that. Once the build is complete, there will be a new folder


This contains the Terracotta distribution that you would have downloaded.

Monday, July 5, 2010

BASH Script Self-Destructs, grep to the Rescue

I was working on a bash script and periodically testing it out. It had gotten somewhere in the 30-40 line range when I made a fatal error. I added an if statement that looked something like this:

if [ $var1 > $var2 ]; then
  echo "true"
  echo "false"

I had actually made two mistakes. First, I meant for var2 to have a value of $?, which is the exit value of the last command and not the path of the script itself, which is what $0 evaluates to. Second, I used the > operator instead of -gt as required by BASH. So what the if statement ended up doing was redirect /dev/null into the file containing my script! After running this code, your script self-destructs into a 0-byte file. I was particularly annoyed because writing a decent-sized BASH script is a meticulous process and one that I'm obviously not expert at, and so I was looking at a good half-hour of lost work.

Arcane UNIX nonsense had gotten me into this mess, I figured it could get me out as well. My file was very likely still sitting on some sector somewhere on the disk. I knew my script contained the word "COLLECTD_SOCKET" which wasn't likely to appear anywhere else on the drive. So I unmounted the filesystem (on device /dev/sdf) and ran the following command:

grep -A40 -B10 COLLECTD_SOCKET /dev/sdf

What this does is search the raw contents of the entire drive at the device level for the term "COLLECTD_SOCKET" and print the 10 lines before the match and 40 lines after the match. It took a little while (as you'd expect for reading the whole device) but I found a number of old versions of the script I was working on, including the version just before by bug caused it to self-destruct.

I guess the lesson here is that UNIX gives you lots of ammunition to shoot yourself with, but it also gives you plenty of gauze to help you heal yourself as well.

Thursday, June 3, 2010

Setting up Collectd Collection3 on Ubuntu Lucid 10.04

Unfortunately the wiki on how to set up collection3 is not that great. In particular it glosses over how to configure apache. But if you're running Ubuntu Lucid 10.04, it's actually pretty easy to set up collectd and collection3. I'll walk you through the steps.

First, you'll need to install the needed dependencies:

sudo apt-get update -y
sudo apt-get install -y apache2 libconfig-general-perl librrds-perl libregexp-common-perl libhtml-parser-perl collectd-core

Then we need to configure collectd to sample some data and store the data as RRDs. Drop this file in /etc/collectd/collectd.conf

LoadPlugin cpu
LoadPlugin load
LoadPlugin memory
LoadPlugin disk
LoadPlugin rrdtool
<Plugin rrdtool>
  DataDir "/var/lib/collectd/rrd/"

Next we configure apache to use collection3. Copy this file into /etc/apache2/conf.d/collection3.conf

ScriptAlias /collectd/bin/ /usr/share/doc/collectd-core/examples/collection3/bin/
Alias /collectd/ /usr/share/doc/collectd-core/examples/collection3/

<Directory /usr/share/doc/collectd-core/examples/collection3/>
    AddHandler cgi-script .cgi
    DirectoryIndex bin/index.cgi
    Options +ExecCGI
    Order Allow,Deny
    Allow from all

Now let's start collectd and restart apache:

sudo /etc/init.d/apache2 reload
sudo /etc/init.d/collectd start

It'll take collectd a minute to gather enough data to usefully graph. Then you can point your browser to

And you'll be able to graph data!

Note 1: You may need to choose "hour" from the pulldown if you just started collectd, since it doesn't have enough data to graph a day yet
Note 2: The apache configuration is not secure; anyone could just navigate to your machine and see those graphs. Use SSL/.htaccess or other methods to lock down access

Wednesday, May 5, 2010

Migrating Unfuddle Tickets to JIRA

I found myself needing to migrate bugs from Unfuddle, which exports them in a custom XML format, to JIRA, which can import CSV (documentation). I threw together a quick Java class to help me do this. It takes backup.xml generated from Unfuddle and creates a CSV which can be read by JIRA. It imports the following fields:
  • Summary
  • Status
  • Description
  • Milestone (as a custom JIRA field)
  • Assignee
  • Reporter
  • Resolution (if resolved)
  • Resolution description (as a comment)
  • Creation time
  • Resolved time (if resolved)
Furthermore it outputs the bugs in the order of the ID in Unfuddle, so that if you're importing into an empty JIRA project, the bugs will have the same number as in Unfuddle. It assumes the JIRA usernames correspond to Unfuddle usernames, though you can easily map differences by modifying the lookupUser function. Once you generate the CSV, you can give the configuration file below to the JIRA CSV Import wizard to take care of the mappings. You'll want to update
  • existingprojectkey
to match your project. There are a few notable things that are missed with this tool:
  • Time of day for creation/resolved
  • Comments
The tool should run without modification and requires only Joda Time as a dependency under JDK 1.6. This is total slapdash, quick-n-dirty, git-er-done code for a one-off conversion. If anyone would like to extend this tool or generalize it, that would be great :)

Java class

// Original author Gabe Nell. Released under the Apache 2.0 License

import java.util.ArrayList;
import java.util.Collections;
import java.util.HashMap;
import java.util.List;
import java.util.Map;

import javax.xml.parsers.DocumentBuilderFactory;

import org.joda.time.DateTime;
import org.joda.time.format.DateTimeFormat;
import org.joda.time.format.DateTimeFormatter;
import org.w3c.dom.Document;
import org.w3c.dom.Element;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;

public class UnfuddleToJira {

    private static final DateTimeFormatter DATE_FORMATTER = DateTimeFormat.forPattern("yyyyMMdd");

    private final Document doc;
    private final PrintStream output;
    private final Map<String, String> milestones;
    private final Map<String, String> people;

    public UnfuddleToJira(Document doc, PrintStream output) {
        this.doc = doc;
        this.output = output;
        this.milestones = parseMilestones(doc);
        this.people = parsePeople(doc);

    private static Map<String, String> parseMilestones(Document doc) {
        Map<String, String> milestones = new HashMap<String, String>();
        NodeList milestoneNodes = doc.getElementsByTagName("milestone");
        for (int i = 0; i < milestoneNodes.getLength(); i++) {
            Element elem = (Element)milestoneNodes.item(i);
            String title = elem.getElementsByTagName("title").item(0).getTextContent();
            String id = elem.getElementsByTagName("id").item(0).getTextContent();
            milestones.put(id, title);
        System.out.println("Found " + milestones.size() + " milestones: " + milestones);
        return milestones;

    private static Map<String, String> parsePeople(Document doc) {
        Map<String, String> people = new HashMap<String, String>();
        NodeList peopleNodes = doc.getElementsByTagName("person");
        for (int i = 0; i < peopleNodes.getLength(); i++) {
            Element elem = (Element)peopleNodes.item(i);
            String name = elem.getElementsByTagName("username").item(0).getTextContent();
            String id = elem.getElementsByTagName("id").item(0).getTextContent();
            people.put(id, name);
        System.out.println("Found " + people.size() + " people: " + people);
        return people;

    private static String prepareForCsv(String input) {
        if (input == null) return "";
        return "\"" + input.replace("\"", "\"\"") + "\"";

    private static String convertDate(String input) {
        return DATE_FORMATTER.print(new DateTime(input));

    private String lookupUser(String id) {
        String person = people.get(id);
         * Here you can transform a person's username if it changed between
         * Unfuddle and JIRA. Eg: <tt> 
         * if ("gabe".equals(person)) {
         *     person = "gabenell";
         * }
         * </tt>
        return person;

    private String lookupMilestone(String id) {
        return milestones.get(id);

    private void writeCsvHeader() {
        StringBuilder builder = new StringBuilder(256);
        builder.append("Summary, ");
        builder.append("Status, ");
        builder.append("Assignee, ");

    private void writeCsvRow(Ticket ticket) {
        StringBuilder builder = new StringBuilder(256);
        builder.append(prepareForCsv(ticket.summary)).append(", ");
        builder.append(prepareForCsv(ticket.status)).append(", ");
        builder.append(prepareForCsv(lookupUser(ticket.assigneeId))).append(", ");
        builder.append(prepareForCsv(lookupUser(ticket.reporterId))).append(", ");
        builder.append(prepareForCsv(ticket.resolution)).append(", ");
        builder.append(prepareForCsv(convertDate(ticket.createdTime))).append(", ");
        String resolveTime = ticket.resolution != null ? convertDate(ticket.lastUpdateTime) : null;
        builder.append(prepareForCsv(resolveTime)).append(", ");
        builder.append(prepareForCsv(lookupMilestone(ticket.milestoneId))).append(", ");

        // JIRA doesn't have the notion of a resolution description, add it as a
        // comment
        if (ticket.resolutionDescription != null) {

    public void writeCsv() throws Exception {
        NodeList ticketNodes = doc.getElementsByTagName("ticket");
        List<Ticket> tickets = new ArrayList<Ticket>();
        for (int i = 0; i < ticketNodes.getLength(); i++) {
            Node node = ticketNodes.item(i);
            Element nodeElem = (Element)node;
            Ticket ticket = new Ticket();
            NodeList ticketElements = nodeElem.getChildNodes();
            for (int j = 0; j < ticketElements.getLength(); j++) {
                Node ticketSubNode = ticketElements.item(j);
                String nodeName = ticketSubNode.getNodeName();
                if ("id".equals(nodeName)) {
           = ticketSubNode.getTextContent();
                } else if ("status".equals(nodeName)) {
                    ticket.status = ticketSubNode.getTextContent();
                } else if ("summary".equals(nodeName)) {
                    ticket.summary = ticketSubNode.getTextContent();
                } else if ("description".equals(nodeName)) {
                    ticket.description = ticketSubNode.getTextContent();
                } else if ("milestone-id".equals(nodeName)) {
                    ticket.milestoneId = ticketSubNode.getTextContent();
                } else if ("assignee-id".equals(nodeName)) {
                    ticket.assigneeId = ticketSubNode.getTextContent();
                } else if ("reporter-id".equals(nodeName)) {
                    ticket.reporterId = ticketSubNode.getTextContent();
                } else if ("resolution".equals(nodeName)) {
                    ticket.resolution = ticketSubNode.getTextContent();
                } else if ("resolution-description".equals(nodeName)) {
                    ticket.resolutionDescription = ticketSubNode.getTextContent();
                } else if ("created-at".equals(nodeName)) {
                    ticket.createdTime = ticketSubNode.getTextContent();
                } else if ("updated-at".equals(nodeName)) {
                    ticket.lastUpdateTime = ticketSubNode.getTextContent();
        System.out.println("Writing " + tickets.size() + " tickets...");

        // Output to CSV in order of ticket number
        for (Ticket ticket : tickets) {

    public static class Ticket implements Comparable<Ticket> {

        public String id;
        public String summary;
        public String status;
        public String description;
        public String milestoneId;
        public String assigneeId;
        public String reporterId;
        public String resolution;
        public String resolutionDescription;
        public String createdTime;
        public String lastUpdateTime;

        public int compareTo(Ticket other) {
            return Integer.parseInt(id) - Integer.parseInt(;

    public static void main(String[] args) throws Exception {
        DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
        if (args.length != 2) {
            System.err.println("Usage: UnfuddleToJira /path/to/unfuddle/backup.xml /path/to/jira/output.csv");
        String inputFilename = args[0];
        String outputFilename = args[1];
        PrintStream output = new PrintStream(new FileOutputStream(outputFilename), true, "UTF-8");
        UnfuddleToJira converter = new UnfuddleToJira(factory.newDocumentBuilder().parse(inputFilename), output);


Configuration file:

# written by PropertiesConfiguration
# Wed May 05 07:12:57 UTC 2010
existingprojectkey = WEB
importsingleproject = false
importexistingproject = true
mapfromcsv = false
field.Resolution = resolution
field.Milestone = customfield_Milestone:select
field.Assignee = assignee
field.Summary = summary
field.Status = status
field.Description = description
field.Reporter = reporter
field.CreateTime = created
value.Status.closed = 6
value.Resolution.works_for_me = 5
value.Resolution.will_not_fix = 2 = 1
value.Status.reassigned = 1
value.Resolution.invalid = 4
value.Resolution.postponed = 2
value.Status.accepted = 3
value.Resolution.fixed = 1
value.Resolution.duplicate = 3 =
date.import.format = yyyyMMdd
field.ResolveTime = resolutiondate
date.fields = CreateTime
date.fields = ResolveTime