Thursday, July 9, 2015

Using HAProxy as a reverse proxy for AWS microservices

Amazon's EC2 micro instances offer a very affordable option for hosting a Docker-based micro-services architecture. Each micro instance can host several Docker containers.  For example, you may have a simple Node.js-based web application that you would like exposed as and another Java/Tomcat webapp surfaced at Each could be hosted through a separate (and perhaps clustered) Docker container, with additional containers used for exposing API-services. The architecture might resemble:

Notice the use of HAProxy, which is being used in this instance as a load balancer and reverse proxy. It is being housed on its own micro-EC2 instance, and will direct inbound traffic to its destination based-upon subdomain routing rules (HAProxy supports a wide variety of routine rules, not unlike Apache mod_proxy).  Both DNS entries for subdomain1 and subdomain2 will direct traffic  to the same public IP address, which in this case is intercepted by the HAProxy. Configuration routing rules will then forward the inbound request to the appropriate Docker container that is hosting the web site.

Configuring HAProxy is very straightforward, which may explain its popularity within the open source community (learn more about HAProxy at  Let's look at an example HAProxy configuration file that would support the above architecture (haproxy.cfg is typically found at /etc/haproxy). We'll start with the two global and default sections, which you likely can leave "as-is" for most scenarios:

        maxconn 4096
        user haproxy
        group haproxy
        log local0 debug

        log     global
        mode    http
        option  httplog
        option  dontlognull
        retries 3
        option redispatch
        option http-server-close
        option forwardfor
        maxconn 2000
        timeout connect 5s
        timeout client  15min
        timeout server  15min

Next, we'll move on to the "frontend" section, which is used to describe a set of listening sockets accepting client connections and is used to configure the rules for the IP/port forwarding that will redirect traffic to an appropriate host for processing.

frontend public
        bind *:80

        # Define hosts
        acl host_subdomain1 hdr(host) -i
        acl host_subdomain2 hdr(host) -i

        ## figure out which one to use
        use_backend subdomain1 if host_subdomain1
        use_backend subdomain2 if host_subdomain2

The "bind" directive is used to indicate which port HAProxy will be listening (in this case, port 80). The "acl" directives define the criteria for how to route the inbound traffic. The first parameter that follows the directive is simply an internal name for future referencing, with the remainder of the parameters defining the methods used for matching some element of the inbound request that is being used as the basis for routing (the HAProxy docs provide more details). In this case, the first "acl" directive is simply stating that if the inbound request has a hostname of "", then assign the match to acl identifier called "host_subdomain1". Basically, we are simply defining the criteria for matching the inbound request.

The next part of this section is the "use_backend" directive, which assigns the matching "acl" rule to a specific host for processing.  So, in the first example, it's saying that if the "acl" match name for the inbound request is assigned to "host_subdomain1", then use the backend identified by "subdomain1". Those "backends" are defined in the section that follows:

backend subdomain1
        option httpclose
        option forwardfor
        cookie JSESSIONID prefix
        server subdomain-1 check

backend subdomain2
        option httpclose
        option forwardfor
        cookie JSESSIONID prefix
        server subdomain-2 localhost:8080 check

In the first "backend" directive, we are defining the server backend used for processing that was identified through the assigned name "subdomain1" (these are just arbitrary names - just be sure to match what you have in the frontend directive).  The "server" directive (which uses an arbitrary name assignment as the first parameter) is then used to identify which host/port to forward the request, which in this case is the host (the check parameter is actually only relevant when using load balancing). You can have multiple server directives defined if you are using load balancing/clustering.

There are a multitude of configuration options available for HAProxy, but this simple example illustrates how easily you can use it as a reverse proxy, redirecting inbound traffic to its final destination.

APIs & Microservices

One of the challenges in devising a API solution that runs using microservices is how to "carve" up the various service calls so that they can hosted in separate Docker containers. Using the subdomain based approach (i.e., I described earlier for standard web applications isn't really practical when you want to convey a uniform API to your development community. However, it can still be accomplished in much the same fashion using URL path-based reverse proxy rules. For example, consider these two fictitious API calls:


In the first example, this is a RESTful GET call that retrieves a given contract JSON object based upon a contractId that was specified.  A similar pattern is used in example 2 for fetching a customer record. As you can see, the first path element is really the "domain" associated with the web service call. This could be used as a basis for separating those domains calls into their own separate microservice Docker container(s).  Once nice benefit to doing so is that one of those domains may typically experience more traffic than the other. By splitting them into separate services, you can perform targeted scaling.

If you enjoyed this article, please link to it accordingly.

Friday, February 13, 2015

Multi-Tier Architecture Tutorial using Docker


While there are many very good Docker tutorials currently available, I found that many are either too simplistic in the scenarios offered, or in some cases, too complex for my liking. Thus, I decided to write my own tutorial that describes a multi-tier architecture configured using Docker.


This tutorial assumes that you have successfully installed Git and Docker. I’m running Boot2Docker on a Mac, which enables Docker to be run nearly as though it were installed natively on the Mac (on the PC, Boot2Docker isn’t quite as straightforward). I assume some  basic level of familiarity with Docker (a good source of information is


The scenario I developed for purposes of this tutorial is depicted below:

As you can see, it involves multiple tiers:

MongoDB: This is our persistence layer that will be populated with some demo data representing customers. 

API Server: This tier represents Restful API services. It exposes the mongo data packaged as JSON (for purposes of this tutorial, it really doesn’t use any business logic as would normally be present).  This tier is developed in Java using Spring Web Services and Spring Data. 

WebClient: This is a really simple tier that demonstrates a web application written in Google’s Polymer UI framework. All of the business logic resides in Javascript, and a CORS tunnel is used to then access the API data from the services layer.

I’m not going to spend much time on the implementation of each tier, as it’s not really relevant for purposes of this Docker tutorial. I’ll touch on some of the code logic briefly just to set the context. Let’s begin by looking at how the persistence layer was configured.

MongoDB Configuration 

Note: The project code for this tier is located at:

Dockerhub contains an official image for mongo that we’ll use as our starting point ( In case we need to build upon what is installed in this image, I decided to clone the image by simply copying its Dockerfile (images can be either binary or declaratively via a text Dockerfile). Here’s the Dockerfile:

Line 5 identifies the base image used for the mongo installation - this being the official dockerfile/ubuntu image. Since no version was specified, we’re grabbing the latest.

Note: Comments in Dockerfiles start with #.

Line 8 begins the installation of the mongo database, along with some additional tools such s curl, git etc. Ubuntu’s apt-get is Ubuntu’s standard package management tool.

Lines 16 and 18 setup some mount points for where the mongo database files will reside (along with a git mount point in case we need it later). On line 21, we set the working directory to be that data mount point location (from line 16).

And lastly, we identify some ports that will need to be exposed by Mongo (lines 27 & 28).

Assuming you have grabbed the code from my git repository (, you can launch launch this Docker image by running the script (in Windows Boot2Docker, you’ll need to fetch the code within the VM container that Docker is running on). That script simply contains the following:

docker run -t -d -p 27017:27017 --name mongodb jeffdavisco/mongodb mongod --rest --httpinterface --smallfiles

When run, it will use the Dockerfile present in the directory to launch the container. You can confirm it’s running by using the docker ps command:

 As you can see from the above, in this case it launched Mongo using container Id of 4fb383781949. Once you have the container Id, you can look at the container’s server logs using:

docker logs --tail="all" 4fb383781949 #container Id

If you don’t see any results from docker ps, that means that the container started but then exited immediately. Try issuing the command docker ps -a - this will identify the container Id of the failed service and you can then use docker logs to identify what went wrong. You will need to remove that container prior to launching a new one using docker rm .

Once the Mongo container is running, we can now populate it with some demo data that we’ll use for the API service layer. A directory called northwind-mongo-master is present in project files and it contains a script called Let’s look at the contents of this file:

This script simply runs through each .csv file present in the directory, and use the mongoimport utility to load up the collection. However, there is one little wrinkle. Since mongoimport is connecting to a remote mongo database, the IP address of the remote host is required, and this remote host is the mongo container that was just launched. How did I determine which IP address to use? If you are using Boot2Docker, you can simply run boot2docker ip - this will provide you the IP address of the Docker host you are using. If you are running directly within a host running Docker, you could connect via localhost as your IP address (since that port was exposed to the Docker host when we launched the container).

In order to load the test data, update that script so that it has the proper IP address for your environment. Then you can run the script (, and it should populate the mongo database with the sample data. If you are using MongoHub, you can connect to the remote mongo host, and see the following collections available:

Fortunately, we don’t have to go through such hurdles on the remaining containers.

API Service Tier Configuration

The API service is a very simple Restful-based web service that simply exposes some of the mongo data in a JSON-format (granted, you can access mongo directly through it’s restful API, but in most real-world scenarios, you’d never expose that publicly).  The web application, written in Java Spring, uses the Spring Data library to access the remote mongo database. 

The project files for this tier are located at: While I won’t cover all of the Spring configuration/setup (after all, this isn’t a Spring demo), let’s briefly look at the main service class that will have an exposed web service:

Using the Spring annotations for Spring web services, we annotate this controller in line 5 so that Spring can expose the subsequent services using the base URL /customer. Then, we define one method that is used to expose all of the customer data from the Mongo collection. The CustomerDAO class contains the logic for interfacing with Mongo. 

Let’s look at the Dockerfile used for running this web service (using Jetty as the web server):

As you can see, in line 2 we’re identifying the base image for the API services as Java 7 image. Similar to the mongo Dockerfile, the next thing we do in lines 8-15 is install the various required dependencies such as such git, maven and the mongo client.  

Then, unlike with our mongo Dockerfile, we use git to install our Java web application. This is done in lines 21 thru 26. We create a location for our source files, then install them using git clone. Lastly, in line 28, we the shell script called Let’s take a look at that scripts contents:


echo `env`

mvn jetty:run

As you can see, it’s pretty minimal. We first echo the environment variables from this container so that, for troubleshooting purposes, can see them when checking the docker logs. Then, we simply launch the web application using mvn jetty:run. Since the application’s source code was already downloaded via git clone, maven will automatically compile the webapp and then launch the jetty web server.

Now, you maybe wondering, how does the web service know how to connect to the Mongo database? While we exposed the Mongo database port to the docker host, how is the connection defined within the java app so that it points to the correct location. This is done by using the environment variables automatically created when you specify a dependency/link between two containers. To get an understanding of how this is accomplished, let’s look at the startup script used to launch the container,

docker run -d --name tomcat-maven -p 8080:8080 
\ —link mongodb:mongodb jeffdavisco/tomcat-maven:latest 
\ /local/git/docker-maven-tomcat/

As you can see, the -link option is used to notify docker that this container has a dependency an another container, in this case, our Mongo instance. When present, the -link option will create a set of environment variables that get populated when the container is started. Let’s examine what those environment variables look like by examining with the docker logs command (remember, before you can launch this container, the mongo container must first be running):

JAVA_HOME=/usr/lib/jvm/java-7-oracle MONGODB_PORT_28017_TCP_ADDR= MONGODB_PORT_28017_TCP=tcp:// 

The environment variables starting with MONGODB represent those created through the link. How did it know to prefix was MONGODB? That’s simply because, when we launched the mongo container, we specified the optional —name parameter that provided a name alias for that container. 

Now, how is that incorporated into the API service? When we define the Spring Data properties for accessing mongo, we specify that an environment variable is used to identify the host. This is done in the spring-data.xml file (located under src/main/resources in the project code). 

                   write-fsync="true" />

So, when we startup the container, all that is required is to launch it using Let’s confirm that the service is working by accessing the service via the browser using:

In your environment, if you are using Boot2Docker, run boot2docker ip to identify the IP address in which the service is exposed. If running directly on a docker host, you should be able to specify localhost. This should bring back some JSON customer data such as: 

Now, let’s look at the last of the containers, the one used for exposing the data via a web page. 

Web Application Tier

The web application tier is comprised of a lightweight Python web server that just serves up a static HTML page along with corresponding Javascript.  The page uses Google’s Polymer UI framework for displaying the customer results to the browser. In order to permit Polymer to communicate directly with the API service, CORS was configured on on the API server to permit the inbound requests (I wanted to keep the web app tier as lightweight as possible).

Note: The source code/project files for this tier can be found at:

Here is the relevant piece of code from the Javascript for making the remote call:

Obviously, the VMHOST value present in the url property isn’t a valid domain name. Instead, when the Docker container is launched, it will replace that value with the actual IP of the API server.  Before we get into the details of this, let’s first examine the Dockerfile used for the Python server:

This Dockerfile is quite similar to the others we’ve been using. Like the mongo container, this one is based on the official ubuntu image. We then specify in lines 11-14 that Python and related libraries are loaded. In line 23-24, we prepare a directory for the git source code. We follow that up by then fetching the code using git clone in line 27, and then set our working directory to that location (line 28). (the CMD command in line 31 is actually ignored because we specify in the launch script which command to run (i.e., it overrides what is in the Dockerfile).

Let’s look at the startup script now used to launch our container:

docker run -d -e VMHOST= --name python -p 8000:8000 \ 
--link tomcat-maven:tomcat jeffdavisco/polymer-python:latest \

A couple of things to note in the above. Notice how I’m passing the IP address of my Boot2Docker instance using the environment flag -e VMHOST= Similar to what we needed to do when configuring the API service tier, you will have to modify this for your environment (if using Boot2Docker, run boot2docker ip to find the IP address of your docker host, or if running natively on Linux, you can use localhost). Notice the other thing we are doing is exposing port 8000, which is where our Python web server will be running under. Lastly, we are instructing the docker container to run the shell script. We’ll look at this next.

The script contains the following:


# replace placeholder with actual docker host
sed -i "s/VMHOST/$VMHOST/g" post-service/post-service.html

python -m SimpleHTTPServer 8000

The sed command is used for replacing that VMHOST placeholder token with the environment variable passed to the docker container when it was launched ($VMHOST). Once the html file has been updated, we then launch the python server using the command python -m SimpleHTTPServer 8000. After running the startup script, if everything went well, you should de able to then visit in your browser: http://:8000 (where dockeriphost is equal to your boot2docker ip address, or local docker host if on Linux).  You should see something like: 

You've now completed the tutorial! You have a docker container running mongo; another running an API service; and the last running simple web service.

I hope you've enjoyed this tutorial, and I hope to have a follow-up to it shortly describing how these the contains can be more easily managed using fig (learn more at: