Nginx and Statsd

Add nginx-statsd to the official Nginx Debian package.

I like Nginx, but I love statsd. Zebrafish Labs (which seems to be offline), wrote an excellent plugin for nginx that allows you to send statsd metrics for each nginx request, its source is available on GitHub.

Unlike Apache httpd, (the open-source version of) nginx doesn’t support dynamic module loading, so we need to recompile nginx to add the plugin. I like to run all of my builds through Jenkins to centralize documentation and auditing when it’s time to modify or upgrade our software.

Create a new Jenkins job, it doesn’t need to track any upstream source control, as we’re pulling source from the official upstream apt mirror.

Add this Gist as a Jenkins build step (Execute Shell) to extract the official deb, update the changelog, add the latest nginx-statsd plugin to the build, and recompile.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
#!/bin/bash -ex

if [ ! -e "/etc/apt/sources.list.d/nginx.list" ]; then
  echo "ERROR! This build slave isn't configured with the nginx apt mirror! (see: http://wiki.nginx.org/Install)"
  exit 1
fi

apt-get source nginx

cd $WORKSPACE/nginx-*/debian/

# Increment the package version by updating the changelog
cat > changelog <<CHANGELOG;
nginx (1.6.2-${BUILD_NUMBER}.local) trusty; urgency=medium
  * Package built by jenkins
 -- jenkins <${USER}@${NODE_NAME}>  $(date -R)
CHANGELOG

mkdir modules
cd $WORKSPACE/nginx-*/debian/modules

wget https://github.com/zebrafishlabs/nginx-statsd/archive/master.tar.gz
tar xvf master.tar.gz
rm master.tar.gz

# Enable the nginx-statsd module
sed -i 's|CFLAGS="" ./configure \\|CFLAGS="" ./configure --add-module=debian/modules/nginx-statsd-master \\|' $WORKSPACE/nginx-*/debian/rules

cd $WORKSPACE/nginx-*/

dpkg-buildpackage

I like to add a Post-build Action to Archive the artifacts of all the *.deb files created from each build, then upload the deb package to an internal apt mirror.

Since this modifies the official deb source package, the resulting binary will (most likely) share the same config as your existing nginx binary package, just adding the nginx-statsd plugin. However, take care if you previously installed nginx from source, as the official package is fairly liberal with its included modules.

Re-deploy nginx from this package and you’ll get some nifty new config options (from the nginx-statsd README):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
http {
    # Set the server that you want to send stats to.
    statsd_server your.statsd.server.com;

    # Randomly sample 10% of requests so that you do not overwhelm your statsd server.
    # Defaults to sending all statsd (100%). 
    statsd_sample_rate 10; # 10% of requests

    server {
        listen 80;
        server_name www.your.domain.com;

        # Increment "your_product.requests" by 1 whenever any request hits this server. 
        statsd_count "your_product.requests" 1;

        location / {

            # Increment the key by 1 when this location is hit.
            statsd_count "your_product.pages.index_requests" 1;

            # Increment the key by 1, but only if $request_completion is set to something.
            statsd_count "your_product.pages.index_responses" 1 "$request_completion";

            # Send a timing to "your_product.pages.index_response_time" equal to the value
            # returned from the upstream server. If this value evaluates to 0 or empty-string,
            # it will not be sent. Thus, there is no need to add a test.
            statsd_timing "your_product.pages.index_response_time" "$upstream_response_time";

            # Increment a key based on the value of a custom header. Only sends the value if
            # the custom header exists in the upstream response.
            statsd_count "your_product.custom_$upstream_http_x_some_custom_header" 1 
                "$upstream_http_x_some_custom_header";

            proxy_pass http://some.other.domain.com;
        }
    }
}

You can use any nginx variable in either the statsd key name or the value. See ngx_http_core_module docs for a list. Some of my favorites are below:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
server {
  statsd_count "nginx.requests" 1;
  statsd_count "nginx.responses.$status" 1 "$status";
  statsd_count "nginx.request_length" "$request_length";
  statsd_count "nginx.bytes_sent" "$bytes_sent";
 
  location /api/v1 {
    statsd_count    "nginx.location.api_v1" 1;
    statsd_timing   "nginx.upstream.api_v1.request_time" "$request_time";
    statsd_timing   "nginx.upstream.api_v1.upstream_response_time" "$upstream_response_time";
 
    include         proxy.conf;
    proxy_pass      http://api_v1;
    proxy_redirect  default;
  }
  location /api/v2 {
    statsd_count    "nginx.location.api_v2" 1;
    statsd_timing   "nginx.upstream.api_v2.request_time" "$request_time";
    statsd_timing   "nginx.upstream.api_v2.upstream_response_time" "$upstream_response_time";
 
    include         proxy.conf;
    proxy_pass      http://api_v2;
    proxy_redirect  default;
  }
}

I use this style config to report on the total number of requests compared to the number of responses of each type. For example, nginx.responses.[45][0-9]{2} / nginx.requests will return the percentage of errors this server returns to clients.

This also provides performance profiling of different upstreams, so you can compare the request_time (total nginx time, from open to close of client requests) versus the upstream_response_time (time nginx spent waiting for the application server to process and return data).