Varnish on Gentoo

Overview:

Over the past few months we've been focusing on enhancing the performance of the web portal we've been building (make it fast). Part of this involved looking at various types of caching solutions, both hardware and software, to see what they deliver. One of these was Varnish, a linux-based HTTP accelerator written from the ground up to be a high performance caching reverse proxy. Let's get it running against the EnergizedWork wiki and see what happens.

Installation:

I use Gentoo as my Linux flavour, so it's off to portage to see what we've got:
hyperion # emerge -s varnish
Searching...   
[ Results for search key : varnish ]
[ Applications found : 1 ]
 
*  www-servers/varnish [ Masked ]
      Latest version available: 1.1.2
      Latest version installed: [ Not Installed ]
      Size of files: 570 kB
      Homepage:      http://varnish.linpro.no/
      Description:   Varnish is an HTTP accelerator
      License:       BSD-2

Hmmm, ok, my arch is amd64 so maybe it's not officially supported. Let's have a look to see what keywords are required:

hyperion # emerge -av varnish

These are the packages that would be merged, in order:

Calculating dependencies /
!!! All ebuilds that could satisfy "www-servers/varnish" have been masked.
!!! One of the following masked packages is required to complete your request:
- www-servers/varnish-1.1.2 (masked by: missing keyword)
- www-servers/varnish-1.1.1 (masked by: missing keyword)
- www-servers/varnish-1.0.4 (masked by: missing keyword)

For more information, see MASKED PACKAGES section in the emerge man page or 
refer to the Gentoo Handbook.
Ah, ok, 'missing keyword' - it's not been tested on this architecture. Let's add varnish to our package.keywords to move on:
hyperion # echo 'www-servers/varnish ~*' >> /etc/portage/package.keywords
Now we can successfully emerge varnish and on to the next step - configuration.

Configuration:

The ebuild finishes with this note:
hyperion # emerge varnish
...
  No demo-/sample-configfile is included in the distribution -
  please read the man-page for more info.
  A sample (localhost:8080 -> localhost:80) for gentoo is given in
     /etc/conf.d/varnishd"
Let's have a look at what's in conf.d/varnishd:
hyperion # cat /etc/conf.d/varnishd
# options passed to varnish on startup
# please see the varnishd man page for more options
VARNISHD_OPTS="-a localhost:8080 -b localhost:80"
Well it looks like 'man varnishd' is our next step:
VARNISHD(1)               BSD General Commands Manual              VARNISHD(1)

NAME
     varnishd -- HTTP accelerator daemon
...
     -a address[:port]
                 Listen for client requests on the specified address and port.  The address can be a host name (``localhost''), an IPv4 dotted-quad (``127.0.0.1''), or
                 an IPv6 address enclosed in square brackets (``[::1]'').  If address is not specified, varnishd will listen on all available IPv4 and IPv6 interfaces.
                 If port is not specified, the default HTTP port as listed in /etc/services is used.

     -b host[:port]
                 Use the specified host as backend server.  If port is not specified, the default is 8080.

OK, so -a is the 'frontend' address varnish will listen on and -b sets the origin server. Looking down the manpage I also see this option:
     -f config   Use the specified VCL configuration file instead of the builtin default.  See vcl(7) for details on VCL syntax.
It looks like there's a builtin default configuration. Phew! :) Since I'm running Jetty on port 8080 and I want to receive requests on port 80 I'll update the config to reflect this:
hyperion # cat /etc/conf.d/varnishd
VARNISHD_OPTS="-a localhost:80 -b localhost:8080"

Now it's time to give it a test run.

hyperion # /etc/init.d/varnishd start
 * Caching service dependencies ...                                                              [ ok ]

 * Starting varnish ...                                                                          [ ok ]

Testing:

Now it's time to browse around the wiki and see what varnish does. Straight off I notice that it feels snappier and more responsive, but maybe I'm delusional :) Let's use Firebug to have a look at the HTTP reponse headers:
Response Headers
Content-Type	text/html; charset=utf-8
Server	        Jetty(6.1.6)
Content-Length	15406
Date	        Sat, 15 Mar 2008 13:27:02 GMT
X-Varnish	1795100395
Age	        0
Via	        1.1 varnish
Connection	keep-alive
OK so varnish seems to be doing its thing. Good! After browsing around the wiki for a bit I decide to log in to see if everything's still working ok. Hmmmm, not so good. Varnish isn't caching anything anymore and the responsiveness seems to have gone back to its original state. After logging out and clearing my cookies I see varnish kick in again. So we're in a bit of a bind here. I don't want Varnish to cache page content when I'm logged in as that will most probably change, but I sure do want it to cache static resources such as images, css and javascript. Seems like I'll need a custom configuration; time to have a deeper look at VCL - the Varnish Configuration Language.

VCL:

The best place for vcl info seems currently to be the man page, which is very well put together & quite exhaustive, though you can look at the Varnish website for some examples and of course google around when you have something a more specific to find.
hyperion # man vcl
VCL(7)               BSD Miscellaneous Information Manual               VCL(7)

NAME
     VCL -- Varnish Configuration Language
...
EXAMPLES
     The following code is the equivalent of the default configuration with the backend address set to "backend.example.com" and no backend port specified.

         backend default {
             set backend.host = "backend.example.com";
             set backend.port = "http";
         }

         sub vcl_recv {
             if (req.request != "GET" && req.request != "HEAD") {
                 pipe;
             }
             if (req.http.Expect) {
                 pipe;
             }
             if (req.http.Authenticate || req.http.Cookie) {
                 pass;
             }
             lookup;
         }

         sub vcl_pipe {
             pipe;
         }

         sub vcl_pass {
             pass;
         }

         sub vcl_hash {
             set req.hash += req.url;
             set req.hash += req.http.host;
             hash;
         }

         sub vcl_hit {
             if (!obj.cacheable) {
                 pass;
             }
             deliver;
         }

         sub vcl_miss {
             fetch;
         }

         sub vcl_fetch {
             if (!obj.valid) {
                 error;
             }
             if (!obj.cacheable) {
                 pass;
             }
             if (obj.http.Set-Cookie) {
                 pass;
             }
             insert;
         }

         sub vcl_deliver {
             deliver;
         }

         sub vcl_timeout {
             discard;
         }

         sub vcl_discard {
             discard;
         }
Looking closely at this configuration we can see that this default configuration won't cache any HTTP requests with cookies - the vcl_recv method issues a 'pass' rather than a cache 'lookup' if the request contains a cookie (req.http.Cookie) and the same happens again in the vcl_fetch method. Looking further down the man page there's an example that might help:
     The following snippet demonstrates how to force Varnish to cache documents even when cookies are present.

         sub vcl_recv {
             if (req.request == "GET" && req.http.cookie) {
                 lookup;
             }
         }

         sub vcl_fetch {
             if (obj.http.Set-Cookie) {
                 insert;
             }
         }
OK that's not exactly what we want, but it gives us an angle of approach at least. Now we need to figure out how to cache only certain content types. The Varnish website gives us a helpful example:
sub vcl_recv {
 if (req.request == "GET" && req.url ~ "\.(gif|jpg|swf|css|js)$") {
    lookup;
 }
}
Looks like this VCL is quite a powerful language. First off let's paste the default configuration from the man page into an external file, say /etc/varnish.conf. (Note: if you're a vi user like me and use the mouse to select the text you want to paste from the man page, make sure you put vi into paste mode, :set paste, so that the auto-indenting doesn't screw up the format of the pasted text - this can be quite annoying). Now let's update the conf.d/varnishd file to use it, using the -f option to tell varnish to load our external configuration file. Since we've specified our backend server in the configuration file we can remove the -b option from the VARNISHD_OPTS.
hyperion # cat /etc/conf.d/varnishd
VARNISHD_OPTS="-a localhost:80 -f /etc/varnish.conf"
Restarting varnish verifies that it's happy with the config and we can push on with our custom configuration.

Custom Configuration:

Now we need to modify our default configuration file to cache requests for resources (javascript, css, images) even though they have have cookies associated with them. Modifying the vcl_recv method seems to do the trick:
        sub vcl_recv {
             if (req.request != "GET" && req.request != "HEAD") {
                 pipe;
             }
             if (req.http.Expect) {
                 pipe;
             }
             if (req.request == "GET" && req.url ~ "\.(gif|jpg|png|swf|css|js)$") {
                 remove req.http.Set-Cookie;
                 lookup;
             }
             if (req.http.Authenticate || req.http.Cookie) {
                 pass;
             }
             lookup;
         }
The extra 'if' statement before the if (req.http.Authenticate || req.http.Cookie) { condition removes the cookies from the request header (remove req.http.Set-Cookie;) if the request is for a gif, jpg etc. (req.url "\.(gif|jpg|png|swf|css|js)$") before looking it up from the cache (lookup;). Restart varnish and a browse around the wiki with Firebug seems to give us what we're looking for. Let's use wget to verify that it works just as well without cookies (we could additionally test with cookies using wget's --header option):
hyperion # wget -S -O /dev/null http://wiki.energizedwork.com/images/ew.png
--14:46:30--  http://wiki.energizedwork.com/images/ew.png
           => `/dev/null'
Resolving wiki.energizedwork.com... 
Connecting to wiki.energizedwork.com|:80... connected.
HTTP request sent, awaiting response... 
  HTTP/1.1 200 OK
  Content-Type: image/png
  Last-Modified: Mon, 03 Mar 2008 22:37:39 GMT
  Server: Jetty(6.1.6)
  Content-Length: 41364
  Date: Sat, 15 Mar 2008 14:46:30 GMT
  X-Varnish: 1655011862
  Age: 0
  Via: 1.1 varnish
  Connection: keep-alive
Length: 41,364 (40K) [image/png]

100%[===============================================================================================================================================>] 41,364        --.--K/s             

14:46:30 (603.47 MB/s) - `/dev/null' saved [41364/41364]
Looks good! We're done, for the moment at least.

Conclusion

Varnish, while a relative newcomer to the scene, appears well thought-out, powerful, fully featured and pretty easy to get up and running. I'm looking forward to putting it through its paces over the coming weeks in a more rigorous environment. --Gus

Obeying HTTP-Cache headers

When content is sent to the browser, it can be classed as 'not-cacheable', or a few other different pragmas. Ideally our proxy should treat this the same as any client would.
    sub vcl_fetch {
             if (!obj.valid) {
                 error;
             }
             if (!obj.cacheable) {
                 pass;
             }
             if (obj.http.Set-Cookie) {
                 pass;
             }
	     if (obj.http.Pragma ~ "no-cache" || obj.http.Cache-Control ~ "no-cache" || obj.http.Cache-Control ~ "private") {
	     	pass;
	     }
             insert;
         }

So, when we now get a chunk of content back where the header information defines it as 'no-cahe' or 'private', it is no longer inserted into the cache.

We also require an addition to the recv routine aswell ..

    sub vcl_recv {
             ...
             if (req.http.Cache-Control ~ "no-cache") {
		pass;
	     }
	     lookup;
         }

Ah, now that more efficient. If the 'no-cache' is set, it won't even bother for a lookup in the cache, and pass it direct.

As a general rule, most sites dont worry about these pragma settings, but if you are using a CMS or other type of management system, it will often try to create efficient caching rules, or protect private pages from being cached locally (eg banks). --Aaron

Add new attachment

Only authenticated users are allowed to upload new attachments.
« This page (revision-42) was last changed on 12-06-2008 12:19 by Aaron