Full-stack Varnish all the things

December 25th, 2015

While attending VUGX Rotterdam, which was a jolly time for the Varnish family, I have been pleasantly surprised by some little known but very useful new things available for your site.

Varnish Security Firewall

I’ll tell you about them in a moment, but I’ll start by plugging VSF: The Varnish Security Firewall, which is a really useful security framework for Varnish 4, which we recently made really, really easy to install. You can give it a spin right now to stop many of the most common attacks against your website today.

Shameless plugs out of the way, Varnish is a painless process to operate, but most sites out there need to admin web servers, they need to encrypt their traffic and they need to load balance, and all this together becomes pretty messy, requiring haproxy, nginx and sticky putty goo in a glorious sandwich.

Nginx Sandwich

SSL frontend with Hitch

One problem that keeps popping up is the lack of SSL support. No, Varnish is not going to link openssl before hell freezes over in PHK’s basement, because let’s be honest: it belongs in its own binary. Still, you can now replace your heavy-handed Nginx stack with the lightweight Hitch, a revamped stud, which even has rudimentary Let’sEncrypt support.


An SSL frontend config for Hitch is a simple 5 lines.

frontend = "[*]:443"
backend  = "[]:8000"
pem-file = "/etc/ssl/private/projects.hackeriet.pem"
user = nobody
write-proxy = on

With the write-proxy = on option, we can even use the PROXY protocol to pass the client.ip to Varnish instead of trusting X-Forwarded-For, but we’ll have to update Varnish to accept it by copying /lib/systemd/system/varnish.service to /etc/systemd/system/varnish.service and editing ExecStart to include -a:8000,PROXY like so:

ExecStart=/usr/sbin/varnishd -a :80 -a:8000,PROXY -a :6081 -T localhost:6082 -f /etc/varnish/default.vcl -S /etc/varnish/secret -s malloc,256m

On the backend side, it’s fairly straightforward to write SSL-backends for Varnish using stunnel.

Varnish as a Web Server

For years we’ve been half-joking that what Varnish needs is web server capabilities, but little did we know that this has been around for years!

web server

Today we can all enjoy Martin’s fsbackend vmod, which lets us serve (and cache) static files straight off the file system in a secure and fast way. It doesn’t come with much documentation though, so here’s how you use it to serve your sites with the standard “index.html”, some minimal trailing-slashes support and some content-typing:

vcl 4.0;
import fsbackend;
sub vcl_init {
  new webroot = fsbackend.root("/var/www");
  webroot.add_header("Cache-Control: public, max-age=3600");
sub vcl_backend_fetch {
  if(bereq.url ~ "^/mysite") {
    set bereq.url = regsub(bereq.url, "/mysite/?", "/");
    if(bereq.url ~ "/$") {
      set bereq.url = bereq.url + "index.html";
    set bereq.backend = webroot.backend();
sub vcl_backend_response {
  if (bereq.url ~ "\.html") {
    set beresp.http.Content-Type = "text/html";
  } elif(bereq.url ~ "\.txt") {
    set beresp.http.Content-type = "text/plain; charset=utf8";
  } else {
    set beresp.http.Content-Type = "text/plain";


This module is historically proceeded by Dridi’s efforts for Varnish 3 and 4, which by his own words was “a hack”, as well as the std.file(), which was not good enough for serving web pages.

Coupled with Hitch for SSL, this has allowed me to drop the Nginx sandwich completely on the static parts of my sites.

Regexes with sub-captures


Innate regexes are awesome in Varnish, but sub-captures are barely accessible and usually force you to copy and match a single string many times for multiple captures. This is both messy and not efficient. Enter Geoff from Uplex re vmod, which lets us access all subcaptures of a efficiently:

import re;
sub vcl_init {
  new myregex = re.regex("bar(\d+)");
sub vcl_recv {
  if (myregex.match(req.http.Foo)) {
    set req.http.Baz = myregex.backref(1, "");

Consistent Hashing

If you’re running a really large site where you’re regularily pinning your server bandwidth you know that load-balancing and sharding is where it’s at. That’s where the stateless persistent hashing vmod that Niels from Uplex presented comes in.
Consistent Hashing
When a frontend request hits your cluster, the vslp vmod looks up which cache is the master for that request, and then picks which backend owns the request in a consistent way across n servers. If you ever add frontends or backends, only 1/n+1 objects have to be rebalanced, giving you a limited increased load cost for adding new servers. The code has been in heavy production for the last couple of years!

This means you can keep adding frontends and backends to your pool, and you can round-robin load balance while scaling (near-) linearly.

Dynamic Backends

When running Varnish on EC2 with an ELB, it’s real tricky to keep the backend list up to date. Dynamic backend vmods will fix this problem Real Soon Now.

With consistent hashing in the Varnish core, and dynamic backends, we might no longer need haproxy nor any other messy load balancer, simplifying the stack for quite a lot of sites out there.

fullstack of pancakes
There are also rumours of a FCGI or even WSGI vmod, which will allow us to forgo the ancient dogmatic cows of web adminning entirely, and go Full Stack Varnish. Let’s just pray that day will come soon enough.

Ho Ho Ho! I wish you a merry X-Forwarded For!

PS The title is a joke, and so is everything “full stack”.

netctl is lame

October 10th, 2015

One morning not long ago I installed a different graphics card and my machine no longer connected to the network upon startup.
The interface was not coming up and this being Arch Linux, it was netctl wouldn’t start, but it was being catty about why.

Having written about systemd before, I thought this might be a piece of cake to figure out, so I rolled up my sleeves.

I know that after a bunch of checks it just calls /usr/lib/network start eth, so I called it and found that my onboard network card had been renamed enp3s0 instead of enp2s0, so I changed the reference in /etc/system/eth to Interface=enp3s0.

Now /usr/lib/network start eth would get me online, but the machine would still not bring the interface up on boot! Netctl still wasn’t satisfied:

[root@wir ~]# systemctl status netctl
● netctl.service - (Re)store the netctl profile state
   Loaded: loaded (/usr/lib/systemd/system/netctl.service; enabled; vendor preset: disabled)
      Active: active (exited) since Sat 2015-10-10 12:53:55 CEST; 6min ago
        Process: 1027 ExecStart=/usr/bin/netctl restore (code=exited, status=1/FAILURE)
         Main PID: 1027 (code=exited, status=1/FAILURE)
            CGroup: /system.slice/netctl.service

            Oct 10 12:53:55 wir systemd[1]: Starting (Re)store the netctl profile state...
            Oct 10 12:53:55 wir systemd[1]: Started (Re)store the netctl profile state.

Lets try netctl directly:

# netctl start eth
# "failed, refer to journald -xn"


Oct 10 12:56:37 wir systemd[1]: sys-subsystem-net-devices-enp2s0.device: Job sys-subsystem-net-devices-enp2s0.device/start timed out.
Oct 10 12:56:37 wir systemd[1]: Timed out waiting for device sys-subsystem-net-devices-enp2s0.device.
-- Subject: Unit sys-subsystem-net-devices-enp2s0.device has failed
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
-- Unit sys-subsystem-net-devices-enp2s0.device has failed.
-- The result is timeout.
Oct 10 12:56:37 wir systemd[1]: Dependency failed for A basic dhcp ethernet connection.
-- Subject: Unit netctl@eth.service has failed
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
-- Unit netctl@eth.service has failed.
-- The result is dependency.
Oct 10 12:56:37 wir systemd[1]: netctl@eth.service: Job netctl@eth.service/start failed with result 'dependency'.
Oct 10 12:56:37 wir polkitd[762]: Unregistered Authentication Agent for unix-process:2441:16689 (system bus name :1.13, object path /org/freedesktop/PolicyKit1/AuthenticationAgent, locale en_US.UTF-8) (discon
Oct 10 12:56:37 wir systemd[1]: sys-subsystem-net-devices-enp2s0.device: Job sys-subsystem-net-devices-enp2s0.device/start failed with result 'timeout'.

Now I had no idea where to find sys-subsystem-net-devices-enp2s0.device, and started poking around in /lib/systemd/system,
/usr/lib/systemd/system and /etc/systemd/system. Finally I found /etc/systemd/system/netctl@eth.service which seems to be the failing service, and sure enough, it refers to the device, so I changed it:

Description=A basic dhcp ethernet connection

After that netctl would boot again.

stair warning

I just wanted to say that this problem is lame. I was under the impression that the (already lame) predictable network card renaming from eth0 to enpXsY was supposed to be stable to avoid this problem.
Furthermore, systemd and netctl made a very long line of breadcrums to finally solve this, instead of being obvious.
Note that the netctl@eth.service file was created by netctl and not me, so I had no prior knowledge of its existence.

And yes, clearly I should have just called `netctl reenable eth`
but that begs the question how should I have known about this without consulting the arch wiki?

Another thing I could have done to save me troubles was to disasble predictable network card renaming entirely, by booting the kernel with `net.ifnames=0`.

But then I would not be learning.

My Varnish pet peeves

August 23rd, 2015

I’ve been meaning to write a blog entry about Varnish for years now. The closest I’ve come is to write a blog about how to make Varnish cache your debian repos, make you a WikiLeaks cache and I’ve released Varnish Secure Firewall, but that without a word on this blog. So? SO? Well, after years it turns out there is a thing or two to say about Varnish. Read on to find out what annoys me and people I meet the most.

varnish on wood

Although you could definitely call me a “Varnish expert” and even a sometimes contributor, and I do develop programs, I cannot call myself a Varnish developer because I’ve shamefully never participated in a Monday evening bug wash. My role in the Varnish world is more… operative. I am often tasked with helping ops people use Varnish correctly, justify its use and cost to their bosses, defend it from expensive and inferior competitors, sit up long nites with load tests just before launch days. I’m the guy that explains the low risk and high reward of putting Varnish in front of your critical site, and the guy that makes it actually be low risk, with long nites on load tests and I’ll be the first guy on the scene when the code has just taken a huge dump on the CEO’s new pet Jaguar. I am also sometimes the guy who tells these stories to the Varnish developers, although of course they also have other sources. The consequences of this .. lifestyle choice .. is that what code I do write is either short and to the point or .. incomplete.

bug wash

I know we all love Varnish, which is why after nearly 7 years of working with this software I’d like to share with you my pet peeves about the project. There aren’t many problems with this lovely and lean piece of software but those which are there are sharp edges that pretty much everyone snubs a toe or snags their head on. Some of them are specific to a certain version, while others are “features” present in nearly all versions.

And for you Varnish devs who will surely read this, I love you all. I write this critique of the software you contribute to, knowing full well that I haven’t filed bug reports on any of these issues and therefore I too am guilty in contributing to the problem and not the solution. I aim to change that starting now :-) Also, I know that some of these issues are better lived with than fixed, the medicine being more hazardous than the disease, so take this as all good cooking; with a grain of salt.

Silent error messages in init scripts

Some genious keeps inserting 1>/dev/null 2>&1 into the startup scripts on most Linux distros. This might be in line with some wacko distro policy but makes conf errors and in particular VCL errors way harder to debug for the common man. Even worse, the `service varnish reload` script called `varnish-vcl-reload -q`, that’s q for please-silence-my-fatal-conf-mistakes, and the best way to fix this is to *edit the init script and remove the offender*. Mind your p’s and q’s eh, it makes me sad every time, but where do I file this particular bug report?

silent but deadly

debug.health still not adequately documented

People go YEARS using Varnish without discovering watch varnishadm debug.health. Not to mention that it’s anyone’s guess this has to do with probes, and that there are no other debug.* parameters, except for the totally unrelated debug parameter. Perhaps this was decided to be dev-internal at some point, but the probe status is actually really useful in precisely this form. debug.health is still absent from the param.show list and the man pages, while in 4.0 some probe status and backend info has been put into varnishstat, which I am sure to be not the only one being verry thankful for indeed.

Bad naming

Designing a language is tricky.


Explaining why purge is now ban and what is now purge is something else is mindboggling. This issue will be fixed in 10 years when people are no longer running varnish 2.1 anywhere. Explaining all the three-letter acronyms that start with V is just a gas.
Showing someone ban("req.url = "+ req.url) for the first time is bound to make them go “oh” like a racoon just caught sneaking through your garbage.
Grace and Saint mode… that’s biblical, man. Understanding what it does and how to demonstrate the functionality is still for Advanced Users, explaining this to noobs is downright futile, and I am still unsure whether we wouldn’t all be better off for just enabling it by default and forgetting about it.
I suppose if you’re going to be awesome at architecting and writing software, it’s going to get in the way of coming up with really awesome names for things, and I’m actually happy that’s still the way they prioritize what gets done first.

Only for people who grok regex

Sometimes you’ll meet Varnish users who do code but just don’t grok regex. It’s weak, I know, but this language isn’t for them.

Uncertain current working directory

This is a problem on some rigs which have VCL code in stacked layers, or really anywhere where it’s more appropriate to call the VCL a Varnish program, as in “a program written for the Varnish runtime”, rather than simply a configuration for Varnish.

UncertantyYou’ll typically want to organize your VCL in such a way that each VCL is standalone with if-wrappend rules and they’re all included from one main vcl file, stacking all the vcl_recv’s and vcl_fetches .

Because distros don’t agree on where to put varnishd’s current working directory, which happens to be where it’s been launched from, instead of always chdir $(basename $CURRENT_VCL_FILE), you can’t reliably specify include statements with relative paths. This forces us to use hardcoded absolute paths in includes, which is neither pretty nor portable.

Missing default director in 4.0

When translating VCL to 4.0 there is no longer any language for director definitions, which means they are done in vcl_init(), which means your default backend is no longer the director you specified at the top, which means you’ll have to rewrite some logic lest it bite you in the ass.

director.backend() is without string representation, instead of backend_hint,
so cannot do old style name comparisons, ie backends are first-class objects but directors are another class of objects.

the missing director

VCL doesn’t allow unused backends or probes

Adding and removing backends is a routine ordeal in Varnish.
Quite often you’ll find it useful to keep backup backends around that aren’t enabled, either as manual failover backups, because you’re testing something or just because you’re doing something funky. Unfortunately, the VCC is a strict and harsh mistress on this matter: you are forced to comment out or delete unused backends :-(

Workarounds include using the backends inside some dead code or constructs like

	set req.backend_hint = unused;
	set req.backend_hint = default;

It’s impossible to determine how many bugs this error message has avoided by letting you know that backend you just added, er yes that one isn’t in use sir, but you can definitely count the number of Varnish users inconvenienced by having to “comment out that backend they just temporarily removed from the request flow”.

I am sure it is wise to warn about this, but couldn’t it have been just that, a warning? Well, I guess maybe not, considering distro packaging is silencing error messages in init and reload scripts..

To be fair, this is now configurable in Varnish by setting vcc_err_unref to false, but couldn’t this be the default?

saintmode_threshold default considered harmful


If many different URLs keep returning bad data or error codes, you might concievably want the whole backend to be declared sick instead of growing some huge list of sick urls for this backend. What if I told you your developers just deployed an application which generates 50x error codes triggering your saintmode for an infinite amount of URLs? Well, then you have just DoSed yourself because you hit this threshold. I usually enable saintmode only after giving my clients a big fat warning about this one, because quite frankly this easily comes straight out of left field every time. Either saintmode is off, or the treshold is Really Large™ or even ∞, and in only some special cases do you actually want this set to an actual number.

Then again, maybe it is just my clients and the wacky applications they put behind Varnish.

What is graceful about the saint in V4?

While we are on the subject, grace mode being the most often misunderstood feature of Varnish, the thing has changed so radically in Varnish 4 that it is no longer recognizable by users, and they often make completely reasonable but devestating mistakes trying to predict its behavior.

To be clear on what has happened: saint mode is deprecated as a core feature in V4.0, while the new architecture now allows a type of “stale-while-revalidate” logic. A saintmode vmod is slated for Varnish 4.1.

But as of 4.0, say you have a bunch of requests hitting a slow backend. They’ll all queue up while we fetch a new one, right? Well yes, and then they all error out when that request times out, or if the backend fetch errors out. That sucks. So lets turn on grace mode, and get “stale-while-revalidate” and even “stale-if-error” logic, right? And send If-Modified-Since headers too, sweet as.

Now that’s gonna work when the request times out, but you might be surprised that it does not when the request errors out with 50x errors. Since beresp.saint_mode isn’t a thing anymore in V4, those error codes are actually going to knock the old object outta cache and each request is going to break your precious stale-while-error until the backend probe declares the backend sick and your requests become grace candidates.

Ouch, you didn’t mean for it to do that, did you?

The Saint

And if, gods forbid, your apphost returns 404′s when some backend app is not resolving, bam you are in a cascading hell fan fantasy.

What did you want it to do, behave sanely? A backend response always replaces another backend response for the same URL – not counting vary-headers. To get a poor mans saint mode back in Varnish 4.0, you’ll have to return (abandon) those erroneous backend responses.

Evil grace on unloved objects

For frequently accessed URLs grace is fantastic, and will save you loads of grief, and those objects could have large grace times. However, rarely accessed URLs suffer a big penalty under grace, especially when they are dynamic and ment to be updated from backend. If that URL is meant to be refreshed from backend every hour, and Varnish sees many hours between each access, it’s going to serve up that many-hour-old stale object while it revalidates its cache.

stale while revalidate
This diagram might help you understand what happens in the “200 OK” and “50x error” cases of graceful request flow through Varnish 4.0.

Language breaks on major versions

This is a funny one because the first major language break I remember was the one that I caused myself. We were making security.vcl and I was translating rules from mod_security and having trouble with it because Varnish used POSIX regexes at the time, and I was writing this really godaweful script to translate PCRE into POSIX when Kristian who conceived of security.vcl went to Tollef, who were both working in the same department at the time, and asked in his classical broker-no-argument kind of way "why don’t we just support Perl regexes?".
Needless to say, (?i) spent a full 12 months afterwards cursing myself while rewriting tons of nasty client VCL code from POSIX to PCRE and fixing occasional site-devestating bugs related to case-sensitivity.

Of course, Varnish is all the better for the change, and would get no where fast if the devs were to hang on to legacy, but there is a lesson in here somewhere.


So what's a couple of sed 's/req.method/req.request/'s every now and again?
This is actually the main reason I created the VCL.BNF. For one, it got the devs thinking about the grammar itself as an actual thing (which may or may not have resulted in the cleanups that make VCL a very regular and clean language today), but my intent was to write a parser that could parse any version of VCL and spit out any other version of VCL, optionally pruning and pretty-printing of course. That is still really high on my todo list. Funny how my clients will book all my time to convert their code for days but will not spend a dime on me writing code that would basically make the conversion free and painless for everyone forever.

Indeed, most of these issues are really hard to predict consequences of implementation decisions, and I am unsure whether it would be possible to predict these consequences without actually getting snagged by the issues in the first place. So again: varnish devs, I love you, what are your pet peeves? Varnish users, what are your pet peeves?

Errata: vcc_err_unref has existed since Varnish 3.

Curiosity killed the printer

March 28th, 2015

A friend of mine write some thoughts about printers that are highly pertinent. I’ll let his words speak for themselves.

Curiosity killed the printer

Understanding today, I mean fully understanding, demands a expert in
a huge amount of fields. Something as simple as a refrigerator
requires, for most people, a engineering degree. Looking at something
slightly more complicated in example a microwave you would probably
want a physics major in your back pocket as well. As Isaac Newton
rightfully quoted we are indeed standing on the should of giants. This
puts us in the joyful position of not needing to interpret some deep
meta-physical meaning of why, but rather through empirical studies we
can learn to understand actions and reactions. We can suddenly augment
the works of others, improvise our new lended wisdom to create works
of arts previously unimaginable. This new found way of science led us
to the creation of printers.

In any enterprise there most surely will be a printer available, these
days most likely a multi-functional beast connected to your Wifi and
implementing a fax-over-ip-through-skype-attachment protocol.
The question I pose is simple: Why won’t it work? All printers are
at heart a simple construction. Churning through paper and creating
tiny dots either by burning, or leaving it’s colourfull ink all over
the place.

These wonderful devices that helps us rapidly deforests the planet,
and in return give us the latest news about Miley Cirus and Norwegian
youths terrible score in mathematics, are poorly misunderstood; the
unwanted child of any IT-department. Often not accounted for, left
alone in their own network with no humans ever giving them any love.
It’s gotten so bad that the printers them self jam paper every now and
then just to feel the tender touch of a human a few times a year. They
are truly misunderstood. As the little black box they are, a printer
does nothing but slave away; but for whom?

The need for access, the need to be able to print, from anywhere to
any printer might come as a shock. But for some reason you will find
hundreds of thousands printers online. I don’t know why their there.
Perhaps the printers might have escaped the prison of corporate networks
to go live in the wild. But their diligence come at a price, when the
evil scientists of the world learns the way of the printers, who knows
what will happend? The prophecy once claimed that one day the printers
would rise, and indeed they did; 420 000 of them. How did we let a
technology of something so simple become a weapon of this evil
scientist? These printers that where just looking for a quiet secluded
life in the country, with the possibility of being useful to people to
whom it is easy to do good. From what it looks the printers might have
been fed to much Tolstoy.

It’s easy for all of us to forget the true nature of things. Printers
in themselves are not dangerous, they just want to live a life in
peace. On the other hand people, especially those who do things don’t
understand – the same people who demanded printers in the first place
- must be thought. It’s easy to brush something of as unneeded, to cut
some corners to make some extra short term income. We are constantly
optimizing everything today, it’s so sad to see people, companies,
and technologies thrown out for the “newer, better version of you”.
Anywhere you turn your head you can learn, not just the actions and
their reactions, but something about how physics, software, society,
or any other subject work at it’s core. You can probably just learn
alot about those plain old people that you see on your daily commute
if you just asked them. They will have much more interesting stories
than your daily news have to tell. We must always strive to ask
question, be bold in our critique and speak our minds. It’s in our
nature to experience, to break, assemble and learn. Most of the things
we surround us with everyday does not give us a deeper understanding
of anything. In fact the over generalization we humans apply to all
things optimizes the amount of knowledge needed to a minimum, letting
us do the important things in life like sharing picture of our newest
composition of milk and cereal.

As found on

NSA-proof SSH

January 6th, 2015

ssh-pictureOne of the biggest takeaways from 31C3 and the most recent Snowden-leaked NSA documents is that a lot of SSH stuff is .. broken.

I’m not surprised, but then again I never am when it comes to this paranoia stuff. However, I do run a ton of SSH in production and know a lot of people that do. Are we all fucked? Well, almost, but not really.

Unfortunately most of what Stribika writes about the “Secure Secure Shell” doesn’t work for old production versions of SSH. The cliff notes for us real-world people, who will realistically be running SSH 5.9p1 for years is hidden in the bettercrypto.org repo.

Edit your /etc/ssh/sshd_config:

Ciphers aes256-ctr,aes192-ctr,aes128-ctr
MACs hmac-sha2-512,hmac-sha2-256,hmac-ripemd160
KexAlgorithms diffie-hellman-group-exchange-sha256

Basically the nice and forward secure aes-*-gcm chacha20-poly1305 ciphers, the curve25519-sha256 Kex algorithm and Encrypt-Then-MAC message authentication modes are not available to those of us stuck in the early 2000s. That’s right, provably NSA-proof stuff not supported. Upgrading at this point makes sense.

Still, we can harden SSH, so go into /etc/ssh/moduli and delete all the moduli that have 5th column < 2048, and disable ECDSA host keys:

cd /etc/ssh
mkdir -p broken
mv moduli ssh_host_dsa_key* ssh_host_ecdsa_key* ssh_host_key* broken
awk '{ if ($5 > 2048){ print } }' broken/moduli > moduli
# create broken links to force SSH not to regenerate broken keys
ln -s ssh_host_ecdsa_key ssh_host_ecdsa_key
ln -s ssh_host_dsa_key ssh_host_dsa_key
ln -s ssh_host_key ssh_host_key

Your clients, which hopefully have more recent versions of SSH, could have the following settings in /etc/ssh/ssh_config or .ssh/config:

Host all-old-servers

    Ciphers aes256-gcm@openssh.com,aes128-gcm@openssh.com,chacha20-poly1305@openssh.com,aes256-ctr,aes192-ctr,aes128-ctr
    MACs hmac-sha2-512-etm@openssh.com,hmac-sha2-256-etm@openssh.com,hmac-ripemd160-etm@openssh.com,umac-128-etm@openssh.com,hmac-sha2-512,hmac-ripemd160
    KexAlgorithms curve25519-sha256@libssh.org,diffie-hellman-group-exchange-sha256

Note: Sadly, the -ctr ciphers do not provide forward security and hmac-ripemd160 isn’t the strongest MAC. But if you disable these, there are plenty of places you won’t be able to connect to. Upgrade your servers to get rid of these poor auth methods!

Handily, I have made a little script to do all this and more, which you can find in my Gone distribution.

There, done.

sshh obama

Updated Jan 6th to highlight the problems of not upgrading SSH.
Updated Jan 22nd to note CTR mode isn’t any worse.
Go learn about COMSEC if you didn’t get trolled by the title.

sound sound

December 8th, 2014


Recently I been doing some video editing.. less editing than tweaking my system tho.
If you want your jack output to speak with Kdenlive, a most excellent video editing suite,
and output audio in a nice way without choppyness and popping, which I promise you is not nice,
you’ll want to pipe it through pulseaudio because the alsa to jack stuff doesn’t do well with phonom, at least not on this convoluted setup.

Remember, to get that setup to work, ALSA pipes to jack with the pcm.jack { type jack .. thing, and remove the alsa to pulseaudio stupidity at /usr/share/alsa/alsa.conf.d/50-pulseaudio.conf

So, once that’s in place, it won’t play even though Pulse found your Jack because your clients are defaulting out on some ALSA device… this is when you change /etc/pulse/client.conf and set default-sink = jack_out.

danger at the docks

November 14th, 2014

Docker.io is taking the world by storm, but a day at the docks is not without its perils. Here I hope to inspire you to try out docker by showing you how to avoid its pitfalls.

In the days of yore

As the FreeBSD jailers and Solaris zoners will attest to, containerizing your services is a great boon, saving space and resources and providing easy management akin to chroots and potential security benefits, without the overheads of full-blown virtual machines.

FreeBSD Jail Linux has had containers for the longest time, in the ancient form of User Mode Linux, which actually ran a kernel in userland, and more recently OpenVZ, which was more like jails.

The former didn’t lend itself to production deployments and the latter never made it into the linux mainline, coming at a time when people were more interested in virtualization than containment. In recent years, a kernel facility named Cgroups has made LinuX Containers (LXC) possible, which as afforded the management, if not security, of bsd jails.

what can be gained

The biggest potential benefit from containers is that CPU, memory and disk resources are 100% shared at native speeds, so no libraries and no data need ever be duplicated on disk nor in memory.

In FreeBSD jails, this was achieved by providing most of the system read-only like /usr, /lib and /bin, and sharing it amongst jails. This worked quite well, but was surprisingly tricky to update.

You can do similar stuff with LXC, just as long as you understand that if it breaks, you get to keep all the pieces. This gives you full control, and means that I for one have LXC instances in production with uptimes of 1200 days and counting.


Taking the approach of single-container-single-responsibility further, you could instead of deploying whole system containers create image filesystems that contained only the bare necessities. For instance, your python application would have apart from its code,just the python runtime, libc and other dependant libraries, and naught much else.

Inspired by the “leaner is better” philosophy backed by the experience of running LXC in production, we built this minimal deployment framework complete with a tool to magically find all the required libraries.
leaner is better
Awesomely small images come from this approach, where the “contact surface” of the application has shrank to nothing but the app itself. It was far from perfect, serving to make the images awesomely less debuggable and managable, and never made it into production proper.

layer upon layer is two steps further

In comes Docker, and its concept of filesystem image layers based on AUFS. The approach isn’t novel itself, having been used by live-CD distributions for the longest time, but it’s the first that provides tools to manage the layers effortlessly for containers. So you can now have 100 servers with 100 application layers, and all your Ruby applications share one runtime layer and your Python applications share another, and they all run on the same base image of Ubuntu, and they do all that transparently, without you having to consciously think about which bit goes where.

layersDocker takes another step further, borrowing heavily from distributed social source control ala github, allowing you to clone, build, push, pull, commit, share and remix images as easy as that.

This is the type of thing that blows disk-image-based virtualization straight out of the water.

Perils and rough starts

The Docker docs are well written and will get you spawning containers and dockerizing applications in no time at all. What they will not tell you is how to run containers in production for all values of production.

In particular, the following things require special attention:

  • changing ips
  • service discovery
  • dns issues
  • cache clash

.. and that is precisely what we will talk about next time.
see you later

systemd crash course, with remote logging

September 20th, 2014

live a better life

The world is taking systemd by storm and there is no looking back now.

Still, there are some elements that you would expect to be there that are missing. One of them is remote logging!

Another thing missing is a decent crash course [*]. This is frustrating because making init scripts and checking logs is the staple diet of any old sysadmin.

Read on to readjust gently but quickly.
she wants it

Systemd crash course

Find “unit” – that’s the new name for “init script name” to us oldtimers:

systemctl list-units --type=service
# this one is way more verbose
systemctl list-units

Start, stop, restart, reload, status:

systemctl start sshd
systemctl stop sshd
systemctl restart sshd
systemctl reload sshd
# status, gives some log output too
systemctl status sshd

Check ALL the logs, follow the logs, get a log for a service:

journalctl -l
journalctl -f
journalctl -u sshd

Install a systemd service:
(This is what a systemd service description looks like)

    cat > ossec.service << EOF
Description=OSSEC Host-based Intrusion Detection System

# this can be "simple", in which case systemd will expect the service to run in foreground and you can drop the ExecStop directive.
# This needs to specify the full path to the executable
ExecStart=/var/ossec/bin/ossec-control start
ExecStop=/var/ossec/bin/ossec-control stop


# now copy that file into the magic place, /etc/init.d in the old days
install -Dm0644 ossec.service /etc/systemd/system/ossec.service

# now make systemd pick up the changes
systemctl daemon-reload

Enable or disable a service:

systemctl enable ossec
systemctl disable ossec

systemd components

Remote logging

OK so you now know your way around this beast.
Now you want remote logging.

According to the Arch wiki [#], systemd doesn’t actually do remote logging (yet. what else doesn’t it do?) but it will helpfully spew its logs onto the socket /run/systemd/journal/syslog if you knock twice, gently.

To convince systemd to write to this socket, go to /etc/systemd/journald.conf and set


then issue a journald restart

systemctl restart systemd-journald

You can install syslog-ng and it should pick up the logs. Test it now by making a log entry with

logger -t WARN zool

and check /var/log/syslog.log

If you have a distro running systemd, then hopefully syslog-ng will be recent enough to be aware enough of systemd that things should just work at this point.

If it don’t, syslog-ng.conf’s source src { system(); }; isn’t picking up the socket file. Fix this by adding the socket explicitly by changing the source in /etc/syslog-ng/syslog-ng.conf like so:

source src {

if you are working with a laptop or desktop then the console_all on tty12 is handy too:

log { source(src); destination(console_all); };

the systemd monster

[*] IMHO Fedora’s cheatsheet on systemd is a little too cluttered
[#] Arch has a decent intro to systemd

No sockpuppets were harmed in the making of this blog entry. Any and all images are © whomever made them, and I love you for not sueing me.

Grsecurity on the desktop

August 8th, 2014

In my last post I presented Grsecurity kernel packages for Debian Wheezy. Now would be a good time to review the hows and whys of Grsecurity, so you can decide if it is something you need.

Today we will quickly look at Grsecurity’s viability and impact on a typical desktop or laptop.

Why do I need Grsecurity on the desktop?

  • Often run unsecure code? Limit its impact on the system.
  • Employ chroots and containers? Enforce stricter containment.
  • Connect to hostile networks? Reduce and mitigate impact of exploitation attacks.
  • Allow others to use your system? Increase monitoring and control over your machine.

Or perhaps you choose to run Grsecurity on your laptop simply for the sheer paranoia-factor and to impress friends.

How does Grsecurity behave on the desktop?

In addition to the invisible yet significant hardening efforts againt kernel exploitation, there are some changes that an experienced user will notice immediately:

  • you need root to dmesg
  • grsec reports denied resource oversteps in dmesg
  • top only shows this user’s running processes
  • mappings and other sensitive information is only available to the process owner in /proc

The very few programs that depend on a specific kernel version, or that read /sys/kcore or write directly to kernel constructs will not work under Grsecurity.

And of course, there is the feeling of solidity and the sight of the reported kernel version:

~# uname -a
Linux amaeth 3.2.60-grsec-lied #1 SMP Wed Aug 6 17:40:27 CEST 2014 x86_64 GNU/Linux

kernel security for debian

August 6th, 2014

TL;DR – links to Grsecurity-enabled up-to-date debian wheezy-kernel packages at the bottom of this post.

Kernel security is becoming more important nowadays, especially since the Linux kernel has grown so big and the platform is rife with misbehaving programs.

Some enthusiasts have illusions that the Linux kernel is somehow magically very secure, but those in the know will tell you there are quicker ways to gain root than sudo.
you did what?
Grsecurity is by far the best patchset for increasing the security of a system, whether it be a virtual LAMP box, a login shell server or a desktop machine. I have been tracking it and using it for years and consider it superior to the SELinux and AppArmor approaches.

It has only one major drawback: most distributions of Linux do not provide tested and up-to-date grsec-patched kernel packages, making Grsec-level security features nearly unobtainium for the common mortal.

I have been rolling my own kernel patches since the millenium and so I put in the work to put Grsecurity back into Debian.

So far I have built and tested kernels for Debian 7.5 and 7.6 Stable codenamed Wheezy. This is the standard, debian-patched kernel with added Grsecurity.

I have built separate packages which are suitable for servers, virtualized servers and desktops, and these have been tested on run-of-the-mill LAMP boxen as well as custom and well-trafficed shell servers, and of course my trusty desktops and laptops.

Download and Install

You can download and install the grsec desktop. server and virtual server debian packages today!

Note, to avoid running out of space in /boot, change MODULES=most to MODULES=dep in /etc/initramfs-tools/initramfs.conf

Install the lied apt repository

sudo -i
wget http://www.delta9.pl/lied/SIGNING-KEY.GPG -O- | apt-key add -
wget http://www.delta9.pl/lied/lied.list -O /etc/apt/sources.list.d/lied.list
apt-get update

Grsecurity Desktop/Laptop HOWTO

apt-get install linux-image-3.2.60-grsec-lied

Grsecurity Server HOWTO

wget http://www.delta9.pl/lied/linux-image-3.2.54-grsec-nose-server_3.2.54-grsec-nose-server-1_amd64.deb
dpkg -i linux-image-3.2.54-grsec-nose-server_3.2.54-grsec-nose-server-1_amd64.deb

Grsecurity Virtual Server HOWTO

wget http://www.delta9.pl/lied/linux-image-3.2.54-grsec-nose-virtual_3.2.54-grsec-nose-virtual-1_amd64.deb
dpkg -i linux-image-3.2.54-grsec-nose-virtual_3.2.54-grsec-nose-virtual-1_amd64.deb

Furthermore I commit to also merging the patchsets and making available Grsecurity packages for Debian 8/Jessie and providing it all in a debian repo. I will then make this available in a repo so that people can easily add it to their setup.
I also commit to keeping these packages up to date on all the platforms I care about.

Quick Howto build your own
To build your own Grsec-enabled debian kernel packages, execute the following commands:

wget http://www.delta9.pl/lied/grsec-201408032014-debian-3.2.60-1.patch.gpg
gpg grsec-201408032014-debian-3.2.60-1.patch.gpg
apt-get source linux
cd linux-3.2.60
patch -p1 < ../grsec-201408032014-debian-3.2.60-1.patch
wget http://www.delta9.pl/lied/grsec-config-server -O .config
make deb-pkg LOCALVERSION=-myversion

You can replace “grsec-config-server” with “grsec-config-desktop” or “-virtual” if you need one of the other configurations.