Archive for the ‘breakage’ Category

netctl is lame

Saturday, October 10th, 2015

One morning not long ago I installed a different graphics card and my machine no longer connected to the network upon startup.
The interface was not coming up and this being Arch Linux, it was netctl wouldn’t start, but it was being catty about why.

Having written about systemd before, I thought this might be a piece of cake to figure out, so I rolled up my sleeves.

I know that after a bunch of checks it just calls /usr/lib/network start eth, so I called it and found that my onboard network card had been renamed enp3s0 instead of enp2s0, so I changed the reference in /etc/system/eth to Interface=enp3s0.

Now /usr/lib/network start eth would get me online, but the machine would still not bring the interface up on boot! Netctl still wasn’t satisfied:

[root@wir ~]# systemctl status netctl
● netctl.service - (Re)store the netctl profile state
   Loaded: loaded (/usr/lib/systemd/system/netctl.service; enabled; vendor preset: disabled)
      Active: active (exited) since Sat 2015-10-10 12:53:55 CEST; 6min ago
        Process: 1027 ExecStart=/usr/bin/netctl restore (code=exited, status=1/FAILURE)
         Main PID: 1027 (code=exited, status=1/FAILURE)
            CGroup: /system.slice/netctl.service

            Oct 10 12:53:55 wir systemd[1]: Starting (Re)store the netctl profile state...
            Oct 10 12:53:55 wir systemd[1]: Started (Re)store the netctl profile state.

Lets try netctl directly:

# netctl start eth
# "failed, refer to journald -xn"

Wah.


Oct 10 12:56:37 wir systemd[1]: sys-subsystem-net-devices-enp2s0.device: Job sys-subsystem-net-devices-enp2s0.device/start timed out.
Oct 10 12:56:37 wir systemd[1]: Timed out waiting for device sys-subsystem-net-devices-enp2s0.device.
-- Subject: Unit sys-subsystem-net-devices-enp2s0.device has failed
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
-- 
-- Unit sys-subsystem-net-devices-enp2s0.device has failed.
-- 
-- The result is timeout.
Oct 10 12:56:37 wir systemd[1]: Dependency failed for A basic dhcp ethernet connection.
-- Subject: Unit netctl@eth.service has failed
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
-- 
-- Unit netctl@eth.service has failed.
-- 
-- The result is dependency.
Oct 10 12:56:37 wir systemd[1]: netctl@eth.service: Job netctl@eth.service/start failed with result 'dependency'.
Oct 10 12:56:37 wir polkitd[762]: Unregistered Authentication Agent for unix-process:2441:16689 (system bus name :1.13, object path /org/freedesktop/PolicyKit1/AuthenticationAgent, locale en_US.UTF-8) (discon
Oct 10 12:56:37 wir systemd[1]: sys-subsystem-net-devices-enp2s0.device: Job sys-subsystem-net-devices-enp2s0.device/start failed with result 'timeout'.

Now I had no idea where to find sys-subsystem-net-devices-enp2s0.device, and started poking around in /lib/systemd/system,
/usr/lib/systemd/system and /etc/systemd/system. Finally I found /etc/systemd/system/netctl@eth.service which seems to be the failing service, and sure enough, it refers to the device, so I changed it:

[Unit]
Description=A basic dhcp ethernet connection
BindsTo=sys-subsystem-net-devices-enp3s0.device
After=sys-subsystem-net-devices-enp3s0.device

After that netctl would boot again.

stair warning

I just wanted to say that this problem is lame. I was under the impression that the (already lame) predictable network card renaming from eth0 to enpXsY was supposed to be stable to avoid this problem.
Furthermore, systemd and netctl made a very long line of breadcrums to finally solve this, instead of being obvious.
Note that the netctl@eth.service file was created by netctl and not me, so I had no prior knowledge of its existence.

And yes, clearly I should have just called `netctl reenable eth`
but that begs the question how should I have known about this without consulting the arch wiki?

Another thing I could have done to save me troubles was to disasble predictable network card renaming entirely, by booting the kernel with `net.ifnames=0`.

But then I would not be learning.

My Varnish pet peeves

Sunday, August 23rd, 2015

I’ve been meaning to write a blog entry about Varnish for years now. The closest I’ve come is to write a blog about how to make Varnish cache your debian repos, make you a WikiLeaks cache and I’ve released Varnish Secure Firewall, but that without a word on this blog. So? SO? Well, after years it turns out there is a thing or two to say about Varnish. Read on to find out what annoys me and people I meet the most.

varnish on wood

Although you could definitely call me a “Varnish expert” and even a sometimes contributor, and I do develop programs, I cannot call myself a Varnish developer because I’ve shamefully never participated in a Monday evening bug wash. My role in the Varnish world is more… operative. I am often tasked with helping ops people use Varnish correctly, justify its use and cost to their bosses, defend it from expensive and inferior competitors, sit up long nites with load tests just before launch days. I’m the guy that explains the low risk and high reward of putting Varnish in front of your critical site, and the guy that makes it actually be low risk, with long nites on load tests and I’ll be the first guy on the scene when the code has just taken a huge dump on the CEO’s new pet Jaguar. I am also sometimes the guy who tells these stories to the Varnish developers, although of course they also have other sources. The consequences of this .. lifestyle choice .. is that what code I do write is either short and to the point or .. incomplete.

bug wash

I know we all love Varnish, which is why after nearly 7 years of working with this software I’d like to share with you my pet peeves about the project. There aren’t many problems with this lovely and lean piece of software but those which are there are sharp edges that pretty much everyone snubs a toe or snags their head on. Some of them are specific to a certain version, while others are “features” present in nearly all versions.

And for you Varnish devs who will surely read this, I love you all. I write this critique of the software you contribute to, knowing full well that I haven’t filed bug reports on any of these issues and therefore I too am guilty in contributing to the problem and not the solution. I aim to change that starting now :-) Also, I know that some of these issues are better lived with than fixed, the medicine being more hazardous than the disease, so take this as all good cooking; with a grain of salt.

Silent error messages in init scripts

Some genious keeps inserting 1>/dev/null 2>&1 into the startup scripts on most Linux distros. This might be in line with some wacko distro policy but makes conf errors and in particular VCL errors way harder to debug for the common man. Even worse, the `service varnish reload` script called `varnish-vcl-reload -q`, that’s q for please-silence-my-fatal-conf-mistakes, and the best way to fix this is to *edit the init script and remove the offender*. Mind your p’s and q’s eh, it makes me sad every time, but where do I file this particular bug report?

silent but deadly

debug.health still not adequately documented

People go YEARS using Varnish without discovering watch varnishadm debug.health. Not to mention that it’s anyone’s guess this has to do with probes, and that there are no other debug.* parameters, except for the totally unrelated debug parameter. Perhaps this was decided to be dev-internal at some point, but the probe status is actually really useful in precisely this form. debug.health is still absent from the param.show list and the man pages, while in 4.0 some probe status and backend info has been put into varnishstat, which I am sure to be not the only one being verry thankful for indeed.

Bad naming

Designing a language is tricky.

conufsed?

Explaining why purge is now ban and what is now purge is something else is mindboggling. This issue will be fixed in 10 years when people are no longer running varnish 2.1 anywhere. Explaining all the three-letter acronyms that start with V is just a gas.
Showing someone ban("req.url = "+ req.url) for the first time is bound to make them go “oh” like a racoon just caught sneaking through your garbage.
Grace and Saint mode… that’s biblical, man. Understanding what it does and how to demonstrate the functionality is still for Advanced Users, explaining this to noobs is downright futile, and I am still unsure whether we wouldn’t all be better off for just enabling it by default and forgetting about it.
I suppose if you’re going to be awesome at architecting and writing software, it’s going to get in the way of coming up with really awesome names for things, and I’m actually happy that’s still the way they prioritize what gets done first.

Only for people who grok regex

Sometimes you’ll meet Varnish users who do code but just don’t grok regex. It’s weak, I know, but this language isn’t for them.

Uncertain current working directory

This is a problem on some rigs which have VCL code in stacked layers, or really anywhere where it’s more appropriate to call the VCL a Varnish program, as in “a program written for the Varnish runtime”, rather than simply a configuration for Varnish.

UncertantyYou’ll typically want to organize your VCL in such a way that each VCL is standalone with if-wrappend rules and they’re all included from one main vcl file, stacking all the vcl_recv’s and vcl_fetches .

Because distros don’t agree on where to put varnishd’s current working directory, which happens to be where it’s been launched from, instead of always chdir $(basename $CURRENT_VCL_FILE), you can’t reliably specify include statements with relative paths. This forces us to use hardcoded absolute paths in includes, which is neither pretty nor portable.

Missing default director in 4.0

When translating VCL to 4.0 there is no longer any language for director definitions, which means they are done in vcl_init(), which means your default backend is no longer the director you specified at the top, which means you’ll have to rewrite some logic lest it bite you in the ass.

director.backend() is without string representation, instead of backend_hint,
so cannot do old style name comparisons, ie backends are first-class objects but directors are another class of objects.

the missing director

VCL doesn’t allow unused backends or probes

Adding and removing backends is a routine ordeal in Varnish.
Quite often you’ll find it useful to keep backup backends around that aren’t enabled, either as manual failover backups, because you’re testing something or just because you’re doing something funky. Unfortunately, the VCC is a strict and harsh mistress on this matter: you are forced to comment out or delete unused backends :-(

Workarounds include using the backends inside some dead code or constructs like

vcl_recv{
	set req.backend_hint = unused;
	set req.backend_hint = default;
	...
}

It’s impossible to determine how many bugs this error message has avoided by letting you know that backend you just added, er yes that one isn’t in use sir, but you can definitely count the number of Varnish users inconvenienced by having to “comment out that backend they just temporarily removed from the request flow”.

I am sure it is wise to warn about this, but couldn’t it have been just that, a warning? Well, I guess maybe not, considering distro packaging is silencing error messages in init and reload scripts..

To be fair, this is now configurable in Varnish by setting vcc_err_unref to false, but couldn’t this be the default?

saintmode_threshold default considered harmful

saintmode

If many different URLs keep returning bad data or error codes, you might concievably want the whole backend to be declared sick instead of growing some huge list of sick urls for this backend. What if I told you your developers just deployed an application which generates 50x error codes triggering your saintmode for an infinite amount of URLs? Well, then you have just DoSed yourself because you hit this threshold. I usually enable saintmode only after giving my clients a big fat warning about this one, because quite frankly this easily comes straight out of left field every time. Either saintmode is off, or the treshold is Really Large™ or even ∞, and in only some special cases do you actually want this set to an actual number.

Then again, maybe it is just my clients and the wacky applications they put behind Varnish.

What is graceful about the saint in V4?

While we are on the subject, grace mode being the most often misunderstood feature of Varnish, the thing has changed so radically in Varnish 4 that it is no longer recognizable by users, and they often make completely reasonable but devestating mistakes trying to predict its behavior.

To be clear on what has happened: saint mode is deprecated as a core feature in V4.0, while the new architecture now allows a type of “stale-while-revalidate” logic. A saintmode vmod is slated for Varnish 4.1.

But as of 4.0, say you have a bunch of requests hitting a slow backend. They’ll all queue up while we fetch a new one, right? Well yes, and then they all error out when that request times out, or if the backend fetch errors out. That sucks. So lets turn on grace mode, and get “stale-while-revalidate” and even “stale-if-error” logic, right? And send If-Modified-Since headers too, sweet as.

Now that’s gonna work when the request times out, but you might be surprised that it does not when the request errors out with 50x errors. Since beresp.saint_mode isn’t a thing anymore in V4, those error codes are actually going to knock the old object outta cache and each request is going to break your precious stale-while-error until the backend probe declares the backend sick and your requests become grace candidates.

Ouch, you didn’t mean for it to do that, did you?

The Saint

And if, gods forbid, your apphost returns 404′s when some backend app is not resolving, bam you are in a cascading hell fan fantasy.

What did you want it to do, behave sanely? A backend response always replaces another backend response for the same URL – not counting vary-headers. To get a poor mans saint mode back in Varnish 4.0, you’ll have to return (abandon) those erroneous backend responses.

Evil grace on unloved objects

For frequently accessed URLs grace is fantastic, and will save you loads of grief, and those objects could have large grace times. However, rarely accessed URLs suffer a big penalty under grace, especially when they are dynamic and ment to be updated from backend. If that URL is meant to be refreshed from backend every hour, and Varnish sees many hours between each access, it’s going to serve up that many-hour-old stale object while it revalidates its cache.

stale while revalidate
This diagram might help you understand what happens in the “200 OK” and “50x error” cases of graceful request flow through Varnish 4.0.

Language breaks on major versions

This is a funny one because the first major language break I remember was the one that I caused myself. We were making security.vcl and I was translating rules from mod_security and having trouble with it because Varnish used POSIX regexes at the time, and I was writing this really godaweful script to translate PCRE into POSIX when Kristian who conceived of security.vcl went to Tollef, who were both working in the same department at the time, and asked in his classical broker-no-argument kind of way "why don’t we just support Perl regexes?".
Needless to say, (?i) spent a full 12 months afterwards cursing myself while rewriting tons of nasty client VCL code from POSIX to PCRE and fixing occasional site-devestating bugs related to case-sensitivity.

Of course, Varnish is all the better for the change, and would get no where fast if the devs were to hang on to legacy, but there is a lesson in here somewhere.

furby

So what's a couple of sed 's/req.method/req.request/'s every now and again?
This is actually the main reason I created the VCL.BNF. For one, it got the devs thinking about the grammar itself as an actual thing (which may or may not have resulted in the cleanups that make VCL a very regular and clean language today), but my intent was to write a parser that could parse any version of VCL and spit out any other version of VCL, optionally pruning and pretty-printing of course. That is still really high on my todo list. Funny how my clients will book all my time to convert their code for days but will not spend a dime on me writing code that would basically make the conversion free and painless for everyone forever.

Indeed, most of these issues are really hard to predict consequences of implementation decisions, and I am unsure whether it would be possible to predict these consequences without actually getting snagged by the issues in the first place. So again: varnish devs, I love you, what are your pet peeves? Varnish users, what are your pet peeves?

Errata: vcc_err_unref has existed since Varnish 3.

Curiosity killed the printer

Saturday, March 28th, 2015

A friend of mine write some thoughts about printers that are highly pertinent. I’ll let his words speak for themselves.

Curiosity killed the printer

Understanding today, I mean fully understanding, demands a expert in
a huge amount of fields. Something as simple as a refrigerator
requires, for most people, a engineering degree. Looking at something
slightly more complicated in example a microwave you would probably
want a physics major in your back pocket as well. As Isaac Newton
rightfully quoted we are indeed standing on the should of giants. This
puts us in the joyful position of not needing to interpret some deep
meta-physical meaning of why, but rather through empirical studies we
can learn to understand actions and reactions. We can suddenly augment
the works of others, improvise our new lended wisdom to create works
of arts previously unimaginable. This new found way of science led us
to the creation of printers.

In any enterprise there most surely will be a printer available, these
days most likely a multi-functional beast connected to your Wifi and
implementing a fax-over-ip-through-skype-attachment protocol.
The question I pose is simple: Why won’t it work? All printers are
at heart a simple construction. Churning through paper and creating
tiny dots either by burning, or leaving it’s colourfull ink all over
the place.

These wonderful devices that helps us rapidly deforests the planet,
and in return give us the latest news about Miley Cirus and Norwegian
youths terrible score in mathematics, are poorly misunderstood; the
unwanted child of any IT-department. Often not accounted for, left
alone in their own network with no humans ever giving them any love.
It’s gotten so bad that the printers them self jam paper every now and
then just to feel the tender touch of a human a few times a year. They
are truly misunderstood. As the little black box they are, a printer
does nothing but slave away; but for whom?

The need for access, the need to be able to print, from anywhere to
any printer might come as a shock. But for some reason you will find
hundreds of thousands printers online. I don’t know why their there.
Perhaps the printers might have escaped the prison of corporate networks
to go live in the wild. But their diligence come at a price, when the
evil scientists of the world learns the way of the printers, who knows
what will happend? The prophecy once claimed that one day the printers
would rise, and indeed they did; 420 000 of them. How did we let a
technology of something so simple become a weapon of this evil
scientist? These printers that where just looking for a quiet secluded
life in the country, with the possibility of being useful to people to
whom it is easy to do good. From what it looks the printers might have
been fed to much Tolstoy.

It’s easy for all of us to forget the true nature of things. Printers
in themselves are not dangerous, they just want to live a life in
peace. On the other hand people, especially those who do things don’t
understand – the same people who demanded printers in the first place
- must be thought. It’s easy to brush something of as unneeded, to cut
some corners to make some extra short term income. We are constantly
optimizing everything today, it’s so sad to see people, companies,
and technologies thrown out for the “newer, better version of you”.
Anywhere you turn your head you can learn, not just the actions and
their reactions, but something about how physics, software, society,
or any other subject work at it’s core. You can probably just learn
alot about those plain old people that you see on your daily commute
if you just asked them. They will have much more interesting stories
than your daily news have to tell. We must always strive to ask
question, be bold in our critique and speak our minds. It’s in our
nature to experience, to break, assemble and learn. Most of the things
we surround us with everyday does not give us a deeper understanding
of anything. In fact the over generalization we humans apply to all
things optimizes the amount of knowledge needed to a minimum, letting
us do the important things in life like sharing picture of our newest
composition of milk and cereal.

As found on
http://lafa.hackeriet.no/2013-12-09-curiosity-killed-the-printer.md

NSA-proof SSH

Tuesday, January 6th, 2015

ssh-pictureOne of the biggest takeaways from 31C3 and the most recent Snowden-leaked NSA documents is that a lot of SSH stuff is .. broken.

I’m not surprised, but then again I never am when it comes to this paranoia stuff. However, I do run a ton of SSH in production and know a lot of people that do. Are we all fucked? Well, almost, but not really.

Unfortunately most of what Stribika writes about the “Secure Secure Shell” doesn’t work for old production versions of SSH. The cliff notes for us real-world people, who will realistically be running SSH 5.9p1 for years is hidden in the bettercrypto.org repo.

Edit your /etc/ssh/sshd_config:


Ciphers aes256-ctr,aes192-ctr,aes128-ctr
MACs hmac-sha2-512,hmac-sha2-256,hmac-ripemd160
KexAlgorithms diffie-hellman-group-exchange-sha256

sshh
Basically the nice and forward secure aes-*-gcm chacha20-poly1305 ciphers, the curve25519-sha256 Kex algorithm and Encrypt-Then-MAC message authentication modes are not available to those of us stuck in the early 2000s. That’s right, provably NSA-proof stuff not supported. Upgrading at this point makes sense.

Still, we can harden SSH, so go into /etc/ssh/moduli and delete all the moduli that have 5th column < 2048, and disable ECDSA host keys:

cd /etc/ssh
mkdir -p broken
mv moduli ssh_host_dsa_key* ssh_host_ecdsa_key* ssh_host_key* broken
awk '{ if ($5 > 2048){ print } }' broken/moduli > moduli
# create broken links to force SSH not to regenerate broken keys
ln -s ssh_host_ecdsa_key ssh_host_ecdsa_key
ln -s ssh_host_dsa_key ssh_host_dsa_key
ln -s ssh_host_key ssh_host_key

Your clients, which hopefully have more recent versions of SSH, could have the following settings in /etc/ssh/ssh_config or .ssh/config:

Host all-old-servers

    Ciphers aes256-gcm@openssh.com,aes128-gcm@openssh.com,chacha20-poly1305@openssh.com,aes256-ctr,aes192-ctr,aes128-ctr
    MACs hmac-sha2-512-etm@openssh.com,hmac-sha2-256-etm@openssh.com,hmac-ripemd160-etm@openssh.com,umac-128-etm@openssh.com,hmac-sha2-512,hmac-ripemd160
    KexAlgorithms curve25519-sha256@libssh.org,diffie-hellman-group-exchange-sha256

Note: Sadly, the -ctr ciphers do not provide forward security and hmac-ripemd160 isn’t the strongest MAC. But if you disable these, there are plenty of places you won’t be able to connect to. Upgrade your servers to get rid of these poor auth methods!

Handily, I have made a little script to do all this and more, which you can find in my Gone distribution.

There, done.

sshh obama

Updated Jan 6th to highlight the problems of not upgrading SSH.
Updated Jan 22nd to note CTR mode isn’t any worse.
Go learn about COMSEC if you didn’t get trolled by the title.

danger at the docks

Friday, November 14th, 2014

docker.io
Docker.io is taking the world by storm, but a day at the docks is not without its perils. Here I hope to inspire you to try out docker by showing you how to avoid its pitfalls.

In the days of yore

As the FreeBSD jailers and Solaris zoners will attest to, containerizing your services is a great boon, saving space and resources and providing easy management akin to chroots and potential security benefits, without the overheads of full-blown virtual machines.

FreeBSD Jail Linux has had containers for the longest time, in the ancient form of User Mode Linux, which actually ran a kernel in userland, and more recently OpenVZ, which was more like jails.

The former didn’t lend itself to production deployments and the latter never made it into the linux mainline, coming at a time when people were more interested in virtualization than containment. In recent years, a kernel facility named Cgroups has made LinuX Containers (LXC) possible, which as afforded the management, if not security, of bsd jails.

what can be gained

The biggest potential benefit from containers is that CPU, memory and disk resources are 100% shared at native speeds, so no libraries and no data need ever be duplicated on disk nor in memory.

In FreeBSD jails, this was achieved by providing most of the system read-only like /usr, /lib and /bin, and sharing it amongst jails. This worked quite well, but was surprisingly tricky to update.

LXC
You can do similar stuff with LXC, just as long as you understand that if it breaks, you get to keep all the pieces. This gives you full control, and means that I for one have LXC instances in production with uptimes of 1200 days and counting.

minimalizing

Taking the approach of single-container-single-responsibility further, you could instead of deploying whole system containers create image filesystems that contained only the bare necessities. For instance, your python application would have apart from its code,just the python runtime, libc and other dependant libraries, and naught much else.

Inspired by the “leaner is better” philosophy backed by the experience of running LXC in production, we built this minimal deployment framework complete with a tool to magically find all the required libraries.
leaner is better
Awesomely small images come from this approach, where the “contact surface” of the application has shrank to nothing but the app itself. It was far from perfect, serving to make the images awesomely less debuggable and managable, and never made it into production proper.

layer upon layer is two steps further

In comes Docker, and its concept of filesystem image layers based on AUFS. The approach isn’t novel itself, having been used by live-CD distributions for the longest time, but it’s the first that provides tools to manage the layers effortlessly for containers. So you can now have 100 servers with 100 application layers, and all your Ruby applications share one runtime layer and your Python applications share another, and they all run on the same base image of Ubuntu, and they do all that transparently, without you having to consciously think about which bit goes where.

layersDocker takes another step further, borrowing heavily from distributed social source control ala github, allowing you to clone, build, push, pull, commit, share and remix images as easy as that.

This is the type of thing that blows disk-image-based virtualization straight out of the water.

Perils and rough starts

The Docker docs are well written and will get you spawning containers and dockerizing applications in no time at all. What they will not tell you is how to run containers in production for all values of production.

In particular, the following things require special attention:

  • changing ips
  • service discovery
  • dns issues
  • cache clash

.. and that is precisely what we will talk about next time.
see you later

systemd crash course, with remote logging

Saturday, September 20th, 2014

live a better life

The world is taking systemd by storm and there is no looking back now.

Still, there are some elements that you would expect to be there that are missing. One of them is remote logging!

Another thing missing is a decent crash course [*]. This is frustrating because making init scripts and checking logs is the staple diet of any old sysadmin.

Read on to readjust gently but quickly.
she wants it

Systemd crash course

Find “unit” – that’s the new name for “init script name” to us oldtimers:

systemctl list-units --type=service
# this one is way more verbose
systemctl list-units

Start, stop, restart, reload, status:

systemctl start sshd
systemctl stop sshd
systemctl restart sshd
systemctl reload sshd
# status, gives some log output too
systemctl status sshd

Check ALL the logs, follow the logs, get a log for a service:

journalctl -l
journalctl -f
journalctl -u sshd

Install a systemd service:
(This is what a systemd service description looks like)

    cat > ossec.service << EOF
[Unit]
Description=OSSEC Host-based Intrusion Detection System

[Service]
Type=forking
ExecStart=/var/ossec/bin/ossec-control start
ExecStop=/var/ossec/bin/ossec-control stop

[Install]
WantedBy=basic.target
EOF

# now copy that file into the magic place, /etc/init.d in the old days
install -Dm0644 ossec.service /usr/lib/systemd/system/ossec.service

# now make systemd pick up the changes
systemctl daemon-reload

Enable or disable a service:

systemctl enable ossec
systemctl disable ossec

systemd components

Remote logging

OK so you now know your way around this beast.
Now you want remote logging.

According to the Arch wiki [#], systemd doesn’t actually do remote logging (yet. what else doesn’t it do?) but it will helpfully spew its logs onto the socket /run/systemd/journal/syslog if you knock twice, gently.

To convince systemd to write to this socket, go to /etc/systemd/journald.conf and set

ForwardToSyslog=yes

then issue a journald restart

systemctl restart systemd-journald

You can install syslog-ng and it should pick up the logs. Test it now by making a log entry with

logger -t WARN zool

and check /var/log/syslog.log

If you have a distro running systemd, then hopefully syslog-ng will be recent enough to be aware enough of systemd that things should just work at this point.

If it don’t, syslog-ng.conf’s source src { system(); }; isn’t picking up the socket file. Fix this by adding the socket explicitly by changing the source in /etc/syslog-ng/syslog-ng.conf like so:

source src {
  unix-dgram("/run/systemd/journal/syslog");
  internal();
};

if you are working with a laptop or desktop then the console_all on tty12 is handy too:

log { source(src); destination(console_all); };

the systemd monster

[*] IMHO Fedora’s cheatsheet on systemd is a little too cluttered
[#] Arch has a decent intro to systemd

No sockpuppets were harmed in the making of this blog entry. Any and all images are © whomever made them, and I love you for not sueing me.

3g wwan pain revisited with gobi

Monday, June 30th, 2014

Hi all,
after a long hiatus I’ve finally found a something annoying enough to share with you: namely, my 3g modem.
I have spoken at lengths about 3g on linux before.

I have a Thinkapd X201 laptop and it has a Qualcomm Gobi 2000 3g modem. This modem does some fancy mode switching, but not in the regular way by getting some control bytes. Therefore, usb-modeswitch can’t help you.

Instead, the modem needs firmware loaded to switch from usb id 05c6:9204 to usb id 05c6:9205.
On linux, the firmare loading is achieved with gobi-loader.

All this is nicely documented at thinkwiki, unfortunately it doesn’t make it one bit easier for the regular joe.

The trouble is, the firmware is not redistributable, so the whole thing is quite tricky!

  1. download 7xwc48ww.exe from the Thinkpad support site,
  2. unpack the drivers with wine or cabextract. I used wine:
    cp 7xwc48ww.exe ~/.wine/drive_c
    wine 7xwc48ww.exe

    Make sure you run the driver installation after extraction, otherwise execute setup again: wine ~/.wine/drive_c/DRIVERS/WWANQL/setup.exe

  3. copy the firmware:
    cd ~/.wine/drive_c/Program Files/QUALCOMM/Images/Lenovo
    sudo mkdir /lib/firmware/gobi
    sudo cp 6/UQCN.mbn UMTS/* /lib/firmware/gobi/
    

    This was the tricky part, unpacking and selecting the correct firmware out of the 12 different sets of files provided in that directory.

  4. reload the driver: modprobe -r qcserial; modprobe qcserial
  5. dmesg should now show you have three USB serial devices /dev/ttyUSB0 (control interface), /dev/ttyUSB1 (the actual modem), and /dev/ttyUSB2 (the GPS, which you need windows to enable once).
    usb 2-1.4: Product: Qualcomm Gobi 2000
    usb 2-1.4: Manufacturer: Qualcomm Incorporated
    qcserial 2-1.4:1.1: Qualcomm USB modem converter detected
    usb 2-1.4: Qualcomm USB modem converter now attached to ttyUSB0
    qcserial 2-1.4:1.2: Qualcomm USB modem converter detected
    usb 2-1.4: Qualcomm USB modem converter now attached to ttyUSB1
    qcserial 2-1.4:1.3: Qualcomm USB modem converter detected
    usb 2-1.4: Qualcomm USB modem converter now attached to ttyUSB2
    
  6. If you have gotten this far, your 3g modem is basically working and you can set up wvdial as in my previoius post pointing at the /dev/ttyUSB1 modem.

    Note however you still need to enable the modem with echo enable > /proc/acpi/ibm/wan

software defined radios with alsa, jack and pulseaudio and a professional sound card

Monday, September 9th, 2013

Preramble

waves

All the people out there are neatly divided in two piles:
the “it works for me and does what I need”-camp, and
the “always always always gets in the way so killitwithfire”-camp,
and this fragmentation may be the best argument that pulseaudio should be up for a whammy.

For all of you tl;dr’s (too lazy, doyou read?) here’s a short summary:

  • alsa: just works. confusion in the .asoundrc
  • pulseaudio: controls per process, less buffer fuckups, “just works”
  • jack: controls per process, realtime, firewire/usb, pro audio apps
  • firewire: fantastic, massive pain but getting there
  • software defined radios: so worth it!

But read on to learn the recipie to the secret magic sauce.

The reason I am writing this is not because pulseaudio is evil and sucks. However, it was the last straw in a long and windy road that broke the camels back. Pulseaudio assumes you are running systemd, and talks to console-kit-daemon which is surely one of Satan’s most trusted advisers and a harbinger of the Apocalypse.

Pulseaudio

We know all this, and yet why do I bother?
I didn’t come here to rant about Pulseaudio though:
I’ve gathered you here today to tell a story about Software Defined Radios.

Introducing a large and characteristic cast of characters, and howto make them work together in the best possible way.

My way.

The Cook

Well: a friend of mine got a hold of a few Terratec DVB-dongles with the awesome rtl-chipset and Elonics tuner, which means I can play with radio!

terratec dongle

Except the first time I tried I got stuck in gnuradio dependency hell and never got anything working… which was very nearly a year ago.

Things weren’t easy back then, gqrx, the pretty waterfall app wasn’t mature enough and you were stuck using something far more fugly (.net code running under mono, shudder the thought).

You still have to build gnuradio from source (because the packaged versions aren’t new and shiny enough), but the piper’s playing to a different tune now, with the advent of build-gnuradio it’s possible to sit back and relax while gnuradio and all its dependencies builds before your very eyes.

Yes indeed this takes longer than getting the cows back from pasture but it’s worth it, because with a full gnuradio build you can now have a hope of getting gqrx the shiny waterfall to compile!

gnuradio companion

The Thief

Except you didn’t realize that without the -m option ti build-gnuradio it builds gnuradio 4.6 which is not 4.7 which gqrx needs! Joke’s on you haha ha ha.

Then you build gqrx and realize you can’t get it to talk to your Terratec, because why? Because it’s a DVB dongle and the kernel has helpfully inserted the DVB module to enable it! So run along now and add

# rtlsdr
blacklist dvb_usb_rtl28xxu

to your /etc/modprobe.d/blacklist.conf – now you are ready to fire up gqrx and gnuradio-companion.

His Wife

That’s when you might discover that if you are lucky there is no sound in gqrx. It’s working and showing you a waterfall but you can’t hear the demodulated waves!

GQRX gnuradio waterfall

Why oh why, well let me tell you why: It absolutely needs, requires Pulseaudio to produce sound!

OK, fair enough, some of you out there are in the “works-for-me”-camp and ipso facto you’re done here, gqrx works and IT ALL JUST WORKS but the world is not so easy for the rest of us.

The rest of us bite the bullet and install pulseaudio to get this thing working. Which is as far as you need to go if you’re semi-sane and normal or even when you are running this thing on a Raspberry PI or you’re building a beagleboard spectrum analyzer.

Actually you don’t even need Pulseaudio for that last project..

Her Lover

echo2

What I have neglected to tell you however is that I have an Echo Audiofire. I was impressed with these little firewire-driven sound cards back when my bro had the small and portable Audiofire2.

Sound quality and flexibility wise they are unbeatable, certainly if you want professional quality sound input and output.

Firewire sound also has the major advantage over USB sound in that firewire packets aren’t quantized in time, which means a lot if you’re doing midi or other realtime music stuff. Latency is a killer.

You might also be aware that the higher the sample rate of your sound card, the higher the bandwidth of your homebrew SDR radio..

Anyways, firewire soundcards “just work” with asio drivers in Windoze but are a small pain of their own to set up in Linux. ALSA never heard of them. Pulseaudio doesn’t speak firewire. For anything resembling realtime professional audio under Linux you’ll have to go FFADO and JACK.

JACK audio kit

Also, never think that just any firewire card will work in Linux: a lot of vendors continue to ignore the platform (understandibly, because of the atrocious state of professional audio under linux) and there are some wonderous cards out there that have just pitiful support here.

The jackd brothers

You’re walking down a long path, you’re going to Mecca. You come upon a fork in the road where two brothers live. They are twins, and you know that one of them always speaks the truth, and the other always lies. You need to ask them the way to Mecca, but how?

As there are two problems with anything in this world, there are two problems with Jack. Firstly, jack forked into jack1 and jack2, and both versions are strangely alive to this day, and there is netjack1 and netjack2 and well, what the fuck.

FFADO

To complicate matters there are two competing linux driver subsystems for firewire and both of them live to this day, with one supporting some firewire devices and one supporting other firewire devices, and one being supported in jack1 and the other in jack2. Furthermore you need a recent FFADO to get it all working.

Thankfully in recent debians and ubuntus the right kind of jackd talks to the right kind of firewire device in the kernel and matches the right ffado to get things to work, but you still need to know your way around.

LMMS

The Answer, not The Question

Know what question to ask to get the right answer, which is that at least for the Echo Audiofire, jackd2 works nicely with ffado and recent-ish kernels as long as
you run jackd as your X user
with jackd -v -dfirewire and then fire off qjackctl and ffado-mixer and then all your sweet sweet jack apps. For now, lets assume you are jackd2′ing things, but let us just say that at this point it no longer matters.

What you don’t know is that to get the Echo to work, you will likely have to upgrade your Echo firmware, either by hooking it up to a windoze with the right drivers and letting them reflash the rom, or play with the scary commands ffado-diag and ffado-fireworks-downloader, insert magic (literally!), etc.

Having done all this voodoo I still had problems that required rebooting the sound card or booting into windoze to reset it to a known state that jackd could talk to, but with newer kernels/libffado/jackd versions the problem evaporated somewhere along the line.

Jack meters

Realtime patchset to the Linux kernel? Lets not get into it… I am not a professional musician nor am I a sound engineer, and they would probably use windows or mac.

The Waitress

Confusion.

Synthesizer clutter

At that point you might be wondering where I’m going with things. Lets recap:
I’ve got a gqrx waterfall on Terratec DVB RTL-SDR that only supports Pulseaudio, and I’ve got an Echo Audiofire soundcard on firewire that only listens to jack. I can hook pulseaudio to Alsa.

Indeed, installing pulseaudio I discovered it will do this automatically, /usr/share/alsa/alsa.conf.d/pulse.conf suddenly appears and fucks your setup by putting everything in ALSA through Pulseaudio.

There is also some shit in /etc/pulseaudio/default.pa that is supposed to detect jackdbus and make pulseaudio use jack, but that stuff just never worked.

Of course, I have an .asoundrc file that takes everything from ALSA and puts it up JACK, so how do you think that’s gonna work?

Well, it doesn’t work.
So, it’s time to bring out the guns again.

The Heist

# convert alsa API over jack API
# use it with
# % aplay foo.wav

# use this as default
pcm.!default {
    type plug
    slave { 
       pcm "jack" 
       #rate 96000
       }
}

ctl.mixer0 {
    type hw
    card 1
}

# pcm type jack
pcm.jack {
    type jack
    playback_ports {
        0 system:playback_1
        1 system:playback_2
    }
    capture_ports {
        0 system:capture_1
        1 system:capture_2
    }
}

(that was .asoundrc)

load-module module-jack-sink
load-module module-jack-source 

in your /etc/pulseaudio/default.pa
but put it somewhere near the top, instead of load-module module-alsa-sink, before the ifexists module-jackdbus shit.

and rm /usr/share/alsa/alsa.conf.d/pulse.conf

Now remember that jack is running as you, so make sure that Pulseaudio is running as you as well:

sudo service pulseaudio stop
pulseaudio -v

The Payoff

Pulses playing through jack audio

At this point you can run your freshly compiled gqrx waterfall radio outputting to pulseaudio outputting to jackd and at the same time enjoy ALSA apps talking to jack directly and jack apps doing jack.

live online root migration

Monday, August 19th, 2013

Hey all, yep before you ask, yes OHM was fantastic.
Others have eloquently described that experience, so I’m here to tell you a different story.
fuck cairo

I got an SSD this summer. They finally reached my cost-benefit pricepoint, and I’m not going to waste breath talking about how phun SSDs are. However, I will tell you about the little things I did that others would probably not do, most notably how I migrated a live running debian linux system from one disk to another.

I already RAID-1 my home partition, which has completely different data storage requirements, and besides it wouldn’t fit on 10 SSDs.

The SSD was to be my root drive, the OS disk. The OS is not so important in my setup, because the argument goes that if something dies, I can always reinstall it, I can always boot from USB, I can always configure it and the heavier stuff is in gone already.

I decided to put NILFS2 on it, which I’ve successfully used before to save my girlfriend from inadvertantly discovering how ext2undelete sucks, the argument being that log structured filesystems are better suited to solid state drives, then researched the heck out of it and found out about F2FS, which seemed like a better fit – if nothing else then because it’s faster, I don’t need log-based snapshots on the root disk and it’s not btrfs.

brain fuck schedule

I’m allowed the luxury to try out new things – I compile my very own kernels, complete with Con Kolivas patches as always, and I know I how to keep the pieces if something breaks.

Let’s try out this new magic.

First, the simple stuff; cfdisk/gparted to create one small /boot partition, one large root partition (all primaries of course, because extended partitions are evil) and leave some space for “something unexpected”.. then mkfs.ext2 -L boot /dev/sda1; mkfs.f2fs -l root /dev/sda2.

Why the boot partition? Just in case, mainly because experience has taught me this makes booting a whole lot easier in case of troubles that may or may not be ancient and deprecated in 2013. Let’s get the files there mount /dev/sda1 /mnt/target; rsync -a /boot /mnt/target/; umount /mnt/target.

What I am doing next is not difficult. It’s just not something anyone would consider safe. I want to sync the root filesystem over to the SSD. But I can’t be bothered rebooting into a USB stick, I want to continue using my system while I sync the 40 odd gigs (yes, fourty, ask me why), and I want to do it all in one go, creating the “cleanest” new filesystem with no fragmentation and with directories ordered spatially together on disk.

clever trap, sir

Therefore, first I disable any and all services that might write to root or var:

for daemon in prads gpsd munin-node hddtemp mpd atop powernowd atd and cron ddclient dictd libvirt-bin ntpd ssh timidity smartd uml-utilities postfix rsyslog arpwatch mcelog
do
  service $daemon stop
done

Afterwards, I give it a minute while watching pidstat -d 10 to see if anything else might write to disk. I see nothing and proceed:

mount -o bind / /mnt/src
mount /dev/sda2 /mnt/target
rsync -a /mnt/src/ /mnt/target/

Why did I bindmount? because I don’t want rsync to cross filesystem boundries, which it sometimes does regardless whether I tell it not to.

There are only two more things to do: update fstab and the boot loader. I’ve already disabled swap beforehand, but the rest of this I forget to do. Instead, I am excited to try out the new SSD-enabled system and reboot a little too early.

nuked

Thankfully I’m running grub – grub 1 at that, so much more user-friendly than grub2 – and I tell it to boot into (hd0,0). This fails because initrd cannot mount /dev/sda6. Well duh, that was the old disk at the old location. I mount /dev/sda2 and rewrite fstab. I reboot. Again it fails: the root= argument to the kernel doesn’t match fstab. I reboot again, edit the grub boot again, and it fails on the third try. Because f2fs doesn’t have fsck, and I’ve told it to do a pass in fstab. This is a modern filesystem, it fsck’s on mount. I tell it not to pass with a magic zero in the fstab and hoppla, fourth time is the charm, my machine boots with no further ado, and somewhat faster.

And before you ask, I am not fond of disk UUIDs either, they are terrible from a usability standpoint. Ever tried to type an UUID by hand, let alone remember one?

Furthermore,

Kernel command line: root=/dev/sda2 ro crashkernel=128M zcache vga=791
zcache: using lzo compressor
zcache: cleancache enabled using kernel transcendent memory and compression buddies
zcache: frontswap enabled using kernel transcendent memory and compression buddies
zcache: created ephemeral local tmem pool, id=0

hell yeah, zcache is finally in mainline, which means the ancient war between Time and Space has another battle and I get compressed memory pages which (hopefully) are both faster and more plentiful in physical RAM, now that I’ve foregone a swap partition.

I promptly recompile my kernel, timing the difference now that I use an SSD and zcache aand… it’s not much faster. Oh well, I guess it’s time to upgrade the system :-P

dark nite

bup quick reference

Thursday, April 25th, 2013

Git is nice and flexible. I wish my backups were that flexible. Thankfully, my wishes have been answered, as bup was created.
I used to lookup the 28c3 bup slides for a quick reference, until I realized I was always looking for just one page of the slides. Best docs are short docs.

# Install
sudo apt-get install python2.6-dev python-fuse python-pyxattr python-pylibacl
git clone https://github.com/bup/bup.git
cd bup && make && make test && sudo make install
# index zz's home directory
bup index -ux /home/zz
# backup to default BUP_DIR and label the backup 'laptop'
bup save -n laptop /home/zz
# backup to remote myserver, naming the backup 'laptop'
bup save -r myserver -n laptop /home/zz
# index /home/zz on myserver
bup on myserver index -ux /home/zz
# backup myserver:/home/zz, naming the backup 'server'
bup on myserver save -n server /home/zz
# check the latest laptop backup
bup ls laptop/latest/home/zz

It’s hard to migrate from tivoli, rsnapshot, tarsnap and friends when you don’t know how. So here we go, without further ado, all you needed to know about bup but never daret to ask, ie

Some reasons to use bup:

  • global deduplication
    • rsnapshot: 4.97G = 2.18G with bup
    • rsnapshot: 12.6G = 4.6G with bup
  • save transmission time
  • backups are oneliners
  • anytime snapshots
  • uid,gid,permissions,acl,selinux
  • par2 anti-bitrot and corruption resistance
  • runs on dd-wrt

This is awesome, but there are two caveats. One is I am unaware of Enterprise&tm; shops using bup yet, the other is a common question: no, bup doesn’t encrypt data.

You can either encrypt or deduplicate. Choose. If you want the other, you probably want duplicity or tarsnap.