Posts Tagged ‘rant’

My Varnish pet peeves

Sunday, August 23rd, 2015

I’ve been meaning to write a blog entry about Varnish for years now. The closest I’ve come is to write a blog about how to make Varnish cache your debian repos, make you a WikiLeaks cache and I’ve released Varnish Secure Firewall, but that without a word on this blog. So? SO? Well, after years it turns out there is a thing or two to say about Varnish. Read on to find out what annoys me and people I meet the most.

varnish on wood

Although you could definitely call me a “Varnish expert” and even a sometimes contributor, and I do develop programs, I cannot call myself a Varnish developer because I’ve shamefully never participated in a Monday evening bug wash. My role in the Varnish world is more… operative. I am often tasked with helping ops people use Varnish correctly, justify its use and cost to their bosses, defend it from expensive and inferior competitors, sit up long nites with load tests just before launch days. I’m the guy that explains the low risk and high reward of putting Varnish in front of your critical site, and the guy that makes it actually be low risk, with long nites on load tests and I’ll be the first guy on the scene when the code has just taken a huge dump on the CEO’s new pet Jaguar. I am also sometimes the guy who tells these stories to the Varnish developers, although of course they also have other sources. The consequences of this .. lifestyle choice .. is that what code I do write is either short and to the point or .. incomplete.

bug wash

I know we all love Varnish, which is why after nearly 7 years of working with this software I’d like to share with you my pet peeves about the project. There aren’t many problems with this lovely and lean piece of software but those which are there are sharp edges that pretty much everyone snubs a toe or snags their head on. Some of them are specific to a certain version, while others are “features” present in nearly all versions.

And for you Varnish devs who will surely read this, I love you all. I write this critique of the software you contribute to, knowing full well that I haven’t filed bug reports on any of these issues and therefore I too am guilty in contributing to the problem and not the solution. I aim to change that starting now :-) Also, I know that some of these issues are better lived with than fixed, the medicine being more hazardous than the disease, so take this as all good cooking; with a grain of salt.

Silent error messages in init scripts

Some genious keeps inserting 1>/dev/null 2>&1 into the startup scripts on most Linux distros. This might be in line with some wacko distro policy but makes conf errors and in particular VCL errors way harder to debug for the common man. Even worse, the `service varnish reload` script called `varnish-vcl-reload -q`, that’s q for please-silence-my-fatal-conf-mistakes, and the best way to fix this is to *edit the init script and remove the offender*. Mind your p’s and q’s eh, it makes me sad every time, but where do I file this particular bug report?

silent but deadly

debug.health still not adequately documented

People go YEARS using Varnish without discovering watch varnishadm debug.health. Not to mention that it’s anyone’s guess this has to do with probes, and that there are no other debug.* parameters, except for the totally unrelated debug parameter. Perhaps this was decided to be dev-internal at some point, but the probe status is actually really useful in precisely this form. debug.health is still absent from the param.show list and the man pages, while in 4.0 some probe status and backend info has been put into varnishstat, which I am sure to be not the only one being verry thankful for indeed.

Bad naming

Designing a language is tricky.

conufsed?

Explaining why purge is now ban and what is now purge is something else is mindboggling. This issue will be fixed in 10 years when people are no longer running varnish 2.1 anywhere. Explaining all the three-letter acronyms that start with V is just a gas.
Showing someone ban("req.url = "+ req.url) for the first time is bound to make them go “oh” like a racoon just caught sneaking through your garbage.
Grace and Saint mode… that’s biblical, man. Understanding what it does and how to demonstrate the functionality is still for Advanced Users, explaining this to noobs is downright futile, and I am still unsure whether we wouldn’t all be better off for just enabling it by default and forgetting about it.
I suppose if you’re going to be awesome at architecting and writing software, it’s going to get in the way of coming up with really awesome names for things, and I’m actually happy that’s still the way they prioritize what gets done first.

Only for people who grok regex

Sometimes you’ll meet Varnish users who do code but just don’t grok regex. It’s weak, I know, but this language isn’t for them.

Uncertain current working directory

This is a problem on some rigs which have VCL code in stacked layers, or really anywhere where it’s more appropriate to call the VCL a Varnish program, as in “a program written for the Varnish runtime”, rather than simply a configuration for Varnish.

UncertantyYou’ll typically want to organize your VCL in such a way that each VCL is standalone with if-wrappend rules and they’re all included from one main vcl file, stacking all the vcl_recv’s and vcl_fetches .

Because distros don’t agree on where to put varnishd’s current working directory, which happens to be where it’s been launched from, instead of always chdir $(basename $CURRENT_VCL_FILE), you can’t reliably specify include statements with relative paths. This forces us to use hardcoded absolute paths in includes, which is neither pretty nor portable.

Missing default director in 4.0

When translating VCL to 4.0 there is no longer any language for director definitions, which means they are done in vcl_init(), which means your default backend is no longer the director you specified at the top, which means you’ll have to rewrite some logic lest it bite you in the ass.

director.backend() is without string representation, instead of backend_hint,
so cannot do old style name comparisons, ie backends are first-class objects but directors are another class of objects.

the missing director

VCL doesn’t allow unused backends or probes

Adding and removing backends is a routine ordeal in Varnish.
Quite often you’ll find it useful to keep backup backends around that aren’t enabled, either as manual failover backups, because you’re testing something or just because you’re doing something funky. Unfortunately, the VCC is a strict and harsh mistress on this matter: you are forced to comment out or delete unused backends :-(

Workarounds include using the backends inside some dead code or constructs like

vcl_recv{
	set req.backend_hint = unused;
	set req.backend_hint = default;
	...
}

It’s impossible to determine how many bugs this error message has avoided by letting you know that backend you just added, er yes that one isn’t in use sir, but you can definitely count the number of Varnish users inconvenienced by having to “comment out that backend they just temporarily removed from the request flow”.

I am sure it is wise to warn about this, but couldn’t it have been just that, a warning? Well, I guess maybe not, considering distro packaging is silencing error messages in init and reload scripts..

To be fair, this is now configurable in Varnish by setting vcc_err_unref to false, but couldn’t this be the default?

saintmode_threshold default considered harmful

saintmode

If many different URLs keep returning bad data or error codes, you might concievably want the whole backend to be declared sick instead of growing some huge list of sick urls for this backend. What if I told you your developers just deployed an application which generates 50x error codes triggering your saintmode for an infinite amount of URLs? Well, then you have just DoSed yourself because you hit this threshold. I usually enable saintmode only after giving my clients a big fat warning about this one, because quite frankly this easily comes straight out of left field every time. Either saintmode is off, or the treshold is Really Large™ or even ∞, and in only some special cases do you actually want this set to an actual number.

Then again, maybe it is just my clients and the wacky applications they put behind Varnish.

What is graceful about the saint in V4?

While we are on the subject, grace mode being the most often misunderstood feature of Varnish, the thing has changed so radically in Varnish 4 that it is no longer recognizable by users, and they often make completely reasonable but devestating mistakes trying to predict its behavior.

To be clear on what has happened: saint mode is deprecated as a core feature in V4.0, while the new architecture now allows a type of “stale-while-revalidate” logic. A saintmode vmod is slated for Varnish 4.1.

But as of 4.0, say you have a bunch of requests hitting a slow backend. They’ll all queue up while we fetch a new one, right? Well yes, and then they all error out when that request times out, or if the backend fetch errors out. That sucks. So lets turn on grace mode, and get “stale-while-revalidate” and even “stale-if-error” logic, right? And send If-Modified-Since headers too, sweet as.

Now that’s gonna work when the request times out, but you might be surprised that it does not when the request errors out with 50x errors. Since beresp.saint_mode isn’t a thing anymore in V4, those error codes are actually going to knock the old object outta cache and each request is going to break your precious stale-while-error until the backend probe declares the backend sick and your requests become grace candidates.

Ouch, you didn’t mean for it to do that, did you?

The Saint

And if, gods forbid, your apphost returns 404′s when some backend app is not resolving, bam you are in a cascading hell fan fantasy.

What did you want it to do, behave sanely? A backend response always replaces another backend response for the same URL – not counting vary-headers. To get a poor mans saint mode back in Varnish 4.0, you’ll have to return (abandon) those erroneous backend responses.

Evil grace on unloved objects

For frequently accessed URLs grace is fantastic, and will save you loads of grief, and those objects could have large grace times. However, rarely accessed URLs suffer a big penalty under grace, especially when they are dynamic and ment to be updated from backend. If that URL is meant to be refreshed from backend every hour, and Varnish sees many hours between each access, it’s going to serve up that many-hour-old stale object while it revalidates its cache.

stale while revalidate
This diagram might help you understand what happens in the “200 OK” and “50x error” cases of graceful request flow through Varnish 4.0.

Language breaks on major versions

This is a funny one because the first major language break I remember was the one that I caused myself. We were making security.vcl and I was translating rules from mod_security and having trouble with it because Varnish used POSIX regexes at the time, and I was writing this really godaweful script to translate PCRE into POSIX when Kristian who conceived of security.vcl went to Tollef, who were both working in the same department at the time, and asked in his classical broker-no-argument kind of way "why don’t we just support Perl regexes?".
Needless to say, (?i) spent a full 12 months afterwards cursing myself while rewriting tons of nasty client VCL code from POSIX to PCRE and fixing occasional site-devestating bugs related to case-sensitivity.

Of course, Varnish is all the better for the change, and would get no where fast if the devs were to hang on to legacy, but there is a lesson in here somewhere.

furby

So what's a couple of sed 's/req.method/req.request/'s every now and again?
This is actually the main reason I created the VCL.BNF. For one, it got the devs thinking about the grammar itself as an actual thing (which may or may not have resulted in the cleanups that make VCL a very regular and clean language today), but my intent was to write a parser that could parse any version of VCL and spit out any other version of VCL, optionally pruning and pretty-printing of course. That is still really high on my todo list. Funny how my clients will book all my time to convert their code for days but will not spend a dime on me writing code that would basically make the conversion free and painless for everyone forever.

Indeed, most of these issues are really hard to predict consequences of implementation decisions, and I am unsure whether it would be possible to predict these consequences without actually getting snagged by the issues in the first place. So again: varnish devs, I love you, what are your pet peeves? Varnish users, what are your pet peeves?

Errata: vcc_err_unref has existed since Varnish 3.

software defined radios with alsa, jack and pulseaudio and a professional sound card

Monday, September 9th, 2013

Preramble

waves

All the people out there are neatly divided in two piles:
the “it works for me and does what I need”-camp, and
the “always always always gets in the way so killitwithfire”-camp,
and this fragmentation may be the best argument that pulseaudio should be up for a whammy.

For all of you tl;dr’s (too lazy, doyou read?) here’s a short summary:

  • alsa: just works. confusion in the .asoundrc
  • pulseaudio: controls per process, less buffer fuckups, “just works”
  • jack: controls per process, realtime, firewire/usb, pro audio apps
  • firewire: fantastic, massive pain but getting there
  • software defined radios: so worth it!

But read on to learn the recipie to the secret magic sauce.

The reason I am writing this is not because pulseaudio is evil and sucks. However, it was the last straw in a long and windy road that broke the camels back. Pulseaudio assumes you are running systemd, and talks to console-kit-daemon which is surely one of Satan’s most trusted advisers and a harbinger of the Apocalypse.

Pulseaudio

We know all this, and yet why do I bother?
I didn’t come here to rant about Pulseaudio though:
I’ve gathered you here today to tell a story about Software Defined Radios.

Introducing a large and characteristic cast of characters, and howto make them work together in the best possible way.

My way.

The Cook

Well: a friend of mine got a hold of a few Terratec DVB-dongles with the awesome rtl-chipset and Elonics tuner, which means I can play with radio!

terratec dongle

Except the first time I tried I got stuck in gnuradio dependency hell and never got anything working… which was very nearly a year ago.

Things weren’t easy back then, gqrx, the pretty waterfall app wasn’t mature enough and you were stuck using something far more fugly (.net code running under mono, shudder the thought).

You still have to build gnuradio from source (because the packaged versions aren’t new and shiny enough), but the piper’s playing to a different tune now, with the advent of build-gnuradio it’s possible to sit back and relax while gnuradio and all its dependencies builds before your very eyes.

Yes indeed this takes longer than getting the cows back from pasture but it’s worth it, because with a full gnuradio build you can now have a hope of getting gqrx the shiny waterfall to compile!

gnuradio companion

The Thief

Except you didn’t realize that without the -m option ti build-gnuradio it builds gnuradio 4.6 which is not 4.7 which gqrx needs! Joke’s on you haha ha ha.

Then you build gqrx and realize you can’t get it to talk to your Terratec, because why? Because it’s a DVB dongle and the kernel has helpfully inserted the DVB module to enable it! So run along now and add

# rtlsdr
blacklist dvb_usb_rtl28xxu

to your /etc/modprobe.d/blacklist.conf – now you are ready to fire up gqrx and gnuradio-companion.

His Wife

That’s when you might discover that if you are lucky there is no sound in gqrx. It’s working and showing you a waterfall but you can’t hear the demodulated waves!

GQRX gnuradio waterfall

Why oh why, well let me tell you why: It absolutely needs, requires Pulseaudio to produce sound!

OK, fair enough, some of you out there are in the “works-for-me”-camp and ipso facto you’re done here, gqrx works and IT ALL JUST WORKS but the world is not so easy for the rest of us.

The rest of us bite the bullet and install pulseaudio to get this thing working. Which is as far as you need to go if you’re semi-sane and normal or even when you are running this thing on a Raspberry PI or you’re building a beagleboard spectrum analyzer.

Actually you don’t even need Pulseaudio for that last project..

Her Lover

echo2

What I have neglected to tell you however is that I have an Echo Audiofire. I was impressed with these little firewire-driven sound cards back when my bro had the small and portable Audiofire2.

Sound quality and flexibility wise they are unbeatable, certainly if you want professional quality sound input and output.

Firewire sound also has the major advantage over USB sound in that firewire packets aren’t quantized in time, which means a lot if you’re doing midi or other realtime music stuff. Latency is a killer.

You might also be aware that the higher the sample rate of your sound card, the higher the bandwidth of your homebrew SDR radio..

Anyways, firewire soundcards “just work” with asio drivers in Windoze but are a small pain of their own to set up in Linux. ALSA never heard of them. Pulseaudio doesn’t speak firewire. For anything resembling realtime professional audio under Linux you’ll have to go FFADO and JACK.

JACK audio kit

Also, never think that just any firewire card will work in Linux: a lot of vendors continue to ignore the platform (understandibly, because of the atrocious state of professional audio under linux) and there are some wonderous cards out there that have just pitiful support here.

The jackd brothers

You’re walking down a long path, you’re going to Mecca. You come upon a fork in the road where two brothers live. They are twins, and you know that one of them always speaks the truth, and the other always lies. You need to ask them the way to Mecca, but how?

As there are two problems with anything in this world, there are two problems with Jack. Firstly, jack forked into jack1 and jack2, and both versions are strangely alive to this day, and there is netjack1 and netjack2 and well, what the fuck.

FFADO

To complicate matters there are two competing linux driver subsystems for firewire and both of them live to this day, with one supporting some firewire devices and one supporting other firewire devices, and one being supported in jack1 and the other in jack2. Furthermore you need a recent FFADO to get it all working.

Thankfully in recent debians and ubuntus the right kind of jackd talks to the right kind of firewire device in the kernel and matches the right ffado to get things to work, but you still need to know your way around.

LMMS

The Answer, not The Question

Know what question to ask to get the right answer, which is that at least for the Echo Audiofire, jackd2 works nicely with ffado and recent-ish kernels as long as
you run jackd as your X user
with jackd -v -dfirewire and then fire off qjackctl and ffado-mixer and then all your sweet sweet jack apps. For now, lets assume you are jackd2′ing things, but let us just say that at this point it no longer matters.

What you don’t know is that to get the Echo to work, you will likely have to upgrade your Echo firmware, either by hooking it up to a windoze with the right drivers and letting them reflash the rom, or play with the scary commands ffado-diag and ffado-fireworks-downloader, insert magic (literally!), etc.

Having done all this voodoo I still had problems that required rebooting the sound card or booting into windoze to reset it to a known state that jackd could talk to, but with newer kernels/libffado/jackd versions the problem evaporated somewhere along the line.

Jack meters

Realtime patchset to the Linux kernel? Lets not get into it… I am not a professional musician nor am I a sound engineer, and they would probably use windows or mac.

The Waitress

Confusion.

Synthesizer clutter

At that point you might be wondering where I’m going with things. Lets recap:
I’ve got a gqrx waterfall on Terratec DVB RTL-SDR that only supports Pulseaudio, and I’ve got an Echo Audiofire soundcard on firewire that only listens to jack. I can hook pulseaudio to Alsa.

Indeed, installing pulseaudio I discovered it will do this automatically, /usr/share/alsa/alsa.conf.d/pulse.conf suddenly appears and fucks your setup by putting everything in ALSA through Pulseaudio.

There is also some shit in /etc/pulseaudio/default.pa that is supposed to detect jackdbus and make pulseaudio use jack, but that stuff just never worked.

Of course, I have an .asoundrc file that takes everything from ALSA and puts it up JACK, so how do you think that’s gonna work?

Well, it doesn’t work.
So, it’s time to bring out the guns again.

The Heist

# convert alsa API over jack API
# use it with
# % aplay foo.wav

# use this as default
pcm.!default {
    type plug
    slave { 
       pcm "jack" 
       #rate 96000
       }
}

ctl.mixer0 {
    type hw
    card 1
}

# pcm type jack
pcm.jack {
    type jack
    playback_ports {
        0 system:playback_1
        1 system:playback_2
    }
    capture_ports {
        0 system:capture_1
        1 system:capture_2
    }
}

(that was .asoundrc)

load-module module-jack-sink
load-module module-jack-source 

in your /etc/pulseaudio/default.pa
but put it somewhere near the top, instead of load-module module-alsa-sink, before the ifexists module-jackdbus shit.

and rm /usr/share/alsa/alsa.conf.d/pulse.conf

Now remember that jack is running as you, so make sure that Pulseaudio is running as you as well:

sudo service pulseaudio stop
pulseaudio -v

The Payoff

Pulses playing through jack audio

At this point you can run your freshly compiled gqrx waterfall radio outputting to pulseaudio outputting to jackd and at the same time enjoy ALSA apps talking to jack directly and jack apps doing jack.

failcode

Thursday, August 18th, 2011

In my time as an application programmer. developer and designer, breif stint as team lead and project manager,
as well as my time as a systems consultant, I have witnessed first-hand and also heard many credible tales of systematic failure that rival any of the stories on The Daily WTF. My collegues and I have seen so many examples of bad design, bad code and systemic failure that we have considered writing a book titled How To Write Ugly Code.

I have also read the Texas Instruments Chainsaw massacre and personally met Gomez while debugging applications.

My speciality and my interest lies in diagnostics and the analysis of problems as well as system security, and my experience has showed that one can venture to say something about the qualitative difference of different design methodologies and what they have to say for the end result.

Firstly however, it is worth noting that the software industry as a whole has one primary problem: the time pressure to deliver new features at the face of mouting expectations.

This pressure to deliver is seen as the driving force behind industry progress and ever leaner, more economic applications, however contrary to this belief I have evidence that it leads to incentives for sloppy work, overengineering and poor considerations of the problem domain. It seems the process itself rewards poor application design, regardless of development methodology.

Large corporate and government tenders, which affect many hundreds of thousands of peoples lives, get bid on by large software houses that believe they can deliver everything (at low risk: if they cannot deliver it is very hard for the customer to contest this to a major software house).

What we get by and large out of this process are bloated top-down applications designed by people who do not understand the (whole) problem, leading to misguided decisions for such things as

  • choice of platform and language
  • choice of coding standards (check out Systems Hungarian if you don’t believe me)
  • programming methodology
  • communication tools: source control, ticket and forum tools for developers and system managers
  • Not Invented Here-practices
  • monkey-coding by people whose talents could be employed to solving the problem

What usually goes for as “agile” development causes frequent ineffective blame-game meetings.
Unit test driven development frequently causes micromanagement of program details and inflexible designs,
… all these methodologies were designed to improve programs, not bog them down! why then do they cause so much breakage?

The pressure to deliver requires the application developer to prefer large swathes of ready-made library code and a high level of abstraction to allow her to meet deadline demands.

A high abstraction level causes low debuggability and poor performance.
Low debuggability because bugs are by definition conditions caused by circumstances unforseen by the application developer. Abstractions are employed by the developer to hide implementation details to aid clairty and speed of application development, at the cost of debuggability.

The very tools and abstractions employed by the application developer create the frame through which the developer can see the circumstances of her design and code. Bugs most frequently occur on the boundries between abstractions, where the developer has no possibility to forsee these circumstances. Furthermore, in a system that has a passibly high level of abstraction there is a whole stack of hidden details which must be traced and unwound to discover the bug. Therefore, every additional layer of abstraction obscures the debugging process.

The debuggability and algorithmic simplicity is key in achieving optimal performance. In other words, if we have a clear problem statement it is possible to achieve performance. If there is no clear problem statement, and the program is further muddled by abstractions and interactions there is no effective path to performance.

Any artist will be able to tell you that the most interesting, creative and innovative work comes out of having a stress-free, playful environ. Since innovative coding is a creative activity, the same applies to developing applications, something that game developer companies and creative shops have known for years, and behemoths like Google and Microsoft have picked up on, reinvesting up to 15% of their revenue into research and development and getting that part right, as witnessed by the sheer output of innovation.

If there is a clear path to solving these fundamental problems of IT then it is putting the people who know what they are doing in the pilot seat, enabling developers to choose for themselves not only toolchains, methodology and communication tools but also engaging the systems thinkers into creating the specifications and architecture of the systems they are going to implement. The good news is that as customers and managers get savvy to this method of achieving IT success, we are going to see more developer autonomy and less spectacular fails.

Free society conference – my experience

Tuesday, November 9th, 2010

Until the very last minutes I was unsure whether I’d make it to FSCONS, the Free Society Conference and Nordic Summit. I did not know what to think of it, despite gentle pushings from one set to speak at the conference. Three days later and with the event somewhat filtered in my mind, there is no doubt that it was well worth the opportunity costs and then some.

I'm going to FSCONS 2010!

My first impressions while browsing the event programme were that there was no way to attend all the interesting talks! An insurmountable problem, and I hadn’t even gotten there: my meat could not be in two to three places at the same time, while my mind could not rationally choose away interesting unknowns.. so I opted to leave it to a vague notion of chance and intent.

What I hadn’t expected was the powerful impact that the people attending would have on me. Cyber heroes and heroines, freedom fighters, game changers, inventors, uncelebrated cryptoanarchists and everything makers were some of those that I got to know, that engaged me in dialogue, that dared discuss openly some (most?) of the hardest problems that we, the world, are facing today. With the full intent to do something about these problems.

(more…)

what’s wrong with IT?

Wednesday, March 24th, 2010

Hold on a bit.
I am a monk of the old order, one of the illuminati of software stacks. By no means a high priest, but like many of my brethren I have been ordained with most of the dirty little secrets over the years since I joined the convent. I never specialized so I am well read in ancient texts and new work, and I have meditated on subjects ranging from compiling compilers through 3D rendering and artificial intelligence to business processes and value chains. In the constant rush to deliver on the promises of technology I’ve seen projects that are set up for failure even before they start. I’ve seen enough code to write a book detailing example for example what you should not do during development.

The secrets are many, and they are complex and hard to grasp out of context, but to misuse an old adage: the truth is simple and it’s out there.

The reason applications fail is because they are complex, but the reason IT fails is that IT people expect the applications to be simple to manage, and the business has a nasty tendency to promote the clueless.

It’s amazing how much money is thrown out the window (and into greedy hands) by large corporations and public departments on hairy overcomplicated blackbox solutions that are supposed to meet simple business needs.

Take databases for example. It’s easy to argue that the database is part of core business (because all the critical application data ends up in the database) and thus the database must be highly available, highly performant and highly secure. Maybe that’s how the CTO’s of the world justify spending millions on monstrous arcane iron black boxes to serve their modest database needs. Annualy!

The same needs, if properly understood, could be served by the fraction of the cost while being easier to manage and debug!

This is not just a schpiel on Postgres (who cares it’s open source, it can do all that and more) but a general protection fault in how technology is driven today.

Another nice example is DNS, which is beyond core business in importance: without domain resolution nearly all infrastructure fails. DNS problems can cause the most obscure failures simply because applications have no provision for DNS failure. Quite a few IT departments all over the world operate DNS through point-and-click wizards without anything but the rudimentary understanding of its inner workings. Should they have that understanding? Hell yes, otherwise sooner or later it must fail as everything does, and when it does they have none of the tools to fix it!

Scarier still is that the rest of the world (or very nearly) has standardized on the most baroque and insecure DNS server in existence (BIND me in hell with seven furies burning marks in my skin), a precise analogy to what has happened in the world of e-mail (sendmail will do anything but!). We do this because we follow Best Business Practices, which is the IT analogue of what happens to you when you go through airport security: it is completely ineffective but feels safer.

Other examples of the same thing happening is the proliferation of security products that offer nothing but a smokescreen, the use of gigantic and poorly considered application frameworks and the abstraction and layering of simple concepts into behemoth object-relation collections.
Humans have a distinctly object-oriented view of the world, all the same the world is trying to tell us that objects don’t exist in their own right but depend on a whole slew of epiphenomena.

Software rots if it is not maintained.
None of the above are hard problems, regardless of what others might have tried to jam down your throat. Databases are a snooze to work on, DNS and mail should Just Work, and once we have a web application stack going for us we’re not going to consider how it works or what could be better. The difficulty that lies in application infrastructure is a people problem.

We want to buy a shrink-wrapped product and feel value for money without risk.

There is some sort of mass marketing effect happening where decision makers are best influenced by the greediest hands. We tend to think that the most expensive car has the best value with the least risk, and we seldom so clear-sighted as to go back on decisions we have already made.

So what’s the fix?

Decision makers should spend more time evaluating the options before launching headlong into projects based on best business practices, and they should identify and listen more to the few quiet people that have a clue. The people with clue usually only get to vent their frustrations by making crass jokes about management and the hairyness of the most recent or most painful and embarassing failure of technology. These things are not talked about openly, but they should be.

Ideally we should focus on long-term research into the difficult problems of technology: artificial intelligence, algorithms, how to feed the starving and save the world from imminent ecological disaster, quantum computing etc, instead of spending so much time failing at the simple things.

Lame things that suck

Saturday, December 5th, 2009

The world is a difficult place, we know.
Here’s a list of things that suck unnecessarily much:

  • Fink for OSX needs Xcode dev tools.
    Why not provide a gcc/libc-dev package? No idea. General lameness from the fink developers forces you to register at the Apple Developer Connection and download 700MB of apple crap just to install and compile source packaged software in fink. LAME.
  • Fink is not Cydia on the iPhone.
    Both are based on apt and dpkg. Both run on OSX. Pooling of efforts, everyone.
  • The very fact that you have to jailbreak an iPhone is ridiculous. Goes doubletime for Xbox and PlayStation chipping, Wii softmods and DS carding. This is vendor lockdown and should be lotted with the criminally insane – vendors don’t need to take responsibility for user-created apps, but vendoirs must not stand in the way of the software evolution.
  • Tar sands. Corruption.
  • There ain’t enough time to read all the cool web comics. Games? Don’t get me started.
  • to quote a friend and collegue, “every operating system in the world. Pick one and I will tell you how much it sucks.”

how to break your head : try linux compilination

Wednesday, July 29th, 2009

I’ve recently started compiling my own kernels again. Some people ask me why I’d ever want to do this – a valid question, since anyone who’s done it knows a time-consuming hassle best left to the distro packagers and really nerdy people with too much time on their hands. Other people will give a blank face and ask “What is a Conpiling?” To these other people: this article is not for you, it will only serve to confuse that pretty little head of yours. If you know what ‘a compiling’ is, you may proceed. I don’t provide references; I banter them. Google your friend, pluckum.
\
Still, I am not here to discuss the reasons for compiling your own kernel – these are all too obvious to the initiated and completely uninteresting to anyone else. I’m more interested in the reasons why my friends, collegues and I have *stopped* compiling our own kernels – despite some of us enjoying at least a compile a day (or ten!) for periods of time in the past. Only the gentoo rice boys remain, steadfastly compiling everything in sight despite snide comments about mean time between upgrades and ridicule about their USE_FLAGS selector GUIs.
\
Why don’t we compile anymore?
There is no stable upstream branch. In my own experience this has had direct consequences for the stability and quality of point releases.
Years after Linus’ bitkeeper schism, the SCO slimeballing and the death of the stable branch, we can look back and say that aye, we have a better audit trail and development has scaled through the roof. We have more kernel features than ever, and an astounding rate of patches make it into mainline every day.
\
These amazing developments are a long shot away from the linux dev process back in the days of 2.2 and 2.4, but there is a dark side to these developments.
Regressions are no longer the domain of the bleeding edge, the -mm or -ac trees, -alpha and -rc releases for the adventurous, masochistic or desperate. Common things. Getting bitten by that local sexploit and being too embarassed to tell your friends about it. Software suspend used to work fine. The graphics card did not crap itself on the last point release, but at least my NIC doesn’t get bricked in this one. The wifi keeps screwing with you, but you don’t know if you should blame Ubuntu, Intel or Linus. On the internet noone can hear you scream.

\

Elitism is rife on the LKML, and more pointedly, in the mainline patch process. Who knew NIH would be such a big problem in an open source project? Admittedly, it is the largest and perhaps the most ambitious open source project of all, with all eyes on target, a million uses and powerful market forces pulling the project this way and that. Linux has long ago outgrown the boy’s room, the hacker dungeon and its academic roots. Most kernel patches that get into mainline are pushed there by large hardware and software vendors. Many kernel hackers hack the kernel on their day job, earning an engineer’s living.
\
Linux has reached the Enterprise in a big way. The system runs and is optimized for Big Iron. The desktop is “good enough”, say the kernel hackers. Latency is fine for our uses, and those squeaky audiophiles should shut up and fork. Indeed they did, as embedded, realtime and audio people have all collectively decided to jump off the wagon.
Out-of-tree kernel hackers already know where the lay is at. After years of pushing the same genious useful patchsets they are sick of cleaning up, splitting out, documenting, backporting, forward porting only to discover that noone read their patch. Maybe they will be lucky, their ideas bastardized overnight into someone else’s pet project, far more likely to succeed once it is Invented Here(tm).
\
It’s not all bad: we want and need to trust the people that push stuff into the kernel. Who are you to think that you can do it better than them? They are doing their job, they do it well, so what if they all meet for beer and virgin sacrifice after hours, so what if there is no free seating in their society? Fork your own.
\
Weiging in at 800MB uncompressed, the Linux source is a behemoth. Counting only source, headers and assembly, there are 35,000 files in the linux kernel, with 10,667,648 lines of source code. This code is metriculously organized, not only into systems, subsystems and modules, but into domains of responsibility. Hey, if you’ve ever managed a large software project you would know how annoying, how encroaching it is when someone start fiddling with your private bits.
\
On the other hand, linux has lost a lot of great contributions and spurned a lot of marvelous people because of this elitism. OpenMosix israeli clustering, reiser4 the murderous file system, software suspend 2 the ‘it just works’ approach, page-in-from-swap, CK’s desktop efforts, the two kernel monty carlo and process snapshotting are only few of the projects that failed to sufficiently influence the core developers, some of them even year after year.
It can be argued that despite the patches not making it to mainline some of these ideas did find their way into the minds of the gitmasters and found other implementations on technical merit alone. To me this defeats the whole purpose of the open source model which drives technology by sheer speed. We’ve had a working, cleaned up, documented version of the patch for two years – and the feature doesn’t make the cut. This is too little too late.

\

Well, not everyone takes an interest in kernel politicking even if they follow the LKML or kerneltrap, and some people even like hitting bugs and fixing issues in their compiles, and trolling in epic flame wars. They too have left kernel compiling to other, more patient and masochistic people.
Maybe it’s because even grepping a single point release changelog is a major chore. The distro folks have gotten fairly good at kernel compiles; ubuntu ships a one-size-fits-all Just Works(tm) kernel, RedHat’s patchset has grown less offensive over the years and debian is and always was debian. Upgrades are relatively painless and usually somebody else already did the dirty work.
Linus Thorvald’s initial plan succeeded: by axing the stable/unstable tree he told the world that the responsibility for stability rests on the distributor. He also axed many hobbyists’ will to stay and play with new releases. I’d rather go play on milw0rm.

\

There are other compelling reasons not to roll one’s own: the number of configuration options has doubled over the past years, and most of these new options are not relevant to the hobbyist use case. Development not only in the kernel source but in the toolchain (gcc) has caused compile times to soar. I remember proudly compiling 2.4 kernels on my K7 within 10 minutes back in 2001. Today it might take longer to compile the tree on my Centrino dual-core.
And there it is: we’ve suffered feature creep and bloat. After a long download, an hour or more of configuring, and many failed initial make runs, a generic compiled bzImage weighs in at about 3412 kB. This is a modular kernel, mind you. What happened to lean and mean 800 kB kernels?

Memory is cheap you say.
But minds are not cheap!

\

I’m announcing a contest: what’s the smallest stable useful kernel you can make for your platform? Remember, it should run on other machines and be useful, and the compile reproducible. Choose your own definition of useful, but do find a concrete definition. Use any tree and patchsets that turn you on. Bonus points for packaging so others can plug your kernel into their system. I’ll make your package available.
As a side contest I’ll take compile times along with bogomips numbers and your .config file for reference.

\\\

PS. Yahoo! internal IT sucks. Where’s the wifi? Running our own cables, canned XP images in a linux lab, packet loss. This aint no funky party. I guess they are too busy. Paranoia maybe. Things aren’t wonderful.

LDAP and its many uses

Friday, June 19th, 2009

There is a nice article on Single-Sign-On and LDAP in the Journal and although it is not new the man writing it has clearly spent some time finding novel (read: whack) uses for catalogue services.

Myself, on the other hand, I’ve been finding novel ways to break OpenLDAP. My 35-hour stint on Thursday set up more Active Directory-integrating workaround setups of the Slap Daemon than you can shake a bloody large stick at, including but not limited to The Inverted Translucent Reverse Meta Tree, where we do a slapo-translucent overlay in one slapd and a plain slapd database in the second slapd, then slapd-meta the sAMAccountName into uid and remap the suffixes in a third slapd process. Yep, that’s four separate catalogues to solve one application problem.

Don’t. Ask. Why.

The upshot is that you should stay the hell away from the slapd rewrite module as it will core, that the translucent overlay is magnificent at making very plain ldapsearches (objectclass=*) return no objects or fail, that slapd-meta is a very cool backend to do remapping suffixes, attributes and your mom, that your application should never have to write to a read-only Active Directory tree, and that simplicity is instrumental in not going mental.

Unfortunately, simple solutions to complicated problems are rather hard to come by.

PS. The problems I was trying to fix all came out of one single application bug and my attempts to work around it :-P