michael orlitzky

Motherfuckers need package management

posted 2015-07-05

Are you angry? Let's be angry.

The Problem

Your computer has a bunch of software on it. There's an operating system kernel, some libraries, and then the programs (like Firefox) that you actually use. Each of those things is called a “software package,” or simply “package” when you're unlikely to mistake the software for a dong.

Every software package has dependencies—the other shit that it needs to work properly. For example, Firefox probably requires Mesa to draw porn on your screen. But, Firefox might require version 9 of Mesa while your old-ass office suite needs version 8. So if you install the newer version, Firefox will work, but you won't be able to do spreadsheets or whatever your stupid job is.

This problem has a name. Actually a lot of names:

I guess people got bored of coming up with kinds of hell, so they decided to call all of these dependency hell. And they are all the same problem. You'd think someone would have figured out a way to fix it by now. Oh, right.

Package Managers

A package manager is a program that installs stuff. But it's supposed to do more than that. It's supposed to make sure that all of the shit you have installed works together.

There are really two parts to package manager. There's the package manager program itself, and then a repository of packages that it uses. There is no standard way to build/install a package, and no common format for specifying dependencies. So, someone has to figure all that shit out, write it down, and then put it somewhere. That somewhere is the package repository. As long as you don't commit garbage to the package repository, the package manager will keep you out of dependency hell by refusing your requests to install conflicted shit.

On Linux, the people who curate the package repository are called distribution developers, like Gentoo developers or Debian developers. All software sucks, so usually the distribution developer will have to fuck a bit with a package before it will work. The ability to fuck with software is one of the four freedoms; that's why most of the software available on Linux and BSD is Free Software.

Package managers are the killer feature on Linux/BSD. They're why Linux/BSD doesn't have viruses. They're also why those operating systems last more than a few years. Most people replace their Windows PCs every few years because “it's slow.” Well, computers don't get slow—Windows gets slow. And it gets slow because you install a bunch of garbage on it and never track what that garbage is or where it goes and after a few years your computer is a landfill so naturally you take it to be with its own kind. That doesn't happen when you use a package manager.

Back to dependency hell. Package managers exist. So we're done here, right? Everyone uses a package manager and dependency hell goes away? Right?

How to Fuck Up a Package Manager

Support only binary packages

There are two ways that you can install a package. The first is more familiar to Windows and Mac users: you download an executable that someone else built, and then stick it somewhere. This is called a “binary package.” The second way would be to download the source code and build it yourself, called a “source package.” Source packages have countless benefits over binary packages. Here are the two deal-breakers.

First, most software has a number of options that can be set at build time. You need to be able to change these. If somebody gives you a pre-built executable, you can't.

Second, if someone is giving you the executable, he needs to predict all of the possible ways and places that you might use the software. An executable for one type of computer architecture won't work on another. And there are like, a whole lot of computer architectures. When you build the source code yourself, you build it for your architecture, and then it automatically works there. But when someone else builds it for you, he has to,

  1. know that your architecture exists; and
  2. give a shit about you; and
  3. find a computer like yours to test it

Usually one of these criteria is not met (guess which one). To distribute a binary package that works on ten architectures takes ten times as long as it does for a single architecture. With a source package, it doesn't matter, because you build it where you're going to use it. So source packages—at least optionally—are the way to go.

Despite the disadvantages, there are still some package managers that only support binary packages. That's one good way to fuck up your package manager.

Bundle dependencies

The package manager's job is to make sure that everything on your system is consistent. But that's like, hard, bro. What if we just, uh, didn't? Feel me? *takes a hit of javascript*

Some package managers only operate within the current directory or the current project. In that case, every dependency of your project is downloaded and stored alongside your project where it won't interfere with any of the other projects on the system. This is called “bundling.” You're less likely to get conflicts this way, because the package manager isn't really doing what you want it to. If I have five packages in directory foo and five packages in directory bar, it's possible that the packages in foo are consistent (within foo) and the packages in bar are consistent (within bar) but one of foo's dependencies just fucking hates a package in bar. And I would kind of like to know that, because if both foo and bar are important to me, someone needs to get his shit together and fix it.

Bundling also makes it impossible to stay updated, and creates a security nightmare. Suppose someone finds a security vulnerability in libwhatever that has been fixed in a later version. I have around 1,000 programs installed on my workstation right now. How do I upgrade libwhatever if each of those thousand programs potentially has its own bundled copy of it? Am I supposed to go around to each program and attempt to update them all individually? How do I even find out what programs I have installed? I own more than one computer, by the way. Ain't nobody got time for that. People who think this is a good idea have never had to keep anything working for any amount of time.

What it comes down to is this: it's stupid to copy/paste code around. And a bundling package manager is essentially just a fancy interface for copy/pasting code into your project. Everyone knows that copy/pasting code is bad, and no self-respecting programmer (if there exists such a creature) would ever tell you otherwise in reference to (say) copying the body of a function. But when it comes to dependencies, motherfuckers' brains fall out. Some cognitive-dissonance-olympics-type shit lets these people maintain that copy/pasting a little bit of code is abhorrent, just absolutely unthinkable; but copy/pasting everything? That's a fucking great fucking way to solve this problem.

So the whole bundling idea is retarded but a lot of package managers do it. It can seem like a good idea as long as you don't ever write something useful: if there are only two decent programs that your package manager provides, well, it's not that annoying to have to update them both all the time. But once you have to babysit more than a few of them you'll realize that your package manager is fucked up.

Restrict it to a single programming language

Programs can be written in more than one programming language. Sometimes a program written in one language depends on a program written in another language. So an inevitable requirement of a package manager is that it should support more than one programming language. For example, you should be able to say that the Ruby bindings to PostgreSQL (written in Ruby) require libpq (written in C), because they fucking do.

This need becomes obvious about ten minutes after you finish writing your package manager. Unfathomably, most new package managers fail this most basic requirement.

Tie it to one platform (Linux, Mac, Windows, etc.)

No one wants to duplicate all of this work for every platform out there. A package manager should work on any platform within reason (it's cool to require Cygwin on Windows where everything not completely missing completely sucks.) Otherwise, we'd need more than one package manager, and that would be more than one work.

Comparison of Existing Package Managers

Here is where we pick on specific people in particular, using a table.

The following table compares some popular package managers. It's set up so that a ✓ is a good thing, and an ✗ is bad. Here's what they mean.

pm (repo)
package manager (repository one, repository two…)
src
package manager is capable of installing packages from source
global
package manager is capable of installing packages and dependencies globally
x-lang
package manager supports more than one programming language
x-plat
package manager is cross-platform
Comparison of fucked up package managers
pm (repo) src global x-lang x-plat
Apt (Debian)
Bower (Bower registry)
Bundler (RubyGems)
Cabal (Hackage, Stackage)
Cargo (Crates)
CocoaPods (PodSpecs)
Composer (Packagist)
CPAN
easy_install (PyPI)
elm-package
Emacs package.el (ELPA)
Ivy (Maven Central)
Leiningen (Maven Central)
Maven (Maven Central)
npm
NuGET
Octave (Octave Forge)
Pacman (Arch Linux)
PEAR
pip (PyPI)
R (CRAN)
RubyGems
SageMath (SPKGs)
Stack (Stackage)
Yum (Fedora)

If I've made any mistakes in the table, it's not because I secretly hate your package manager and want to make it look bad: I overtly hate your package manager, and it is bad. Let me know and I might fix the table.

Among this list we have some outstanding achievement awards:

*applause*

It turns out, writing a package manager is hard (whaaaaaaattt). Dependency resolution algorithms are hard. Updating/rebuilding packages for ABI changes is hard. Ensuring atomic operation is hard. Cross-compilation is hard. Tracking installed files is kinda hard. To create a simple user interface for all that shit is unbelievably hard. The older package managers have been around for a long time—lots of research and work has gone into them and it's not because the authors were idiots.

Here's what's easy and fun: parsing a text file of dependencies, downloading them, and then copy/pasting them into a directory. Guess what most new package managers do? Mmmmhhhmmm. To understand why we have a table full of 25 projects that all attempt to do exactly the same thing and all fail in exactly the same ways, you must first understand the CADT model of development.

And the bundling is only getting worse. These days we have Docker and Vagrant which pretty much copy/paste an entire operating system into a container in order to run a single program. This only perpetuates the problem. Developers who use these things never realize that the libraries they're writing are a pain in the ass, so the libraries don't get fixed, so the only way to use them is to copy/paste them into your project…

What to do About It

I'm taking this opportunity to announce my new service, Computr, which entirely solves the dependency problem. Simply write the name of a package on a postcard, staple it to about $300, and mail it to me. Within a week, a brand new computer will show up on your doorstep with only that package installed. If you do that for every program you ever need to run for the rest of your life then you'll never hit another dependency conflict again.

But Seriously

There's only one package manager that aces the table above: Gentoo Portage/Prefix. Portage is the standard package manager on Gentoo Linux. The Gentoo Prefix project allows it to work anywhere. (The “Portage” part of this equation is somewhat flexible; there are other package managers like Paludis and pkgcore that use the same repository and ebuild format. That format is described in the Package Manager Specification.)

Here's how we fix this mess. The next time you invent your own programming language and think, “I should write a package manager for my language,” just go right ahead and unthink it. Instead, learn Gentoo Prefix, and begin contributing packages for your language to Gentoo. Then, document the typical usage instructions for your users, and tell them that Portage/Prefix is the package manager for your language. Yes, this is more work and less fun than writing a JSON parser and wget front-end. Sorry, quit being a bitch. It also has the potential to work, which your home-grown package manager lacks.

Once your new language has packages in the Gentoo repository, they become available to the tens of thousands of existing Gentoo (and its derivatives') users. And also to anyone using Gentoo Prefix on some other system. Oh, and all of these people can help you maintain your packages. Plus, if your library authors are using it, then they'll know when their library causes a conflict, because they won't be able to install it anymore. So they'll fix it, and shit will work, and you'll save about $300.

If all of the projects in the table did this tomorrow, dependency hell would cease to exist. But unfortunately, the sunk cost fallacy states, “fat chance asshole.” Language-specific, bundling package managers are going to stick around because it makes their authors feel cruddy to admit that their idea was dumb and throw everything away. So instead, they'll keep making their projects slightly-less-but-still-fundamentally fucked up over time. But maybe there's a tiny bit of hope for new languages. (Did you actually expect me to have a plan? I just needed to yell into Emacs for a while.)

Something about a carrot and a stick: if you're doing development in e.g. Ruby or Haskell, come give Gentoo (Prefix) a try. We don't have Rubygems hell or Cabal hell here—everything just works. Although, you'll probably want to use the Ruby overlay or Haskell overlay to get the most up-to-date packages.