posted 2015-07-05
Are you angry? Let's be angry.
Your computer has a bunch of software on it. There's an operating system kernel, some libraries, and then the programs (like Firefox) that you actually use. Each of those things is called a “software package,” or simply “package” when you're unlikely to mistake the software for a dong.
Every software package has dependencies—the other shit that it needs to work properly. For example, Firefox probably requires Mesa to draw porn on your screen. But, Firefox might require version 9 of Mesa while your old-ass office suite needs version 8. So if you install the newer version, Firefox will work, but you won't be able to do spreadsheets or whatever your stupid job is.
This problem has a name. Actually a lot of names:
I guess people got bored of coming up with kinds of hell, so they decided to call all of these dependency hell. And they are all the same problem. You'd think someone would have figured out a way to fix it by now. Oh, right.
A package manager is a program that installs stuff. But it's supposed to do more than that. It's supposed to make sure that all of the shit you have installed works together.
There are really two parts to package manager. There's the package manager program itself, and then a repository of packages that it uses. There is no standard way to build/install a package, and no common format for specifying dependencies. So, someone has to figure all that shit out, write it down, and then put it somewhere. That somewhere is the package repository. As long as you don't commit garbage to the package repository, the package manager will keep you out of dependency hell by refusing your requests to install conflicted shit.
On Linux, the people who curate the package repository are called distribution developers, like Gentoo developers or Debian developers. All software sucks, so usually the distribution developer will have to fuck a bit with a package before it will work. The ability to fuck with software is one of the four freedoms; that's why most of the software available on Linux and BSD is Free Software.
Package managers are the killer feature on Linux/BSD. They're why Linux/BSD doesn't have viruses. They're also why those operating systems last more than a few years. Most people replace their Windows PCs every few years because “it's slow.” Well, computers don't get slow—Windows gets slow. And it gets slow because you install a bunch of garbage on it and never track what that garbage is or where it goes and after a few years your computer is a landfill so naturally you take it to be with its own kind. That doesn't happen when you use a package manager.
Back to dependency hell. Package managers exist. So we're done here, right? Everyone uses a package manager and dependency hell goes away? Right?
There are two ways that you can install a package. The first is more familiar to Windows and Mac users: you download an executable that someone else built, and then stick it somewhere. This is called a “binary package.” The second way would be to download the source code and build it yourself, called a “source package.” Source packages have countless benefits over binary packages. Here are the two deal-breakers.
First, most software has a number of options that can be set at build time. You need to be able to change these. If somebody gives you a pre-built executable, you can't.
Second, if someone is giving you the executable, he needs to predict all of the possible ways and places that you might use the software. An executable for one type of computer architecture won't work on another. And there are like, a whole lot of computer architectures. When you build the source code yourself, you build it for your architecture, and then it automatically works there. But when someone else builds it for you, he has to,
Usually one of these criteria is not met (guess which one). To distribute a binary package that works on ten architectures takes ten times as long as it does for a single architecture. With a source package, it doesn't matter, because you build it where you're going to use it. So source packages—at least optionally—are the way to go.
Despite the disadvantages, there are still some package managers that only support binary packages. That's one good way to fuck up your package manager.
The package manager's job is to make sure that everything on your system is consistent. But that's like, hard, bro. What if we just, uh, didn't? Feel me? *takes a hit of javascript*
Some package managers only operate within the current directory or
the current project. In that case, every dependency of your project
is downloaded and stored alongside your project where it won't
interfere with any of the other projects on the system. This is
called “bundling.” You're less likely to get conflicts
this way, because the package manager isn't really doing what you
want it to. If I have five packages in directory foo
and five packages in directory bar
, it's possible that
the packages in foo
are consistent (within
foo
) and the packages in bar
are
consistent (within bar
) but one of foo
's
dependencies just fucking hates a package in
bar
. And I would kind of like to know that, because if
both foo
and bar
are important to me,
someone needs to get his shit together and fix it.
Bundling also makes it impossible to stay updated, and creates a
security nightmare. Suppose someone finds a security vulnerability
in libwhatever
that has been fixed in a later
version. I have around 1,000 programs installed on my workstation
right now. How do I upgrade libwhatever
if each of
those thousand programs potentially has its own bundled copy of it?
Am I supposed to go around to each program and attempt to update
them all individually? How do I even find out what programs I have
installed? I own more than one computer, by the way. Ain't
nobody got time for that. People who think this is a good idea have
never had to keep anything working for any amount of time.
What it comes down to is this: it's stupid to copy/paste code around. And a bundling package manager is essentially just a fancy interface for copy/pasting code into your project. Everyone knows that copy/pasting code is bad, and no self-respecting programmer (if there exists such a creature) would ever tell you otherwise in reference to (say) copying the body of a function. But when it comes to dependencies, motherfuckers' brains fall out. Some cognitive-dissonance-olympics-type shit lets these people maintain that copy/pasting a little bit of code is abhorrent, just absolutely unthinkable; but copy/pasting everything? That's a fucking great fucking way to solve this problem.
So the whole bundling idea is retarded but a lot of package managers do it. It can seem like a good idea as long as you don't ever write something useful: if there are only two decent programs that your package manager provides, well, it's not that annoying to have to update them both all the time. But once you have to babysit more than a few of them you'll realize that your package manager is fucked up.
Programs can be written in more than one programming language. Sometimes a program written in one language depends on a program written in another language. So an inevitable requirement of a package manager is that it should support more than one programming language. For example, you should be able to say that the Ruby bindings to PostgreSQL (written in Ruby) require libpq (written in C), because they fucking do.
This need becomes obvious about ten minutes after you finish writing your package manager. Unfathomably, most new package managers fail this most basic requirement.
No one wants to duplicate all of this work for every platform out there. A package manager should work on any platform within reason (it's cool to require Cygwin on Windows where everything not completely missing completely sucks.) Otherwise, we'd need more than one package manager, and that would be more than one work.
Here is where we pick on specific people in particular, using a table.
The following table compares some popular package managers. It's set up so that a ✓ is a good thing, and an ✗ is bad. Here's what they mean.
pm (repo) | src | global | x-lang | x-plat |
---|---|---|---|---|
Apt (Debian) | ✗ | ✓ | ✓ | ✗ |
Bower (Bower registry) | ✓ | ✗ | ✗ | ✓ |
Bundler (RubyGems) | ✓ | ✓ | ✗ | ✓ |
Cabal (Hackage, Stackage) | ✓ | ✓ | ✗ | ✓ |
Cargo (Crates) | ✓ | ✗ | ✗ | ✓ |
CocoaPods (PodSpecs) | ✓ | ✗ | ✗ | ✗ |
Composer (Packagist) | ✓ | ✗ | ✗ | ✓ |
CPAN | ✓ | ✓ | ✗ | ✓ |
easy_install (PyPI) | ✓ | ✓ | ✗ | ✓ |
elm-package | ✓ | ✗ | ✗ | ✓ |
Emacs package.el (ELPA) | ✓ | ✗ | ✗ | ✓ |
Ivy (Maven Central) | ✓ | ✗ | ✗ | ✓ |
Leiningen (Maven Central) | ✓ | ✗ | ✗ | ✓ |
Maven (Maven Central) | ✓ | ✗ | ✗ | ✓ |
npm | ✓ | ✗ | ✗ | ✓ |
NuGET | ✓ | ✗ | ✗ | ✗ |
Octave (Octave Forge) | ✓ | ✓ | ✗ | ✓ |
Pacman (Arch Linux) | ✗ | ✓ | ✓ | ✓ |
PEAR | ✓ | ✓ | ✗ | ✓ |
pip (PyPI) | ✓ | ✓ | ✗ | ✓ |
R (CRAN) | ✓ | ✓ | ✗ | ✓ |
RubyGems | ✓ | ✓ | ✗ | ✓ |
SageMath (SPKGs) | ✓ | ✗ | ✓ | ✓ |
Stack (Stackage) | ✓ | ✗ | ✗ | ✓ |
Yum (Fedora) | ✗ | ✓ | ✓ | ✗ |
If I've made any mistakes in the table, it's not because I secretly hate your package manager and want to make it look bad: I overtly hate your package manager, and it is bad. Let me know and I might fix the table.
Among this list we have some outstanding achievement awards:
require
statements so that
your package is broken until you install it using Composer.
*applause*
It turns out, writing a package manager is hard (whaaaaaaattt). Dependency resolution algorithms are hard. Updating/rebuilding packages for ABI changes is hard. Ensuring atomic operation is hard. Cross-compilation is hard. Tracking installed files is kinda hard. To create a simple user interface for all that shit is unbelievably hard. The older package managers have been around for a long time—lots of research and work has gone into them and it's not because the authors were idiots.
Here's what's easy and fun: parsing a text file of dependencies, downloading them, and then copy/pasting them into a directory. Guess what most new package managers do? Mmmmhhhmmm. To understand why we have a table full of 25 projects that all attempt to do exactly the same thing and all fail in exactly the same ways, you must first understand the CADT model of development.
And the bundling is only getting worse. These days we have Docker and Vagrant which pretty much copy/paste an entire operating system into a container in order to run a single program. This only perpetuates the problem. Developers who use these things never realize that the libraries they're writing are a pain in the ass, so the libraries don't get fixed, so the only way to use them is to copy/paste them into your project…
I'm taking this opportunity to announce my new service, Computr, which entirely solves the dependency problem. Simply write the name of a package on a postcard, staple it to about $300, and mail it to me. Within a week, a brand new computer will show up on your doorstep with only that package installed. If you do that for every program you ever need to run for the rest of your life then you'll never hit another dependency conflict again.
There's only one package manager that aces the table above: Gentoo Portage/Prefix. Portage is the standard package manager on Gentoo Linux. The Gentoo Prefix project allows it to work anywhere. (The “Portage” part of this equation is somewhat flexible; there are other package managers like Paludis and pkgcore that use the same repository and ebuild format. That format is described in the Package Manager Specification.)
Here's how we fix this mess. The next time you invent your own
programming language and think, “I should write a package
manager for my language,” just go right ahead and unthink
it. Instead, learn Gentoo Prefix, and begin contributing packages
for your language to Gentoo. Then, document the typical usage
instructions for your users, and tell them that Portage/Prefix is
the package manager for your language. Yes, this is more work and
less fun than writing a JSON parser and wget
front-end. Sorry, quit being a bitch. It also has the potential to
work, which your home-grown package manager lacks.
Once your new language has packages in the Gentoo repository, they become available to the tens of thousands of existing Gentoo (and its derivatives') users. And also to anyone using Gentoo Prefix on some other system. Oh, and all of these people can help you maintain your packages. Plus, if your library authors are using it, then they'll know when their library causes a conflict, because they won't be able to install it anymore. So they'll fix it, and shit will work, and you'll save about $300.
If all of the projects in the table did this tomorrow, dependency hell would cease to exist. But unfortunately, the sunk cost fallacy states, “fat chance asshole.” Language-specific, bundling package managers are going to stick around because it makes their authors feel cruddy to admit that their idea was dumb and throw everything away. So instead, they'll keep making their projects slightly-less-but-still-fundamentally fucked up over time. But maybe there's a tiny bit of hope for new languages. (Did you actually expect me to have a plan? I just needed to yell into Emacs for a while.)
Something about a carrot and a stick: if you're doing development in e.g. Ruby or Haskell, come give Gentoo (Prefix) a try. We don't have Rubygems hell or Cabal hell here—everything just works. Although, you'll probably want to use the Ruby overlay or Haskell overlay to get the most up-to-date packages.