posted 2018-12-21
Before version 240, the systemd-tmpfiles
program will follow symlinks present in a non-terminal path
component while adjusting permissions and ownership. Often—and
particularly with Z
type entries—an attacker can
introduce such a symlink and take control of arbitrary files on the
system to gain root. The fs.protected_symlinks
sysctl
does not prevent this attack. Version 239 contained a partial fix,
but only for the easy-to-exploit recursive Z
type
entries.
The systemd-tmpfiles program tries to avoid following symlinks in the last component of a path. To that end, the following trick is used in src/tmpfiles/tmpfiles.c:
fd = open(path, O_NOFOLLOW|O_CLOEXEC|O_PATH);
...
xsprintf(fn, "/proc/self/fd/%i", fd);
...
if (chown(fn, ...
The call to chown
will follow the
/proc/self/fd/%i symlink, but only once; it
will then operate on the real file described by fd
.
However, there is another way to exploit the code above. The call to
open()
will follow symlinks if they appear in a
non-terminal component of path
, even with the
O_NOFOLLOW
flag set. Citing the open(2)
man page,
O_NOFOLLOW
If pathname is a symbolic link, then the open fails, with the error ELOOP. Symbolic links in earlier components of the pathname will still be followed.
So, for example, if the path
variable contains
/run/foo/a/b
and if a is a
symlink, then open()
will follow it. If
systemd-tmpfiles will be changing
ownership of /run/foo/a/b after that of
/run/foo, then the owner of
/run/foo can exploit that fact to gain root by
replacing /run/foo/a with a symlink. With a
Z
-type tmpfiles.d entry, the attacker can create this
situation himself.
The fs.protected_symlinks
sysctl does not protect
against these sorts of attacks. Due to the widespread and legitimate
use of symlinks in situations like these, the symlink protection is
much weaker than the corresponding hardlink protection.
Consider the following entry in /etc/tmpfiles.d/exploit-recursive.conf:
Once systemd-tmpfiles has been started once, my mjo user will own that directory:
mjo $ sudo ./build/systemd-tmpfiles --create
mjo $ ls -ld /var/lib/systemd-exploit-recursive
drwxr-xr-x 2 mjo mjo 4.0K 2018-02-13 09:38 /var/lib/systemd-exploit-recursive
At this point, I am able to create a directory foo and a file foo/passwd under /var/lib/systemd-exploit-recursive. The next time that systemd-tmpfiles is run (perhaps after a reboot), the following function will be called on foo:
static int
item_do_children(Item *i, const char *path, action_t action) {
_cleanup_closedir_ DIR *d;
struct dirent *de;
int r = 0;
assert(i);
assert(path);
/* This returns the first error we run into, but nevertheless
* tries to go on */
d = opendir_nomod(path);
if (!d)
return IN_SET(errno, ENOENT, ENOTDIR, ELOOP) ? 0 : -errno;
FOREACH_DIRENT_ALL(de, d, r = -errno) {
_cleanup_free_ char *p = NULL;
int q;
if (dot_or_dot_dot(de->d_name))
continue;
p = strjoin(path, "/", de->d_name);
if (!p)
return -ENOMEM;
q = action(i, p);
if (q < 0 && q != -ENOENT && r == 0)
r = q;
if (IN_SET(de->d_type, DT_UNKNOWN, DT_DIR)) {
q = item_do_children(i, p, action);
if (q < 0 && r == 0)
r = q;
}
}
return r;
}
The FOREACH_DIRENT_ALL
macro defers to
readdir(3)
, and thus requires the real directory
stream pointer for foo, because we want it to
see foo/passwd. However, while the macro is
iterating, the q = action(i, p)
will be performed on
p
which consists of the path = "foo"
and
some filename d
, but without reference to its file
descriptor. So, between the time that
item_do_children()
is called on
foo and the time that q = action(i,
p)
is run on foo/passwd, I have the
opportunity to replace foo with a symlink to
/etc, causing /etc/passwd
to be affected by the change of ownership and permissions.
But there's more: the FOREACH_DIRENT_ALL
macro
processes the contents of foo in whatever order
readdir
returns them. Since mjo owns
foo, I can fill it with junk to buy myself as
much time as I like before foo/passwd is
reached:
mjo $ cd /var/lib/systemd-exploit-recursive
mjo $ mkdir foo
mjo $ cd foo
mjo $ echo $(seq 1 500000) | xargs touch
mjo $ touch passwd
Now, restarting systemd-tmpfiles will change ownership of all of those files…
mjo $ sudo ./build/systemd-tmpfiles --create
and it takes some time for it to process the 500,000 dummy files before reaching foo/passwd. At my leisure, I can replace foo with a symlink:
mjo $ cd /var/lib/systemd-exploit-recursive
mjo $ mv foo bar && ln -s /etc ./foo
After some time, systemd-tmpfiles will eventually reach the path foo/passwd, which now points to /etc/passwd, and grant me root access.
A similar, but more difficult attack works against non-recursive entry types. Consider the following tmpfiles.d entry:
d /var/lib/systemd-exploit 0755 mjo mjo
d /var/lib/systemd-exploit/foo 0755 mjo mjo
f /var/lib/systemd-exploit/foo/bar 0755 mjo mjo
After /var/lib/systemd-exploit/foo is created
but before the permissions are adjusted on
/var/lib/systemd-exploit/foo/bar, there is a
short window of opportunity for me to replace foo
with
a symlink to (for example) /etc/env.d. If I'm
fast enough, tmpfiles will open
foo/bar
, following the foo
symlink, and
give me ownership of something sensitive in the
/etc/env.d directory. However, this attack is
more difficult because I can't arbitrary widen my own window of
opportunity with junk files, as was possible with the Z
type entries.
Commit 936f6bdb,
which is present in systemd v239, changes the recursive loop in two
important ways. First, it passes file descriptors—rather than
parent paths—to each successive iteration. That allows the next
iteration to use the
openat()
system call, eliminating the non-terminal
path components from the equation. Second, it ensures that each
“open” call has the
O_NOFOLLOW
and O_PATH
flags to prevent
symlinks from being followed at the current
depth. Note: only the recursive loop was made safe;
the call to open()
the top-level path will still follow
non-terminal symlinks and is vulnerable to the second attack above.
The commits in pull request
8822 aim to make everything safe from this type of symlink
attack. As far as tmpfiles is concerned, the main idea is to use the
chase_symlinks()
function in place of the
open()
system call. Since chase_symlinks()
calls openat()
recursively from the root up, it will
never follow a non-terminal symlink. Commit 1f56e4ce
then introduces the CHASE_NOFOLLOW
flag for that
function, preventing it from following terminal
symlinks. In subsequent commits (e.g. addc3e30),
the consumers of chase_symlinks()
were updated to pass
CHASE_NOFOLLOW
to chase_symlinks()
,
preventing them from following any symlinks.
The complete fix is available in systemd v240.