Michael Orlitzky { Healthy OpenRC recipes to kick off the new year }

This is an XHTML version of the OpenRC service script guide. This version looks a little nicer, but the upstream one is guaranteed to be up-to-date. I have incorporated some suggestions and fixes from William Hubbs that weren't in my original text.

This document is aimed at developers or packagers who write OpenRC service scripts, either for their own projects, or for the packages they maintain. It contains advice, suggestions, tips, tricks, hints, and counsel; cautions, warnings, heads-ups, admonitions, proscriptions, enjoinders, and reprimands.

It is intended to prevent common mistakes that are found “in the wild” by pointing out those mistakes and suggesting alternatives. Each good/bad thing that you should/not do has a section devoted to it. We don't consider anything exotic, and assume that you will use start-stop-daemon to manage a fairly typical long-running UNIX process.

Don't write your own start/stop functions

OpenRC is capable of stopping and starting most daemons based on the information that you give it. For a well-behaved daemon that backgrounds itself and writes its own PID file by default, the following OpenRC variables are likely all that you'll need:

Given those three pieces of information, OpenRC will be able to start and stop the daemon on its own. The following is taken from an OpenNTPD service script:

command="/usr/sbin/ntpd"

# The special RC_SVCNAME variable contains the name of this service.
pidfile="/run/${RC_SVCNAME}.pid"
command_args="-p ${pidfile}"

If the daemon runs in the foreground by default but has options to background itself and to create a pidfile, then you'll also need

That variable should contain the flags needed to background your daemon, and to make it write a PID file. Take for example the following snippet of an NRPE service script:

command="/usr/bin/nrpe"
command_args="--config=/etc/nagios/${RC_SVCNAME}.cfg"
command_args_background="--daemon"
pidfile="/run/${RC_SVCNAME}.pid"

Since NRPE runs as root by default, it needs no special permissions to write to /run/nrpe.pid. OpenRC takes care of starting and stopping the daemon with the appropriate arguments, even passing the --daemon flag during startup to force NRPE into the background. NRPE knows how to write its own PID file, and presumably the PID file path in ${RC_SVCNAME}.cfg agrees with $pidfile in the service script above.

But what if the daemon isn't so well behaved? What if it doesn't know how to background itself or create a pidfile? If it can do neither, then use,

which will additionally pass --make-pidfile to start-stop-daemon causing it to create the $pidfile for you (rather than the daemon itself being responsible for creating the PID file).

If your daemon doesn't know how to change its own user or group, then you can tell start-stop-daemon to launch it as an unprivileged user with

Finally, if your daemon always forks into the background but fails to create a PID file, then your only option is to use

With procname, OpenRC will try to find the running daemon by matching the name of its process. That's not so reliable, but daemons shouldn't background themselves without creating a PID file in the first place. The next example is part of the CA NetConsole Daemon service script:

command="/usr/sbin/cancd"
command_args="-p ${CANCD_PORT}
              -l ${CANCD_LOG_DIR}
              -o ${CANCD_LOG_FORMAT}"
command_user="cancd"

# cancd daemonizes itself, but doesn't write a PID file and doesn't
# have an option to run in the foreground. So, the best we can do
# is try to match the process name when stopping it.
procname="cancd"

Reloading your daemon's configuration

Many daemons will reload their configuration files in response to a signal. Suppose your daemon will reload its configuration in response to a SIGHUP. It's possible to add a new “reload” command to your service script that performs this action. First, tell the service script about the new command.

extra_started_commands="reload"

We use extra_started_commands as opposed to extra_commands because the “reload” action is only valid while the daemon is running (that is, started). Now, start-stop-daemon can be used to send the signal to the appropriate process (assuming you've defined the pidfile variable elsewhere):

reload() {
  ebegin "Reloading ${RC_SVCNAME}"
  start-stop-daemon --signal HUP --pidfile "${pidfile}"
  eend $?
}

Don't restart/reload with a broken config

Often, users will start a daemon, make some configuration change, and then attempt to restart the daemon. If the recent configuration change contains a mistake, the result will be that the daemon is stopped but then cannot be started again (due to the configuration error). It's possible to prevent that situation with a function that checks for configuration errors, and a combination of the start_pre and stop_pre hooks.

checkconfig() {
  # However you want to check this...
}

start_pre() {
  # If this isn't a restart, make sure that the user's config isn't
  # busted before we try to start the daemon (this will produce
  # better error messages than if we just try to start it blindly).
  #
  # If, on the other hand, this *is* a restart, then the stop_pre
  # action will have ensured that the config is usable and we don't
  # need to do that again.
  if [ "${RC_CMD}" != "restart" ] ; then
    checkconfig || return $?
  fi
}

stop_pre() {
  # If this is a restart, check to make sure the user's config
  # isn't busted before we stop the running daemon.
  if [ "${RC_CMD}" = "restart" ] ; then
      checkconfig || return $?
  fi
}

reload() {
  checkconfig || return $?
  ebegin "Reloading ${RC_SVCNAME}"
  start-stop-daemon --signal HUP --pidfile "${pidfile}"
  eend $?
}

PID files should be writable only by root

PID files must be writable only by root, which means additionally that they must live in a root-owned directory.

Some daemons run as an unprivileged user account, and create their PID files (as the unprivileged user) in a path like /run/foo/foo.pid. That can usually be exploited by the unprivileged user to kill root processes, since when a service is stopped, root usually sends a SIGTERM to the contents of the PID file (which are controlled by the unprivileged user). The main warning sign for that problem is using checkpath to set ownership on the directory containing the PID file. For example,

# BAD BAD BAD BAD BAD BAD BAD BAD
start_pre() {
  # Ensure that the pidfile directory is writable by the foo user.
  checkpath --directory --mode 0700 --owner foo:foo "/run/foo"
}
# BAD BAD BAD BAD BAD BAD BAD BAD

If the foo user owns /run/foo, then he can put whatever he wants in the /run/foo/foo.pid file. Even if root owns the PID file, the foo user can delete it and replace it with his own. To avoid security concerns, the PID file must be created as root and live in a root-owned directory. If your daemon is responsible for forking and writing its own PID file but the PID file is still owned by the unprivileged runtime user, then you may have an upstream issue.

Once the PID file is being created as root (before dropping privileges), it can be written directly to a root-owned directory. Typically this will be /run on Linux, and /var/run elsewhere. For example, the foo daemon might write /run/foo.pid. No calls to checkpath are needed. Note: there is nothing technically wrong with using a directory structure like /run/foo/foo.pid, so long as *root* owns the PID file and the directory containing it.

pidfile="@piddir@/${RC_SVCNAME}.pid"

A decent example of this is the Nagios core service script, where the full path to the PID file is specified at build-time.

Don't let the user control the PID file location

It's usually a mistake to let the end user control the PID file location through a conf.d variable, for a few reasons:

pidfile="/run/${RC_SVCNAME}.pid"

Upstream your service scripts (for packagers)

The ideal place for an OpenRC service script is upstream. Much like systemd services, a well-crafted OpenRC service script should be distribution-agnostic, and the best place for it is upstream. Why? For two reasons. First, having it upstream means that there's a single authoritative source for improvements. Second, a few paths in every service script are dependent upon flags passed to the build system. For example,

command=/usr/bin/foo

command=@bindir@/foo

so that the user's value of --bindir is respected. If you keep the service script in your own distribution's repository, then you have to keep the command path and package synchronized yourself, and that's no fun.

Be wary of “need net” dependencies

The first item means that “need net” is wrong for daemons that are happy with 0.0.0.0, and the second point means that “need net” is wrong for daemons that need a particular (for example, the WAN) interface. We'll consider the two most common users of “need net”; network clients who access some network resource, and network servers who provide them.

Network clients

Network clients typically want the WAN interface to be up. That may tempt you to depend on the WAN interface; but first, you should ask yourself a question: does anything bad happen if the WAN interface is not available? In other words, if the administrator wants to disable the WAN, should the service be stopped? Usually the answer to that question is “no,” and in that case, you should forego the “net” dependency entirely.

Suppose, for example, that your service retrieves virus signature updates from the internet. In order to do its job correctly, it needs a (working) internet connection. However, the service itself does not require the WAN interface to be up: if it is, great; otherwise, the worst that will happen is that a “server unavailable” warning will be logged. The signature update service will not crash, and—perhaps more importantly—you don't want it to terminate if the administrator turns off the WAN interface for a second.

Network servers

Network servers are generally easier to handle than their client counterparts. Most server daemons listen on 0.0.0.0 (all addresses) by default, and are therefore satisfied to have the loopback interface present and operational. OpenRC ships with the loopback service in the boot runlevel, and therefore most server daemons require no further network dependencies.

The exceptions to this rule are those daemons who produce negative side-effects when the WAN is unavailable. For example, the Nagios server daemon will generate “the sky is falling” alerts for as long as your monitored hosts are unreachable. So in that case, you should require some other interface (often the WAN) to be up. A “need” dependency would be appropriate, because you want Nagios to be stopped before the network is taken down.

If your daemon can optionally be configured to listen on a particular interface, then…

Depending on a particular interface

If you need to depend on one particular interface, usually it's not easy to determine programmatically what that interface is. For example, if your sshd daemon listens on 192.168.1.100 (rather than 0.0.0.0), then you have two problems:

It's generally a bad idea to parse config files in your service scripts, but the second problem is the harder one. Instead, the most robust approach is to make the user specify the dependency when he makes a change to sshd_config. Include something like the following in the service configuration file,

# Specify the network service that corresponds to the "bind" setting
# in your configuration file. For example, if you bind to 127.0.0.1,
# this should be set to "loopback" to require the loopback interface.
rc_need="loopback"

This is a sensible default for daemons that are happy with 0.0.0.0, but lets the user specify something else like rc_need="net.wan" if he needs it. The burden is on the user to determine the right dependency whenever he changes the daemon's configuration file.

michael orlitzky

Healthy OpenRC recipes to kick off the new year