Process supervision: runit (and supervise)

The SME Server addresses these issues by running processes under the runit process supervision environment, which:

Note: Gerrit Pape's runit came from previous work by Dan Bernstein on the supervise supervision environment. runit provides additional features, and has been released under a free software license.

The runit process tree

When a Linux system boots, it starts the init process, which then starts all other processes. When init enters "run-level 7", it starts /etc/runit/2 from an entry in /etc/inittab.

/etc/runit/2 starts the runsvdir master supervision process, which scans the /service/ directory for work to do. If the runsvdir command happened to fail, it would be restarted by init.

The runsvdir command looks for subdirectories under the /service/ directory, and starts a runsv process to manage that directory. If any of the runsv processes fail, they will be restarted by runsvdir.

Each runsv process looks for a run script under the directory it is managing. runsv runs the run script and keeps a connection to the process started by that script. If the process dies, it is restarted.

If the directory also has a log subdirectory, runsv runs run script in that directory and connects the output of the main program to the input of the "logger" process.

This produces a process tree which looks something like this:

[root@gsxdev1 events]# pstree 1
     | ...
     |          |       `-ulogd
     |          |-6*[runsv---multilog]
     |          |-runsv-+-multilog
     |          |       `-ntpd
     |          |-runsv-+-multilog
     |          |       `-tinydns
     |          |-runsv-+-cvm-unix
     |          |       `-multilog
     |          |-runsv-+-multilog
     |          |       `-mysqld
     |          |-5*[runsv-+-multilog]
     |          |          `-tcpsvd]
     |          |-runsv-+-multilog
     |          |       `-oidentd
     |          |-runsv-+-multilog
     |          |       `-smtp-auth-proxy
     |          |-runsv-+-multilog
     |          |       `-smbd---smbd
     |          |-runsv---httpd---10*[httpd]

This looks like a complex process tree, but is a critical part of the SME Server's design for reliability. Each process is independent, has a consistent management interface, has process limits imposed on it, and will restart if it happens to fail.

Note: For the curious, if init fails, the system reboots.

For further documentation on runit, refer to the runit manual page.

Run-level 7 and the e-smith-service wrapper

The SME Server runs in the normally unused run-level 7. This ensures that the only software running on the SME Server is software that we have chosen to run, and it is started and stopped in a consistent way. If we need to replace a standard startup script with one which runs the process under supervise, we can do so without modifying the original package.

In order to run a process under run-level 7, all you need to do is provide a link in the /etc/rc.d/rc7.d/ directory to your startup script. However, in most cases your process should only start if it is enabled in the configuration database.

If you look at the /etc/rc.d/rc7.d/ directory. you will see that it contains a large number of links to the /etc/rc.d/init.d/e-smith-service script.

S00microcode_ctl     -> /etc/rc.d/init.d/e-smith-service
S05syslog            -> /etc/rc.d/init.d/e-smith-service
S06cpuspeed          -> /etc/rc.d/init.d/e-smith-service
S15nut               -> ../init.d/e-smith-service
S15raidmonitor       -> /etc/rc.d/init.d/e-smith-service
S26apmd              -> /etc/rc.d/init.d/e-smith-service
S35bootstrap-console -> /etc/rc.d/init.d/e-smith-service

This script is key to ensuring that services start when they are enabled and do not start when they are disabled, as it:

Note: If a script exists in the /etc/init.d/supervise/ directory, e-smith-service will use that in preference to the one in the /etc/init.d/ directory. This allows us to install our own supervised startup scripts without modifying the original package.