Rostamizadeh.Blog

A place for me to write about interesting technology topics.

Who Watches the Watchers: Using Upstart to Manage Monit Which Manages Unicorn

So I finally have a spiffy new web application deployed on my VPS, and I’m ready to flip the switch and share it with the world…but what happens when my server reboots? Or what happens when Unicorn shuts down and doesn’t restart? Or maybe Unicorn is consuming too many system resources and I need to kill workers or restart the master process? What should I do? How should I set all this up?

In short, my solution consists of using Upstart to manage Monit and ensure it is always running, while Monit is configured to manage my Unicorn worker and master processes. I installed Monit and set it up with Upstart over SSH to my server, while Monit service tests get pushed to my server when I do Capistrano deployments. We’ll dive into the details below…

Technology

  • Unicorn 4.3.1
  • Rackspace Server (256MB RAM), Ubuntu 12.04
  • Capistrano 2.12.0
  • Upstart
  • Monit

First Task: Install and Configure Monit

Installing from a package is easiest, but you could install from source as well.

1
$ sudo apt-get install monit

Once installed, you can find Monit at /etc/monit/. You’ll notice a monitrc file in this directory which is the Monit control file with lots of settings, and you’ll notice a conf.d directory which will store your Monit service tests.

The monitrc file is pretty well commented, so I’ll only list what I consider the most important settings:

  • set daemon [value] - value, in seconds, represents how often the daemon will check the service tests
  • with start delay [value] - value, in seconds, represents how long Monit should wait after it loads up before it starts checking services. You can comment out or remove this line if you have no need for Monit to wait.
  • include /etc/monit/conf.d/* - I write one service test per file and store them in this directory

Additionally, the HTTPD section of the control file is extremely important, and you can read all about its options at the Monit HTTPD documentation. I say it’s important because without it enabled, everytime you try to communicate with the Monit daemon, you’ll see output like:

1
2
user@vps:/etc/monit$ sudo monit status
monit: error connecting to the monit daemon

So…we need to enable http support to communicate with the Monit daemon, but we care about security on our server, right? RIGHT?! Well fortunately for us, Monit takes security seriously as well, and we have to whitelist connections in order for them to interact with Monit’s built-in web server. Nice!

I kept my configuration very simple, since I only plan to access this server over SSH and I didn’t need the fancy Monit web interface; I just need to be able to communicate with the Monit daemon. Here’s my HTTPD config, which basically says “only listen to the loopback interface on port 2812, and whitelist localhost connections”:

1
2
set httpd port 2812 and use the address 127.0.0.1
  allow localhost

Iptables is out of the scope of this post, but I highly recommend you also configure iptables on your server to maximize security. My iptables is fairly locked down, allowing some exceptions for Nginx, SSH, the loopback interface, and a few others.

Note: Monit won’t pick up config changes until it’s reloaded, which if you’re following along with this guide, we’ll do later on.

Second Task: Use Upstart to Manage Monit

“So why Upstart and not init.d”, you ask? Great question! Honestly it doesn’t matter too much…but Upstart is the future, and it’s already loaded on Ubuntu so you might as well get used to using it! I guess a better question would be: Why not Upstart? I don’t know! Use it!

The Monit website has good instructions for adding Monit daemon configuration to Upstart, but I’ll list the steps here as well:

  1. Stop Monit and remove the default init.d configuration:
    1
    
    $ /etc/init.d/monit stop && update-rc.d -f monit remove
  2. Create the Upstart Monit service configuration:
    1
    
    $ sudo nano /etc/init/monit.conf
  3. Paste in the following text (again from the Monit website):
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    
    description "Monit service manager"
        limit core unlimited unlimited
    
        start on runlevel [2345]
        stop on runlevel [!2345]
    
        expect daemon
        respawn
    
        exec /usr/bin/monit -c /etc/monit/monitrc
        pre-stop exec /usr/bin/monit -c /etc/monit/monitrc quit

Note: The last two lines above are correct for my installation on Ubuntu, but the Monit website has them listed as:

1
2
exec /usr/local/bin/monit -c /etc/monitrc
pre-stop exec /usr/local/bin/monit -c /etc/monitrc quit

The respawn stanza tells Upstart that if the daemon exits without the Upstart goal being changed to stop, Upstart should start the daemon again. You can read about Upstart configuration to your heart’s content at the Upstart Cookbook.

Upstart watches its configuration directories so you shouldn’t need to refresh Upstart to pick up the new Monit service. At this point we can test our setup and verify that Monit is working. Upstart syntax is a bit different from init.d syntax and you’ll use the following format for interacting with Monit:

1
2
$ sudo start monit
$ sudo stop monit

Here’s some commands we can run to verify Monit is working:

1
2
3
4
5
6
7
8
9
10
11
12
13
user@vps:/etc/monit$ sudo monit status
monit: Status not available -- the monit daemon is not running
user@vps:/etc/monit$ sudo start monit
monit start/running, process 4773
user@vps:/etc/monit$ sudo stop monit
monit stop/waiting
user@vps:/etc/monit$ sudo start monit
monit start/running, process 4814
user@vps:/etc/monit$ ps aux | grep monit
root      4814  0.0  0.5 104128  1224 ?        Sl   21:40   0:00 /usr/bin/monit -c /etc/monit/monitrc
user@vps:/etc/monit$ sudo kill 4814
user@vps:/etc/monit$ ps aux | grep monit
root     21459  0.0  0.4 104128  1208 ?        Sl   21:41   0:00 /usr/bin/monit -c /etc/monit/monitrc

Awesome! Now we have Upstart to keep Monit running, the Monit daemon responds to our commands, and the only thing left to do is tell Monit what it should be monitoring!

Third Task: Teach Monit How to Care for Unicorns

And now you say to me, “Why are you using Monit, it’s so linuxy, and there’s this cool Ruby process monitoring framework called God…and you like Ruby, don’t you?” I think Ruby is an amazing language and I briefly looked into using God (which looks like an awesome project, so check it out), but Monit felt like the right solution to me; it’s tiny, easy to configure, and when it comes to server administration, I like to use traditional linux tools when I can. That said, I’d like to play with God in the future and compare the two tools.

My Monit-Unicorn configuration is heavily inspired (read: I owe a lot of this code to Andrew Grim) by a blog post from 2010 called Where Unicorns go to die: Watching unicorn workers with monit. So go take a look at his blog!

I have one service test erb template for master Unicorn processes, and one service test erb template for worker processes (both get processed and uploaded to my server by Capistrano). As I mentioned above, I keep each service test in a separate file and my templates allow me to do this easily.

Note: You should tailor the system resource values based on your application’s profile and the server you’re using.

My master template is pretty straightforward. My Unicorn service is named according to what application and environment I’m deploying to, and the check process top line matches that format. You can see that whenever Monit runs this service test (according to the daemon setting mentioned above), it will either:

  1. Send a start signal to the service if it isn’t already running (default Monit behavior)
  2. Send a stop signal to the correct Unicorn service based upon specific system resource threshholds that would trigger a restart (Note: a restart in Monit is done via running Stop and Start). Easy!

The worker template is still pretty straightforward, but right now you’re probably looking at it and wondering where <%= val %> comes from. It will all become clear if you look at my Capistrano recipe. First off, I’m using multistage, and in each of my environment config files (ex: /rails_root/config/deploy/production.rb, /rails_root/config/deploy/staging.rb, etc.), I setup the two variables listed below which govern how many workers I will have for that environment’s master Unicorn process.

So let’s pretend I set unicorn_workers to “4” in staging.rb. As you’ll see below in my Capistrano recipe, this will create four Monit worker service tests, named monit_unicorn_#{application}_#{rails_env}_#{val}_worker, where #{val} represents the current iteration through the number of workers specified.

OK, but that doesn’t explain the start and stop worker lines in the worker service test…so what’s that about? Unicorn master processes do a great job on their own of managing workers, so I want to leave spawning workers up to the master. Monit expects to be the one keeping things running and needs that start task, so I give it a placebo of /bin/true to make it feel important (Thanks to Andrew Grim!).

The stop program references a specific Unicorn service, and calls the kill_worker task passing in the value which represents that workers’ pid file (Thanks, again, to Andrew Grim). You can see my complete Unicorn init script here:

At this point, we just need to add the process id to each worker pid file. This can be done simply enough using Unicorn’s after_fork hook, getting the current process id with Process.pid, and writing it out to a file. In my case, I take the Unicorn pid file’s name, and sub “.pid” for “.worker.#{val}.pid]”.

1
2
3
4
after_fork do |server, worker|
  worker_pid = server.config[:pid].sub('.pid', ".worker.#{worker.nr}.pid")
  system("echo #{Process.pid} > #{worker_pid} ")
end

Fourth Task: Deploying our Monit Service Tests with Capistrano

We’re almost done. At this point we have installed Monit, configured Monit with Upstart, modified Unicorn’s config to output worker pid files, and created the Monit service tests to look after our Unicorn masters and workers across any number of applications/environments we deploy on our server.

Fortunately, the deployment part is pretty easy. You can read through my Monit Capistrano recipe below which is fairly self describing, and then I will point out a few key pieces.

I didn’t originally have the monit:monitor and monit:unmonitor tasks, but after a couple wonky deployments where I was tweaking things and ended up with pid files that didn’t match the running processes while also having Monit looking for a totally different set of pids, I thought it would be a good idea to resync things on deployment. The vast majority of the time, I only run two Capistrano commands…cap deploy and cap deploy:setup…so my before and after hooks are setup for those tasks. I start each of the tasks by unmonitoring, and end them with re-enabling monitoring.

Additionally, after the monit:unmonitor task, I run the monit:unicorn_service_tests task which pushes my master and worker service tests to the server in case they’ve been changed, and then reloads the Monit configuration.

The last thing I want to mention about these service tests is that I originally wasn’t running the Monit commands by group name, and instead was explicitly running the commands for each of the service tests (master + x number of workers). Not only was this inefficient, it sometimes resulted in errors like:

1
2
3
[UTC Jul 14 08:43:46] info     : monit daemon at 4814 awakened
[UTC Jul 14 08:43:47] error    : monit: action failed -- Other action already in progress -- please try again later
[UTC Jul 14 08:43:54] error    : 'unicorn_appname_staging_0_worker' failed to start

Since my group names are dynamic and specific to the application and environment, using them for Monit commands is the right thing to do.

Wrapping It Up

I’m currently using this setup on my VPS for hosting various applications and it’s been working well, however, I know there’re still things that can be improved. If you have any suggestions, I’d love to hear them!

Looking to the future, the next version of Monit looks promising. A couple of the items on their planned features really catch my eye in relation to this post:

Manage processes directly without requiring a pid file

We plan to use libev as the engine for Monit and will use it to handle i/o events, child processes, signals and file objects among other things.

And lastly, a big thanks to Andrew Grim for his blog post about watching Unicorn workers with Monit which pointed me in a better direction to managing worker processes than I had been working on.

Comments