Rostamizadeh.Blog

A place for me to write about interesting technology topics.

Precompiling Assets Locally for Capistrano Deployment

I’ve had a goal of fast Capistrano deployments to my VPS for a while now, but I’ve constantly been plagued with asset precompilation taking anywhere from four to 15 minutes on my little server (I’m using Rackspace’s smallest offering, a VPS with 256MB RAM). When I precompile assets locally, it always finishes in under a minute…so I came up with an approach to leverage my local machine for precompilation and upload the assets to the server. I’ve also avoided any shenanigans with adding assets to my git repository (Ew! Don’t do that!).

To take my newly found performance boost one level further, I integrated local asset precompilation with Ben Curtis’ approach of skipping asset precompilation unless any assets have changed.

Technology

  • Rackspace Server (256MB RAM), Ubuntu 12.04
  • Capistrano 2.12.0
  • Rails 3.2.6

Local Asset Precompilation

I started out using Capistrano’s built-in asset precompilation which is as simple as uncommenting the line below from the Capfile:

1
2
# Uncomment if you are using Rails' asset pipeline
load 'deploy/assets'

And Capistrano was successful at precompiling my assets on the server…it just took a long time to complete…sometimes a very long time to complete. I figured the first step in getting assets to precompile locally would be commenting back out the deploy/assets line in the Capfile and reading over the Capistrano assets.rb source to know exactly what I needed to re-implement. Go check out the Capistrano source if you haven’t already. The Capistrano code, in general, is very easy to read and well documented, however, the assets methods are especially simple. The real magic in the Capistrano assets code is the symlink method which needs to execute before deploy:finalize_update. I didn’t include symlinking in my first test deployment and it didn’t work well.

Here is my complete Capistrano recipe for handling assets:

assets:precompile and deploy
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
before 'deploy:finalize_update', 'deploy:assets:symlink'
after 'deploy:update_code', 'deploy:assets:precompile'

namespace :deploy do
  namespace :assets do

    task :precompile, :roles => :web do
      from = source.next_revision(current_revision)
      if capture("cd #{latest_release} && #{source.local.log(from)} vendor/assets/ lib/assets/ app/assets/ | wc -l").to_i > 0
        run_locally("rake assets:clean && rake assets:precompile")
        run_locally "cd public && tar -jcf assets.tar.bz2 assets"
        top.upload "public/assets.tar.bz2", "#{shared_path}", :via => :scp
        run "cd #{shared_path} && tar -jxf assets.tar.bz2 && rm assets.tar.bz2"
        run_locally "rm public/assets.tar.bz2"
        run_locally("rake assets:clean")
      else
        logger.info "Skipping asset precompilation because there were no asset changes"
      end
    end

    task :symlink, roles: :web do
      run ("rm -rf #{latest_release}/public/assets &&
            mkdir -p #{latest_release}/public &&
            mkdir -p #{shared_path}/assets &&
            ln -s #{shared_path}/assets #{latest_release}/public/assets")
    end
  end
end

I discuss the finer points of the conditional logic on whether or not to precompile below, so I’ll skip over that for now and explain the process of precompiling and uploading first. The run_locally method is courtesy of Capistrano and allows us to run commands on the local machine. In order to keep things tidy, I run assets:clean before running assets:precompile. Next, a .tar.bz2 of the assets folder (I’ll explain why I went with bz2 below) is created, and Capistrano’s top.upload method is invoked to secure copy assets.tar.bz2 to the shared directory on my server. After the file is on the server, it is extracted, and then the .bz2 is deleted. I tried leaving off :via and using the default sftp behavior but kept running into: Net::SFTP::StatusException(4, "failure"). Instead of debugging that issue, I tried scp and it worked perfectly. Lastly, I delete the assets.tar.bz2 from my local machine and run assets:clean again…leaving my public directory nice and clean. Remember, it’s not best practice to store assets in a code repository.

As for the symlink method, I took that directly from the Capistrano symlink method. No need to change anything in that behavior.

So, why did I use bz2 instead of gz when bz2 takes longer to compress? My goal is fast deployments, and unfortunately, I can’t always be connected to lightning fast internet when I work remotely, so I’d rather spend a little more time compressing if that means faster uploads.

I ended up writing a quick and dirty performance test to see if bz2 was worthwhile. Here’s my shell script:

tar_compression_test.sh
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
#!/bin/sh
#pass in a directory to compress
DIRECTORY=$1

echo "Beginning TAR:"
echo `date`
tar -cf test_tar.tar $DIRECTORY
echo `date`

echo "Beginning GZIP:"
echo `date`
tar -zcf test_gzip.tar.gz $DIRECTORY
echo `date`

echo "Beginning BZ2:"
echo `date`
tar -jcf test_bz2.tar.bz2 $DIRECTORY
echo `date`

Here’s the output sizes of my assets directory in various formats:

  • tar: 6.1MB
  • gz: 5MB
  • bz2: 4.3MB

I ran the performance test five times and here’s the average compression times:

  • tar: <1s
  • gz: <1s
  • bz2: 1.6s

In my opinion, that space savings is worth the wait considering I might be uploading a change tethered from my phone or connected to a public wifi connection!

Ben’s Approach to Selective Asset Precompilation

As I was working on writing my new Capistrano recipe, I stumbled upon a post by Ben Curtis called Skipping Asset Compilation with Capistrano. It seems he was also looking for a way to speed up Capistrano deployments and approached it from the angle of reducing how often precompilation is done. By default, Capistrano does it on every deploy, however, if no assets have changed…then there’s no need for it to be run. Per Ben’s post:

The trick, then, is to check the list of files that were changed in the range of commits that are being deployed, and compile the assets only if assets show up in that list.

Ben’s solution builds on Capistrano’s pending:default method, and limits the scm log to the assets folders. Here’s the piece we’re interested in from Ben’s code:

1
2
3
4
5
from = source.next_revision(current_revision)
if capture("cd #{latest_release} && #{source.local.log(from)} vendor/assets/ app/assets/ | wc -l").to_i > 0
  #assets:precompile
else
  logger.info "Skipping asset pre-compilation because there were no asset changes"

Let’s break this down a bit since not everyone is familiar with the inner workings of Capistrano, and I’ll be explaining in the context of Git…since each scm may have a different implementation, but Git is my scm of choice.

Source is set to Capistrano::Deploy::SCM.new(scm, self) (in my deployments scm is set to Git). The SCM module has the next_revision method which looks like:

1
2
3
4
5
6
7
8
9
10
11
# Returns the revision number immediately following revision, if at
# all possible. A block should always be passed to this method, which
# accepts a command to invoke and returns the result, although a
# particular SCM's implementation is not required to invoke the block.
#
# By default, this method simply returns the revision itself. If a
# particular SCM is able to determine a subsequent revision given a
# revision identifier, it should override this method.
def next_revision(revision)
  revision
end

Since I’m using Git, I can look in the Capistrano Git class and see that next_revision is not being overridden, so it will simply return the revision passed to the method.

The current_revision variable is set to the commit hash stored in #{current_path}/REVISION.

Putting all these pieces together, we can see that from is set to the /path/to/app/current commit hash.

The next line uses some Git magic to find out if there have been any changes to the assets folders. Log is a method in the Git class which corresponds to the $ git log command.

1
2
3
4
# Returns a log of changes between the two revisions (inclusive).
def log(from, to=nil)
  scm :log, "#{from}..#{to}"
end

This gets evaluated to:

1
git log #{/path/to/app/current commit hash}..

The .. is indicative of the <since>..<until> options which specify a range of commits. If you don’t specify an upper bound when filtering git log with the <since> option, the <until> option will default to HEAD. So if you have commits on your machine that are newer than the current_path commit hash, running this command will show a list of all the more recent commits. Ben trims down this result even further by using the <path> option. This Git option allows you to specify any number of directories or files that you want to filter the commits on…meaning if you have commits with changes that aren’t in your specified <path> option, then those commits won’t be output when you run git log. The output from git log can then be piped to the linux command wc -l which prints the newline counts. In this case, if the newline count is greater than zero, there are new commits with modified assets! Easy!

The only tweak I made to Ben’s code was adding the lib/assets directory to the <path> option.

Wrapping It Up

There might be some corner cases where this setup won’t work, but I’ve yet to encounter them. I’ve probably done somewhere in the neighborhood of 100 deployments (and counting) with this code in place. When I change assets, they get precompiled correctly, and my worst experience with the upload of the bz2 involved a seven minute upload over a slow connection. Overall I’m satisfied with these changes, but I’ll probably never stop looking for ways to improve my deploy process.

Comments