Running Rails jobs in background threads

It's good practice for Rails developers to execute slow tasks in a background job, rather than running them during the HTTP request/response cycle. The idea is to send the browser an HTML page as soon as possible, without making it wait while the server grinds away doing something else (like sending emails).

The most common approach to running these longer "jobs" is to hand them off to a separate process to execute them independently of the web server. There are plenty of great libraries out there that provide the "background job" functionality for you.

There's only one problem – if your app runs on Heroku you'll need to start up a worker dyno (at a cost of $35 per month) to run these background jobs. When all you want to do is to send a few emails in response to people taking actions in your app, this can seem like overkill.

In Agile Planner, I'm sending notification emails in background threads instead.

Threading in Rails

I think a bit of background on how Rails behaves with threads wouldn't go amiss at this point.

Aaron Patterson has written a great post on the config.threadsafe! configuration setting, which explains how threads work with different types of web server. I recommend you read it.

Once you've read Aaron's post you'll understand why (if you're running on a threaded web server such as Puma) enabling threading in your Rails app will allow a single process to concurrently handle multiple requests simultaneously.

You'll also understand that if you're running on a multi-process web server (e.g. Unicorn) that each process will only handle one request at once (even if you enable threading) due to the way that Unicorn's multi-process model works. A single Unicorn "worker" process will be handed a request by the parent server. While it's serving the request Unicorn's master process won't hand it another one, but once the TCP socket to the web browser has been closed the worker is added back into the pool of workers that are available to handle another incoming request.

None of this has much of an impact on the code I'm about to show you; I mention it simply because if you're running on Unicorn (as I am) you needn't give too much thought to questions of whether or not Rails is threadsafe in order to be able to put threads to good use yourself.

Running tasks in a thread

Threading in Ruby is pretty simple. You can just do this (the new thread will execute immediately):

Thread.new do
  puts "I'm in a thread!"
end

To deliver an email to a list of users from a background thread, we just need to use ActionMailer from within a thread. A naive example might look like this:

Thread.new do
  list_of_users.each do |user|
    MyNotificationMailer.say_hello(user).deliver
  end
end

There are a couple of problems with this code.

Cleaning up the database connections

Each thread opens its own database connection, which you should close when the thread terminates in order to avoid leaving dangling connections to your database.

To update the previous example, we could write:

Thread.new do
  list_of_users.each do |user|
    MyNotificationMailer.say_hello(user).deliver
  end
  ActiveRecord::Base.connection.close
end

Avoiding race conditions

We've just seen that ActiveRecord opens a new connection to the database for every thread. When you're using a relational database you'll usually find that data written via one connection won't become visible to the other connections until the transaction is committed.

In our example, the code in the thread won't be able to find any users that have been added in the main thread until the main thread commits its transaction. Sometimes the transaction will be committed before the thread tries to access the database, but sometimes the thread will get there first. This could mean that some people won't receive our email!

A cheeky workaround to this problem is to load any ActiveRecord objects that you need in your thread before you enter the thread. You can do this by writing some Ruby that will force the query to run before you enter the thread:

# Note the call to to_a, which forces ActiveRecord to
# execute the query outside of the thread.
#
# The list_of_users variable is available to the thread.

list_of_users = User.where(my: 'query').to_a
Thread.new do
  list_of_users.each do |user|
    MyNotificationMailer.say_hello(user).deliver
  end
  ActiveRecord::Base.connection.close
end

Consider performance when calling to_a on an ActiveRecord scope like this; it loads all the objects returned by the query into memory. If you're only going to be calling it on a small number of records it's fine, but if you had thousands of users to email this approach could have a negative impact on your app's performance. That's why I describe this approach as "cheeky".

An alternative solution is to commit the data in the main thread, before you call the code that spawns the thread. You can do this by wrapping your code that updates the database with ActiveRecord's transaction method:

AnyActiveRecordModel.transaction do
  # Modify data that the thread needs here...
  User.create!(user_attributes)
end

Thread.new do
  list_of_users = User.where(my: 'query')
  list_of_users.each do |user|
    MyNotificationMailer.say_hello(user).deliver
  end
  ActiveRecord::Base.connection.close
end

Putting it all together

This code isn't something that I'd want to drop directly into a controller action. It's too long, and it's not very obvious at first glance what it does. I've got a few emails that need sending in the background like this, so I've bundled them all up into a class called MailFlinger.

Here's a snippet showing how the "Alice created a new board" emails get sent by Agile Planner:

class MailFlinger < Struct.new(:current_user)
  def board_created(board)
    background do
      recipients = board.account.users.rejecting(current_user)
      recipients.each do |recipient|
        AccountMailer.new_board(board, current_user, recipient).deliver
      end
    end
  end

  def background(&block)
    Thread.new do
      yield
      ActiveRecord::Base.connection.close
    end
  end
end

A Hexagonal aside

If you're wondering why none of my example code shows emails getting sent from within a Rails controller, it's because I've been using some of the ideas in Hexagonal Rails to keep things tidy. I've already written about hexagonal rails, explaining how I arrived at controller actions that look like this:

class BoardsController < ApplicationController::Base
  def create
    builder = BoardBuilder.new(current_account)
    builder.add_subscriber(CreateResponse.new(self))
    builder.create_board(name: board_params['name'])
  end
end

The BoardBuilder is responsible for creating a new board. If the database is updated successfully, BoardBuilder emits a board_created event (by calling #board_created on all its subscribers). The CreateResponse class knows about setting the flash message and sending an HTTP redirect to the browser. It also knows how to cope with a board_create_failed event, re-rendering the form with an error message, etc.

So by now you've probably worked it out how MailFlinger fits in. It's a subscriber that knows what to do when it hears that a board has been created. I use it like this:

class BoardsController < ApplicationController::Base
  def create
    builder = BoardBuilder.new(current_account)
    builder.add_subscriber(CreateResponse.new(self))
    builder.add_subscriber(MailFlinger.new(current_user))
    builder.create_board(name: board_params['name'])
  end
end

If you'd like to read more about that approach, check out Refactoring with Hexagonal Rails.

It became especially useful once I started building the API, which is implemented using a different set of controllers that are built from the same domain objects.

In the API controllers I just need to choose to add the appropriate subscribers to each domain object. There are some differences - the API code doesn't need to set any flash messages, for example.

I'll be extending MailFlinger again later when Agile Planner starts sending more emails in response to user activity.

When to use threads for background jobs

There is one more potential issue with this approach, and you'll need to decide whether or not the technique is suitable on a case by case basis.

When Ruby creates a new thread, that thread can either shut itself down when it finishes its work, or be killed automatically when the main Ruby process finishes. Unless you go out of your way to prevent it, Ruby will kill your thread when the process exits. In other words, when your web server gets restarted all your threads will get killed.

Paul Brannan pointed out (in the comments) that we can use Ruby's at_exit method to define a block of code that Ruby will execute when it wants to shut down. This means that you could write an at_exit handler that would delay Ruby's exit until your job had finished.

However… jkburges noticed that Paul's technique creates a memory leak, so it's not a good idea if you've got a long running process and/or will be creating a lot of threads.

If you like the idea of running jobs in threads instead of in a separate process, I recommend you take a quick look at Sucker Punch. It's an in-memory job runner, that uses a separate thread for each "worker". I haven't used it myself, but it looks good. It can also be used with the Active Job API, which is a good idea if you're using Rails; you'd be able to switch to a more robust job runner if/when your application required it.