CLI testing with RSpec and Cucumber-less Aruba

At Bulletproof, we are increasingly finding home brew systems tools are critical to delivering services to customers.

These tools are generally wrapping a collection of libraries and other general Open Source tools to solve specific business problems, like automating a service delivery pipeline.

Traditionally these systems tools tend to lack good tests (or simply any tests) for a number of reasons:

  • The tools are quick and dirty
  • The tools model business processes that are often in flux
  • The tools are written by systems administrators

Sysadmins don’t necessarily have a strong background in software development. They are likely proficient in Bash, and have hacked a little Python or Ruby. If they’ve really gotten into the infrastructure as code thing they might have delved into the innards of Chef and Puppet and been exposed to those projects respective testing frameworks.

In a lot of cases, testing is seen as “something I’ll get to when I become a real developer”.

The success of technical businesses can be tied to the quality of their tools.

Ask any software developer how they’ve felt inheriting an untested or undocumented code base, and you’ll likely hear wails of horror. Working with such a code base is a painful exercise in frustration.

And this is what many sysadmins are doing on a daily basis when hacking on their janky scripts that have evolved to send and read email.

So lets build better systems tools:

  • We want to ensure our systems tools are of a consistent high quality
  • We want to ensure new functionality doesn’t break old functionality
  • We want to verify we don’t introduce regressions
  • We want to streamline peer review of changes

We can achieve much of this by skilling up sysadmins on how to write tests, adopting a developer mindset to write system tools, and provide them a good framework that helps frame questions that can be answered with tests.

We want our engineers to feel confident their changes are going to work, and they are consistently meeting our quality standards.

But what do you test?

We’ve committed to testing, but what exactly do we test?

Unit and integration tests are likely not relevant unless the cli tool is large and unwieldy.

The user of the tool doesn’t care whether the tool is tested. The user cares whether they can achieve a goal. Therefore, the tests should verify that the user can achieve those goals.

Acceptance tests are a good fit because we want to treat the cli tool as a black box and test what the user sees.

Furthermore, we don’t care how the tool is actually built.

We can write a generic set of high level tests that are decoupled from the language the tool is implemented in, and refactor the tool to a more appropriate language once we’re more familiar with the problem domain.

How do you test command line applications?

Aruba is a great extension to Cucumber that helps you write high level acceptance tests for command line applications, regardless of the language those cli apps are written in.

There are actually two parts to Aruba:

  1. Pre-defined Cucumber steps for running + verifying behaviour of command line applications locally
  2. An API to perform the actual testing, that is called by the Cucumber steps
Scenario: create a file
  Given a file named "foo/bar/example.txt" with:
    """
    hello world
    """
  When I run `cat foo/bar/example.txt`
  Then the output should contain exactly "hello world"

The other player in the command line application testing game is serverspec. It can do very similar things to Aruba, and provides some fancy RSpec matchers and helper methods to make the tests look neat and elegant:

describe package('httpd') do
  it { should be_installed }
end

describe service('httpd') do
  it { should be_enabled   }
  it { should be_running   }
end

describe port(80) do
  it { should be_listening }
end

The cool thing about serverspec that sets it apart from Aruba is it can test things locally and remotely via SSH.

This is useful when testing automation that creates servers somewhere: run the tool, connect to the server created, verify conditions are met.

But what happens when we want to test the behaviour of tools that create things both locally and remotely? For local testing Aruba is awesome. For remote testing, serverspec is a great fit.

But Aruba is Cucumber, and serverspec is RSpec. Does this mean we have to write and maintain two separate test suites?

Given we’re trying to encourage people who have traditionally never written tests before to write tests, we want to remove extraneous tooling to make testing as simple as possible.

A single test suite is a good start.

This test suite should be able to run both local + remote tests, letting us use the powerful built-in tests from Aruba, and the great remote tests from serverspec.

There are two obvious ways to slice this:

  1. Use serverspec like Aruba - build common steps around serverspec matchers
  2. Use the Aruba API without the Cucumber steps

We opted for the second approach - use the Aruba API from within RSpec, sans the Cucumber steps.

Opinions on Cucumber within Bulletproof R&D are split between love and loathing. There’s a reasonable argument to be made that Cucumber adds a layer of abstraction to tests that increases maintenance of tests and slows down development. On the other hand, Cucumber is great for capturing high level user requirements in a format those users are able to understand.

Again, given we are trying to keep things as simple as possible, eliminating Cucumber from the testing setup to focus purely on RSpec seemed like a reasonable approach.

The path was pretty clear:

  1. Do a small amount of grunt work to allow the Aruba API to be used in RSpec
  2. Provide small amount of coaching to developers on workflow
  3. Let the engineers run wild

How do you make Aruba work without Cucumber?

It turns out this was easier than expected.

First you add Aruba to your Gemfile

# Gemfile
source 'https://rubygems.org'

group :development do
  gem 'rake'
  gem 'rspec'
  gem 'aruba'
end

Run the obligatory bundle to ensure all dependencies are installed locally:

bundle

Add a default Rake task to execute tests, to speed up the developer’s workflow, and make tests easy to run from CI:

# Rakefile

require 'rspec/core/rake_task'

RSpec::Core::RakeTask.new(:spec)

task :default => [:spec]

Bootstrap the project with RSpec:

$ rspec --init

Require and include the Aruba API bits in the specs:

# spec/template_spec.rb

require 'aruba'
require 'aruba/api'

include Aruba::Api

This pulls in just the API helper methods in the Aruba::Api namespace. These are what we’ll be using to run commands, test outputs, and inspect files. The include Aruba::Api makes those methods available in the current namespace.

Then we set up PATH so the tests know where executables are:

# spec/template_spec.rb
require 'pathname'

root = Pathname.new(__FILE__).parent.parent

# Allows us to run commands directly, without worrying about the CWD
ENV['PATH'] = "#{root.join('bin').to_s}#{File::PATH_SEPARATOR}#{ENV['PATH']}"

The PATH environment variable is used by Aruba to find commands we want to run. We could specify a full path in each test, but by setting PATH above we can just call the tool by its name, completely pathless, like we would be doing on a production system.

How do you go about writing tests?

The workflow for writing stepless Aruba tests that still use the Aruba API is pretty straight forward:

  1. Find the relevant step from Aruba’s cucumber.rb
  2. Look at how the step is implemented (what methods are called, what arguments are passed to the method, how is output captured later on, etc)
  3. Take a quick look at how the method is implemented in Aruba::Api
  4. Write your tests in pure-RSpec

Here’s an example test:

# spec/template_spec.rb

# genud is the name of the tool we're testing
describe "genud" do
  describe "YAML templates" do
    it "should emit valid YAML to STDOUT" do
      fqdn      = 'bprnd-test01.bulletproof.net'

      # Run the command with Aruba's run_simple helper
      run_simple "genud --fqdn #{fqdn} --template #{template}"

      # Test the YAML can be parsed
      lambda {
        userdata = YAML.parse(all_output)
        userdata.should_not be_nil
      }.should_not raise_error
      assert_exit_status(0)
    end
  end
end

Multiple inputs, and DRYing up the tests

Testing multiple inputs and outputs of the tool is important for verifying the behaviour of the tool in the wild.

Specifically, we want to know the same inputs create the same outputs if we make a change to the tool, and we want to know that new inputs we add are valid in multiple use cases.

We also don’t want to write test cases for each instance of test data - generating the tests automatically would be ideal.

Our first approach at doing this was to glob a bunch of test data and test the behaviour of the tool for each instance of test data:

# spec/template_spec.rb

describe "genud" do
  describe "YAML templates" do
    it "should emit valid YAML to STDOUT" do

      # The inputs we want to test
      templates = Dir.glob(root + 'templates' + "*.yaml.erb") do |template|
        fqdn     = 'hello.example.org'

        # Run the command with Aruba's run_simple helper
        run_simple "genud --fqdn #{fqdn} --template #{template}"

        # Test the YAML can be parsed
        lambda {
          userdata = YAML.parse(all_output)
          userdata.should_not be_nil
        }.should_not raise_error
        assert_exit_status(0)
      end
    end
  end
end

This worked great provided all the tests were passing, but the tests themselves became very black box when one of the test data input caused a failure.

The engineer would need to add a bunch of puts statements all over the place to determine which input was causing the failure. And even worse, early test failures mask failures in later test data.

To combat this, we DRY’d up the tests by doing the Dir.glob once in the outer scope, rather than in each test:

# spec/template_spec.rb

describe "genud" do
  templates = Dir.glob(root + 'templates' + "*.yaml.erb") do |template|
    describe "YAML templates" do
      describe "#{File.basename(template)}" do
        it "should emit valid YAML to STDOUT" do

          fqdn     = 'hello.example.org'

          # Run the command with Aruba's run_simple helper
          run_simple "genud --fqdn #{fqdn} --template #{template}"

          # Test the YAML can be parsed
          lambda {
            userdata = YAML.parse(all_output)
            userdata.should_not be_nil
          }.should_not raise_error
          assert_exit_status(0)
        end
      end
    end
  end
end

This produces a nice clean test output that decouples the tests from one another while providing the engineer more insight into what test data triggered a failure:

$ be rake

genud
  YAML templates
    test.yaml.erb
      should emit valid YAML to STDOUT
  YAML templates
    test2.yaml.erb
      should emit valid YAML to STDOUT

Where to from here?

The above test rig is a good first pass at meeting our goals for building systems tools:

  • We want to ensure our systems tools are of a consistent high quality
  • We want to ensure new functionality doesn’t break old functionality
  • We want to verify we don’t introduce regressions
  • We want to streamline peer review of changes

… but we want to take it to the next level: integrating serverspec into the same test suite.

Having a quick feedback loop to verify local operation of the tool is essential to engineer productivity, especially when remote operations of these type of system tools can take upwards of 10 minutes to complete.

But we have to verify the output of local operation actually creates the desired service at the other end. serverspec will help us do this.