photo of me

Lindsay Holmwood is an engineering manager living in the Australian Blue Mountains. He is the creator of Visage, Flapjack & cucumber-nagios, and organises the Sydney DevOps Meetup.

Escalating Complexity

Back in 2009 when I was backpacking around Europe I remember waking up on the morning of June 1 and reading about how an Air France flight had disappeared somewhere over the Atlantic.

The lack of information on what happened to the flight intrigued me, and given the traveling I was doing, I was left wondering “what if I was on that plane?”

Keeping an ear out for updates, in December 2011 I stumbled upon the Popular Mechanics article describing the final moments of the flight. I was left fascinated by how a technical system so advanced could fail so horribly, apparently because of the faulty meatware operating it.

Read more...

Data failures, compartmentalisation challenges, monitoring pipelines

To recap, pipelines are a useful way of modelling monitoring systems.

Each compartment of the pipeline manipulates monitoring data before making it available to the next.

At a high level, this is how data flows between the compartments:

basic pipeline

This design gives us a nice separation of concern that enables scalability, fault tolerance, and clear interfaces.

The problem

What happens when there is no data available for the checks to query?

Read more...

Pipelines: a modern approach to modelling monitoring

Over the last few years I have been experimenting with different approaches for scaling systems that monitor large numbers of heterogenous hosts, specifically in hosting environments.

This post outlines a pipeline approach for modelling and manipulating monitoring data.

Read more...

Rebooting Flapjack

This is the first time I’ve actually blogged about Flapjack.

Read more...

Upcoming speaking engagements and travel

My next 2 months is going to be jam packed with conferences and travel!

  • Devopsdays NZ, March 8 2013. I will be giving a talk that analyses AA261 through a DevOps lense, looking at the collaborative maintenance and operation of the MD-83 in the crash.
  • Monitorama, March 28-29 2013. I’m looking forward to slowing down and listening at Monitorama, which has a tremendous line up of speakers. I’ll be keen to hear what others think of the work we’ve been doing on Flapjack the last 6 months.
  • Mountain West Ruby Conf 2013, April 3-5 2013. MWRC has added an extra day of DevOps content to the conference this year, and I’ll be joining an esteemed speaker lineup to talk about what both dev and ops can learn from AF447 when responding to rapidly evolving failure scenarios.
  • I’ll be staying in the Netherlands for a little under a week between conferences, visiting family and friends. Hopefully I can visit a meetup or two.
  • Open Source Data Center Conference 2013, April 17-18 2013. This will be my first time in Nurenberg, and I’m really looking forward to saying I have attended both OSDCs. I’ll be talking about Ript, a DSL for describing firewall rules, and a tool for incrementally applying them.
  • Puppet Camp Nurenberg 2013, April 19 2013. Straight after OSDC I’ll be talking about how we are using Puppet at Bulletproof Networks in multi-tenant, isolated environments.

Read more...

How I make interesting technical presentations

Whenever I talk at conferences, I am routinely asked how I go about preparing and making my presentations.

There are no hard and fast rules, but these are some things I have learnt:

Start analog

The most limiting thing you can do when you start putting together a presentation is to reach for slideware. I use a paper notebook to brainstorm my ideas with multicoloured pens, then scan it so I can refer back to it quickly when putting the slides together.

mindmapping a talk

Read more...

DevOps Down Under 2012 - what happened?

Almost 2 days ago Patrick kicked off a discussion about organising another Australian DevOps conference in 2013 amongst a small group of passionate DevOps who are actively involved in the Australian community.

While the discussion was trundling on without me, I felt I owed everyone involved an explanation of what happened with this year’s unrealised conference, and why the conference fell flat.

Let’s start at the beginning.

Read more...

Ript: quick, reliable, and painless firewalling

Running your own servers? Hate managing firewall rules?

For the last year at Bulletproof Networks I’ve been working on a little tool called Ript to make writing firewall rules a joy, and applying them quick, reliable, and painless.

Ript is a clean and opinionated Domain Specific Language for describing firewall rules, and a tool with database migrations-like functionality for applying these rules with zero downtime.

The DSL

At Ript’s core is an easy to use Ruby DSL for describing both simple and complex sets of iptables firewall rules. After defining the hosts and networks you care about:

partition "joeblogsco" do
  label "www.joeblogsco.com",      :address => "172.19.56.216"
  label "app-01",                  :address => "192.168.5.230"
  label "joeblogsco uat subnet",   :address => "192.168.5.0/24"
  label "joeblogsco stage subnet", :address => "10.60.2.0/24"
  label "joeblogsco prod subnet",  :address => "10.60.3.0/24"
  label "bad guy",                 :address => "172.19.110.247"
  label "bad guys",                :address => "10.0.0.0/8"
end

…you use Ript’s helpers for accepting, dropping, & rejecting packets, as well as for performing DNAT and SNAT:

Read more...

Incentivising automated changes

Matthias Marschall wrote a great peice last week on the pitfalls of making manual changes to production systems. TL,DR; Making manual changes in the heat of the moment will bite you at the most inopportune times.

The article finishes with this suggestion:

You should have your configuration management tool (like Puppet or Chef) setup so that you can try out possible solutions without having to go in and do it manually.

In my experience, this is the key to solving the problem.

Rather than coercing people to follow a “no manual changes” policy, you make the incentives for making changes with automation better than for making changes manually.

Specifically:

  • Make it simple. Reduce the number of steps to make the change with automation. It should be quicker to find the place in your Chef or Puppet code and deploy than logging into the box, editing a file, and restarting a service.
  • Make it fast. The time from thinking about the change to the change being applied should be shorter with automation than by doing it manually.
  • Make it safe. Provide a rollback mechanism for changes. A safety harness can be as simple as a thin process around “git revert” + deploy.

It’s a perfect example of how tools should complement culture.

Read more...

Instrumenting your monitoring checks with New Relic

This post is part 3 of 3 in a series on monitoring scalability.

In parts 1 and 2 of this series I talked about check latency and how you can mitigate its effects by splitting data collection + storage out from alerting, while looking at monitoring systems through the prism of an MVC web application.

This final post in the series provides a concrete example of how to instrument your monitoring checks so you can identify which exact parts of your checks are inducing latency in your monitoring system.

When debugging performance bottlenecks, I tend to use a simple but effective workflow:

  1. observe the system
  2. analyse the results
  3. optimise the bottleneck that is having the most impact
  4. rinse and repeat until the system is performing within the expected performance parameters

What if we continue to look at monitoring checks as micro MVC web applications? What tools exist to aid this optimisation workflow, and how can we hook instrumentation into our checks?

Read more...