Back in 2009 when I was backpacking around Europe I remember waking up on the morning of June 1 and reading about how an Air France flight had disappeared somewhere over the Atlantic.
The lack of information on what happened to the flight intrigued me, and given the traveling I was doing, I was left wondering “what if I was on that plane?”
Keeping an ear out for updates, in December 2011 I stumbled upon the Popular Mechanics article describing the final moments of the flight. I was left fascinated by how a technical system so advanced could fail so horribly, apparently because of the faulty meatware operating it.
Monitorama, March 28-29 2013. I’m looking forward to slowing down and listening at Monitorama, which has a tremendous line up of speakers. I’ll be keen to hear what others think of the work we’ve been doing on Flapjack the last 6 months.
Mountain West Ruby Conf 2013, April 3-5 2013. MWRC has added an extra day of DevOps content to the conference this year, and I’ll be joining an esteemed speaker lineup to talk about what both dev and ops can learn from AF447 when responding to rapidly evolving failure scenarios.
I’ll be staying in the Netherlands for a little under a week between conferences, visiting family and friends. Hopefully I can visit a meetup or two.
Open Source Data Center Conference 2013, April 17-18 2013. This will be my first time in Nurenberg, and I’m really looking forward to saying I have attended bothOSDCs. I’ll be talking about Ript, a DSL for describing firewall rules, and a tool for incrementally applying them.
Whenever I talk at conferences, I am routinely asked how I go about preparing and making my presentations.
There are no hard and fast rules, but these are some things I have learnt:
The most limiting thing you can do when you start putting together a presentation is to reach for slideware. I use a paper notebook to brainstorm my ideas with multicoloured pens, then scan it so I can refer back to it quickly when putting the slides together.
Almost 2 days ago Patrick kicked off a discussion about organising another Australian DevOps conference in 2013 amongst a small group of passionate DevOps who are actively involved in the Australian community.
While the discussion was trundling on without me, I felt I owed everyone involved an explanation of what happened with this year’s unrealised conference, and why the conference fell flat.
Matthias Marschall wrote a great peice last week on the pitfalls of making manual changes to production systems. TL,DR; Making manual changes in the heat of the moment will bite you at the most inopportune times.
The article finishes with this suggestion:
You should have your configuration management tool (like Puppet or Chef) setup so that you can try out possible solutions without having to go in and do it manually.
In my experience, this is the key to solving the problem.
Rather than coercing people to follow a “no manual changes” policy, you make the incentives for making changes with automation better than for making changes manually.
Make it simple. Reduce the number of steps to make the change with automation. It should be quicker to find the place in your Chef or Puppet code and deploy than logging into the box, editing a file, and restarting a service.
Make it fast. The time from thinking about the change to the change being applied should be shorter with automation than by doing it manually.
Make it safe. Provide a rollback mechanism for changes. A safety harness can be as simple as a thin process around “git revert” + deploy.
It’s a perfect example of how tools should complement culture.
This post is part 3 of 3 in a series on monitoring scalability.
In parts 1 and 2 of this series I talked about check latency and how you can mitigate its effects by splitting data collection + storage out from alerting, while looking at monitoring systems through the prism of an MVC web application.
This final post in the series provides a concrete example of how to instrument your monitoring checks so you can identify which exact parts of your checks are inducing latency in your monitoring system.
When debugging performance bottlenecks, I tend to use a simple but effective workflow:
observe the system
analyse the results
optimise the bottleneck that is having the most impact
rinse and repeat until the system is performing within the expected performance parameters
What if we continue to look at monitoring checks as micro MVC web applications? What tools exist to aid this optimisation workflow, and how can we hook instrumentation into our checks?