Scaling systems configuration at Facebook: the paradigms, design, and software behind managing massive numbers of systems with open source and small teams

Phil Dibowitz (Facebook)
Operations
Location: Portland Ballroom
Average rating: ****.
(4.64, 11 ratings)
Slides:   1-PDF 

For many years, Facebook managed its systems with cfengine2. With many individual clusters over 10k nodes in size, a slew of different constantly-changing system configurations, and small teams, this system was showing its age and the complexity was steadily increasing, limiting its effectiveness and usability. It was difficult to integrate with internal systems, testing was often impractical, and it provided no isolation of configurations, among many other problems. After an extensive evaluation of the tools and paradigms in modern systems configuration management – open source, proprietary, and a potential home-grown solution – we built a system based on one of the existing open source configuration management tools (our choice will be announced in February). The evaluation process involved understanding the direction we wanted to take in managing the next many iterations of systems, clusters, and teams. More importantly, we evaluated the various paradigms behind effective configuration management and the different kinds of scale they provide. What we ended up with is an extremely flexible system that allows a tiny team to manage an incredibly large number of systems with a variety of unique configuration needs. In this talk we will look at the paradigms behind the system we built, the software we chose and why, and the system we built using that software. Further, we will look at how the philosophies we followed can apply to anyone wanting to scale their systems infrastructure.

Photo of Phil Dibowitz

Phil Dibowitz

Facebook

Phil Dibowitz has been working in systems engineering for 12 years and is currently a production engineer at Facebook. Initially, he worked on the traffic infrastructure team automating load balancer configuration management as well as designing and building the production IPv6 infrastructure. Phil now leads the team responsible for rebuilding the configuration management system from the ground up. Prior to Facebook, he worked at Google managing the large GMail environment, and at Ticketmaster, where he co-authored and open sourced a configuration management tool called Spine (https://github.com/ticketmaster/spine). Phil also contributes to and maintains various open source projects (http://www.phildev.net/) and has spoken around the community at conferences and LUGs on a variety of topics from Path MTU Discovery to X509.

Comments on this page are now closed.

Comments

Picture of Phil Dibowitz
07/29/2013 1:21pm PDT

OK, slides are now linked from above. I gave a similar talk at as a keynote at #ChefConf which there is video of on youtube, and I’ll be presenting at Velocity East as well.

Picture of Phil Dibowitz
07/28/2013 11:43pm PDT

Yup, I’ll deliver PDF export of my slides to the OSCON folks tomorrow at the office. I don’t believe they recorded video of the talk though.

Picture of Michael Shadle
07/28/2013 3:40pm PDT

is there a video or slides from this?

Sponsors

Sponsorship Opportunities

For information on exhibition and sponsorship opportunities at the conference, contact Sharon Cordesse at (707) 827-7065 or scordesse@oreilly.com.

Contact Us

View a complete list of OSCON contacts