Thoughts on Apache Mesos
I’ve been following the development of Apache Mesos for quite some time now. From its humble beginnings as a research paper and Apache incubation in 2010 to its graduation at ASF and the establishment of the commercial entity Mesosphere in 2013.
In the past couple of months a number of things happened and so I thought it’s a good time to jot down some (rather random) notes concerning Mesos and its ecosystem. If I missed something relevant, please do let me know.
A lot has been said concerning Mesos and YARN. I’ve seen statement such as Mesos’ resource request model is weirdly backwards , and also noticed the — admittedly delayed — increasing popularity of Mesos in the past years. One of the key factors here might be the hype around Docker and the respective need for an orchestration or coordination layer. I’ll come back to the Mesos vs. YARN topic towards the end of this post, again.
I will confess that I didn’t really totally grasp the potential of Mesos until the day I sat down and read the Mesos research paper. It contains design philosophy, motivation and justification concerning resource allocation, isolation guarantees and fault tolerance.
A core challenge Mesos addresses is that of satisfying the constraints of a framework without actually knowing about them. Here’s where the sometimes misunderstood resource offer process comes into play and one way to understand this is by analogy. Mesos behaves like the parent host at a kids birthday party: say you’ve got some 15 kids (== frameworks) to supply with food (== resources) and can’t possible know their inclinations (==placement preferences). But you can offer them a piece of pizza or a bowl of rocket and they are free to accept it (now or later) or to reject it. Further, it might be that the dad who dropped off one of the guests told you that his youngster is a vegetarian, so there’s no point in you offering him, say, a beef burger (== filters), etc.
Fun fact, albeit widely known I suppose, is that both Mesos and Spark have something in common: Matei Zaharia — originating from a town close to Ontario, Canada — who was a student at AMPLab, UC Berkeley where he heavily contributed to both Mesos and Spark. These days he serves as the CTO of Databricks, the commercial entity shepherding Spark.
So, coming back to Mesos vs. YARN — luckily it’s not an either or these days: with project Myriad (a joint effort of eBay, Mesosphere and MapR, currently being submitted to ASF for incubation) you can have the cluster scheduling cake and eat it, too. In a nutshell, Myriad is a Mesos framework for dynamically scaling YARN clusters, allowing to run Hadoop apps such as Spark alongside non-Hadoop applications such as Node.js, Memecached, RoR, or what have you. Exciting times!
That was it, my thoughts about Apache Mesos as of time of writing, mid-February 2015. I’d keep an eye on Myriad and for starters you can test-drive Mesos if you haven’t yet.