Tuesday, October 16, 2007

Sinfonia: A New Paradigm for Building Scalable Distributed Systems

Authors: Marcos K. Aguilera (HP Labs), Arif Merchant (HP Labs), Mehul Shah (HP Labs), Alistair Veitch (HP Labs), and Christos Karamanolis (VMWare)

Paper: http://www.sosp2007.org/papers/sosp064-aguilera.pdf

(SOSP presentation)

This is an infrastructure for distributed applications. The general idea is to create a set of linear memories, and a particular memory address is identified by a node id and an offset. They use small, short-lived transactions with the semantics that if all read values match what the transaction expects, then the transaction commits.

They have a simplified 2-phase commit protocol with, essentially, no coordinator. The transaction blocks if _any_ of the application nodes crash, but they argue this isn't that big a deal because the action may involve application data located on the application node, and if the application node isn't available, you're screwed anyway (rough paraphrase).

They built a cluster file system and a group communication service.

Crap...wasn't paying attention to questions (at least, not writing them down). From what I remember:

- Why this and not a federated array of bricks?
- Are there any pathological cases you found that resulted in frequent rollback?
- other stuff...

No comments: