Authors: Xiaoxin Chen (VMWare), Tal Garfinkel (VMWare), E. Christopher Lewis (VMWare), Pratap Subrahmanyam (VMWare), Carl A. Waldspurger (VMWare), Dan Boneh (Stanford), Jeffrey Dwoskin (Princeton), Dan R. K. Ports (MIT)
Questions:
Q: Cloaking needs to be atomic? How?
A: Has to be atomic w/ respect to OS. No real fallout for concurrent systems.
Q: Why is MARSHALL mini-benchmark worse than PASSTHRU?
A:
Q: Could you reverse it and use it in the OS to protect against malicious VM?
A: Huh. Maybe.
Q: What's your threat model?
A: Don't worry about I/O (things like SSL protect network).
Q: Why is mmap read performance worse than write?
A: Write has to touch disk anyway, so minimal additional overhead. Read needs an extra page fault.
Monday, March 3, 2008
Tuesday, October 16, 2007
VirtualPower: Coordinated Power Management in Virtualized Enterprise Systems
Authors: Ripal Nathuji (Georgia Institute of Technology) and Karsten Schwan (Georgia Institute of Technology)
Paper: http://www.sosp2007.org/papers/sosp111-nathuji.pdf
Datacenters need power management. Cooling makes things worse. Indusry deployed ACPI.
How do you maintain power information in virtualization layer?
Lots of platform heterogeneity in a datacenter. Variations in power, performance, managability. Homogenize using VMs. Can restrict physical utilization of a VM ("soft scaling"). Can adhere to virtualization SLAs in the guest OS without being specifically aware of them. Take advantage of feedback loop.
Implementation: VPM events created when guest OS makes a power call. Dom0 can retrieve these events.
Future work: VPM tokens, idle power management (additional VPM C-states), efficient soft-scale consolidations
Q: Any estimate of whether adherence to OS power interface has a cost?
A: Want to look at it. Looking at a lightweight paravirtualization solution.
Q: What about consolidating nodes (i.e., shutting down physical nodes)?
A: Well, we can do this on-the-fly automatically.
Paper: http://www.sosp2007.org/papers/sosp111-nathuji.pdf
Datacenters need power management. Cooling makes things worse. Indusry deployed ACPI.
How do you maintain power information in virtualization layer?
Lots of platform heterogeneity in a datacenter. Variations in power, performance, managability. Homogenize using VMs. Can restrict physical utilization of a VM ("soft scaling"). Can adhere to virtualization SLAs in the guest OS without being specifically aware of them. Take advantage of feedback loop.
Implementation: VPM events created when guest OS makes a power call. Dom0 can retrieve these events.
Future work: VPM tokens, idle power management (additional VPM C-states), efficient soft-scale consolidations
Q: Any estimate of whether adherence to OS power interface has a cost?
A: Want to look at it. Looking at a lightweight paravirtualization solution.
Q: What about consolidating nodes (i.e., shutting down physical nodes)?
A: Well, we can do this on-the-fly automatically.
Integrating Concurrency Control and Energy Management in Device Drivers
Authors: Kevin Klues (Stanford University, Washington University in St. Louis, Technical University of Berlin), Vlado Handziski (Technical University of Berlin), Chenyang Lu (Washington University in St. Louis), Adam Wolisz (Technical University of Berlin, University of California Berkeley), David Culler (Arch Rock Co., University of California Berkeley), David Gay (Intel Research Berkeley), and Philip Levis (Stanford University)
Paper: http://www.sosp2007.org/papers/sosp186-klues.pdf
(SOSP presentation)
Existing embedded devices usually rely on application for power savings. Manually shut off/turn on pieces = bleh. ICEM: Split-phase I/O operations. = make asynchronous. 3 types of device driver:
Q: Doesn't it ultimately boil down to application decisions no matter what?
A: Thinking of letting application send hints to system
Q: Does any of this apply to mainstream OSes?
A: Not yet...where we'd really like to see this is in mobile phone OSes.
Q: How does the programming model change for app writers?
A: Very much like async I/O.
Q: Can any of the transaction work apply here? You're sort of grouping operations into a transaction.
A: Hadn't thought about it.
Q: Send is bottleneck. Done anything about that?
A: We're just specifying an architecture. You can specify policy.
Paper: http://www.sosp2007.org/papers/sosp186-klues.pdf
(SOSP presentation)
Existing embedded devices usually rely on application for power savings. Manually shut off/turn on pieces = bleh. ICEM: Split-phase I/O operations. = make asynchronous. 3 types of device driver:
- virtualized
- only a functional interface
- assume multiple users
- buffer I/O requests for energy savings
- must be able to tolerate longer latencies
- dedicated
- assume single user
- no concurrency control
- explicit energy management
- shared
- functional and lock interface
- multiple user
- explicit concurrency control through split-phase lock
- implicit energy management based on pending requests
- used for stringent timing requests
Q: Doesn't it ultimately boil down to application decisions no matter what?
A: Thinking of letting application send hints to system
Q: Does any of this apply to mainstream OSes?
A: Not yet...where we'd really like to see this is in mobile phone OSes.
Q: How does the programming model change for app writers?
A: Very much like async I/O.
Q: Can any of the transaction work apply here? You're sort of grouping operations into a transaction.
A: Hadn't thought about it.
Q: Send is bottleneck. Done anything about that?
A: We're just specifying an architecture. You can specify policy.
AutoBash: Improving Configuration Management with Operating System Causality Analysis
Configuration management sucks. Want to automate:
Q: How far would just having transactions in Linux get you?
A: mumble.
Q: How well would this work in a distributed file system environment? Also, how important is it to have predicates for every piece of software?
A: Speculator doesn't work on distributed system. Should work on more general distributed systems.
Q: Maybe you could mine installation process for information about what the correct configuration is supposed to be. Now the Q: where can you get the predicates?
A: mumble.
Q: How well would this work with persistent state transactions (rather than speculation)?
A: If only work in persistent state, bad state can become incorporated into the persistent state. That's bad.
Q: What if the things you try only partially fix the problem or just gives you a clue about what to try next?
A: Which predicates work should tell you something about what needs to happen.
Q: What if you need to apply multiple solutions to get it to work? Can the system figure that out?
A: Future work.
- Replay mode - automatically search for solution
- Observation mode - highly interactive problem solving
- Healthcare mode - make sure things stay working
Q: How far would just having transactions in Linux get you?
A: mumble.
Q: How well would this work in a distributed file system environment? Also, how important is it to have predicates for every piece of software?
A: Speculator doesn't work on distributed system. Should work on more general distributed systems.
Q: Maybe you could mine installation process for information about what the correct configuration is supposed to be. Now the Q: where can you get the predicates?
A: mumble.
Q: How well would this work with persistent state transactions (rather than speculation)?
A: If only work in persistent state, bad state can become incorporated into the persistent state. That's bad.
Q: What if the things you try only partially fix the problem or just gives you a clue about what to try next?
A: Which predicates work should tell you something about what needs to happen.
Q: What if you need to apply multiple solutions to get it to work? Can the system figure that out?
A: Future work.
Staged Deployment in Mirage, an Integrated Software Upgrade Testing and Distribution System
Authors: Olivier Crameri (EPFL), Nikola Knezevic (EPFL), Dejan Kostic (EPFL), Ricardo Bianchini (Rutgers), and Willy Zwaenepoel (EPFL)
Paper: http://www.sosp2007.org/papers/sosp076-crameri.pdf
This paper seems to focus on the issue of clustering machines that behave identically with respect to an upgrade.
Heuristically categorize dependencies used at runtime. Can be user-defined. Fingerprint resources. Categorize based on set of resources.
How effective is automatic resource classification? Good...no errors, though small single digit number of vendor-specific rules.
Q: Isn't this going to slow things down and make things easier for people to exploit security flaws?
A: As we said, there's a tradeoff
Q: Could this help you narrow down differences in configuration that cause bugs?
A: Hopefully.
Q: Isn't the number of configurations subject to combinatorial explosions?
A: Sure, possible...in practice hopefully not? We're studying this now.
Paper: http://www.sosp2007.org/papers/sosp076-crameri.pdf
This paper seems to focus on the issue of clustering machines that behave identically with respect to an upgrade.
Heuristically categorize dependencies used at runtime. Can be user-defined. Fingerprint resources. Categorize based on set of resources.
How effective is automatic resource classification? Good...no errors, though small single digit number of vendor-specific rules.
Q: Isn't this going to slow things down and make things easier for people to exploit security flaws?
A: As we said, there's a tradeoff
Q: Could this help you narrow down differences in configuration that cause bugs?
A: Hopefully.
Q: Isn't the number of configurations subject to combinatorial explosions?
A: Sure, possible...in practice hopefully not? We're studying this now.
Dynamo: Amazon's Highly Available Key-Value Store
Authors: Guiseppe DeCandia (Amazon.com), Deniz Hastorun (Amazon.com), Madan Jampani (Amazon.com), Gunavardhan Kakulapati (Amazon.com), Avinash Lakshman (Amazon.com), Alex Pilchin (Amazon.com), Swami Sivasubramanian (Amazon.com), Peter Vosshall (Amazon.com), and Werner Vogels (Amazon.com)
Paper: http://www.allthingsdistributed.com/2007/10/amazons_dynamo.html
Amazon is a loosely coupled, service-oriented architecture. Each service is independent, but must adhere to latency SLAs. Availability is paramount.
RDBMS is a poor fit despite key-value being a good fit: most features unused, scales up and not out, and availability limitations.
Generally care more about availability than consistency.
Needs to be always writable, even in failure, even without previous context.
Want "knobs" to tune tradeoffs between cost, consistency, durability, and latency.
Overview:
"Sloppy quorum": N replicas in ideal state, read from at least R nodes, write to at least W nodes. "Sloppy" because the membership is dynamic based on node availability. Different values for N, R, and W yield different characteristics for the resulting system.
Each write is a new version. In worst case, might read a stale read version. A write based on this creates a branch in the version history.
It is up to the application to resolve version history conflicts! All (relevant) versions returned to app!
Use vector clocks to take care of version history (preserves causality).
Lessons learned:
A: We overprovision to deal with typical failure scenarios, including whole datacenter dying.
Q: When you need to add capacity, don't you need to shed load off of everybody?
A: Nodes have lots of neighbors. Adding nodes does pull load away from a bunch of others.
Q: How do you do reconciliation?
A: Use merkel hash tree for reconciliation?
Q: How do you prove that you met SLAs?
A: not sure
Q: Talk about the kind of conflicts you saw?
A: 99.94% of reads return single value. Most others returning two versions. Some of those might be returning write retries that happen in parallel.
Q: How often do you not achieve quorum?
A: Never!
Q: Ever been a partition?
A: Sure...rack files. Client can't see it though.
Q: Clients might potentially see lots of versions (even if it's rare). How do clients do reconciliation? No version ancestry?
A: Very application-specific. Sometimes last-write wins. Somewhat hard model to program to. Could potentially not garbage collect. No proof of convergence.
Q: Why did you go with consistent hashing?
A: High availability.
Q: What happens when some keys are more popular than others?
A: Often we don't see that. Often falls into the noise.
Paper: http://www.allthingsdistributed.com/2007/10/amazons_dynamo.html
Amazon is a loosely coupled, service-oriented architecture. Each service is independent, but must adhere to latency SLAs. Availability is paramount.
RDBMS is a poor fit despite key-value being a good fit: most features unused, scales up and not out, and availability limitations.
Generally care more about availability than consistency.
Needs to be always writable, even in failure, even without previous context.
Want "knobs" to tune tradeoffs between cost, consistency, durability, and latency.
Overview:
- consistent hashing
- optimistic replication
- "sloppy quorum"
- anti-entropy mechanisms
- object versioning
"Sloppy quorum": N replicas in ideal state, read from at least R nodes, write to at least W nodes. "Sloppy" because the membership is dynamic based on node availability. Different values for N, R, and W yield different characteristics for the resulting system.
Each write is a new version. In worst case, might read a stale read version. A write based on this creates a branch in the version history.
It is up to the application to resolve version history conflicts! All (relevant) versions returned to app!
Use vector clocks to take care of version history (preserves causality).
Lessons learned:
- (missed first)
- repartitioning is slow because propagating data to new nodes takes forever (gets throttled; lots of random disk I/O)
- use fixed arcs; allow transfer of whole database (a file copy, linear read on disk)
- no transactional semantics
- no ordered traversal
- no large objects
- does not scale indefinitely
A: We overprovision to deal with typical failure scenarios, including whole datacenter dying.
Q: When you need to add capacity, don't you need to shed load off of everybody?
A: Nodes have lots of neighbors. Adding nodes does pull load away from a bunch of others.
Q: How do you do reconciliation?
A: Use merkel hash tree for reconciliation?
Q: How do you prove that you met SLAs?
A: not sure
Q: Talk about the kind of conflicts you saw?
A: 99.94% of reads return single value. Most others returning two versions. Some of those might be returning write retries that happen in parallel.
Q: How often do you not achieve quorum?
A: Never!
Q: Ever been a partition?
A: Sure...rack files. Client can't see it though.
Q: Clients might potentially see lots of versions (even if it's rare). How do clients do reconciliation? No version ancestry?
A: Very application-specific. Sometimes last-write wins. Somewhat hard model to program to. Could potentially not garbage collect. No proof of convergence.
Q: Why did you go with consistent hashing?
A: High availability.
Q: What happens when some keys are more popular than others?
A: Often we don't see that. Often falls into the noise.
PeerReview: Practical Accountability for Distributed Systems
Authors: Andreas Haeberlen (MPI-SWS), Petr Kouznetsov (MPI-SWS), and Peter Druschel (MPI-SWS)
Paper: http://www.sosp2007.org/papers/sosp118-haeberlen.pdf
How do you detect faults when the system is federated and you can't see all of it? Specifically, how do you detect faults, how do you identify faulty nodes, and how do you convince evidence? Obviously, we need verifiable evidence.
Genera solution: keep a log, have an auditor that periodically inspects the log. Log is a hash chain (to prevent changing the log ex post facto).
Probabilistic log checking allows for scalability (otherwise overhead would be quadratic).
Q: How do you prevent collusion?
A: We used consistent hashing to choose witnesses, then secure routing.
Q: How do you deal with selective processing?
A: (reiterates what said in the talk)
Q: Seems like most appropriate to malicious faults given that it's all the same state machines. Is this useful for failing software?
A: (nothing useful...offline)
Q: (you misrepresented my CATS system...) How do you make logs visible in a secure way?
A: ??? Assume always at least one correct witness node.
Q: Why is non-repudiation work from 70s not applicable?
A: (Not sure what you're saying, offline)
Paper: http://www.sosp2007.org/papers/sosp118-haeberlen.pdf
How do you detect faults when the system is federated and you can't see all of it? Specifically, how do you detect faults, how do you identify faulty nodes, and how do you convince evidence? Obviously, we need verifiable evidence.
Genera solution: keep a log, have an auditor that periodically inspects the log. Log is a hash chain (to prevent changing the log ex post facto).
Probabilistic log checking allows for scalability (otherwise overhead would be quadratic).
Q: How do you prevent collusion?
A: We used consistent hashing to choose witnesses, then secure routing.
Q: How do you deal with selective processing?
A: (reiterates what said in the talk)
Q: Seems like most appropriate to malicious faults given that it's all the same state machines. Is this useful for failing software?
A: (nothing useful...offline)
Q: (you misrepresented my CATS system...) How do you make logs visible in a secure way?
A: ??? Assume always at least one correct witness node.
Q: Why is non-repudiation work from 70s not applicable?
A: (Not sure what you're saying, offline)
Subscribe to:
Comments (Atom)