This was a project ideas page for Joe and Mothy's 262 class
P2 Engine Architecture
-
Parallel dataflow execution using lightweight threads.
P2 is currently implemented using a single-threaded
event-driven select loop to implement asynchronous I/O. Some
consequences of this are that P2 cannot take advantage of multiple
processors, and it is cumbersome to implement long-running I/O
operations (such as disk access or invoking a separate database
process) in the middle of executing a P2 rule.
This project investigates the issues in using the Cappricio
lightweight threads package as a better basis of the P2 runtime.
Cappricio is attractive because supports very large numbers of
coroutine-like threads, with lightweight synchronization primitives.
However, it also presents challenges, for example: efficiently
scheduling P2's dataflow graph based on ordering contraints imposed by
language semantics, and resource control to ensure that long-running
CPU-intensive dataflow invocations do not starve or hold up the rest
of the system.
This project assumes fairly strong C++/STL skills and a working
knowledge of asynchronous I/O programming on Unix.
-
Coordination extensions to P2.
In the P2 declarative overlay system, actions (insertion in a table,
sending a message, etc.) occur as a result of single "events" (arrival
of a message, timers, etc.). While sufficient to implement overlay
networks remarkably concisely, this "single event => action" model
does make it somewhat cumbersome to express certain kinds of
short-term "stateful" behavior, such as "perform action X when events
Y and Z are seen within 3 seconds of each other", etc.
This project is about combining P2's powerful datalog evaluation with
a more sophisticated event coordination facility, and evaluating what
the payoff is in terms of system complexity, simplicity of expression,
and runtime efficiency, as compared with implementing such stateful
constructs in P2's existing query languages. The idea is to combine
some of facilities of modern publish-subscribe systems with
distributed relational query processing.
-
Parallel Dataflow with Overlays. Partitioned-parallel
dataflow execution is a good way to scale up cluster-based data
processing. It has been used successfully in databases since the
1980's (see the DeWitt/Gray paper on "Parallel Database Systems" in
the red book, the Graefe paper to be read on 11/3, as well as the work
on FLuX
[2]).
Recently this idea has proven very important in web search engines;
Google calls
it MapReduce,
and it underlies much of their batch processing. Traditionally,
parallel dataflow assumes a very reliable set of "fate-shared"
machines. As clusters scale out, they are beginning to look more and
more like distributed systems. Investigate the use of overlay
technologies in a high-performance parallel dataflow environment for
clusters.
Networking
-
Declarative Protocol Implementation. P2 was design as a
system for constructing overlay networks; however, one can certainly
imagine using it to implement the core functionality of Internet
routing. Consider a declarative implementation of Internet protocols
like BGP, or even TCP/IP. Can these be made efficient in P2? Does
the declarative perspective help to prove properties about
correctness? Does the flexibility of the language and the dataflow
system enable new optimizations to be defined and exploited? Some
related work here
includes Click, XORP,
and logics
for routing.
-
Ephemeral, Customized Overlay Networks. P2 allows overlay
networks to be concisely specified, and constructed on the fly. This
raises the possibility of customized, "ephemeral" overlays that are
set up for a specific task (e.g. disseminating a command with a
randomized gossip algorithm, constructing a multicast/aggregation tree
with controlled fanout/in, producing a random-neighbor DHT for
secure routing, etc.) This project would investigate the
feasibility and motivation for such "ECO" networks: how quickly can
overlays be brought up and torn down, and can you demonstrate the
benefit of a specific customized overlay for a particular task?
Internet-Scale Monitoring/Querying
All of the following projects could use
P2
or PIER system as an
underlying platform. Talk with Joe for more information.
-
Distributed Queries in P2.
The PIER system implemented
distributed relational queries over Distributed Hash Tables (DHTs).
P2 offers the opportunity to revisit the PIER design, and consider
both (a) implementing distributed relational operators (join,
aggregation, etc.) without a DHT, or (b) opening up the
traditional DHT interface to allow particular relational operators
to perform more efficiently/reliably.
-
Pronghorn: A new generation of P2P Filesharing. Projects
like PIER and P2 promise p2p-scale querying with the full power of a
language like SQL. Unfortunately, the queryable data in p2p
filesharing today is impoverished: just short filenames and ID3 tags.
Microsoft's WinFS filesystem (in the upcoming Windows
"Longhorn"/"Vista") promises to change all that by putting a full
relational database in for filesystem metadata. What P2P filesharing
facilities could be enabled by coupling these two technologies? Get a
jump on the trend early, and see what technical challenges arise!
-
Distributed Aggregate Triggers. The challenge here is to
track a distributed property of a distributed system. For example,
given a p2p application running on n sites, track the sum of
the bandwidth those n sites are imposing in total on any
external site. Initial ideas on this topic appear in
a HotNets
2004 paper; your challenge would be to move substantially beyond
these ideas, perhaps
using distributed
sketching ideas implemented
in P2.
-
P2P firewall anomaly detection. Assume a collection of
end-hosts want to "band together" to look for anomalous network
behaviors. To this end, they use a P2P query engine to
collaboratively share and query the contents of their firewall logs,
in the hopes of allowing end-users to address high-level questions
like "has something changed?" or "is it just me or is everybody
experiencing something weird?" Define specific collaborative queries
the end-hosts can run to identify anomalous patterns on the network,
and investigate how to efficiently implement them
in P2.
-
Distributed End-Host Clustering. In the spirit of the
previous project, design a distributed clustering algorithm to
automatically group end-hosts in a P2P by various features (traffic
patterns, software versions, firewall traces, etc.) Be able to
continuously track the clusters over time, to enable end-hosts to
determine when they drift from their usual cluster, or when their
entire cluster experiences changes.
Virtual Machines
- LOX: a library O/S for Xen.
Xen is a GPL-licenced Hypervisor for ia32-architecture machines. This
project investigates one of the implications of assuming that
hypervisor functionality is part of a computing platform. Not only
is writing an operating system for a hypervisor much simpler that
writing one for "real" hardware (devices are simple and standardised,
for instance), but it becomes possible to use a highly optimised and
simplified per-application operating system, much like the notion of
an Exokernel "library OS" or a Nemesis domain.
This project will explore the tradeoffs involved in the design of a
per-application operating system for VMM environments. Issues to
investigate include: what are the performance benefits (if any) to be
gained for different classes of application? What design of library
OS is appropriate for (say) a database system? Or a web server? Can
such an operating system be built in a modular, customizable manner?
- DOX: a device O/S for Xen.
A somewhat different direction to LOX (see project proposal above),
DOX also explores what follows from assuming virtualization technology
is present on a computing platform. The Xen hypervisor implements
device drivers using existing operating systems: an OS in one virtual
machine (Linux, BSD, Windows) is granted access to the physical
hardware and runs the standard device drivers, and then exports a
"virtual device", usually with some standard format (like a
block-oriented virtual disk) to other VMs.
There's no reason why different device drivers shouldn't execute in
different virtual machines, under different operating systems. More
long-term, though, there's no reason why the operating system for a
device driver needs to be a so-called general-purpose system like
Linux or Windows. This project will looks at questions like: What
is the appropriate operating system to run inside a Xen domain and
provide high-performance, high robustness virtualization of a complex
I/O device (such as a RAID or SATA controller)? How can such an OS be
customized to expose the distinctive/unique functionality of such
devices without sacrificing reusability?
Sensor Networks
- Epoch scheduling services. Various data collection tasks
in sensornets work on a fixed schedule of "epochs": time is divided
into epochs, and the sensors must collaborate within each epoch to
achieve some collaborative task. Typically they are expected to sleep
in power-save mode for most of the epoch, and wake up on a schedule to
communicate. TinyDB is an
example of such a design. There are a variety of challenges in
designing a robust epoch scheduling service. How do you allocate
wakeup and communication timeslots to the nodes, taking into account
the vagaries of time synchronization, unpredictable task lengths
(e.g. sending variable-sized data), and a scalable, possibly dynamic
population of nodes?
- Sketch-based approximate aggregates. Evaluate the practical
use of approximation techniques
like distributed
sketches in a real sensornet query implementation. Consider the
routing opportunities of duplicate-insensitive sketches as well
(e.g. Tributaries
and Deltas.)
- Source-routed tours in
sensornets. The BBQ
system proposed using source-routed "tours" of network nodes for data
collection. Recently, various protocols were designed to make such
tours more robust to failure (see Joe for details). The practicality
of these protocols remains very much in question. This project would
evaluate the existing protocol ideas and hopefully improve upon
them.
- Mixing push and pull in probabilistic
querying. BBQ
proposed a "pull-oriented" paradigm for probabilistically querying
sensors. A CS262 project last year proposed a "push-based" paradigm
called Ken, which was recently accepted for publication. These two
approaches are different; it is unclear how to integrate them
intelligently, and what third approaches might be sensible.
Security and Privacy in Distributed Query Processing
- Proof Sketches. We have developed a new technique based on
distributed
sketches to prove that the result to a distributed aggregate query
(e.g. a vote, an average computation, etc.) is within epsilon of the
correct result, despite tampering by adversarial aggregators.
However, the idea has limitations with respect to "suppression
attacks" in which data (individual values, or sub-aggregates) are
simply not forwarded along. Other work on duplicate-insensitive
aggregation
(e.g. Tributaries
and Deltas) has focused on tolerating loss in distributed
aggregation, but has not focused on coordinated suppression attacks.
Clearly, redundancy for duplicate-insensitive messages can mitigate
such attacks; but how do we construct a suitably redundant aggregation
topology (a DAG? a family of trees?) to provide strong probabilistic
guarantees? And can we maintain such a topology in the face of
network churn?
- Privacy and Verifiability in Distributed Querying.
Privacy-preserving multi-party querying has been a hot topic in recent
years (see Rakesh Agrawal's various papers for overviews -- this is a
good starting point). This work has typically not considered
wide-area environments with multi-hop collaborative execution models,
in which not only is the privacy at issue, but also the veracity of
the answer. The rather open-ended idea here is to revisit one of the
key themes from this line of work in a distributed context, and
consider the new issues that arise in verifying the correctness of the
query answers while preserving privacy.
Miscellany
- Disordered Disk I/O.
The use of asynchronous disk I/O API calls to improve performance of
concurrent applications over Unix is well-known - it's almost always a
good idea to let the application get on with something else useful
while it's waiting for the disk to get back. Less well investigated
are the delays (in particular, latency penalties) resulting from
more subtle ordering constraints on disk I/O imposed by imperative
programming and the I/O interface itself. For example, a simple
for-loop to sum the records in a file can be terribly inefficient
in disk seeks. In theory, it doesn't matter in which order the disk
blocks making up the file are read from the disk, so it should be
possible to move the disk arm a minimum number of tracks to get all
the blocks. The problem here is that the application cannot express
this opportunity for reordering and parallelism to the operating
system.
Taken to a limit, this project will investigate alternative O/S APIs,
file systems, and device drivers which allow such unordered I/O to be
efficiently specified by (new) applications, using a combination of
techniques which might include continuation support, map/reduce-like
constructs, and new file abstractions which leave record ordering
unbound. Unlike databases, which tend to focus on throughput
optimization, the emphasis here is reducing latency. There's plenty
of scope for both a simulation-based study or (better) an
implementation using a modified OS like BSD or Linux over Xen.