Friday, April 03, 2015

Graphs in Rust

Graphs are a bit awkward to construct in Rust because of Rust's stringent lifetime and mutability requirements. Graphs of objects are very common in OO programming. In this tutorial I'm going to go over a few different approaches to implementation. My preferred approach uses arena allocation and makes slightly advanced use of explicit lifetimes. I'll finish up by discussing a few potential Rust features which would make using such an approach easier.

There are essentially two orthogonal problems: how to handle the lifetime of the graph and how to handle it's mutability.

The first problem essentially boils down to what kind of pointer to use to point to other nodes in the graph. Since graph-like data structures are recursive (the types are recursive, even if the data is not) we are forced to use pointers of some kind rather than have a totally value-based structure. Since graphs can be cyclic, and ownership in Rust cannot be cyclic, we cannot use Box<Node> as our pointer type (as we might do for tree-like data structures or linked lists).

No graph is truly immutable. Because there may be cycles, the graph cannot be created in a single statement. Thus, at the very least, the graph must be mutable during its initialisation phase. The usual invariant in Rust is that all pointers must either be unique or immutable. Graph edges must be mutable (at least during initialisation) and there can be more than one edge into any node, thus no edges are guaranteed to be unique. So we're going to have to do something a little bit advanced to handle mutability.

...

Read the full tutorial, with examples. There's also some discussion of potential language improvements in Rust to make dealing with graphs easier.

4 comments:

Robert said...

Seems to me it would be really nice to have the option of saying "I don't care about managing lifetimes myself here, just use a tracing GC for all this stuff".

I know that was once part of the Rust plan, and then the plan was to push it out to a library with language support for tracing hooks, and it seems now the option has disappeared entirely. Which makes me sad.

Niko Matsakis said...

I just wrote a quick post showing another way to model graphs:

http://smallcultfollowing.com/babysteps/blog/2015/04/06/modeling-graphs-in-rust-using-vector-indices/

I would offer a slight correction to the previous commenter, though: the option of tracing Gc is not currently available, but it is something I hope we will work on soon.

gasche said...

In Mezzo, they use a dynamic mechanism called "adoption and abandon" as a way to handle shared/cyclic nodes in a relatively ergonomic way (but paying the cost of a runtime check when taking ownership of a node during traversal).

See for example this code example for graphs. Adoption and Abandon has been described in various places, such as this blog post.

Anonymous said...

A graph is just a tuple G = (V, E) where V is some set of vertices, and E a set of edges, generally represented as tuples of vertices. In general, either your vertices or your edges are implicit to some degree. And lots of things in your program form graphs with or without your awareness or intention. Becoming aware of hidden graph structures in your code is often an "ah-hah" moment.

There are tons of ways to represent graphs. There is no precondition that they be either pointer-based or dynamic. You can form graphs from anything that allows any kind of associativity: arrays, genejric maps, function calls (as mentioned above), match statements, etc.

The point is you have a set of "things", and you have a set of "pairs of things", where each item in a given pair is a "thing". If you have a set of "things", and *any* other structure in your code which associates pairs of "things", that forms a graph -- whether or not that is what you indended!

It's very easy to form static graphs. For example, you can represent vertices as functions, edges as calls from one function to another. Yes: every program consists of a graph over functions! You could represent nodes as enum values, and edges as match clauses. Whether or not you intend, this forms a graph.

It's not that hard, guys.