Featherweight Musings

Rustfmt-ing Rust

2015-10-07T06:38:00.000+01:00

Over on my new blog I have a post on running rustfmt on the Rust repo and how you can help with the process, if you would like to:

http://www.ncameron.org/blog/rustfmt-ing-rust/

My Git and GitHub work flow

2015-06-01T03:40:00.000+01:00

Every now and then I get a bunch of questions about my Git workflow. Hopefully, this will be useful, even though there are already a bunch of tutorials and blogs on Git. It is aimed at pretty much Git newbies, but assumes some knowledge of version control concepts. Some of these things might not be best practice, I'd appreciate people letting me know if I could do things better!

Also, I only describe what I do, not why, i.e., the underlying concepts you should understand. To do that would probably take a book rather than a blog post and I'm not the right person to write such a thing.

Starting out

I operate in two modes when using Git - either I'm contributing to an existing repo (e.g., Rust) or I'm working on my own repo (e.g., rustfmt), which might just be a personal thing, essentially just using GitHub for backup, or which might be a community project that I started. The workflow for the two scenarios is a bit different.

Lets start with contributing to someone else's repo. The first step is to find that repo on GitHub and fork it (I'm assuming you have a GitHub account set up, it's very easy to do if you haven't). Forking means that you have your own personal copy of the repo hosted by GitHub and associated with your account. So for example, if you fork https://github.com/rust-lang/rust, then you'll get https://github.com/nrc/rust. It is important you fork the version of the repo you want to contribute to. In this case, make sure you fork rust-lang's repo, not somebody else's fork of that repo (e.g., nrc's).

Then make a local clone of your fork so you can work on it locally. I create a directory, then `cd` into it and use:

git clone git@github.com:nrc/rust.git .

Here, you'll replace the 'git@...' string with the identifier for your repo found on its GitHub page. The trailing `.` means we clone into the current directory instead of creating a new directory.

Finally. you'll want to create a reference to your fork (e.g., nrc/rust, called 'origin') and the original repo (rust-lang/rust, called 'upstream'):

git remote add upstream https://github.com/rust-lang/rust.git

Now you're all set to go and contribute something!

If I'm starting out with my own repo, then I'll first create a directory and write a bit of code in there, probably add a README.md file, and make sure something builds. Then, to make it a git repo I use

git init

then make an initial commit (see the next section for more details). Over on GitHub, go to the repos page and add a new, empty repo, choose a cool name, etc. The we have to associate the local repo with the one on GitHub:

git remote add origin git@github.com:nrc/rust-fmt.git

Finally, we can make the GitHub repo up to date with the local one (again, see below for more details):

git push origin master

Doing work

I usually start off by creating a new branch for my work. Create a branch called 'foo' using:

git checkout -b foo

There is always a 'master' branch which corresponds with the current state (as of the last time you updated) of the repo without any of your branches. I try to avoid working on master. You can switch between branches using `git checkout`, e.g.,

git checkout master
git checkout foo

Once I've done some work, I commit it. Eventually, when you submit the work upstream, a commit should be a self-contained, modular piece of work. However, when working locally I prefer to make many small commits and then sort them out later. I generally commit when I context switch to work on something else, when I have to make a decision I'm not sure about, or when I reach a point which seems like it could be a natural break in the proper commits I'll submit later. I usually commit using

git commit -a

The `-a` means all the changed files git knows about will be included in the commit. This is usually what I want. I sometimes use `-m "The commit message"`, but often prefer to use a text editor since it allows me to check which files are being committed.

Often, I don't want to create a new commit, but just add my current work to the last commit, then I use:

git commit -a --amend

If you've created new files as part of your work, you need to tell Git about them before committing, use:

git add path/to/file_name.rs

Updating

When I want to update the local repo to the upstream repo I use `git pull upstream master` (with my master branch checked out locally). Commonly, I want to update my master and then rebase my working branch to branch off the updated master.

Assuming I'm working on the foo branch, the recipe I use to rebase is:

git checkout master
git pull upstream master
git checkout foo
git rebase master

The last step will often require manual resolution of conflicts, after that you must `git add` the changed files and then `git rebase --continue`. That might happen several times.

If you've got a lot of commits, I find it is usually easier to squash a bunch of commits before rebasing - it sometimes means dealing with conflicts fewer times.

On the subject of updating the repo, there is a bit of a debate about rebasing vs merging. Rebasing has the advantage that it gives you a clean history and fewer merge commits (which are just boilerplate, most of the time). However, it does change your history, which if you are sharing your branch is very, very bad news. My rule of thumb is to rebase private branches (never merge) and to only merge (never rebase) branches which have been shared publicly. The latter generally means the master branch of repos that others are also working on (e.g., rustfmt). But sometimes I'll work on a project branch with someone else.

Current status

With all these repos, branches, commits, and so forth, it is pretty easy to get lost. Here are few commands I use to find out what I'm doing.

As an aside, because Rust is a compiled language and the compiler is big, I have multiple Rust repos on my local machine so I don't have to checkout branches too often.

Show all branches in the current repo and highlight the current one:

git branch

Show the history of the current branch (or any branch, foo):

git log
git log foo

Which files have been modified, deleted, etc.:

git status

All changes since last commit (excludes files which Git doesn't know about, e.g., new files which haven't been `git add`ed):

git diff

The changes in the last commit and since that commit:

git diff HEAD~1

Tidying up

Like I said above, I like to make a lot of small, work in progress commits and then tidy up later. To do that I use:

git rebase -i HEAD~n

Where `n` is the number of commits I want to tidy up. `rebase -i` lets you move commits, around squash them together, reword the commit messages, and so forth. I usually do a `rebase -i` before every rebase and a thorough one before submitting work.

Submitting work

Once I've tidied up the branch, I push it to my GitHub repo using:

git push origin foo

I'll often do this to backup my work too if I'm spending more than a day or so on it. If I've done this and rebased since then, then you need to add `-f` to the above command. Sometimes I want my branch to have a different name on the GitHub repo than I've had locally:

git push origin foo:bar

(The common use case here is foo = "fifth-attempt-at-this-stupid-piece-of-crap-bar-problem").

When ready to submit the branch, I go to the GitHub website and make a pull request (PR). Once that is reviewed, the owner of the upstream repo (or, often, a bot) will merge it into master.

Alternatively, if it is my repo I might create a branch and pull request, or I might manually merge and push:

git checkout master
git merge foo
git push origin master

Misc.

And here is a bunch of stuff I do all the time, but I'm not sure how to classify.

Delete a branch when I'm all done:

git branch -d foo

or

git branch -D foo

You need the capital 'D' if the branch has not been merged to master. With complex merges (e.g., if the branch got modified) you sometimes need capital 'D', even if the branch is merged.

Sometimes you need to throw away some work. If I've already committed, I use the following to throw away the last commit:

git reset HEAD~1

or

git reset HEAD~1 --hard

The first version leaves changes from the commit as uncommitted changes in your working directory. The second version throws them away completely. You can change the '1' to a larger number to throw away more than one commit.

If I have uncommitted changes I want to throw away, I use:

git checkout HEAD -f

This only gets rid of changes to tracked files. If you created new files, those won't be deleted.

Sometimes I need more fine-grained control of which changes to include in a commit. This often happens when I'm reorganising my commits before submitting a PR. I usually use some combination of `git rebase -i` to get the ordering right, then pop off a few commits using `git reset HEAD~n`, then add changes back in using:

git add -p

which prompts you about each change. (You can also use `git add filename` to add all the changes in a file). After doing all this, use `git commit` to commit. My muscle memory often appends the `-a`, which ruins all the work put in to separating out changes.

Sometimes this is too much work, in which case the best thing to do is save all the changes from your commits as a diff, edit them around in a text editor, then patch them back piece by piece when committing. Something like:

git diff ... >patch.diff
...
patch -p1

Every now and again, I'll need to copy a commit from one branch to another. I use `git log branch-name` to show the commits, copy the hash from the commit I want to copy, then use

git cherry-pick hash

to copy the commit into the current branch.

Finally, if things go wrong and you can't see a way out, `git reflog` is the secret magic that can fix nearly everything. It shows a log of pretty much everything Git has done, down to a fine level of detail. You can usually use this info to get out of any pickle (you'll have to google the specifics). However, Git only know about files which have been committed at least once, so even more reason to do regular, small commits.

rustfmt - call for contributions

2015-04-30T04:36:00.001+01:00

I've been experimenting with a rustfmt tool for a while now. Its finally in working shape (though still very, very rough) and I'd love some help on making it awesome.

rustfmt is a reformatting tool for Rust code. The idea is that it takes your code, tidies it up, and makes sure it conforms to a set of style guidelines. There are similar tools for C++ (clang format), Go (gofmt), and many other languages. Its a really useful tool to have for a language, since it makes it easy to adhere to style guidelines and allows for mass changes when guidelines change, thus making it possible to actually change the guidelines as needed.

Eventually I would like rustfmt to do lots of cool stuff like changing glob imports to list imports, or emit refactoring scripts to rename variables to adhere to naming conventions. In the meantime, there are lots of interesting questions about how to lay out things like function declarations and match expressions.

My approach to rustfmt is very incremental. It is usable now and gives good results, but it only touches a tiny subset of language items, for example function definitions and calls, and string literals. It preserves code elsewhere. This makes it immediately useful.

I have managed to run it on several crates (or parts of crates) in the rust distro. It also bootstraps, i.e., you can rustfmt on rustfmt before every check-in, in fact this is part of the test suite.

It would be really useful to have people running this tool on their own code or on other crates in the rust distro, and filing issues and/or test cases where things go wrong. This should actually be a useful tool to run, not just a chore, and will get more useful with time.

It's a great project to hack on - you'll learn a fair bit about the Rust compiler's frontend and get a great understanding of more corners of the language than you'll ever want to know about. It's early days too, so there is plenty of scope for having a big impact on the project. I find it a lot of fun too! Just please forgive some of the hackey code that I've already written.

Here is the rustfmt repo on GitHub. I just added a bunch of information to the repo readme which should help new contributors. Please let me know if there is other information that should go in there. I've also created some good issues for new contributors. If you'd like to help out and need help, please ping me on irc (I'm nrc).

Contributing to Rust

2015-04-13T22:40:00.002+01:00

I wrote a few things about contributing to Rust. What with the imminent 1.0 release, now is a great time to learn more about Rust and contribute code, tests, or docs to Rust itself or a bunch of other exciting projects.

The main thing I wanted to do was make it easy to find issues to work on. I also stuck in a few links to various things that new contributors should find useful.

I hope it is useful, and feel free to ping me (nrc in #rust-internals) if you want more info.

New tutorial - arrays and vectors in Rust

2015-04-12T02:36:00.001+01:00

I've just put up a new tutorial on Rust for C++ programmers: arrays and vectors. This covers everything you might need to know about array-like sequences in Rust (well, not everything, but at least some of the things).

As well as the basics on arrays, slices, and vectors (Vec), I dive into the differences in representing arrays in Rust compared with C/C++, describe how to use Rust's indexing syntax with your own collection types, and touch on some aspects of dynamically sized types (DSTs) and fat pointers in Rust.

Graphs in Rust

2015-04-03T02:17:00.002+01:00

Graphs are a bit awkward to construct in Rust because of Rust's stringent lifetime and mutability requirements. Graphs of objects are very common in OO programming. In this tutorial I'm going to go over a few different approaches to implementation. My preferred approach uses arena allocation and makes slightly advanced use of explicit lifetimes. I'll finish up by discussing a few potential Rust features which would make using such an approach easier.

There are essentially two orthogonal problems: how to handle the lifetime of the graph and how to handle it's mutability.

The first problem essentially boils down to what kind of pointer to use to point to other nodes in the graph. Since graph-like data structures are recursive (the types are recursive, even if the data is not) we are forced to use pointers of some kind rather than have a totally value-based structure. Since graphs can be cyclic, and ownership in Rust cannot be cyclic, we cannot use Box<Node> as our pointer type (as we might do for tree-like data structures or linked lists).

No graph is truly immutable. Because there may be cycles, the graph cannot be created in a single statement. Thus, at the very least, the graph must be mutable during its initialisation phase. The usual invariant in Rust is that all pointers must either be unique or immutable. Graph edges must be mutable (at least during initialisation) and there can be more than one edge into any node, thus no edges are guaranteed to be unique. So we're going to have to do something a little bit advanced to handle mutability.

...

Read the full tutorial, with examples. There's also some discussion of potential language improvements in Rust to make dealing with graphs easier.

Creating a drop-in replacement for the Rust compiler

2015-02-23T20:33:00.001+00:00

Many tools benefit from being a drop-in replacement for a compiler. By this, I mean that any user of the tool can use `mytool` in all the ways they would normally use `rustc` - whether manually compiling a single file or as part of a complex make project or Cargo build, etc. That could be a lot of work; rustc, like most compilers, takes a large number of command line arguments which can affect compilation in complex and interacting ways. Emulating all of this behaviour in your tool is annoying at best, especially if you are making many of the same calls into librustc that the compiler is.

The kind of things I have in mind are tools like rustdoc or a future rustfmt. These want to operate as closely as possible to real compilation, but have totally different outputs (documentation and formatted source code, respectively). Another use case is a customised compiler. Say you want to add a custom code generation phase after macro expansion, then creating a new tool should be easier than forking the compiler (and keeping it up to date as the compiler evolves).

I have gradually been trying to improve the API of librustc to make creating a drop-in tool easier to produce (many others have also helped improve these interfaces over the same time frame). It is now pretty simple to make a tool which is as close to rustc as you want it to be. In this tutorial I'll show how.

Note/warning, everything I talk about in this tutorial is internal API for rustc. It is all extremely unstable and likely to change often and in unpredictable ways. Maintaining a tool which uses these APIs will be non- trivial, although hopefully easier than maintaining one that does similar things without using them.

This tutorial starts with a very high level view of the rustc compilation process and of some of the code that drives compilation. Then I'll describe how that process can be customised. In the final section of the tutorial, I'll go through an example - stupid-stats - which shows how to build a drop-in tool.

Continue reading on GitHub...

Recent syntactic changes to Rust

2015-01-11T22:27:00.001+00:00

The last few weeks I implemented a few syntactic changes in Rust. I wanted to go over those and explain the motivation so it doesn't just seem like churn. None of the designs were mine, but I agree with all of them. (Oh, by the way, did you see we released the 1.0 Alpha?!!!!!!).

Slicing syntax

Slicing syntax has changed from `foo[a..b]` to `&foo[a..b]`, although that might not look like much of a change, it is actually the deepest one I'll cover here. Under the covers we moved from having a `Slice` trait to using the `Index` trait. So `&foo[a..b]` works exactly the same way as overloaded indexing `foo[a]`, the difference being that slicing is indexing using a range. One advantage of this approach is that ranges are now first class expressions - you can write `for i in 0..5 { ... }` and get the expected result. The other advantage is that the borrow becomes explicit (the newly required `&`). Since borrowing is important in Rust, it is great to see where things are borrowed and so we are trying to make that as explicit as possible. Previously, the borrow was implicit. The final benefit of this approach is that there is one fewer 'blessed' traits and one fewer kind of expression in the language.

Fixed length and repeating arrays

The syntax for these changed from `[T, ..n]` and `[expr, ..n]` to `[T; n]` and `[expr; n]`, respectively. This is a pretty minor change and the motivation was to allow the wider use of range syntax (see above) - the `..n` could ambiguously be a range or part of these expressions. The `..` syntax is somewhat overloaded in Rust already, so I think this is a nice change in that respect. The semicolon seems just as clear to me, involves fewer characters, and there is less confusion with other expressions.

Sized bound

Used for dynamically sized types the `Sized?` bound indicated that a type parameter may or not be bounded by the `Sized` trait (type parameters have the `Sized` bound by default). Being bound by the `Sized` trait indicates to the compiler that the object has statically known size (as opposed to DST objects). We changed the syntax so that rather than writing `Sized? T` you write `T: ?Sized`. This has the advantage that it is more regular - now all bounds (regular or optional) come after the type parameter name. It will also fit with negative bounds when they come about, which will have the syntax `!Foo`.

`self` in `use` imports

We used to accept `mod` in `use` imports to allow the import of the module itself, e.g., `use foo::bar::{mod, baz};` would import `foo::bar` and `foo::bar::baz`. We changed `mod` to `self`, making the example `use foo::bar::{self, baz};`. This is a really minor change, but I like it - `self`/`Self` has a very consistent meaning as a variable of some kind, whereas `mod` is a keyword; I think that made the original syntax a bit jarring. We also recently introduced scoped enums, which make the pattern of including a module (in this case an enum) and its components more common. Especially in the enum case, I think `self` fits better than `mod` because you are referring to data rather than a module.

`derive` in attributes

The `derive` attribute is used to derive some standard traits (e.g., `Copy`, `Show`) for your data structures. It was previously called `deriving`, and now it is `derive`. This is another minor change, but makes it more consistent with other attributes, and consistency is great!

Thanks to Japaric for doing a whole bunch of work converting our code to using the new syntaxes.

My thoughts on Rust in 2015

2014-12-27T22:52:00.000+00:00

Disclaimer: these are my personal thoughts on where I would like to see Rust go in 2015 and what I would like to work on. They are not official policy from the Rust team or anything like that.

The obvious big thing coming up in 2015 is the 1.0 release. All my energy will be pretty much devoted to that in the first quarter. From the compiler's point of view, we are in pretty good shape for the release. I'll be mainly working on associated types (with Niko) - trying to make them as feature complete and as bug free as possible. This is primarily motivated by the standard library; we want to provide everything the standard library needs here. Other things on my work list are a number of syntactic changes (`?Sized`, slicing (this is not purely syntactic), other minor changes as they come up) and coercions (here, there are a few backwards incompatible things, and providing a really solid, ergonomic story around coercions is important). Any other time is likely to be spent on 'polish' things such as improving error messages and refactoring the compiler in minor ways; possibly even some hacking on the libraries if that would be most useful.

We (the Rust community) also need to do some planning for post-1.0 Rust. There are obviously a lot of features people would like post-1.0 and it would be chaos to try to implement them all asap. So, we need to decide what is in scope for the immediate future of Rust. My personal preference is to focus on stability for a while and only add a minimum of new language features. Stability for me means building better, broader libraries, supporting the ecosystem (work on Cargo and crates.io, for example), fixing bugs, and making the compiler friendlier to work with - both for regular users (error messages, build times, etc.) and for tooling.

'A minimum of language features' will probably mean quite a few new things actually, since we postponed a lot of work due to 1.0 which are pretty high priority. UFCS, custom DST coercions, cross-borrowing coercions, and being able to return a bare trait from a function are high on my wish list. There are a couple of large features which seem debatable for immediate work - higher kinded types and efficient inheritance. Ideally, we would put both of these off, but the former is in great demand for improving the standard libraries and the latter would really help work on Servo (and also improve the compiler in many ways, depending on the chosen solution). Both of these need a lot of design work as well as being big implementation jobs.

Finally, the biggest piece of Rust making me uncomfortable right now is macros. We will have limited macro support for 1.0. I would like to start looking at macro rules 2.0 and a better system for syntax extensions as soon as possible. The longer we leave this, the more people will use and get used to the existing solution or start using external tools to replace syntax extensions.

My vision for macro rules, is basically a tweaked version of today's. Fundamentally, I think hygiene needs to be built into everything from the ground up, including more sophisticated 'type' annotations on macro arguments. Unfortunately, I think this means a lot of work, including rewriting the compiler's name resolution machinery (but then we want to do that anyway).

Syntax extensions/procedural macros have many more open questions. I'm not sure how to make these hygienic and not too painful to write. There is also the question of the interface we make available. Using the compiler's AST and giving extensions access to the entire compiler is pretty awful. There are a number of alternatives: my preference is to separate libsyntax's AST from the compiler's and make the former available as an interface, along with a library of convenience functions. Other alternatives are using source text or token trees as input and output, or relying much more heavily on quasi-quoting.

Looking further ahead, I don't have too many big language features I'm wanting to push forward (efficient inheritance and nested enums/refinement types, being the exception; perhaps parameterised modules at some point). There are a bunch of smaller ones I'm keen on though (type ascription, explicit lifetimes in the syntax, parameterising with ints to make fixed length arrays more usable (which is related to CTFE), etc.), but they are mostly low priority.

I am keen, however, on pushing forward on making the compiler a better piece of software and on improving support for tooling. I think these are two sides of the same coin really. Some ideas I have, in rough order of importance/priority/ease of implementation:

fix parallel codegen. This is currently broken with what I suspect is a minor bug. There is another bug preventing having it turned on by default. Given that this halves build times of the compiler for me, I am keen to get it fixed.
Land incremental codegen. Stuart Pernsteiner came super-close to finishing this. It should be relatively easy to finish it and land it. It should then have a massive effect on compilation performance.
Improve the output of save-analysis to be more useful and general purpose. I hope this can become an API of sorts for many compiler-based tools, especially in the medium term, before we are ready to expose a better and more permenant API.
Separate the libsyntax and compiler ASTs. This is primarily in support of the macro changes described above, but I think it is important for allowing long term evolution of the compiler too.
Refactor name resolution. Again important for macro reform, but also it is a confusing and buggy part of the compiler.
Make the CFG the first class data structure for all stages from borrow checking onwards.
Refactor trans to use its own IR, i.e., make it a series of lowering steps: CFG -> IR -> LLVM. This should hopefully make trans easier to understand and extend, and allow for adding our own optimisation passes.
Refactor metadata. This is a really ugly part of the compiler. We could make much better use of metadata for tools (there is a lot of overlap with save-analysis output and debuginfo, and it is used by RustDoc), but it is really hard to use at the moment. It is also crucial to have good metadata support for incremental compilation. I hope metadata can become just a straightforward serialisation of some compiler IR, this may mean we need another IR between the compiler's AST and the CFG.
Incremental compilation. This is a huge job of work, but really necessary for a lot of tools and would go along way to solving our compile time problems. It is very dependent on a better compiler architecture.

I guess that not all of this will get done in 2015, but I can dream...

Side projects

Some things I want to work on in my spare time (and possibly a little bit of Mozilla time, if it is prioritised right): get DXR up and going. Frustratingly, this was working back in June or July, but then DXR underwent a massive refactoring and I had to port across the Rust plugin. That has dragged on because I have been low on both time and motivation. But it is such a useful tool that I am keen to get this working, and it is getting close now. Once it is 'done' there are always ways to make it better (for example, my latest addition is showing everything imported by a glob import).

Syntax extensions on methods. I hacked this up on the plane to Portland last month, but fixing the details has taken a long while. I think it is ready now, but I need to implement something that uses it to make sure that the design is right. This project meant changing a few things with syntax extensions, and I hope to deprecate and remove some of the older, less flexible versions. Plus, there are some other follow up things.

The motivation for syntax extensions on methods is libhoare. I want to be able to put pre- and postconditions on methods. I have this working in some cases, but there is a lot of duplicate code and I think I can do better. Perhaps not though. Perhaps it can also inform some of the design around libraries for syntax extensions.

A while back I thought of adding `scanln`, a compliment to `println` using the same infrastructure, but for easy input rather easy output. I ended up forgetting about this because the RFC for adding it was rejected due to no out-of-tree implementation, but my implementation required big changes to the existing `format!` support. I would like to resurrect this project, since lack of really easy input is the number one thing which puts me off using Rust for a lot of small tasks. I believe it also raises the bar for Rust being taught at universities, etc.

I have some crazy ideas around a tool that would be a hybrid of grep/sed-with-knowledge-of-syntax-and-types, refactoring tools, and a Rustfix/Rustfmt kind of thing. It seems I need this kind of thing a lot - I spend a lot of time on tasks which are marginally more complicated than sed can handle, but much easier than needing a full refactoring tool.

rustaceans.org

2014-12-23T03:45:00.000+00:00

I was getting frustrated trying to map people's irc nicks to their GitHub usernames (and back again). I assume other people were having the same problem too. It's pretty hard to envisage a good technical solution to this. The best I could come up with was having a community phone book for the Rust community. I had been meaning to experiment a bit with some modern web dev technologies, so I thought this would be a good opportunity.

Some of the technologies I was interested in were the modern, client-side, JS frameworks (Ember, Angular, React, etc.), node.js, and RESTful APIs. I ended up using Ember, node.js, and the GitHub API. I had fun learning about these technologies and learnt a lot, although I don't think I did more than scratch the surface, especially with Ember, which is HUGE.

What made the project a little bit more interesting is that I have absolutely no interest in getting involved with user credentials - there is simply too much that can go wrong, security-wise, and no one wants to remember another username and password. To deal with this, I observed that pretty much everyone in the Rust community already has a GitHub account, so why not let GitHub do the hard work with security and logins, etc. It is possible to use GitHub authentication on your own website, but I thought it would be fun to use pull requests to maintain user data in the phone book, rather than having to develop a UI for adding and editing user data.

The design of rustaceans.org follows from the idea of making it pull request based: there is a repository, hosted on GitHub, which contains a JSON file for each user. Users can add or update their information by sending a pull request. When a PR is submitted, a GitHub hook sends a request to the rustaceans.org backend (the node.js bit). The backend does a little sanity checking (most importantly that the user has only updated their own data), then merges the PR, then updates the backing database with the user's new data (the db could be considered a cache for the user data repository, it can be completely rebuilt from the repo when necessary).

The backend exposes a web service to access the db. This provides two functions as an http API (I would say it is RESTful, but I'm not 100% sure that it is) - search for a user with some string, and get a specific user by GitHub username. These just pull data out of the database and return it as JSON (not quite the same JSON as users submit, the data has been processed a little bit, for example, parsing the 'notes' field as markdown).

The frontend is the actual rustaceans.org webpage, which is just a small Ember app, and is a pretty simple UI wrapper around the backend web service. There is a front page with some info and a search box, and you can use direct links to users, e.g., http://www.rustaceans.org/nick29581.

All the implementation is pretty straightforward, which I think verifies the design to some extent. The hardest part was learning the new technologies. While using the site is certainly different from a regular setup where you would modify your own details on the site, it seems to be pretty successful. I've had no complaints, and we have a fair number of rustaceans in the db. Importantly, it has needed very little manual intervention - users presumably understand the procedures, and automation is working well.

Please take a look! And if you know any of those technologies, have a look at the source code and let me know where I could have done better (also, patches and issues are very welcome). And of course, if you are part of the Rust community, please add yourself!

Notes on training for sport

2014-12-15T01:56:00.003+00:00

I like training for sports. I have rock climbed a lot, and done a bit of swimming and kick boxing. I wouldn't say I'm very good at any of those, but I have definitely improved a lot. I think some of what I learned might be interesting, so here it is. None of this is very scientific, especially since the number of participants in the study is one. There are also lots of better sources - articles by coaches and sports scientists, rather than amateurs like me. Still, if you are interested, read on.

* Don't get injured. Injury prevention should be your number one goal when training. That means knowing your limits, warming up, and doing 'pre-hab' exercises (these are exercises that don't directly get you closer to your goals, but reduce the risk of injury by training the antagonistic muscles or improving mobility, etc). The weeks you miss due to injury will affect progress more than any other factor in your training.

* Training must by super-specialised. Training works because the body adapts to the pressures put on it by training. But that adaptation is much more specialised than you might think, especially once you get more advanced in your training. This can lead to some surprises. For example, most climbs are most demanding on the fingers (but see section on technique, below), that means doing pullups will not help you achieve these kinds of climbs. Likewise, be very precise about where you are training on the power-endurance spectrum, being able to run for many kms will not help you run 100m any faster.

This also goes for training the antagonists. For example, doing pressups or bench presses makes climbers more imbalanced, not less. This is because it is usually the shoulders which cause more trouble than the elbows (those exercises are good for balancing the elbows). Rebalancing the shoulders requires exercises that bring the shoulder blades back and down, such as rowing-type exercises. Balancing the elbow is also interesting - climbing tends to overuse the brachioradialis (used for pull ups with the palms out or curls with the palms down) vs the biceps, so doing curls (palms up) can improve muscle balance in the elbow, even though the biceps are usually thought of as a climbing muscle.

Also, yoga is terrible for climbers, balance-wise.

* Technique is important. You always think your technique is good enough, but it can usually be better. This is obvious for technique-based sports like climbing and kick boxing, but it holds true for basically all exercise, even lifting weights - my biggest gains in the bench press and deadlift came from technique coaching, and that was starting from 'good' technique.

* Stretch when warm. Stretching is not a warm up and stretching cold is really bad, I got injured this way a few times. This is important for yoga in particular, you really need to go gentle until you are warmed up.

* To train effectively you need to repeat the same thing and make it progressively more difficult. Doing one session of an exercise doesn't make any difference, you have to do that session once a week (or more) for six weeks (or more) and keep increasing resistance or going for longer. However, if you keep doing the same thing for too long, you'll hit a plateau and won't improve. I found this hardest with the things I have most fun doing - I don't want to change them up because I love doing it, but if you don't make changes, you don't keep improving.

* Protein is the king of nutrients, at least as far as training is concerned. I had this vividly illustrated when I turned vegetarian, my performance dropped and I lost muscle mass. I solved that problem with protein shakes and I think getting plenty of protein is the best thing you can do for you diet when training hard.

There is a meme that too much protein is bad for you in some way, but I don't think that is true, at least as long as you have an otherwise balanced diet (plenty of fibre, vitamins, etc.). Research linking excess protein to kidney damage only indicates that if you already have kidney damage, then excess protein can make it worse. No research (afaik) indicates that excess protein can cause kidney damage in the first place.

I haven't found any other supplements to be anywhere near as worth while. BCAAs seem to have a significant but small effect. Some carbs in drinking water when training also seems to help a little. Creatine made a big difference, but the weight gain (in water retention) meant it was not worth it for climbing (except maybe to escape a plateau), and when you stop taking it the withdrawal is harsh, training-wise.

Thoughts on numeric types

2014-10-22T04:31:00.000+01:00

Rust has fixed width integer and floating point types (`u8`, `i32`, `f64`, etc.). It also has pointer width types (`int` and `uint` for signed and unsigned integers, respectively). I want to talk a bit about when to use which type, and comment a little on the ongoing debate around the naming and use of these types.

Choosing a type

Hopefully you know whether you want an integer or a floating point number. From here on in I'll assume you want an integer, since they are the more interesting case. Hopefully you know if you need your integer to be signed or not. Then it gets interesting.

All other things being equal, you want to use the smallest integer you can for performance reasons. Programs run as fast or faster on smaller integers (especially if the larger integer has to be emulated in software, e.g., `u64` on a 32 bit machine). Furthermore, smaller integers will give smaller code.

At this point you need to think about overflow. If you pick a type which is too small for your data, then you will get overflow and usually bugs. You very rarely want overflow. Sometimes you do - if you want arithmetic modulo 2^n where n is 8, 16, 32, or 64, you can use the fixed width type and rely on overflow behaviour. Sometimes you might also want signed overflow for some bit twiddling tricks. But usually you don't want overflow.

If your data could grow to any size, you should use a type which will never overflow, such as Rust's `num::bigint::BigInt`. You might be able to do better performance-wise if you can prove that values might only overflow in certain places and/or you can cope with overflow without 'upgrading' to a wider integer type.

If, on the other hand, you choose a fixed width integer, you are asserting that the value will never exceed that size. For example, if you have an ascii character, you know it won't exceed 8 bits, so you can use `u8` (assuming you're not going to do any arithmetic which might cause overflow).

So far, so obvious. But, what are `int` and `uint` for? These types are pointer width, that means they are the same size as a pointer on the system you are compiling for. When using these types, you are asserting that a value will never grow larger than a pointer (taking into account details about the sign, etc.). This is actually quite a rare situation, the usual case is when indexing into an array, which is itself quite rare in Rust (since we prefer using an iterator).

What you should never do is think "this number is an integer, I'll use `int`". You must always consider the maximum size of the integer and thus the width of the type you'll use.

Language design issues

There are a few questions that keep coming up around numeric types - how to name the types? Which type to use as a default? What should `int`/`uint` mean?

It should be clear from the above that there are only very few situations when using `int`/`uint` is the right choice. So, it is a terrible choice for any kind of default. But what is a good choice? Well first of all, there are two meanings for 'default': the 'go to' type to represent an integer when programming (especially in tutorials and documentation), and the default when a numeric literal does not have a suffix and type inference can't infer a type. The first is a matter of recommendation and style, and the second is built-in to the compiler and language.

In general programming, you should use the right width type, as discussed above. For tutorials and documentation, it is often less clear which width is needed. We have had an unfortunate tendency to reach for `int` here because it is the most familiar and the least scary looking. I think this is wrong. We should probably use a variety of sized types so that newcomers to Rust get aquainted with the fixed width integer types and don't perpetuate the habit of using `int`.

For a long time, Rust had `int` as the default type when the compiler couldn't decide on something better. Then we got rid of the default entirely and made it a type error if no precise numeric type could be inferred. Now we have decided we should have a default again. The problem is that there is no good default. If you aren't being explicit about the width of the value, you are basically waving your hands about overflow and taking a "it'll be fine, whatever" approach, which is bad programming. There is no default choice of integer which is appropriate for that situation (except a growable integer like BigInt, but that is not an appropriate default for a systems langauge). We could go with `i64` since that is the least worst option we have in terms of overflow (and thus safety). Or we could go with `i32` since that is probably the most performant on current processors, but neither of these options are future-proof. We could use `int`, but this is wrong since it is so rare to be able to reason that you won't overflow when you have an essentially unknown width. Also, on processors with less than 32 bit pointers, it is far to easy to overflow. I suspect there is no good answer. Perhaps the best thing is to pick `i64` because "safety first".

Which brings us to naming. Perhaps `int`/`uint` are nor the best names since they suggest they should be the default types to use when they are not. Names such as `index`/`uindex`, `int_ptr`/`u_ptr`, and `size`/`usize` have been suggested. All of these are better in that they suggest the proper use for these types. They are not such nice names though, but perhaps that is OK, since we should (mostly) discourage their use. I'm not really sure where I stand on this, again I don't really like any of the options, and at the end of the day, naming is hard.

A gotcha with raw pointers and unsafe code

2014-09-13T02:06:00.000+01:00

This bit me today. It's not actually a bug and it only happens in unsafe code, but it is non-obvious and something to be aware of.

There are a few components to the issue. First off, we must look at `&expr` where `expr` is an rvalue, that is a temporary value. Rust allows you to write (for example) `&42` and through some magic, `42` will be allocated on the stack and `x` will be a reference to it with an inferred lifetime shorter than the value. For example, `let x = &42i;` works as does

struct Foo<'a> {
f: &'a int,
}
fn main() {
let x = Foo { f: &42 };
}

Next, we must know that borrowed pointers (`&T`) can be implicitly coerced to raw pointers (`*T`). So if you write `let x: *const int = &42;`, `x` is a raw pointer produced by coercing the borrowed pointer. Once this happens, you have no safety guarantees - a raw pointer can point at memory that has already been freed. This is fine, since you see the raw pointer type and must be aware, but if the type comes from a struct field what looks like a borrowed pointer could actually be a raw pointer:

struct Bar {
f: *const int,
}
fn main() {
let x = Bar { f: &42 };
}

Imagine that `Bar` is some other module or crate, then you might assume that `main` is ok here. But it is not. Since the borrowed pointer has a narrow scope and is not stored (the raw pointer does not count for this analysis), the compiler can choose to delete the `42` allocated on the stack and reuse that memory straight after (or even during, probably) the `let` statement. So, `x.f` is potentially a dangling pointer as soon as `x` is available and accessing it will give you bugs. That is OK, you can only do so in an `unsafe` block and thus you should check (i.e., as a programmer you should check) that you can't get a dangling pointer. You must do this whenever you dereference a raw pointer, and the fact that it must happen in unsafe code is your cue to do so.

The final part of this gotcha was that I was already in unsafe code and was transmuting. Of course transmuting is awful and you should never do it, but sometimes you have to. If you do `unsafe { transmute(x) }` then there is no cue in the code that you have a raw pointer. You have no cue to check the dereference, because there is no dereference! You just get a weird bug that only appears on some platforms and depends on the optimisation level of compilation.

Unfortunately, there is nothing we can really do from the language point of view - you just have to be super-careful around unsafe code, and especially transmutes.

Hat-tip to eddyb for figuring out what was going on here.

LibHoare - pre- and postconditions in Rust

2014-07-23T22:13:00.000+01:00

I wrote a small macro library for writing pre- and postconditions (design by contract style) in Rust. It is called LibHoare (named after the logic and in turn Tony Hoare) and is here (along with installation instructions). It should be easy to use in your Rust programs, especially if you use Cargo. If it isn't, please let me know by filing issues on GitHub.

The syntax is straightforward, you add `[#precond="predicate"]` annotations before a function where `predicate` is any Rust expression which will evaluate to a bool. You can use any variables which would be in scope where the function is defined and any arguments to the function. Preconditions are checked dynamically before a function is executed on every call to that function.

You can also write `[#postcond="predicate"]` which is checked on leaving a function and `[#invariant="predicate"]` which is checked before and after. You can write any combination of annotations too. In postconditions you can use the special variable `result` (soon to be renamed to `return`) to access the value returned by the function.

There are also `debug_*` versions of each annotation which are not checked in --ndebug builds.

The biggest limitation at the moment is that you can only write conditions on functions, not methods (even static ones). This is due to a restriction on where any annotation can be placed in the Rust compiler. That should be resolved at some point and then LibHoare should be pretty easy to update.

If you have ideas for improvement, please let me know! Contributions are very welcome.

# Implementation

The implementation of these syntax extensions is fairly simple. Where the old function used to be, we create a new function with the same signature and an empty body. Then we declare the old function inside the new function and call it with all the arguments (generating the list of arguments is the only interesting bit here because arguments in Rust can be arbitrary patterns). We then return the result of that function call as the result of the outer function. Preconditions are just an `assert!` inserted before calling the inner function and postconditions are an `assert!` inserted after the function call and before returning.

Rust for C++ programmers - part 9: destructuring pt2 - match and borrowing

2014-07-17T02:19:00.001+01:00

(Continuing from part 8, destructuring).

(Note this was significantly edited as I was wildly wrong the first time around. Thanks to /u/dpx-infinity for pointing out my mistakes.)

When destructuring there are some surprises in store where borrowing is concerned. Hopefully, nothing surprising once you understand borrowed references really well, but worth discussing (it took me a while to figure out, that's for sure. Longer than I realised, in fact, since I screwed up the first version of this blog post).

Imagine you have some `&Enum` variable `x` (where `Enum` is some enum type). You have two choices: you can match `*x` and list all the variants (`Variant1 => ...`, etc.) or you can match `x` and list reference to variant patterns (`&Variant1 => ...`, etc.). (As a matter of style, prefer the first form where possible since there is less syntactic noise). `x` is a borrowed reference and there are strict rules for how a borrowed reference can be dereferenced, these interact with match expressions in surprising ways (at least surprising to me), especially when you a modifying an existing enum in a seemingly innocuous way and then the compiler explodes on a match somewhere.

Before we get into the details of the match expression, lets recap Rust's rules for value passing. In C++, when assigning a value into a variable or passing it to a function there are two choices - pass-by-value and pass-by-reference. The former is the default case and means a value is copied either using a copy constructor or a bitwise copy. If you annotate the destination of the parameter pass or assignment with `&`, then the value is passed by reference - only a pointer to the value is copied and when you operate on the new variable, you are also operating on the old value.

Rust has the pass-by-reference option, although in Rust the source as well as the destination must be annotated with `&`. For pass-by-value in Rust, there are two further choices - copy or move. A copy is the same as C++'s semantics (except that there are no copy constructors in Rust). A move copies the value but destroys the old value - Rust's type system ensures you can no longer access the old value. As examples, `int` has copy semantics and `Box<int>` has move semantics:

fn foo() {
    let x = 7i;
    let y = x;                // x is copied
    println!("x is {}", x);   // OK

    let x = box 7i;
    let y = x;                // x is moved
    //println!("x is {}", x); // error: use of moved value: `x`
}

Rust determines if an object has move or copy semantics by looking for destructors. Destructors probably need a post of their own, but for now, an object in Rust has a destructor if it implements the `Drop` trait. Just like C++, the destructor is executed just before an object is destroyed. If an object has a destructor then it has move semantics. If it does not, then all of its fields are examined and if any of those do then the whole object has move semantics. And so on down the object structure. If no destructors are found anywhere in an object, then it has copy semantics.

Now, it is important that a borrowed object is not moved, otherwise you would have a reference to the old object which is no longer valid. This is equivalent to holding a reference to an object which has been destroyed after going out of scope - it is a kind of dangling pointer. If you have a pointer to an object, there could be other references to it. So if an object has move semantics and you have a pointer to it, it is unsafe to dereference that pointer. (If the object has copy semantics, dereferencing creates a copy and the old object will still exist, so other references will be fine).

OK, back to match expressions. As I said earlier, if you want to match some `x` with type `&T` you can dereference once in the match clause or match the reference in every arm of the match expression. Example:

enum Enum1 {
    Var1,
    Var2,
    Var3
}

fn foo(x: &Enum1) {
    match *x { // Option 1: deref here.
        Var1 => {}
        Var2 => {}
        Var3 => {}
    }

    match x {
        // Option 2: 'deref' in every arm.
        &Var1 => {}
        &Var2 => {}
        &Var3 => {}
    }
}

In this case you can take either approach because `Enum1` has copy semantics. Let's take a closer look at each approach: in the first approach we dereference `x` to a temporary variable with type `Enum1` (which copies the value in `x`) and then do a match against the three variants of `Enum1`. This is a 'one level' match because we don't go deep into the value's type. In the second approach there is no dereferencing. We match a value with type `&Enum1` against a reference to each variant. This match goes two levels deep - it matches the type (always a reference) and looks inside the type to match the referred type (which is `Enum1`).

Either way, we must ensure that we (that is, the compiler) must ensure we respect Rust's invariants around moves and references - we must not move any part of an object if it is referenced. If the value being matched has copy semantics, that is trivial. If it has move semantics then we must make sure that moves don't happen in any match arm. This is accomplished either by ignoring data which would move, or making references to it (so we get by-reference passing rather than by-move).

enum Enum2 {
    // Box has a destructor so Enum2 has move semantics.
    Var1(Box<int>),
    Var2,
    Var3
}

fn foo(x: &Enum2) {
    match *x {
        // We're ignoring nested data, so this is OK
        Var1(..) => {}
        // No change to the other arms.
        Var2 => {}
        Var3 => {}
    }

    match x {
        // We're ignoring nested data, so this is OK
        &Var1(..) => {}
        // No change to the other arms.
        &Var2 => {}
        &Var3 => {}
    }
}

In either approach we don't refer to any of the nested data, so none of it is moved. In the first approach, even though `x` is referenced, we don't touch its innards in the scope of the dereference (i.e., the match expression) so nothing can escape. We also don't bind the whole value (i.e., bind `*x` to a variable), so we can't move the whole object either.

We can take a reference to any variant in the second match, but not in the derferenced version. So, in the second approach replacing the second arm with `a @ &Var2 => {}` is OK (`a` is a reference), but under the first approach we couldn't write `a @ Var2 => {}` since that would mean moving `*x` into `a`. We could write `ref a @ Var2 => {}` (in which `a` is also a reference), although it's not a construct you see very often.

But what about if we want to use the data nested inside `Var1`? We can't write:

    match *x {
        Var1(y) => {}
        _ => {}
    }

    match x {
        &Var1(y) => {}
        _ => {}
    }

because in both cases it means moving part of `x` into `y`. We can use the 'ref' keyword to get a reference to the data in `Var1`: `&Var1(ref y) => {}`.That is OK, because now we are not dereferencing anywhere and thus not moving any part of `x`. Instead we are creating a pointer which points into the interior of `x`.

Alternatively, we could destructure the Box (this match is going three levels deep): `&Var1(box y) => {}`. This is OK because `int` has copy semantics and `y` is a copy of the `int` inside the `Box` inside `Var1` (which is 'inside' a borrowed reference). Since `int` has copy semantics, we don't need to move any part of `x`. We could also create a reference to the int rather than copy it: `&Var1(box ref y) => {}`. Again, this is OK, because we don't do any dereferencing and thus don't need to move any part of `x`. If the contents of the Box had move semantics, then we could not write `&Var1(box y) => {}`, we would be forced to use the reference version. We could also use similar techniques with the first approach to matching, which look the same but without the first `&`. For example, `Var1(box ref y) => {}`.

Now lets get more complex. Lets say you want to match against a pair of reference-to-enum values. Now we can't use the first approach at all:

fn bar(x: &Enum2, y: &Enum2) {
    // Error: x and y are being moved.
    // match (*x, *y) {
    //     (Var2, _) => {}
    //     _ => {}
    // }

    // OK.
    match (x, y) {
        (&Var2, _) => {}
        _ => {}
    }
}

The first approach is illegal because the value being matched is created by dereferencing `x` and `y` and then moving them both into a new tuple object. So in this circumstance, only the second approach works. And of course, you still have to follow the rules above for avoiding moving parts of `x` and `y`.

If you do end up only being able to get a reference to some data and you need the value itself, you have no option except to copy that data. Usually that means using `clone()`. If the data doesn't implement clone, you're going to have to further destructure to make a manual copy or implement clone yourself.

What if we don't have a reference to a value with move semantics, but the value itself. Now moves are OK, because we know no one else has a reference to the value (the compiler ensures that if they do, we can't use the value). For example,

fn baz(x: Enum2) {
    match x {
        Var1(y) => {}
        _ => {}
    }
}

There are still a few things to be aware of. Firstly, you can only move to one place. In the above example we are moving part of `x` into `y` and we'll forget about the rest. If we wrote `a @ Var1(y) => {}` we would be attempting to move all of `x` into `a` and part of `x` into `y`. That is not allowed, an arm like that is illegal. Making one of `a` or `y` a reference (using `ref a`, etc.) is not an option either, then we'd have the problem described above where we move whilst holding a reference. We can make both `a` and `y` references and then we're OK - neither is moving, so `x` remains in tact and we have pointers to the whole and a part of it.

Similarly (and more common), if we have a variant with multiple pieces of nested data, we can't take a reference to one datum and move another. For example if we had a `Var4` declared as `Var4(Box<int>, Box<int>)` we can have a match arm which references both (`Var4(ref y, ref z) => {}`) or a match arm which moves both (`Var4(y, z) => {}`) but you cannot have a match arm which moves one and references the other (`Var4(ref y, z) => {}`). This is because a partial move still destroys the whole object, so the reference would be invalid.

Rust for C++ programmers - part 8: destructuring

2014-07-13T05:13:00.000+01:00

First an update on progress. You probably noticed this post took quite a while to come out. Fear not, I have not given up (yet). I have been busy with other things, and there is a section on match and borrowing which I found hard to write and it turns out I didn't understand very well. It is complicated and probably deserves a post of its own, so after all the waiting, the interesting bit is going to need more waiting. Sigh.

I've also been considering the motivation of these posts. I really didn't want to write another tutorial for Rust, I don't think that is a valuable use of my time when there are existing tutorials and a new guide in the works. I do think there is something to be said for targeting tutorials at programmers with different backgrounds. My first motivation for this series of posts was that a lot of energy in the tutorial was expended on things like pointers and the intuition of ownership which I understood well from C++, and I wanted a tutorial that concentrated on the things I didn't know. That is hopefully where this has been going, but it is a lot of work, and I haven't really got on to the interesting bits. So I would like to change the format a bit to be less like a tutorial and more like articles aimed at programmers who know Rust to some extent, but know C++ a lot better and would like to bring up their Rust skills to their C++ level. I hope that complements the existing tutorials better and is more interesting for readers. I still have some partially written posts in the old style so they will get mixed in a bit. Let me know what you think of the idea in the comments.

Destructuring

Last time we looked at Rust's data types. Once you have some data structure, you will want to get that data out. For structs, Rust has field access, just like C++. For tuples, tuple structs, and enums you must use destructuring (there are various convenience functions in the library, but they use destructuring internally). Destructuring of data structures doesn't happen in C++, but it might be familiar from languages such as Python or various functional languages. The idea is that just as you can create a data structure by filling out its fields with data from a bunch of local variables, you can fill out a bunch of local variables with data from a data structure. From this simple beginning, destructuring has become one of Rust's most powerful features. To put it another way, destructuring combines pattern matching with assignment into local variables.

Destructuring is done primarily through the let and match statements. The match statement is used when the structure being desctructured can have difference variants (such as an enum). A let expression pulls the variables out into the current scope, whereas match introduces a new scope. To compare:

fn foo(pair: (int, int)) {
    let (x, y) = pair;
    // we can now use x and y anywhere in foo

    match pair {
        (x, y) => {
            // x and y can only be used in this scope
        }
    }
}

The syntax for patterns (used after `let` and before `=>` in the above example) in both cases is (pretty much) the same. You can also use these patterns in argument position in function declarations:

fn foo((x, y): (int, int)) {
}

(Which is more useful for structs or tuple-structs than tuples).

Most initialisation expressions can appear in a destructuring pattern and they can be arbitrarily complex. That can include references and primitive literals as well as data structures. For example,

struct St {
    f1: int,
    f2: f32
}

enum En {
    Var1,
    Var2,
    Var3(int),
    Var4(int, St, int)
}

fn foo(x: &En) {
    match x {
        &Var1 => println!("first variant"),
        &Var3(5) => println!("third variant with number 5"),
        &Var3(x) => println!("third variant with number {} (not 5)", x),
        &Var4(3, St{ f1: 3, f2: x }, 45) => {
            println!("destructuring an embedded struct, found {} in f2", x)
        }
        &Var4(_, x, _) => {
            println!("Some other Var4 with {} in f1 and {} in f2", x.f1, x.f2)
        }
        _ => println!("other (Var2)")
    }
}

Note how we destructure through a reference by using `&` in the patterns and how we use a mix of literals (`5`, `3`, `St { ... }`), wildcards (`_`), and variables (`x`).

You can use `_` wherever a variable is expected if you want to ignore a single item in a pattern, so we could have used `&Var3(_)` if we didn't care about the integer. In the first `Var4` arm we destructure the embedded struct (a nested pattern) and in the second `Var4` arm we bind the whole struct to a variable. You can also use `..` to stand in for all fields of a tuple or struct. So if you wanted to do something for each enum variant but don't care about the content of the variants, you could write:

fn foo(x: En) {
    match x {
        Var1 => println!("first variant"),
        Var2 => println!("second variant"),
        Var3(..) => println!("third variant"),
        Var4(..) => println!("fourth variant")
    }
}

When destructuring structs, the fields don't need to be in order and you can use `..` to elide the remaining fields. E.g.,

struct Big {
    field1: int,
    field2: int,
    field3: int,
    field4: int,
    field5: int,
    field6: int,
    field7: int,
    field8: int,
    field9: int,
}

fn foo(b: Big) {
    let Big { field6: x, field3: y, ..} = b;
    println!("pulled out {} and {}", x, y);
}

As a shorthand with structs you can use just the field name which creates a local variable with that name. The let statement in the above example created two new local variables `x` and `y`. Alternatively, you could write

fn foo(b: Big) {
let Big { field6, field3, ..} = b;
println!("pulled out {} and {}", field3, field6);
}

Now we create local variables with the same names as the fields, in this case `field3` and `field6`.

There are a few more tricks to Rust's destructuring. Lets say you want a reference to a variable in a pattern. You can't use `&` because that matches a reference, rather than creates one (and thus has the effect of dereferencing the object). For example,

struct Foo {
field: &'static int
}

fn foo(x: Foo) {
let Foo { field: &y } = x;
}

Here, `y` has type `int` and is a copy of the field in `x`.

To create a reference to something in a pattern, you use the `ref` keyword. For example,

fn foo(b: Big) {
let Big { field3: ref x, ref field6, ..} = b;
println!("pulled out {} and {}", *x, *field6);
}

Here, `x` and `field6` both have type `&int` and are references to the fields in `b`.

One last trick when destructuring is that if you are detructuring a complex object, you might want to name intermediate objects as well as individual fields. Going back to an earlier example, we had the pattern `&Var4(3, St{ f1: 3, f2: x }, 45)`. In that pattern we named one field of the struct, but you might also want to name the whole struct object. You could write `&Var4(3, s, 45)` which would bind the struct object to `s`, but then you would have to use field access for the fields, or if you wanted to only match with a specific value in a field you would have to use a nested match. That is not fun. Rust lets you name parts of a pattern using `@` syntax. For example `&Var4(3, s @ St{ f1: 3, f2: x }, 45)` lets us name both a field (`x`, for `f2`) and the whole struct (`s`).

That just about covers your options with Rust pattern matching. There are a few features I haven't covered, such as matching vectors, but hopefully you know how to use `match` and `let` and have seen some of the powerful things you can do. Next time I'll cover some of the subtle interactions between match and borrowing which tripped me up a fair bit when learning Rust.

Rust for C++ programmers - part 7: data types

2014-05-23T23:57:00.000+01:00

In this post I'll discuss Rust's data types. These are roughly equivalent to classes, structs, and enums in C++. One difference with Rust is that data and behaviour are much more strictly separated in Rust than C++ (or Java, or other OO languages). Behaviour is defined by functions and those can be defined in traits and `impl`s (implementations), but traits cannot contain data, they are similar to Java's interfaces in that respect. I'll cover traits and impls in a later post, this one is all about data.

Structs

A rust struct is similar to a C struct or a C++ struct without methods. Simply a list of named fields. The syntax is best seen with an example:

struct S {
field1: int,
field2: SomeOtherStruct
}

Here we define a struct called `S` with two fields. The fields are comma separated; if you like, you can comma-terminate the last field too.

Structs introduce a type. In the example, we could use `S` as a type. `SomeOtherStruct` is assumed to be another struct (used as a type in the example), and (like C++) it is included by value, that is, there is no pointer to another struct object in memory.

Fields in structs are accessed using the `.` operator and their name. An example of struct use:

fn foo(s1: S, s2: &S) {
    let f = s1.field1;
    if f == s2.field1 {
        println!("field1 matches!");
    }
}

Here `s1` is struct object passed by value and `s2` is a struct object passed by reference. As with method call, we use the same `.` to access fields in both, no need for `->`.

Structs are initialised using struct literals. These are the name of the struct and values for each field. For example,

fn foo(sos: SomeOtherStruct) {
let x = S { field1: 45, field2: sos }; // initialise x with a struct literal
println!("x.field1 = {}", x.field1);
}

Structs cannot be recursive, that is you can't have cycles of struct names involving definitions and field types. This is because of the value semantics of structs. So for example, `struct R { r: Option<R> }` is illegal and will cause a compiler error (see below for more about Option). If you need such a structure then you should use some kind of pointer; cycles with pointers are allowed:

struct R {
r: Option<Box<R>>
}

(Note, `Box` is the new syntax for a unique/owning pointer, this has changed from `~R` which was introduced a few posts back).

If we didn't have the `Option` in the above struct, there would be no way to instantiate the struct and Rust would signal an error.

Structs with no fields do not use braces in either their definition or literal use. Definitions do need a terminating semi-colon though, presumably just to facilitate parsing.

struct Empty;

fn foo() {
let e = Empty;
}

Tuples

Tuples are anonymous, heterogeneous sequences of data. As a type, they are declared as a sequence of types in parentheses. Since there is no name, they are identified by structure. For example, the type `(int, int)` is a pair of integers and `(i32, f32, S)` is a triple. Enum values are initialised in the same way as enum types are declared, but with values instead of types for the components, e.g., `(4, 5)`. An example:

// foo takes a struct and returns a tuple
fn foo(x: SomeOtherStruct) -> (i32, f32, S) {
    (23, 45.82, S { field1: 54, field2: x })
}

Tuples can be used by destructuring using a `let` expression, e.g.,

fn bar(x: (int, int)) {
    let (a, b) = x;
    println!("x was ({}, {})", a, b);
}

We'll talk more about destructuring next time.

Tuple structs

Tuple structs are named tuples, or alternatively, structs with unnamed fields. They are declared using the `struct` keyword, a list of types in parentheses, and a semicolon. Such a declaration introduces their name as a type. Their fields must be accessed by destructuring (like a tuple), rather than by name. Tuple structs are not very common.

struct IntPoint (int, int);

fn foo(x: IntPoint) {
    let IntPoint(a, b) = x; // Note that we need the name of the tuple
                             // struct to destructure.
    println!("x was ({}, {})", a, b);
}

Enums

Enums are types like C++ enums or unions, in that they are types which can take multiple values. The simplest kind of enum is just like a C++ enum:

enum E1 {
    Var1,
    Var2,
    Var3
}

fn foo() {
    let x: E1 = Var2;
    match x {
        Var2 => println!("var2"),
        _ => {}
    }
}

However, Rust enums are much more powerful than that. Each variant can contain data. Like tuples, these are defined by a list of types. In this case they are more like unions than enums in C++. Rust enums are tagged unions rather untagged (as in C++), that means you can't mistake one variant of an enum for another at runtime. An example:

enum Expr {
    Add(int, int),
    Or(bool, bool),
    Lit(int)
}

fn foo() {
    let x = Or(true, false);   // x has type Expr
}

Many simple cases of object-oriented polymorphism are better handled in Rust using enums.

To use enums we usually use a match expression. Remember that these are similar to C++ switch statements. I'll go into more depth on these and other ways to destructure data next time. Here's an example:

fn bar(e: Expr) {
    match e {
        Add(x, y) => println!("An `Add` variant: {} + {}", x, y),
        Or(..) => println!("An `Or` variant"),
        _ => println!("Something else (in this case, a `Lit`)"),
    }
}

Each arm of the match expression matches a variant of `Expr`. All variants must be covered. The last case (`_`) covers all remaining variants, although in the example there is only `Lit`. Any data in a variant can be bound to a variable. In the `Add` arm we are binding the two ints in an `Add` to `x` and `y`. If we don't care about the data, we can use `..` to match any data, as we do for `Or`.

Option

One particularly common enum in Rust is `Option`. This has two variants - `Some` and `None`. `None` has no data and `Some` has a single field with type `T` (`Option` is a generic enum, which we will cover later, but hopefully the general idea is clear from C++). Options are used to indicate a value might be there or might not. Any place you use a null pointer in C++ to indicate a value which is in some way undefined, uninitialised, or false, you should probably use an Option in Rust. Using Option is safer because you must always check it before use; there is no way to do the equivalent of dereferencing a null pointer. They are also more general, you can use them with values as well as pointers. An example:

use std::rc::Rc;

struct Node {
    parent: Option<Rc<Node>>,
    value: int
}

fn is_root(node: Node) -> bool {
    match node.parent {
        Some(_) => false,
        None => true
    }
}

Here, the parent field could be either a `None` or a `Some` containing an `Rc<Node>`. In the example, we never actually use that payload, but in real life you usually would.

There are also convenience methods on Option, so you could write the body of `is_root` as `node.is_none()` or `!node.is_some()`.

Inherited mutabilty and Cell/RefCell

Local variables in Rust are immutable by default and can be marked mutable using `mut`. We don't mark fields in structs or enums as mutable, their mutability is inherited. This means that a field in a struct object is mutable or immutable depending on whether the object itself is mutable or immutable. Example:

struct S1 {
    field1: int,
    field2: S2
}
struct S2 {
    field: int
}

fn main() {
    let s = S1 { field1: 45, field2: S2 { field: 23 } };
    // s is deeply immutable, the following mutations are forbidden
    // s.field1 = 46;
    // s.field2.field = 24;

    let mut s = S1 { field1: 45, field2: S2 { field: 23 } };
    // s is mutable, these are OK
    s.field1 = 46;
    s.field2.field = 24;
}

Inherited mutability in Rust stops at references. This is similar to C++ where you can modify a non-const object via a pointer from a const object. If you want a reference field to be mutable, you have to use `&mut` on the field type:

struct S1 {
    f: int
}
struct S2<'a> {
    f: &'a mut S1   // mutable reference field
}
struct S3<'a> {
    f: &'a S1       // immutable reference field
}

fn main() {
    let mut s1 = S1{f:56};
    let s2 = S2 { f: &mut s1};
    s2.f.f = 45;   // legal even though s2 is immutable
    // s2.f = &mut s1; // illegal - s2 is not mutable
    let s1 = S1{f:56};
    let mut s3 = S3 { f: &s1};
    s3.f = &s1;     // legal - s3 is mutable
    // s3.f.f = 45; // illegal - s3.f is immutable
}

(The `'a` parameter on `S2` and `S3` is a lifetime parameter, we'll cover those soon).

Sometimes whilst an object is logically mutable, it has parts which need to be internally mutable. Think of various kinds of caching or a reference count (which would not give true logical immutability since the effect of changing the ref count can be observed via destructors). In C++, you would use the `mutable` keyword to allow such mutation even when the object is const. In Rust we have the Cell and RefCell structs. These allow parts of immutable objects to be mutated. Whilst that is useful, it means you need to be aware that when you see an immutable object in Rust, it is possible that some parts may actually be mutable.

RefCell and Cell let you get around Rust's strict rules on mutation and aliasability. They are safe to use because they ensure that Rust's invariants are respected dynamically, even though the compiler cannot ensure that those invariants hold statically. Cell and RefCell are both single threaded objects.

Use Cell for types which have copy semantics (pretty much just primitive types). Cell has `get` and `set` methods for changing the stored value, and a `new` method to initialise the cell with a value. Cell is a very simple object - it doesn't need to do anything smart since objects with copy semantics can't keep references elsewhere (in Rust) and they can't be shared across threads, so there is not much to go wrong.

Use RefCell for types which have move semantics, that means nearly everything in Rust, struct objects are a common example. RefCell is also created using `new` and has a `set` method. To get the value in a RefCell, you must borrow it using the borrow methods (`borrow`, `borrow_mut`, `try_borrow`, `try_borrow_mut`) these will give you a borrowed reference to the object in the RefCell. These methods follow the same rules as static borrowing - you can only have one mutable borrow, and can't borrow mutably and immutably at the same time. However, rather than a compile error you get a runtime failure. The `try_` variants return an Option - you get `Some(val)` if the value can be borrowed and `None` if it can't. If a value is borrowed, calling `set` will fail too.

Here's an example using a ref-counted pointer to a RefCell (a common use-case):

use std::rc::Rc;
use std::cell::RefCell;

Struct S {
    field: int
}

fn foo(x: Rc<RefCell<S>>) {
    {
        let s = x.borrow();
        println!("the field, twice {} {}", s.f, x.borrow().field);
        // let s = x.borrow_mut(); // Error - we've already borrowed the contents of x
    }

    let s = x.borrow_mut(); // O, the earlier borrows are out of scope
    s.f = 45;
    // println!("The field {}", x.borrow().field); // Error - can't mut and immut borrow
    println!("The field {}", s.f);
}

If you're using Cell/RefCell, you should try to put them on the smallest object you can. That is, prefer to put them on a few fields of a struct, rather than the whole struct. Think of them like single threaded locks, finer grained locking is better since you are more likely to avoid colliding on a lock.

Rust for C++ programmers - part 6: Rc, Gc, and * pointers

2014-05-18T01:19:00.001+01:00

So far we've covered unique and borrowed pointers. Unique pointers are very similar to the new std::unique_ptr in C++ and borrowed references are the 'default' pointer you usually reach for if you would use a pointer or reference in C++. Rust has a few more, rarer pointers either in the libraries or built in to the language. These are mostly similar to various kinds of smart pointers you might be used to in C++.

This post took a while to write and I still don't like it. There are a lot of loose ends here, both in my write up and in Rust itself. I hope some will get better with later posts and some will get better as the language develops. If you are learning Rust, you might even want to skip this stuff for now, hopefully you won't need it. Its really here just for completeness after the posts on other pointer types.

It might feel like Rust has a lot of pointer types, but it is pretty similar to C++ once you think about the various kinds of smart pointers available in libraries. In Rust, however, you are more likely to meet them when you first start learning the language. Because Rust pointers have compiler support, you are also much less likely to make errors when using them.

I'm not going to cover these in as much detail as unique and borrowed references because, frankly, they are not as important. I might come back to them in more detail later on.

Rc<T>

Reference counted pointers come as part of the rust standard library. They are in the `std::rc` module (we'll cover modules soon-ish. The modules are the reason for the `use` incantations in the examples). A reference counted pointer to an object of type `T` has type `Rc<T>`. You create reference counted pointers using a static method (which for now you can think of like C++'s, but we'll see later they are a bit different) - `Rc::new(...)` which takes a value to create the pointer to. This constructor method follows Rust's usual move/copy semantics (like we discussed for unique pointers) - in either case, after calling Rc::new, you will only be able to access the value via the pointer.

As with the other pointer types, the `.` operator does all the dereferencing you need it to. You can use `*` to manually dereference.

To pass a ref-counted pointer you need to use the `clone` method. This kinda sucks, and hopefully we'll fix that, but that is not for sure (sadly). You can take a (borrowed) reference to the pointed at value, so hopefully you don't need to clone too often. Rust's type system ensures that the ref-counted variable will not be deleted before any references expire. Taking a reference has the added advantage that it doesn't need to increment or decrement the ref count, and so will give better performance (although, that difference is probably marginal since Rc objects are limited to a single thread and so the ref count operations don't have to be atomic). As in C++, you can also take a reference to the Gc pointer.

An Rc example:

use std::rc::Rc;

fn bar(x: Rc<int>) { }
fn baz(x: &int) { }

fn foo() {
    let x = Rc::new(45);
    bar(x.clone());   // Increments the ref-count
    baz(&*x);         // Does not increment
    println!("{}", 100 - *x);
} // Once this scope closes, all Rc pointers are gone, so ref-count == 0
   // and the memory will be deleted.

Ref counted pointers are always immutable. If you want a mutable ref-counted object you need to use a RefCell (or Cell) wrapped in an `Rc`.

Cell and RefCell

Cell and RefCell are structs which allow you to 'cheat' the mutability rules. This is kind of hard to explain without first covering Rust data structures and how they work with mutability, so I'm going to come back to these slightly tricky objects later. For now, you should know that if you want a mutable, ref counted object you need a Cell or RefCell wrapped in an Rc. As a first approximation, you probably want Cell for primitive data and RefCell for objects with move semantics. So, for a mutable, ref-counted int you would use `Rc<Cell<int>>`.

Gc

Garbage collected pointers are written using `Gc<T>` and are created and used much like ref-counted pointers. The difference is that Gc pointers do not automatically turn into references, you need to explicitly call `borrow()`. That is (afaik) not by design and should get fixed. You also don't need to use `clone()` to copy Gc pointers like you do with Rc pointers, again, I hope this inconsistency gets addressed (in theory, this is because copying an Rc pointer does some work incrementing the ref count, whereas a Gc copy is just a copy). Currently Gc is a prototype implementation and is actually implemented using reference counting (so if you have cycles, you are screwed). You should probably just use Rc pointers for now (indeed, the Gc module is marked as experimental which means we expect it to change considerably in the near future and may even be removed).

Gc example:

use std::gc::Gc;

fn bar(x: Gc<int>) { }
fn baz(x: &int) { }

fn foo() {
    let x = Gc::new(45);
    bar(x);
    baz(x.borrow());
    println!("{}", 100 - *x.borrow());
}

*T - unsafe pointers

Finally Rust has unsafe pointers. These are denoted `*T` and are created using `&` (you might need to specify a type to get a `*T` rather than a `&T` since the `&` operator can create either a borrowed reference or an unsafe pointer). These are like C pointers, just a pointer to memory with no restrictions on how they are used (you can't do pointer arithmetic without casting, but you can do it that way if you must). Unsafe pointers are the only pointer type in Rust which can be null. There is no automatic dereferencing of unsafe pointers (so to call a method you have to write `(*x).foo()`) and no automatic referencing. The most important restriction is that they can't be dereferenced (and thus can't be used) outside of an unsafe block. In regular Rust code you can only pass them around.

So, what is unsafe code? Rust has strong safety guarantees, and (rarely) they prevent you doing something you need to do. Since Rust aims to be a systems language, it has to be able to do anything that is possible and sometimes that means doing things the compiler can't verify is safe. To accomplish that, Rust has the concept of unsafe blocks, marked by the `unsafe` keyword. In unsafe code you can do unsafe things - dereference an unsafe pointer, index into an array without bounds checking, call code written in another language via the FFI, or cast variables. Obviously, you have to be much more careful writing unsafe code than writing regular Rust code. In fact, you should only very rarely write unsafe code. Mostly it is used in very small chunks in libraries, rather than in client code. In unsafe code you must do all the things you normally do in C++ to ensure safety. Furthermore, you must ensure that by the time the unsafe block finishes, you have re-established all of the invariants that the Rust compiler would usually enforce, otherwise you risk causing bugs in safe code too.

An example of using an unsafe pointer:

fn foo() {
    let x = 5;
    let xp: *int = &5;
    println!("x+5={}", add_5(xp));
}

fn add_5(p: *int) -> int {
    unsafe {
        if !p.is_null() { // Note that *-pointers do not auto-deref, so this is
                          // a method implemented on *int, not int.
            *p + 5
        } else {
            -1            // Not a recommended error handling strategy.
        }
    }
}

As with borrowed references, unsafe pointers are immutable by default and can be made mutable using the `mut` keyword, for example `*mut int`.

And that concludes our tour of Rust's pointers. Next time we'll take a break from pointers and look at Rust's data structures. We'll come back to borrowed references again in a later post though.

Rust for C++ programmers - part 5: borrowed references

2014-05-06T09:55:00.003+01:00

In the last post I introduced unique pointers. This time I will talk about another kind of pointer which is much more common in most Rust programs: borrowed pointers (aka borrowed references, or just references).

If we want to have a reference to an existing value (as opposed to creating a new value on the heap and pointing to it, as with unique pointers), we must use `&`, a borrowed reference. These are probably the most common kind of pointer in Rust, and if you want something to fill in for a C++ pointer or reference (e.g., for passing a parameter to a function by reference), this is probably it.

We use the `&` operator to create a borrowed reference and to indicate reference types, and `*` to dereference them. The same rules about automatic dereferencing apply as for unique pointers. For example,

fn foo() {
    let x = &3;   // type: &int
    let y = *x;   // 3, type: int
    bar(x, *x);
    bar(&y, y);
}

fn bar(z: &int, i: int) {
    // ...
}

The `&` operator does not allocate memory (we can only create a borrowed reference to an existing value) and if a borrowed reference goes out of scope, no memory gets deleted.

Borrowed references are not unique - you can have multiple borrowed references pointing to the same value. E.g.,

fn foo() {
    let x = 5;                // type: int
    let y = &x;               // type: &int
    let z = y;                // type: &int
    let w = y;                // type: &int
    println!("These should all 5: {} {} {}", *w, *y, *z);
}

Like values, borrowed references are immutable by default. You can also use `&mut` to take a mutable reference, or to denote mutable reference types. Mutable borrowed references are unique (you can only take a single mutable reference to a value, and you can only have a mutable reference if there are no immutable references). You can use mutable reference where an immutable one is wanted, but not vice versa. Putting all that together in an example:

fn bar(x: &int) { ... }
fn bar_mut(x: &mut int) { ... } // &mut int is a reference to an int which
                                 // can be mutated

fn foo() {
    let x = 5;
    //let xr = &mut x;     // Error - can't make a mutable reference to an
                           // immutable variable
    let xr = &x;           // Ok (creates an immutable ref)
    bar(xr);
    //bar_mut(xr);         // Error - expects a mutable ref

    let mut x = 5;
    let xr = &x;           // Ok (creates an immutable ref)
    //*xr = 4;             // Error - mutating immutable ref
    //let xr = &mut x;     // Error - there is already an immutable ref, so we
                           // can't make a mutable one

    let mut x = 5;
    let xr = &mut x;       // Ok (creates a mutable ref)
    *xr = 4;               // Ok
    //let xr = &x;         // Error - there is already a mutable ref, so we
                           // can't make an immutable one
    //let xr = &mut x;     // Error - can only have one immutable ref at a time
    bar(xr);               // Ok
    bar_mut(xr);           // Ok
}

Note that the reference may be mutable (or not) independently of the mutableness of the variable holding the reference. This is similar to C++ where pointers can be const (or not) independently of the data they point to. This is in contrast to unique pointers, where the mutableness of the pointer is linked to the mutableness of the data. For example,

fn foo() {
    let mut x = 5;
    let mut y = 6;
    let xr = &mut x;
    //xr = &mut y;        // Error xr is immutable

    let mut x = 5;
    let mut y = 6;
    let mut xr = &mut x;
    xr = &mut y;          // Ok

    let mut x = 5;
    let mut y = 6;
    let mut xr = &x;
    xr = &y;              // Ok - xr is mut, even though the referenced data is not
}

If a mutable value is borrowed, it becomes immutable for the duration of the borrow. Once the borrowed pointer goes out of scope, the value can be mutated again. This is in contrast to unique pointers, which once moved can never be used again. For example,

fn foo() {
    let mut x = 5;            // type: int
    {
        let y = &x;           // type: &int
        //x = 4;              // Error - x has been borrowed
        println!("{}", x);    // Ok - x can be read
    }
    x = 4;                    // OK - y no longer exists
}

The same thing happens if we take a mutable reference to a value - the value still cannot be modified. In general in Rust, data can only ever be modified via one variable or pointer. Furthermore, since we have a mutable reference, we can't take an immutable reference. That limits how we can use the underlying value:

fn foo() {
    let mut x = 5;            // type: int
    {
        let y = &mut x;       // type: &mut int
        //x = 4;              // Error - x has been borrowed
        //println!("{}", x); // Error - requires borrowing x
        let z = *y + x;       // Ok - doesn't require borrowing
    }
    x = 4;                    // OK - y no longer exists
}

Unlike C++, Rust won't automatically reference a value for you. So if a function takes a parameter by reference, the caller must reference the actual parameter. However, pointer types will automatically be converted to a reference:

fn foo(x: &int) { ... }

fn bar(x: int, y: ~int) {
    foo(&x);
    // foo(x);   // Error - expected &int, found int
    foo(y);      // Ok
    foo(&*y);    // Also ok, and more explicit, but not good style
}

`mut` vs `const`

At this stage it is probably worth comparing `mut` in Rust to `const` in C++. Superficially they are opposites. Values are immutable by default in Rust and can be made mutable by using `mut`. Values are mutable by default in C++, but can be made constant by using `const`. The subtler and more important difference is that C++ const-ness applies only to the current use of a value, whereas Rust's immutability applies to all uses of a value. So in C++ if I have a `const` variable, someone else could have a non-const reference to it and it could change without me knowing. In Rust if you have an immutable variable, you are guaranteed it won't change.

As we mentioned above, all mutable variables are unique. So if you have a mutable value, you know it is not going to change unless you change it. Furthermore, you can change it freely since you know that no one else is relying on it not changing.

Borrowing and lifetimes

One of the primary safety goals of Rust is to avoid dangling pointers (where a pointer outlives the memory it points to). In Rust, it is impossible to have a dangling borrowed reference. It is only legal to create a borrowed reference to memory which will be alive longer than the reference (well, at least as long as the reference). In other words, the lifetime of the reference must be shorter than the lifetime of the referenced value.

That has been accomplished in all the examples in this post. Scopes introduced by `{}` or functions are bounds on lifetimes - when a variable goes out of scope its lifetime ends. If we try to take a reference to a shorter lifetime, such as in a narrower scope, the compiler will give us an error. For example,

fn foo() {
    let x = 5;
    let mut xr = &x; // Ok - x and xr have the same lifetime
    {
        let y = 6;
        //xr = &y     // Error - xr will outlive y
    }                 // y is released here
}                     // x and xr are released here

In the above example, x and xr don't have the same lifetime because xr starts later than x, but it's the end of lifetimes which is more interesting, since you can't reference a variable before it exists in any case - something else which Rust enforces and which makes it safer than C++.

Explicit lifetimes

After playing with borrowed pointers for a while, you'll probably come across borrowed pointers with an explicit lifetime. These have the syntax `&'a T` (cf `&T`). They're kind of a big topic since I need to cover lifetime-polymorphism at the same time so I'll leave it for another post (there are a few more less common pointer types to cover first though). For now, I just want to say that `&T` is a shorthand for `&'a T` where `a` is the current scope, that is the scope in which the type is declared.

A thought on language design

2014-05-03T22:28:00.002+01:00

Language designers often aim to make their languages simpler, more elegant, or smaller. If we are doing practical language design (as opposed to design for fun or research), these are not goals, only proxies. Programming languages should be easy to use and easy to learn.

'Easy to use' means (usually) easy to read, write, and operate on using tools. Easy to learn means not just learning the language's features, but learning the idioms/patterns/techniques of the language too.

Simpler, more elegant, and smaller are only good proxies for easy to use and learn because our natural tendency is to make languages too big and too complex ("I'll just add this feature and that feature and, oh, the feature is pretty cool too").

Language design is about balance and trade-offs. No one wants to write software in the lambda calculus and not many people want to do it in Cobol. We need to design languages which balance their complexity and size, not striving to be the smallest or simplest, but to be the easiest and most pleasant to use.

Rust for C++ programmers - part 4: unique pointers

2014-04-29T03:42:00.001+01:00

Rust is a systems language and therefore must give you raw access to memory. It does this (as in C++) via pointers. Pointers are one area where Rust and C++ are very different, both in syntax and semantics. Rust enforces memory safety by type checking pointers. That is one of its major advantages over other languages. Although the type system is a bit complex, you get memory safety and bare-metal performance in return.

I had intended to cover all of Rust's pointers in one post, but I think the subject is too large. So this post will cover just one kind - unique pointers - and other kinds will be covered in follow up posts.

First, an example without pointers:

fn foo() {
let x = 75;

// ... do something with `x` ...
}

When we reach the end of `foo`, `x` goes out of scope (in Rust as in C++). That means the variable can no longer be accessed and the memory for the variable can be reused.

In Rust, for every type `T` we can write `~T` for an owning (aka unique) pointer to `T`. We use the `box` keyword to allocate space on the heap and initialise that space with the supplied value (this has very recently changed from using `~` for allocation too). This is similar to `new` in C++. For example,

fn foo() {
let x = box 75;
}

Here `x` is a pointer to a location on the heap which contains the value `75`. `x` has type `~int`; we could have written `let x: ~int = box 75;`. This is similar to writing `int* x = new int(75);` in C++. Unlike in C++, Rust will tidy up the memory for us, so there is no need to call `free` or `delete`. Unique pointers behave similarly to values - they are deleted when the variable goes out of scope. In our example, at the end of the function `foo`, `x` can no longer be accessed and the memory pointed at by `x` can be reused.

Owning pointers are dereferenced using the `*` as in C++. E.g.,

fn foo() {
let x = box 75;
println!("`x` points to {}", *x);
}

As with primitive types in Rust, owning pointers and the data they point to are immutable by default. Unlike C, you can't have a mutable (unique) pointer to immutable data or vice-versa. Mutability of the data follows from the pointer. E.g.,

fn foo() {
    let x = box 75;
    let y = box 42;
    // x = y;         // Not allowed, x is immutable.
    // *x = 43;       // Not allowed, *x is immutable.
    let mut x = box 75;
    x = y;            // OK, x is mutable.
    *x = 43;          // OK, *x is mutable.
}

Owning pointers can be returned from a function and continue to live on. If they are returned, then their memory will not be freed, i.e., there are no dangling pointers in Rust. The memory will not leak however, eventually it must go out of scope and then it will be free. E.g.,

fn foo() -> ~int {
    let x = box 75;
    x
}

fn bar() {
    let y = foo();
    // ... use y ...
}

Here, memory is initialised in `foo`, and returned to `bar`. `x` is returned from `foo` and stored in `y`, so it is not deleted. At the end of `bar`, `y` goes out of scope and so the memory is reclaimed.

Owning pointers are unique (also called linear) because there can be only one (owning) pointer to any piece of memory at any time. This is accomplished by move semantics. When one pointer points at a value, any previous pointer can no longer be accessed. E.g.,

fn foo() {
    let x = box 75;
    let y = x;
    // x can no longer be accessed
    // let z = *x;   // Error.
}

Likewise, if an owning pointer is passed to another function or stored in a field it can no longer be accessed:

fn bar(y: ~int) {}

fn foo() {
    let x = box 75;
    bar(x);
    // x can no longer be accessed
    // let z = *x;   // Error.
}

Rust's unique pointers are similar to C++ `std::unique_ptr`s. In Rust, as in C++, there can be only one unique pointer to a value and that value is deleted when the pointer goes out of scope. Rust does most of its checking statically rather than at runtime. So, in C++ accessing a unique pointer whose value has moved will result in a runtime error (since it will be null). In Rust this produces a compile time error and you cannot go wrong at runtime.

We'll see later that it is possible to create other pointer types which point at a unique pointer's value in Rust. This is similar to C++. However, in C++ this allows you to cause errors at runtime by holding a pointer to freed memory. That is not possible in Rust (we'll see how when we cover Rust's other pointer types).

As shown above, owning pointers must be dereferenced to use their values. However, method calls automatically dereference, so there is no need for a `->` operator or to use `*` for method calls. In this way, Rust pointers are a bit similar to both pointers and references in C++. E.g.,

fn bar(x: ~Foo, y: ~~~~~Foo) {
x.foo();
y.foo();
}

Assuming that the type `Foo` has a method `foo()`, both these expressions are OK.

Using the `box` operator on an existing value does not take a reference to that value, it copies that value. So,

fn foo() {
    let x = 3;
    let mut y = box x;
    *y = 45;
    println!("x is still {}", x);
}

In general, Rust has move rather than copy syntax (as seen above with unique pointers). Primitive types have copy semantics, so in the above example the value `3` is copied, but for more complex values it would be moved. We'll cover this in more detail later.

Sometimes when programming, however, we need more than one reference to a value. For that, Rust has borrowed pointers. I'll cover those in the next post.

Rust for C++ programmers - part 3: primitive types and operators

2014-04-23T03:53:00.000+01:00

Rust has pretty much the same arithmetic and logical operators as C++. `bool` is the same in both languages (as are the `true` and `false` literals). Rust has similar concepts of integers, unsigned integers, and floats. However the syntax is a bit different. Rust uses `int` to mean an integer and `uint` to mean an unsigned integer. These types are pointer sized. E.g., on a 32 bit system, `uint` means a 32 bit unsigned integer. Rust also has explicitly sized types which are `u` or `i` followed by 8, 16, 32, or 64. So, for example, `u8` is an 8 bit unsigned integer and `i32` is a 32 bit signed integer. For floats, Rust has `f32` and `f64` (`f128` is coming soon too).

Numeric literals can take suffixes to indicate their type (using `i` and `u` instead of `int` and `uint`). If no suffix is given, Rust tries to infer the type. If it can't infer, it uses `int` or `f64` (if there is a decimal point). Examples:

fn main() {
    let x: bool = true;
    let x = 34;   // type int
    let x = 34u; // type uint
    let x: u8 = 34u8;
    let x = 34i64;
    let x = 34f32;
}

As a side note, Rust lets you redefine variables so the above code is legal - each `let` statement creates a new variable `x` and hides the previous one. This is more useful than you might expect due to variables being immutable by default.

Numeric literals can be given as binary, octal, and hexadecimal, as well as decimal. Use the `0b`, `0o`, and `0x` prefixes, respectively. You can use an underscore anywhere in a numeric literal and it will be ignored. E.g,

fn main() {
    let x = 12;
    let x = 0b1100;
    let x = 0o14;
    let x = 0xe;
    let y = 0b_1100_0011_1011_0001;
}

Rust has chars and strings, but since they are Unicode, they are a bit different from C++. I'm going to postpone talking about them until after I've introduced pointers, references, and vectors (arrays).

Rust does not implicitly coerce numeric types. In general, Rust has much less implicit coercion and subtyping than C++. Rust uses the `as` keyword for explicit coercions and casting. Any numeric value can be cast to another numeric type. `as` cannot be used to convert between booleans and numeric types. E.g.,

fn main() {
    let x = 34u as int;     // cast unsigned int to int
    let x = 10 as f32;      // int to float
    let x = 10.45f64 as i8; // float to int (loses precision)
    let x = 4u8 as u64;     // gains precision
    let x = 400u16 as u8;   // 144, loses precision (and thus changes the value)
    println!("`400u16 as u8` gives {}", x);
    let x = -3i8 as u8;     // 253, signed to unsigned (changes sign)
    println!("`-3i8 as u8` gives {}", x);
    //let x = 45u as bool; // FAILS!
}

Rust has the following numeric operators: `+`, `-`, `*`, `/`, `%`; bitwise operators: `|`, `&`, `^`, `<<`, `>>`; comparison operators: `==`, `!=`, `>`, `<`, `>=`, `<=`; short-circuit logical operators: `||`, `&&`. All of these behave as in C++, however, Rust is a bit stricter about the types the operators can be applied to - the bitwise operators can only be applied to integers and the logical operators can only be applied to booleans. Rust has the `-` unary operator which negates a number. The `!` operator negates a boolean and inverts every bit on an integer type (equivalent to `~` in C++ in the latter case). Rust has compound assignment operators as in C++, e.g., `+=`, but does not have increment or decrement operators (e.g., `++`).

Formatting change

2014-04-20T02:03:00.000+01:00

Quick note - yes, I have changed the formatting on my blog - hope you like it!

Rust for C++ programmers - part 2: control flow

2014-04-20T01:56:00.002+01:00

If

The `if` statement is pretty much the same in Rust as C++. One difference is that the braces are mandatory, but brackets around the expression being tested are not. Another is that `if` is an expression, so you can use it the same way as the ternary `?` operator in C++ (remember from last time that if the last expression in a block is not terminated by a semi-colon, then it becomes the value of the block). There is no ternary `?` in Rust. So, the following two functions do the same thing:

fn foo(x: int) -> &'static str {
    let mut result: &'static str;
    if x < 10 {
        result = "less than 10";
    } else {
        result = "10 or more";
    }
    return result;
}

fn bar(x: int) -> &'static str {
    if x < 10 {
        "less than 10"
    } else {
        "10 or more"
    }
}

The first is a fairly literal translation of what you might write in C++. The second is in better Rust style.

You can also write `let x = if ...`, etc.

Loops

Rust has while loops, again just like C++:

fn main() {
    let mut x = 10;
    while x > 0 {
        println!("Current value: {}", x);
        x -= 1;
    }
}

There is no do...while loop in Rust, but we do have the `loop` statement which just loops forever:

fn main() {
    loop {
        println!("Just looping");
    }
}

Rust has `break` and `continue` just like C++.

For loops

Rust also has `for` loops, but these are a bit different. Lets say you have a vector of ints and you want to print them all (we'll cover vectors/arrays, iterators, and generics in more detail in the future. For now, know that a `Vec<T>` is a sequence of `T`s and `iter()` returns an iterator from anything you might reasonably want to iterate over). A simple `for` loop would look like:

fn print_all(all: Vec<int>) {
    for a in all.iter() {
        println!("{}", a);
    }
}

If we want to index over the indices of `all` (a bit more like a standard C++ for loop over an array), you could do

fn print_all(all: Vec<int>) {
    for i in range(0, all.len()) {
        println!("{}: {}", i, all.get(i));
    }
}

Hopefully, it is obvious what the `range` and `len` functions do.

Switch/Match

Rust has a match expression which is similar to a C++ switch statement, but much more powerful. This simple version should look pretty familiar:

fn print_some(x: int) {
    match x {
        0 => println!("x is zero"),
        1 => println!("x is one"),
        10 => println!("x is ten"),
        y => println!("x is something else {}", y),
    }
}

There are some syntactic differences - we use `=>` to go from the matched value to the expression to execute, and the match arms are separated by `,` (that last `,` is optional). There are also some semantic differences which are not so obvious: the matched patterns must be exhaustive, that is all possible values of the matched expression (`x` in the above example) must be covered. Try removing the `y => ...` line and see what happens; that is because we only have matches for 0, 1, and 10 and obviously there are lots of other ints which don't get matched. In that last arm, `y` is bound to the value being matched (`x` in this case). We could also write:

fn print_some(x: int) {
    match x {
        x => println!("x is something else {}", x)
    }
}

Here the `x` in the match arm introduces a new variable which hides the argument `x`, just like declaring a variable in an inner scope.

If we don't want to name the variable, we can use `_` for an unnamed variable, which is like having a wildcard match. If we don't want to do anything, we can provide an empty branch:

fn print_some(x: int) {
    match x {
        0 => println!("x is zero"),
        1 => println!("x is one"),
        10 => println!("x is ten"),
        _ => {}
    }
}

Another semantic difference is that there is no fall through from one arm to the next.

We'll see in later posts that match is extremely powerful. For now I want to introduce just a couple more features - the 'or' operator for values and `if` clauses on arms. Hopefully an example is self-explanatory:

fn print_some_more(x: int) {
    match x {
        0 | 1 | 10 => println!("x is one of zero, one, or ten"),
        y if y < 20 => println!("x is less than 20, but not zero, one, or ten"),
        y if y == 200 => println!("x is 200 (but this is not very stylish)"),
        _ => {}
    }
}

Just like `if` expressions, `match` statements are actually expressions so we could re-write the last example as:

fn print_some_more(x: int) {
    let msg = match x {
        0 | 1 | 10 => "one of zero, one, or ten",
        y if y < 20 => "less than 20, but not zero, one, or ten",
        y if y == 200 => "200 (but this is not very stylish)",
        _ => "something else"
    };

    println!("x is {}", msg);
}

Note the semi-colon after the closing brace, that is because the `let` statement is a statement and must take the form `let msg = ...;`. We fill the rhs with a match expression (which doesn't usually need a semi-colon), but the `let` statement does. This catches me out all the time.

Motivation: Rust match statements avoid the common bugs with C++ switch statements - you can't forget a `break` and unintentionally fall through; if you add a case to an enum (more later on) the compiler will make sure it is covered by your `match` statement.

Method call

Finally, just a quick note that methods exist in Rust, similarly to C++. They are always called via the `.` operator (no `->`, more on this in another post). We saw a few examples above (`len`, `iter`). We'll go into more detail in the future about how they are defined and called. Most assumptions you might make from C++ or Java are probably correct.

Rust for C++ programmers - an intermission - why Rust

2014-04-18T23:54:00.002+01:00

I realise that in terms of learning Rust, I had jumped straight to the 'how' and skipped the 'why'. I guess I am in enough of a Rust bubble that I can't imagine why you wouldn't want to learn it. So, I will make a bit more of an effort to explain why things are how they are. Here I will try to give a bit of an overview/motivation.

If you are using C or C++, it is probably because you have to - either you need low-level access to the system, or need every last drop of performance, or both. Rust aims to do offer the same level of abstraction around memory, the same performance, but be safer and make you more productive.

Concretely, there are many languages out there that you might prefer to use to C++: Java, Scala, Haskell, Python, and so forth, but you can't because either the level of abstraction is too high - you don't get direct access to memory, you are forced to use garbage collection, etc. - or there are performance issues - either performance is unpredictable or its simply not fast enough. Rust does not force you to use garbage collection, and as in C++, you get raw pointers to memory to play with. Rust subscribes to the 'pay for what you use' philosophy of C++. If you don't use a feature, then you don't pay any performance overhead for its existence. Furthermore, all language features in Rust have predictable (and usually small) cost.

Whilst these constraints make Rust a (rare) viable alternative to C++, Rust also has benefits: it is memory safe - Rust's type system ensures that you don't get the kind of memory errors which are common in C++ - memory leaks, accessing un-initialised memory, dangling pointers - all are impossible in Rust. Furthermore, whenever other constraints allow, Rust strives to prevent other safety issues too - for example, all array indexing is bounds checked (of course, if you want to avoid the cost, you can (at the expense of safety) - Rust allows you to do this in unsafe blocks, along with many other unsafe things. Crucially, Rust ensures that unsafety in unsafe blocks stays in unsafe blocks and can't affect the rest of your program). Finally, Rust takes many concepts from modern programming languages and introduces them to the systems language space. Hopefully, that makes programming in Rust more productive, efficient, and enjoyable.

I would like to motivate some of the language features from part 1. Local type inference is convenient and useful without sacrificing safety or performance (it's even in modern versions of C++ now). A minor convenience is that language items are consistently denoted by keyword (`fn`, `let`, etc.), this makes scanning by eye or by tools easier, in general the syntax of Rust is simpler and more consistent than C++. The `println!` macro is safer than printf - the number of arguments is statically checked against the number of 'holes' in the string and the arguments are type checked. This means you can't make the printf mistakes of printing memory as if it had a different type or addressing memory further down the stack by mistake. These are fairly minor things, but I hope they illustrate the philosophy behind the design of Rust.