Featherweight Musings: 2015

Wednesday, October 07, 2015

Rustfmt-ing Rust

Over on my new blog I have a post on running rustfmt on the Rust repo and how you can help with the process, if you would like to:

http://www.ncameron.org/blog/rustfmt-ing-rust/

Monday, June 01, 2015

Every now and then I get a bunch of questions about my Git workflow. Hopefully, this will be useful, even though there are already a bunch of tutorials and blogs on Git. It is aimed at pretty much Git newbies, but assumes some knowledge of version control concepts. Some of these things might not be best practice, I'd appreciate people letting me know if I could do things better!

Also, I only describe what I do, not why, i.e., the underlying concepts you should understand. To do that would probably take a book rather than a blog post and I'm not the right person to write such a thing.

Starting out

I operate in two modes when using Git - either I'm contributing to an existing repo (e.g., Rust) or I'm working on my own repo (e.g., rustfmt), which might just be a personal thing, essentially just using GitHub for backup, or which might be a community project that I started. The workflow for the two scenarios is a bit different.

Lets start with contributing to someone else's repo. The first step is to find that repo on GitHub and fork it (I'm assuming you have a GitHub account set up, it's very easy to do if you haven't). Forking means that you have your own personal copy of the repo hosted by GitHub and associated with your account. So for example, if you fork https://github.com/rust-lang/rust, then you'll get https://github.com/nrc/rust. It is important you fork the version of the repo you want to contribute to. In this case, make sure you fork rust-lang's repo, not somebody else's fork of that repo (e.g., nrc's).

Then make a local clone of your fork so you can work on it locally. I create a directory, then `cd` into it and use:

git clone git@github.com:nrc/rust.git .

Here, you'll replace the 'git@...' string with the identifier for your repo found on its GitHub page. The trailing `.` means we clone into the current directory instead of creating a new directory.

Finally. you'll want to create a reference to your fork (e.g., nrc/rust, called 'origin') and the original repo (rust-lang/rust, called 'upstream'):

git remote add upstream https://github.com/rust-lang/rust.git

Now you're all set to go and contribute something!

If I'm starting out with my own repo, then I'll first create a directory and write a bit of code in there, probably add a README.md file, and make sure something builds. Then, to make it a git repo I use

git init

then make an initial commit (see the next section for more details). Over on GitHub, go to the repos page and add a new, empty repo, choose a cool name, etc. The we have to associate the local repo with the one on GitHub:

git remote add origin git@github.com:nrc/rust-fmt.git

Finally, we can make the GitHub repo up to date with the local one (again, see below for more details):

git push origin master

Doing work

I usually start off by creating a new branch for my work. Create a branch called 'foo' using:

git checkout -b foo

There is always a 'master' branch which corresponds with the current state (as of the last time you updated) of the repo without any of your branches. I try to avoid working on master. You can switch between branches using `git checkout`, e.g.,

git checkout master
git checkout foo

Once I've done some work, I commit it. Eventually, when you submit the work upstream, a commit should be a self-contained, modular piece of work. However, when working locally I prefer to make many small commits and then sort them out later. I generally commit when I context switch to work on something else, when I have to make a decision I'm not sure about, or when I reach a point which seems like it could be a natural break in the proper commits I'll submit later. I usually commit using

git commit -a

The `-a` means all the changed files git knows about will be included in the commit. This is usually what I want. I sometimes use `-m "The commit message"`, but often prefer to use a text editor since it allows me to check which files are being committed.

Often, I don't want to create a new commit, but just add my current work to the last commit, then I use:

git commit -a --amend

If you've created new files as part of your work, you need to tell Git about them before committing, use:

git add path/to/file_name.rs

Updating

When I want to update the local repo to the upstream repo I use `git pull upstream master` (with my master branch checked out locally). Commonly, I want to update my master and then rebase my working branch to branch off the updated master.

Assuming I'm working on the foo branch, the recipe I use to rebase is:

git checkout master
git pull upstream master
git checkout foo
git rebase master

The last step will often require manual resolution of conflicts, after that you must `git add` the changed files and then `git rebase --continue`. That might happen several times.

If you've got a lot of commits, I find it is usually easier to squash a bunch of commits before rebasing - it sometimes means dealing with conflicts fewer times.

On the subject of updating the repo, there is a bit of a debate about rebasing vs merging. Rebasing has the advantage that it gives you a clean history and fewer merge commits (which are just boilerplate, most of the time). However, it does change your history, which if you are sharing your branch is very, very bad news. My rule of thumb is to rebase private branches (never merge) and to only merge (never rebase) branches which have been shared publicly. The latter generally means the master branch of repos that others are also working on (e.g., rustfmt). But sometimes I'll work on a project branch with someone else.

Current status

With all these repos, branches, commits, and so forth, it is pretty easy to get lost. Here are few commands I use to find out what I'm doing.

As an aside, because Rust is a compiled language and the compiler is big, I have multiple Rust repos on my local machine so I don't have to checkout branches too often.

Show all branches in the current repo and highlight the current one:

git branch

Show the history of the current branch (or any branch, foo):

git log
git log foo

Which files have been modified, deleted, etc.:

git status

All changes since last commit (excludes files which Git doesn't know about, e.g., new files which haven't been `git add`ed):

git diff

The changes in the last commit and since that commit:

git diff HEAD~1

Tidying up

Like I said above, I like to make a lot of small, work in progress commits and then tidy up later. To do that I use:

git rebase -i HEAD~n

Where `n` is the number of commits I want to tidy up. `rebase -i` lets you move commits, around squash them together, reword the commit messages, and so forth. I usually do a `rebase -i` before every rebase and a thorough one before submitting work.

Submitting work

Once I've tidied up the branch, I push it to my GitHub repo using:

git push origin foo

I'll often do this to backup my work too if I'm spending more than a day or so on it. If I've done this and rebased since then, then you need to add `-f` to the above command. Sometimes I want my branch to have a different name on the GitHub repo than I've had locally:

git push origin foo:bar

(The common use case here is foo = "fifth-attempt-at-this-stupid-piece-of-crap-bar-problem").

When ready to submit the branch, I go to the GitHub website and make a pull request (PR). Once that is reviewed, the owner of the upstream repo (or, often, a bot) will merge it into master.

Alternatively, if it is my repo I might create a branch and pull request, or I might manually merge and push:

git checkout master
git merge foo
git push origin master

Misc.

And here is a bunch of stuff I do all the time, but I'm not sure how to classify.

Delete a branch when I'm all done:

git branch -d foo

or

git branch -D foo

You need the capital 'D' if the branch has not been merged to master. With complex merges (e.g., if the branch got modified) you sometimes need capital 'D', even if the branch is merged.

Sometimes you need to throw away some work. If I've already committed, I use the following to throw away the last commit:

git reset HEAD~1

or

git reset HEAD~1 --hard

The first version leaves changes from the commit as uncommitted changes in your working directory. The second version throws them away completely. You can change the '1' to a larger number to throw away more than one commit.

If I have uncommitted changes I want to throw away, I use:

git checkout HEAD -f

This only gets rid of changes to tracked files. If you created new files, those won't be deleted.

Sometimes I need more fine-grained control of which changes to include in a commit. This often happens when I'm reorganising my commits before submitting a PR. I usually use some combination of `git rebase -i` to get the ordering right, then pop off a few commits using `git reset HEAD~n`, then add changes back in using:

git add -p

which prompts you about each change. (You can also use `git add filename` to add all the changes in a file). After doing all this, use `git commit` to commit. My muscle memory often appends the `-a`, which ruins all the work put in to separating out changes.

Sometimes this is too much work, in which case the best thing to do is save all the changes from your commits as a diff, edit them around in a text editor, then patch them back piece by piece when committing. Something like:

git diff ... >patch.diff
...
patch -p1

Every now and again, I'll need to copy a commit from one branch to another. I use `git log branch-name` to show the commits, copy the hash from the commit I want to copy, then use

git cherry-pick hash

to copy the commit into the current branch.

Finally, if things go wrong and you can't see a way out, `git reflog` is the secret magic that can fix nearly everything. It shows a log of pretty much everything Git has done, down to a fine level of detail. You can usually use this info to get out of any pickle (you'll have to google the specifics). However, Git only know about files which have been committed at least once, so even more reason to do regular, small commits.

Thursday, April 30, 2015

rustfmt - call for contributions

I've been experimenting with a rustfmt tool for a while now. Its finally in working shape (though still very, very rough) and I'd love some help on making it awesome.

rustfmt is a reformatting tool for Rust code. The idea is that it takes your code, tidies it up, and makes sure it conforms to a set of style guidelines. There are similar tools for C++ (clang format), Go (gofmt), and many other languages. Its a really useful tool to have for a language, since it makes it easy to adhere to style guidelines and allows for mass changes when guidelines change, thus making it possible to actually change the guidelines as needed.

Eventually I would like rustfmt to do lots of cool stuff like changing glob imports to list imports, or emit refactoring scripts to rename variables to adhere to naming conventions. In the meantime, there are lots of interesting questions about how to lay out things like function declarations and match expressions.

My approach to rustfmt is very incremental. It is usable now and gives good results, but it only touches a tiny subset of language items, for example function definitions and calls, and string literals. It preserves code elsewhere. This makes it immediately useful.

I have managed to run it on several crates (or parts of crates) in the rust distro. It also bootstraps, i.e., you can rustfmt on rustfmt before every check-in, in fact this is part of the test suite.

It would be really useful to have people running this tool on their own code or on other crates in the rust distro, and filing issues and/or test cases where things go wrong. This should actually be a useful tool to run, not just a chore, and will get more useful with time.

It's a great project to hack on - you'll learn a fair bit about the Rust compiler's frontend and get a great understanding of more corners of the language than you'll ever want to know about. It's early days too, so there is plenty of scope for having a big impact on the project. I find it a lot of fun too! Just please forgive some of the hackey code that I've already written.

Here is the rustfmt repo on GitHub. I just added a bunch of information to the repo readme which should help new contributors. Please let me know if there is other information that should go in there. I've also created some good issues for new contributors. If you'd like to help out and need help, please ping me on irc (I'm nrc).

Monday, April 13, 2015

Contributing to Rust

I wrote a few things about contributing to Rust. What with the imminent 1.0 release, now is a great time to learn more about Rust and contribute code, tests, or docs to Rust itself or a bunch of other exciting projects.

The main thing I wanted to do was make it easy to find issues to work on. I also stuck in a few links to various things that new contributors should find useful.

I hope it is useful, and feel free to ping me (nrc in #rust-internals) if you want more info.

Sunday, April 12, 2015

New tutorial - arrays and vectors in Rust

I've just put up a new tutorial on Rust for C++ programmers: arrays and vectors. This covers everything you might need to know about array-like sequences in Rust (well, not everything, but at least some of the things).

As well as the basics on arrays, slices, and vectors (Vec), I dive into the differences in representing arrays in Rust compared with C/C++, describe how to use Rust's indexing syntax with your own collection types, and touch on some aspects of dynamically sized types (DSTs) and fat pointers in Rust.

Friday, April 03, 2015

Graphs in Rust

Graphs are a bit awkward to construct in Rust because of Rust's stringent lifetime and mutability requirements. Graphs of objects are very common in OO programming. In this tutorial I'm going to go over a few different approaches to implementation. My preferred approach uses arena allocation and makes slightly advanced use of explicit lifetimes. I'll finish up by discussing a few potential Rust features which would make using such an approach easier.

There are essentially two orthogonal problems: how to handle the lifetime of the graph and how to handle it's mutability.

The first problem essentially boils down to what kind of pointer to use to point to other nodes in the graph. Since graph-like data structures are recursive (the types are recursive, even if the data is not) we are forced to use pointers of some kind rather than have a totally value-based structure. Since graphs can be cyclic, and ownership in Rust cannot be cyclic, we cannot use Box<Node> as our pointer type (as we might do for tree-like data structures or linked lists).

No graph is truly immutable. Because there may be cycles, the graph cannot be created in a single statement. Thus, at the very least, the graph must be mutable during its initialisation phase. The usual invariant in Rust is that all pointers must either be unique or immutable. Graph edges must be mutable (at least during initialisation) and there can be more than one edge into any node, thus no edges are guaranteed to be unique. So we're going to have to do something a little bit advanced to handle mutability.

...

Read the full tutorial, with examples. There's also some discussion of potential language improvements in Rust to make dealing with graphs easier.

Monday, February 23, 2015

Creating a drop-in replacement for the Rust compiler

Many tools benefit from being a drop-in replacement for a compiler. By this, I mean that any user of the tool can use `mytool` in all the ways they would normally use `rustc` - whether manually compiling a single file or as part of a complex make project or Cargo build, etc. That could be a lot of work; rustc, like most compilers, takes a large number of command line arguments which can affect compilation in complex and interacting ways. Emulating all of this behaviour in your tool is annoying at best, especially if you are making many of the same calls into librustc that the compiler is.

The kind of things I have in mind are tools like rustdoc or a future rustfmt. These want to operate as closely as possible to real compilation, but have totally different outputs (documentation and formatted source code, respectively). Another use case is a customised compiler. Say you want to add a custom code generation phase after macro expansion, then creating a new tool should be easier than forking the compiler (and keeping it up to date as the compiler evolves).

I have gradually been trying to improve the API of librustc to make creating a drop-in tool easier to produce (many others have also helped improve these interfaces over the same time frame). It is now pretty simple to make a tool which is as close to rustc as you want it to be. In this tutorial I'll show how.

Note/warning, everything I talk about in this tutorial is internal API for rustc. It is all extremely unstable and likely to change often and in unpredictable ways. Maintaining a tool which uses these APIs will be non- trivial, although hopefully easier than maintaining one that does similar things without using them.

This tutorial starts with a very high level view of the rustc compilation process and of some of the code that drives compilation. Then I'll describe how that process can be customised. In the final section of the tutorial, I'll go through an example - stupid-stats - which shows how to build a drop-in tool.

Continue reading on GitHub...

Sunday, January 11, 2015

Recent syntactic changes to Rust

The last few weeks I implemented a few syntactic changes in Rust. I wanted to go over those and explain the motivation so it doesn't just seem like churn. None of the designs were mine, but I agree with all of them. (Oh, by the way, did you see we released the 1.0 Alpha?!!!!!!).

Slicing syntax

Slicing syntax has changed from `foo[a..b]` to `&foo[a..b]`, although that might not look like much of a change, it is actually the deepest one I'll cover here. Under the covers we moved from having a `Slice` trait to using the `Index` trait. So `&foo[a..b]` works exactly the same way as overloaded indexing `foo[a]`, the difference being that slicing is indexing using a range. One advantage of this approach is that ranges are now first class expressions - you can write `for i in 0..5 { ... }` and get the expected result. The other advantage is that the borrow becomes explicit (the newly required `&`). Since borrowing is important in Rust, it is great to see where things are borrowed and so we are trying to make that as explicit as possible. Previously, the borrow was implicit. The final benefit of this approach is that there is one fewer 'blessed' traits and one fewer kind of expression in the language.

Fixed length and repeating arrays

The syntax for these changed from `[T, ..n]` and `[expr, ..n]` to `[T; n]` and `[expr; n]`, respectively. This is a pretty minor change and the motivation was to allow the wider use of range syntax (see above) - the `..n` could ambiguously be a range or part of these expressions. The `..` syntax is somewhat overloaded in Rust already, so I think this is a nice change in that respect. The semicolon seems just as clear to me, involves fewer characters, and there is less confusion with other expressions.

Sized bound

Used for dynamically sized types the `Sized?` bound indicated that a type parameter may or not be bounded by the `Sized` trait (type parameters have the `Sized` bound by default). Being bound by the `Sized` trait indicates to the compiler that the object has statically known size (as opposed to DST objects). We changed the syntax so that rather than writing `Sized? T` you write `T: ?Sized`. This has the advantage that it is more regular - now all bounds (regular or optional) come after the type parameter name. It will also fit with negative bounds when they come about, which will have the syntax `!Foo`.

`self` in `use` imports

We used to accept `mod` in `use` imports to allow the import of the module itself, e.g., `use foo::bar::{mod, baz};` would import `foo::bar` and `foo::bar::baz`. We changed `mod` to `self`, making the example `use foo::bar::{self, baz};`. This is a really minor change, but I like it - `self`/`Self` has a very consistent meaning as a variable of some kind, whereas `mod` is a keyword; I think that made the original syntax a bit jarring. We also recently introduced scoped enums, which make the pattern of including a module (in this case an enum) and its components more common. Especially in the enum case, I think `self` fits better than `mod` because you are referring to data rather than a module.

`derive` in attributes

The `derive` attribute is used to derive some standard traits (e.g., `Copy`, `Show`) for your data structures. It was previously called `deriving`, and now it is `derive`. This is another minor change, but makes it more consistent with other attributes, and consistency is great!

Thanks to Japaric for doing a whole bunch of work converting our code to using the new syntaxes.