Tuesday, May 06, 2014

Rust for C++ programmers - part 5: borrowed references

In the last post I introduced unique pointers. This time I will talk about another kind of pointer which is much more common in most Rust programs: borrowed pointers (aka borrowed references, or just references).

If we want to have a reference to an existing value (as opposed to creating a new value on the heap and pointing to it, as with unique pointers), we must use `&`, a borrowed reference. These are probably the most common kind of pointer in Rust, and if you want something to fill in for a C++ pointer or reference (e.g., for passing a parameter to a function by reference), this is probably it.

We use the `&` operator to create a borrowed reference and to indicate reference types, and `*` to dereference them. The same rules about automatic dereferencing apply as for unique pointers. For example,
fn foo() {
    let x = &3;   // type: &int
    let y = *x;   // 3, type: int
    bar(x, *x);
    bar(&y, y);
}

fn bar(z: &int, i: int) {
    // ...
}
The `&` operator does not allocate memory (we can only create a borrowed reference to an existing value) and if a borrowed reference goes out of scope, no memory gets deleted.

Borrowed references are not unique - you can have multiple borrowed references pointing to the same value. E.g.,

fn foo() {
    let x = 5;                // type: int
    let y = &x;               // type: &int
    let z = y;                // type: &int
    let w = y;                // type: &int
    println!("These should all 5: {} {} {}", *w, *y, *z);
}
Like values, borrowed references are immutable by default. You can also use `&mut` to take a mutable reference, or to denote mutable reference types. Mutable borrowed references are unique (you can only take a single mutable reference to a value, and you can only have a mutable reference if there are no immutable references). You can use mutable reference where an immutable one is wanted, but not vice versa. Putting all that together in an example:
fn bar(x: &int) { ... }
fn bar_mut(x: &mut int) { ... }  // &mut int is a reference to an int which
                                 // can be mutated

fn foo() {
    let x = 5;
    //let xr = &mut x;     // Error - can't make a mutable reference to an
                           // immutable variable
    let xr = &x;           // Ok (creates an immutable ref)
    bar(xr);
    //bar_mut(xr);         // Error - expects a mutable ref

    let mut x = 5;
    let xr = &x;           // Ok (creates an immutable ref)
    //*xr = 4;             // Error - mutating immutable ref
    //let xr = &mut x;     // Error - there is already an immutable ref, so we
                           // can't make a mutable one

    let mut x = 5;
    let xr = &mut x;       // Ok (creates a mutable ref)
    *xr = 4;               // Ok
    //let xr = &x;         // Error - there is already a mutable ref, so we
                           // can't make an immutable one
    //let xr = &mut x;     // Error - can only have one immutable ref at a time
    bar(xr);               // Ok
    bar_mut(xr);           // Ok
}
Note that the reference may be mutable (or not) independently of the mutableness of the variable holding the reference. This is similar to C++ where pointers can be const (or not) independently of the data they point to. This is in contrast to unique pointers, where the mutableness of the pointer is linked to the mutableness of the data. For example,
fn foo() {
    let mut x = 5;
    let mut y = 6;
    let xr = &mut x;
    //xr = &mut y;        // Error xr is immutable

    let mut x = 5;
    let mut y = 6;
    let mut xr = &mut x;
    xr = &mut y;          // Ok

    let mut x = 5;
    let mut y = 6;
    let mut xr = &x;
    xr = &y;              // Ok - xr is mut, even though the referenced data is not
}
If a mutable value is borrowed, it becomes immutable for the duration of the borrow. Once the borrowed pointer goes out of scope, the value can be mutated again. This is in contrast to unique pointers, which once moved can never be used again. For example,
fn foo() {
    let mut x = 5;            // type: int
    {
        let y = &x;           // type: &int
        //x = 4;              // Error - x has been borrowed
        println!("{}", x);    // Ok - x can be read
    }
    x = 4;                    // OK - y no longer exists
}
The same thing happens if we take a mutable reference to a value - the value still cannot be modified. In general in Rust, data can only ever be modified via one variable or pointer. Furthermore, since we have a mutable reference, we can't take an immutable reference. That limits how we can use the underlying value:
fn foo() {
    let mut x = 5;            // type: int
    {
        let y = &mut x;       // type: &mut int
        //x = 4;              // Error - x has been borrowed
        //println!("{}", x);  // Error - requires borrowing x
        let z = *y + x;       // Ok - doesn't require borrowing
    }
    x = 4;                    // OK - y no longer exists
}
Unlike C++, Rust won't automatically reference a value for you. So if a function takes a parameter by reference, the caller must reference the actual parameter. However, pointer types will automatically be converted to a reference:
fn foo(x: &int) { ... }

fn bar(x: int, y: ~int) {
    foo(&x);
    // foo(x);   // Error - expected &int, found int
    foo(y);      // Ok
    foo(&*y);    // Also ok, and more explicit, but not good style
}

`mut` vs `const`

At this stage it is probably worth comparing `mut` in Rust to `const` in C++. Superficially they are opposites. Values are immutable by default in Rust and can be made mutable by using `mut`. Values are mutable by default in C++, but can be made constant by using `const`. The subtler and more important difference is that C++ const-ness applies only to the current use of a value, whereas Rust's immutability applies to all uses of a value. So in C++ if I have a `const` variable, someone else could have a non-const reference to it and it could change without me knowing. In Rust if you have an immutable variable, you are guaranteed it won't change.

As we mentioned above, all mutable variables are unique. So if you have a mutable value, you know it is not going to change unless you change it. Furthermore, you can change it freely since you know that no one else is relying on it not changing.

Borrowing and lifetimes

One of the primary safety goals of Rust is to avoid dangling pointers (where a pointer outlives the memory it points to). In Rust, it is impossible to have a dangling borrowed reference. It is only legal to create a borrowed reference to memory which will be alive longer than the reference (well, at least as long as the reference). In other words, the lifetime of the reference must be shorter than the lifetime of the referenced value.

That has been accomplished in all the examples in this post. Scopes introduced by `{}` or functions are bounds on lifetimes - when a variable goes out of scope its lifetime ends. If we try to take a reference to a shorter lifetime, such as in a narrower scope, the compiler will give us an error. For example,
fn foo() {
    let x = 5;
    let mut xr = &x;  // Ok - x and xr have the same lifetime
    {
        let y = 6;
        //xr = &y     // Error - xr will outlive y
    }                 // y is released here
}                     // x and xr are released here
In the above example, x and xr don't have the same lifetime because xr starts later than x, but it's the end of lifetimes which is more interesting, since you can't reference a variable before it exists in any case - something else which Rust enforces and which makes it safer than C++.

Explicit lifetimes

After playing with borrowed pointers for a while, you'll probably come across borrowed pointers with an explicit lifetime. These have the syntax `&'a T` (cf `&T`). They're kind of a big topic since I need to cover lifetime-polymorphism at the same time so I'll leave it for another post (there are a few more less common pointer types to cover first though). For now, I just want to say that `&T` is a shorthand for `&'a T` where `a` is the current scope, that is the scope in which the type is declared.

11 comments:

  1. Anonymous1:01 pm

    Thank you for writing these guides!
    Why is it possible to read a value through it's original variable while a mutable reference exists, but not possible to create an immutable reference to that value? (in your example: let z = *y +x) Seems slightly inconsistent since, unless I'm missing something, an immutable reference can also only be used to read the value.

    ReplyDelete
  2. Anonymous5:12 pm

    I don't know Rust, so perhaps I misunderstand the things being shown here, but these two examples don't look right. The first looks like a mutable reference,

    //let xr = &mut x; // Error - can only have one immutable ref at a time


    And, here, the referenced data is mut,

    let mut x = 5;
    let mut y = 6;
    let mut xr = &x;
    xr = &y; // Ok - xr is mut, even though the referenced data is not

    ReplyDelete
  3. This comment:
    //let xr = &mut x; // Error - can only have one immutable ref at a time

    Should read as "one mutable ref at a time"

    BTW, great blog!

    ReplyDelete
  4. Anonymous7:17 am

    Why is the println() disallowed while z=y*+x is allowed? It says that x must be borrowed in order to println, but isn't the same true of z=y*+x? How else is the value of x added to the reference of y unless it is borrowed?

    ReplyDelete
  5. Isn't it y* + x is equivalent to y.add(x) and that requires borrowing of x according to http://static.rust-lang.org/doc/master/std/ops/trait.Add.html :

    pub trait Add {
    fn add(&self, rhs: &RHS) -> Result;
    }

    So both y* + x and print!(..., x) should have the same rules.

    ReplyDelete
  6. The text says

    "So in C++ if I have a `const` variable, someone else could have a non-const reference to it and it could change without me knowing."

    Sorry, but that's just nonsense. It is not possible to change a `const` object in C++ - the behavior would be undefined. What you can have in C++ is multiple access paths to the same non-const object. Some access paths might be const, other access paths might be non-const. The object is modifiable through non-const access paths and the changes are immediately visible through all access paths (including const ones).

    But the key moment here is that the object in question is non-const. If it is const, it is unchangeable in C++.

    ReplyDelete
  7. Nathan11:28 pm

    Actually, you can easily change a constant variable in C++. You just have to do some shenanigans:

    int main(int argc, char * argv[] ) {
    const volatile int myderp = 4;
    int * change = (int *)(& myderp);
    *change = 5;
    printf("change: %d\n", *change);
    printf("myderp: %d\n", myderp);
    }

    I compiled and verified this; without declaring the truly silly "const volatile" myderp was being optimized to "4" in the printf statement, but it was very definitely mutated.

    ReplyDelete
    Replies
    1. You are not changing a constant variable in your code. You are simply attempting to do so, which triggers undefined behavior. In your case that behavior manifested itself in that bizzare fashion: as if the variable got modified. But that manifestation is unreliable and meaningless within the realm of C++ language.

      Delete
    2. This comment has been removed by the author.

      Delete
  8. Nathan11:29 pm

    Should have added: http://en.cppreference.com/w/cpp/language/const_cast

    Even for C++ style casting, there is const_cast<>. Yes, it's techincally undefined behavior, but it works great!

    ReplyDelete
    Replies
    1. The purpose of 'const_cast' is to allow you to remove constness from a const access path (a pointer or a reference) that leads to a non-const object. The fact that the object itself is non-const is the key moment here.

      'const_cast' cannot be legally used to modify const objects. It is not intended for that purpose and any attempts to do so trigger undefined behavior.

      Delete