Wednesday, April 23, 2014

Rust for C++ programmers - part 3: primitive types and operators

Rust has pretty much the same arithmetic and logical operators as C++. `bool` is the same in both languages (as are the `true` and `false` literals). Rust has similar concepts of integers, unsigned integers, and floats. However the syntax is a bit different. Rust uses `int` to mean an integer and `uint` to mean an unsigned integer. These types are pointer sized. E.g., on a 32 bit system, `uint` means a 32 bit unsigned integer. Rust also has explicitly sized types which are `u` or `i` followed by 8, 16, 32, or 64. So, for example, `u8` is an 8 bit unsigned integer and `i32` is a 32 bit signed integer. For floats, Rust has `f32` and `f64` (`f128` is coming soon too).

Numeric literals can take suffixes to indicate their type (using `i` and `u` instead of `int` and `uint`). If no suffix is given, Rust tries to infer the type. If it can't infer, it uses `int` or `f64` (if there is a decimal point). Examples:
fn main() {
    let x: bool = true;
    let x = 34;   // type int
    let x = 34u;  // type uint
    let x: u8 = 34u8;
    let x = 34i64;
    let x = 34f32;
}
As a side note, Rust lets you redefine variables so the above code is legal - each `let` statement creates a new variable `x` and hides the previous one. This is more useful than you might expect due to variables being immutable by default.

Numeric literals can be given as binary, octal, and hexadecimal, as well as decimal. Use the `0b`, `0o`, and `0x` prefixes, respectively. You can use an underscore anywhere in a numeric literal and it will be ignored. E.g,
fn main() {
    let x = 12;
    let x = 0b1100;
    let x = 0o14;
    let x = 0xe;
    let y = 0b_1100_0011_1011_0001;
}
Rust has chars and strings, but since they are Unicode, they are a bit different from C++. I'm going to postpone talking about them until after I've introduced pointers, references, and vectors (arrays).

Rust does not implicitly coerce numeric types. In general, Rust has much less implicit coercion and subtyping than C++. Rust uses the `as` keyword for explicit coercions and casting. Any numeric value can be cast to another numeric type. `as` cannot be used to convert between booleans and numeric types. E.g.,
fn main() {
    let x = 34u as int;     // cast unsigned int to int
    let x = 10 as f32;      // int to float
    let x = 10.45f64 as i8; // float to int (loses precision)
    let x = 4u8 as u64;     // gains precision
    let x = 400u16 as u8;   // 144, loses precision (and thus changes the value)
    println!("`400u16 as u8` gives {}", x);
    let x = -3i8 as u8;     // 253, signed to unsigned (changes sign)
    println!("`-3i8 as u8` gives {}", x);
    //let x = 45u as bool;  // FAILS!
}
Rust has the following numeric operators: `+`, `-`, `*`, `/`, `%`; bitwise operators: `|`, `&`, `^`, `<<`, `>>`; comparison operators: `==`, `!=`, `>`, `<`, `>=`, `<=`; short-circuit logical operators: `||`, `&&`. All of these behave as in C++, however, Rust is a bit stricter about the types the operators can be applied to - the bitwise operators can only be applied to integers and the logical operators can only be applied to booleans. Rust has the `-` unary operator which negates a number. The `!` operator negates a boolean and inverts every bit on an integer type (equivalent to `~` in C++ in the latter case). Rust has compound assignment operators as in C++, e.g., `+=`, but does not have increment or decrement operators (e.g., `++`).

Sunday, April 20, 2014

Formatting change

Quick note - yes, I have changed the formatting on my blog - hope you like it!

Rust for C++ programmers - part 2: control flow

If

The `if` statement is pretty much the same in Rust as C++. One difference is that the braces are mandatory, but brackets around the expression being tested are not. Another is that `if` is an expression, so you can use it the same way as the ternary `?` operator in C++ (remember from last time that if the last expression in a block is not terminated by a semi-colon, then it becomes the value of the block). There is no ternary `?` in Rust. So, the following two functions do the same thing:
fn foo(x: int) -> &'static str {
    let mut result: &'static str;
    if x < 10 {
        result = "less than 10";
    } else {
        result = "10 or more";
    }
    return result;
}

fn bar(x: int) -> &'static str {
    if x < 10 {
        "less than 10"
    } else {
        "10 or more"
    }
}
The first is a fairly literal translation of what you might write in C++. The second is in better Rust style.

You can also write `let x = if ...`, etc.

Loops

Rust has while loops, again just like C++:
fn main() {
    let mut x = 10;
    while x > 0 {
        println!("Current value: {}", x);
        x -= 1;
    }
}
There is no do...while loop in Rust, but we do have the `loop` statement which just loops forever:
fn main() {
    loop {
        println!("Just looping");   
    }
}
Rust has `break` and `continue` just like C++.

For loops

Rust also has `for` loops, but these are a bit different. Lets say you have a vector of ints and you want to print them all (we'll cover vectors/arrays, iterators, and generics in more detail in the future. For now, know that a `Vec<T>` is a sequence of `T`s and `iter()` returns an iterator from anything you might reasonably want to iterate over). A simple `for` loop would look like:
fn print_all(all: Vec<int>) {
    for a in all.iter() {
        println!("{}", a);
    }
}
If we want to index over the indices of `all` (a bit more like a standard C++ for loop over an array), you could do
fn print_all(all: Vec<int>) {
    for i in range(0, all.len()) {
        println!("{}: {}", i, all.get(i));
    }
}
Hopefully, it is obvious what the `range` and `len` functions do.

Switch/Match

Rust has a match expression which is similar to a C++ switch statement, but much more powerful. This simple version should look pretty familiar:
fn print_some(x: int) {
    match x {
        0 => println!("x is zero"),
        1 => println!("x is one"),
        10 => println!("x is ten"),
        y => println!("x is something else {}", y),
    }
}
There are some syntactic differences - we use `=>` to go from the matched value to the expression to execute, and the match arms are separated by `,` (that last `,` is optional). There are also some semantic differences which are not so obvious: the matched patterns must be exhaustive, that is all possible values of the matched expression (`x` in the above example) must be covered. Try removing the `y => ...` line and see what happens; that is because we only have matches for 0, 1, and 10 and obviously there are lots of other ints which don't get matched. In that last arm, `y` is bound to the value being matched (`x` in this case). We could also write:
fn print_some(x: int) {
    match x {
        x => println!("x is something else {}", x)
    }
}
Here the `x` in the match arm introduces a new variable which hides the argument `x`, just like declaring a variable in an inner scope.

If we don't want to name the variable, we can use `_` for an unnamed variable, which is like having a wildcard match. If we don't want to do anything, we can provide an empty branch:
fn print_some(x: int) {
    match x {
        0 => println!("x is zero"),
        1 => println!("x is one"),
        10 => println!("x is ten"),
        _ => {}
    }
}
Another semantic difference is that there is no fall through from one arm to the next.

We'll see in later posts that match is extremely powerful. For now I want to introduce just a couple more features - the 'or' operator for values and `if` clauses on arms. Hopefully an example is self-explanatory:
fn print_some_more(x: int) {
    match x {
        0 | 1 | 10 => println!("x is one of zero, one, or ten"),
        y if y < 20 => println!("x is less than 20, but not zero, one, or ten"),
        y if y == 200 => println!("x is 200 (but this is not very stylish)"),
        _ => {}
    }
}
Just like `if` expressions, `match` statements are actually expressions so we could re-write the last example as:
fn print_some_more(x: int) {
    let msg = match x {
        0 | 1 | 10 => "one of zero, one, or ten",
        y if y < 20 => "less than 20, but not zero, one, or ten",
        y if y == 200 => "200 (but this is not very stylish)",
        _ => "something else"
    };

    println!("x is {}", msg);
}
Note the semi-colon after the closing brace, that is because the `let` statement is a statement and must take the form `let msg = ...;`. We fill the rhs with a match expression (which doesn't usually need a semi-colon), but the `let` statement does. This catches me out all the time.

Motivation: Rust match statements avoid the common bugs with C++ switch statements - you can't forget a `break` and unintentionally fall through; if you add a case to an enum (more later on) the compiler will make sure it is covered by your `match` statement.

Method call

Finally, just a quick note that methods exist in Rust, similarly to C++. They are always called via the `.` operator (no `->`, more on this in another post). We saw a few examples above (`len`, `iter`). We'll go into more detail in the future about how they are defined and called. Most assumptions you might make from C++ or Java are probably correct.

Friday, April 18, 2014

Rust for C++ programmers - an intermission - why Rust

I realise that in terms of learning Rust, I had jumped straight to the 'how' and skipped the 'why'. I guess I am in enough of a Rust bubble that I can't imagine why you wouldn't want to learn it. So, I will make a bit more of an effort to explain why things are how they are. Here I will try to give a bit of an overview/motivation.

If you are using C or C++, it is probably because you have to - either you need low-level access to the system, or need every last drop of performance, or both. Rust aims to do offer the same level of abstraction around memory, the same performance, but be safer and make you more productive.

Concretely, there are many languages out there that you might prefer to use to C++: Java, Scala, Haskell, Python, and so forth, but you can't because either the level of abstraction is too high - you don't get direct access to memory, you are forced to use garbage collection, etc. - or there are performance issues - either performance is unpredictable or its simply not fast enough. Rust does not force you to use garbage collection, and as in C++, you get raw pointers to memory to play with. Rust subscribes to the 'pay for what you use' philosophy of C++. If you don't use a feature, then you don't pay any performance overhead for its existence. Furthermore, all language features in Rust have predictable (and usually small) cost.

Whilst these constraints make Rust a (rare) viable alternative to C++, Rust also has benefits: it is memory safe - Rust's type system ensures that you don't get the kind of memory errors which are common in C++ - memory leaks, accessing un-initialised memory, dangling pointers - all are impossible in Rust. Furthermore, whenever other constraints allow, Rust strives to prevent other safety issues too - for example, all array indexing is bounds checked (of course, if you want to avoid the cost, you can (at the expense of safety) - Rust allows you to do this in unsafe blocks, along with many other unsafe things. Crucially, Rust ensures that unsafety in unsafe blocks stays in unsafe blocks and can't affect the rest of your program). Finally, Rust takes many concepts from modern programming languages and introduces them to the systems language space. Hopefully, that makes programming in Rust more productive, efficient, and enjoyable.

I would like to motivate some of the language features from part 1. Local type inference is convenient and useful without sacrificing safety or performance (it's even in modern versions of C++ now). A minor convenience is that language items are consistently denoted by keyword (`fn`, `let`, etc.), this makes scanning by eye or by tools easier, in general the syntax of Rust is simpler and more consistent than C++. The `println!` macro is safer than printf - the number of arguments is statically checked against the number of 'holes' in the string and the arguments are type checked. This means you can't make the printf mistakes of printing memory as if it had a different type or addressing memory further down the stack by mistake. These are fairly minor things, but I hope they illustrate the philosophy behind the design of Rust.

Rust for C++ programmers - part 1: Hello world

This is the first in a series of blog posts (none written yet) which aim to help experienced C++ programmers learn Rust. Expect updates to be sporadic at best. In this first blog post we'll just get setup and do a few super basic things. Much better resources are at the tutorial and reference manual.

First you need to install Rust. You can download a nightly build from http://www.rust-lang.org/install.html (I recommend the nighlties rather than 'stable' versions - the nightlies are stable in that they won't crash too much (no more than the stable versions) and you're going to have to get used to Rust evolving under you sooner or later anyway). Assuming you manage to install things properly, you should then have a `rustc` command available to you. Test it with `rustc -v`.

Now for our first program. Create a file, copy and paste the following into it and save it as `hello.rs` or something equally imaginative.
fn main() {
    println!("Hello world!");
}
Compile this using `rustc hello.rs`, and then run `./hello`. It should display the expected greeting \o/

Two compiler options you should know are `-o ex_name` to specify the name of the executable and `-g` to output debug info; you can then debug as expected using gdb or lldb, etc. Use `-h` to show other options.

OK, back to the code. A few interesting points - we use `fn` to define a function or method. `main()` is the default entry point for our programs (we'll leave program args for later). There are no separate declarations or header files as with C++. `println!` is Rust's equivalent of printf. The `!` means that it is a macro, for now you can just treat it like a regular function. A subset of the standard library is available without needing to be explicitly imported/included (we'll talk about that later). The `println!` macros is included as part of that subset.

Lets change our example a little bit:
fn main() {
    let world = "world";
    println!("Hello {}!", world);
}
`let` is used to introduce a variable, world is the variable name and it is a string (technically the type is `&'static str`, but more on that in a later post). We don't need to specify the type, it will be inferred for us.

Using `{}` in the `println!` statement is like using `%s` in printf. In fact, it is a bit more general than that because Rust will try to convert the variable to a string if it is not one already*. You can easily play around with this sort of thing - try multiple strings and using numbers (integer and float literals will work).

If you like, you can explicitly give the type of `world`:
    let world: &'static str = "world";
In C++ we write `T x` to declare a variable `x` with type `T`. In Rust we write `x: T`, whether in `let` statements or function signatures, etc. Mostly we omit explicit types in `let` statements, but they are required for function arguments. Lets add another function to see it work:
fn foo(_x: &'static str) -> &'static str {
    "world"
}

fn main() {
    println!("Hello {}!", foo("bar"));
}
The function `foo` has a single argument `_x` which is a string literal (we pass it "bar" from `main`). We don't actually use that argument in `foo`. Usually, Rust will warn us about this. By prefixing the argument name with `_` we avoid these warnings. In fact, we don't need to name the argument at all, we could just use `_`.

The return type for a function is given after `->`. If the function doesn't return anything (a void function in C++), we don't need to give a return type at all (as in `main`). If you want to be super-explicit, you can write `-> ()`, `()` is the void type in Rust. `foo` returns a string literal.

You don't need the `return` keyword in Rust, if the last expression in a function body (or any other body, we'll see more of this later) is not finished with a semicolon, then it is the return value. So `foo` will always return "world". The `return` keyword still exists so we can do early returns. You can replace `"world"` with `return "world";` and it will have the same effect.



* This is a programmer specified conversion which uses the `Show` trait, which works a bit like `toString` in Java. You can also use `{:?}` which gives a compiler generated representation which is sometimes useful for debugging. As with printf, there are many other options.

Monday, April 07, 2014

Anger

There was a lot of anger about Brendan being appointed CEO. There has been a lot of anger since he quit. It is in no way my place to tell people when they should and should not be angry. One of the big points made by progressive movements is that everyone has a right to get angry about things which affect them and people who aren't affected shouldn't tell people not to be angry. Doing so is a control tactic and just generally unfair (especially where it intersects with myths and stereotypes - 'the gay agenda', 'the angry black woman', etc.). I agree. It is one reason I said nothing much about the whole affair. Better for me to listen.

Still, after all this has played out (hopefully), I am left feeling a bit angry myself. And a lot disappointed. Previously, I have mostly agreed with the progressive movements (feminism, LGBT rights, anti-racism, and so forth). When I have not agreed, I have often had my mind changed. I have learnt a lot and I have a great respect for many people in these movements. It feels bad to be on the wrong side of that. It seems to me that the subtlety in the discussion was lost - assumptions were made, opinions were fought for, there was not much attempt to establish empathy and tolerance, nor to accurately learn the specifics of the situation.

Back to anger. Though it is important not to tell people when they are allowed to get angry or what that anger should look like, I would like to suggest how that anger should be used. Anger can be constructive - it is one of the most motivating human emotions and has led to great changes over the years. It can also be amazingly destructive with no purpose - from a child's tantrum to pretty much every war ever fought. Sometimes it is good, emotionally, to get angry and break things. But we must try to put some thought into what gets broken. It was in large part anger that brought LGBT (and other civil) rights to where they are today. We need more of that, and less just breaking stuff, even if it makes us feel better.

In the last couple of weeks we've seen a lot of (justified) anger, but the result has not been positive. Things got broken, but nothing has changed for the better. A small, non-profit organisation which fights for freedom and privacy on the internet against corporate interests and overbearing governments has been damaged in many ways. All to harm a man who made a semi-public donation to an admittedly odious cause. I can't think of anyone who's life has got better from this, maybe some CEOs of other companies who probably have private views worse than Brendan's, but weren't as honest about declaring them.

Sunday, April 06, 2014

Marriage

I have a lot of thoughts and feelings around the recent Brendan/CEO kerfuffle. Mostly though I think it has been better to listen than to talk, so I haven't blogged about it. I will just say that I am very unhappy with the result of it all. I would also like to voice my thoughts on marriage.

One element of misunderstanding, it seems to me, is about whether marriage is important. For some people, marriage is viewed as a purely religious/cultural construct which should be dictated by their religion/culture. They don't see why it is so important for gay people (or sometimes other minority groups) to be able to marry. Especially if alternatives such as civil unions exist. They have the privilege of not being denied the marriage of their choice.

In contrast, for many people marriage is a large and practical thing since it can affect things such as immigration status, benefits, hospital visitation, etc. (As well as having their relationships treated as second class in the eyes of the wider society, of course). In my view, it is unfortunate that the practical side of things exists. I am lucky enough to live in a country where marriage is (mostly) not important in that way, and I prefer it greatly.

Marriage is a wonderful thing, and I would not deny it to anybody who wants it. In my view, it should not involve either the state or any cultural or religious institution. I find the fact that a couple has to be married by a third party weird. In my ideal world, the people being married would only have to marry themselves to each other, and no-one else would get a say. Marriage should simply be a public declaration of commitment in front of the people who are important to those being married. No-one should have to officiate or register it, and no-one should have to say who can or who can't get married. And certainly, being married should have no effects on your legal or moral life.

To clarify, I don't think marriage should lead to tax breaks or extra respect from any institution. I don't believe adultery should be judged any better or worse because of it, etc.

Once marriage brings material benefit from the state or the legal system, and once marriage is bestowed by an institution rather than being freely chosen, it becomes just another tool for enforcing established power. By allowing powerful groups of people to bestow benefits either social or material on individuals, it becomes open to corruption. It becomes something minorities have to fight for and which exclusive majorities seek to prevent others obtaining. That an expression of love and commitment ends up like this is immensely saddening, and says a lot about human society.

And don't even get me started about the commercial side of things. The whole wedding industry makes me feel sick.

A very rough few thoughts on initialising data structures in Rust

This is not a well thought out blog post. It ought to be a tweet, but it is too long.

Thoughts (in kind of random order):
  • Rust has an ownership model based on hierarchies.
  • It is easy to support tree-based data structures this way.
  • Structures with cycles are much more annoying.
  • The real pain is in initialisation.
  • You can have a data structure with cycles using ref-counting or borrowed references as long as you 'delete' the whole data structure at once.
  • Rust helps you do this with its type system.
  • BUT there is no way to initialise such a data structure because of the way `mut` works. Specifically the requirement that `mut` variables are unique and have move/borrow semantics.
  • Therefore you need unsafe blocks, which is a pain.
  • Since it seems that you ought to be able to verify safe initialisation (I'm not sure how though).
  • I think this is an area of the language which has not caused much pain because compilers (especially the frontend) and browsers are both very 'tree-heavy' programs. Long term we should find a solution to this, but it is certainly something that can be put off till post-1.0 (since it has an unsafe solution and I can' imagine a safe solution not being backwards-compatible).
  • Kind of related - when using a data structure where all components have the same lifetime (i.e., the whole structure must be destroyed at once) you end up writing a huge number of really un-interesting lifetime parameters on functions and data which basically just get passed around. I would love to have a shorthand here - perhaps modules could have lifetime parameters? Not sure if that would help that much. More thought required in any case...

Saturday, April 05, 2014

Some thoughts on data structures in Rust

One thing I'm not so keen on in Rust (more of an itch, than a pain) is that there are a lot of data structures. C++ has classes and structs (which are basically the same thing), enums, and unions. Java has classes and enums (unless its gained some recently). Scripting and functional languages tend to have more (lists, tuples, dictionaries/maps, and so forth). (I'm ignoring arrays, lists, and other sequences for this post). Rust has:
  •     structs
  •     tuples
  •     tuple structs
  •     unit structs
  •     enums
  •     unit variants
  •     'tuple' variants
  •     struct variants
This seems like a lot, and I think it is a bit confusing. But there is good motivation for all of them. Still I think it is a little bit of an ad-hoc collection.

We've been thinking recently about how to support structures like the DOM, and there have been a bunch of different proposals. One of mine (https://github.com/rust-lang/rfcs/pull/24) suggests using a kind of unification of structs and enums to support Java style inheritance. I think that idea probably won't work. But I have a lot of thoughts which I would like to preserve for posterity. I'm not proposing any changes to Rust here, just trying to put into order the data structures a bit, and maybe provide a design for a core calculus or something. Basically intellectual masturbation, but perhaps it is interesting.

Classifying data structures

I think it is useful to classify the data structure along several axes:
  1. Are there many variants or just one (enums  - many, everything else - just one);
  2. Do the data structures give types (yes, except for the variants of enums);
  3. Does the data structure introduce a name, and thus nominal typing (tuples don't, everything else does);
  4. Are the fields of the structure named (enums and unit structs/variants - NA, tuples and tuple variants - no, everything else - yes);
  5. Can the data structure be instantiated (enums - no, everything else - yes);
  6. Can the data structure be a variant of another (variants - yes, everything else - no).
Do these categories uniquely identify each data structure? Yes.

Are there combinations of categories which are empty? Also, yes. Some of these are not interesting - there are no instantiable data structure which don't introduce a type. Some are just kind of holes - why is there no anonymous record type (i.e., a structure with named fields, but which can't be named itself)? Some are design decisions - why can't variants be used as types (maybe they will in the future)? Finally, some are really interesting questions - why don't we have variants which are themselves enums? Why don't enums have fields (named or not)? What would it mean to instantiate an enum? Could we have un-named variants?

What would a minimal set of data structures look like

In RFC PR 24, I proposed nesting enums and unifying enums and structs. I would like to go even further and try and think about a minimal set of data structures with the same expressivity as Rust data structures (more, in fact).

First a few observations. If fields are not named, then order is important, otherwise it isn't. That means mixing named and un-named fields doesn't really work. If a structure is named then it is nominally typed, otherwise, it is structurally typed. A unit structure is the same as a structure with n fields (named or not), where n = 0.

Lets start with records - we want named and un-named records with named and un-named fields. That gives us structs and tuples, both with and without names (as opposed to Rust which is missing un-named structs). Here is a Rust-like syntax (I use `[]` to mean a comma-separated sequence):
S ::= 'struct' id '{' ([id:T] | [T]) '}'
T ::= id | '{' ([id:T] | [T]) '}'
e ::= id? '{' ([id:e] | [e]) '}'
The next thing we need are enums and variants. Lets follow my unification proposal and just add variants to structs. The next question is what the variants should be - lets just use the existing data structures. And then we have the question of should an enum be instantiable - rather than following my RFC and using enum vs struct to differentiate, lets just add a keyword - abstract. Now we have:
S ::= 'abstract'? 'struct' id '{' ([id:T] | [T]), [S] '}'
T ::= ... | id | '{' ([id:T] | [T]) '}'
e ::= ... | id? '{' [[id:e] | [e]] '}'
Note that the initialiser syntax has changed to be a sequence of sequences of possibly named expressions. That is because I imagine the fields of outer data structures being inherited by inner ones. We have to accommodate un-named fields (where order is important) and named fields (where order is not, at least within the sub-sequence).

I believe this emulates all Rust data structures and adds anonymous records. Enums are abstract structs with no fields, structs are structs with named fields and no variants. Unit, tuple, and struct variants are nullary structs, structs with un-named fields, and structs with named fields - each nested inside an abstract struct with no fields. And so forth. One difference is that our encoding of variants introduce types, but that is an increase in expressivity.

With reference to my earlier classification, the obviously missing structures are un-named structures with variants and un-named variants - I don't believe either are practically useful; neither are present in Rust.

We have also added expressivity in the form of inheritance of variants from their enclosing structures. In terms of data, we can therefore emulate Java-style classes (we need to add virtual functions to emulate their behaviour, see the RFC for details of that).

To make the syntax more friendly we would add unit variants and structs. Then eliminate the struct keyword. We could use different kinds of brackets. Continuing we could end up back at Rust. I hope the little syntax snippet gives a glimpse of an elegant foundation for all these data structures.

A practical matter

I said above that my RFC/proposal wouldn't actually work, and this thought experiment has the same problem - how to implement enums? As with any value we must know the size of the value at compile time. In a language like Java where everything is a pointer, that is easy - the size is always one word. In Rust we want to really pass values by value. Since each enum variant could be a different size, that is a problem.

The current Rust implementation is to use the maximum size of the variants as the static size for an enum value. Then pad any variant values which are smaller than that. That is fine if we don't have too many enum objects or all enum variants are similar sizes. That is mostly the case.

However, under the scheme sketched above, we might expect both those constraints to be false. If we think about the DOM for instance, each object could be wildly different in size and we would have lots of objects, so there would be a lot of wasted memory.

One solution might be to represent enum values as a tag and a pointer to a bunch of fields, rather than as a tag and a bunch of fields. That would work, although you would have to do be careful to copy the pointed-to fields and not just the pointer when passing by value. However, it adds the cost of a dereference to every access and it is often convenient to really have a value. Rust is a systems language in the C-spirit and so there should not be this kind of magic.

In the context of virtual structs, we have previously discussed treating struct objects as dynamically sized types. We could then allocate precisely sized instances and refer to them via pointer which gives subtyping. However, such values must have restricted uses and so this would be incompatible with the current use of enum values. A solution suggested by bill-myers on GitHub is to use some keyword or attribute (`unsized`, say) which indicates that instances should be dynamically sized and mostly passed by reference. Omitting the keyword would give the current semantics and it is up to the programmer to ensure there is not too much memory wasted by padding.

Wednesday, March 12, 2014

Directory tiles, incentivisation, and indirection

Note: I have absolutely nothing to do with directory tiles, but they have been on my mind (and of plenty of other peoples), so here are my thoughts. My opinions are not those of my employer, but I am very grateful to be able to voice these opinions.

I agree with the need to diversify our revenue streams over the long run, and I think I agree with the sentiment that we should not 'leave money on the table'. However, some of the discussion around directory tiles makes me uneasy. (I think the worst thing about the whole thing has been how badly the idea was communicated, but that has been pretty well established, so I won't beat that horse any further. Personally, I don't think any revenue from the idea would be significant because, aiui, only new users will see the ads and then at most nine times, which if I were a company, I wouldn't pay for. But I don't know anything about such things and there are plenty of people at Mozilla who DO know about this stuff, so I'll believe it could make money somehow).

My position only makes sense if you assume that advertising is bad per se. I do. I know many people do not, especially in the technology sector. But I think advertising is a bane of modern civilisation - not just on the internet, but in magazines, on billboards, on public transport, in sport; it is poison. It is particularly insidious in that we don't realise we are seeing and being influenced by advertising, in part by design and in part because of its ubiquity. In the rare times I have spent time away from it (the subway in Prague a decade ago, hiking in the mountains) returning to a world full of advertising feels as jarring and unpleasant as it ought to. I would love to live in a world where we paid for websites and software. If I could pay 1/10th of a cent for every page I viewed and if Mozilla could be funded by administering that and taking 1/10th of a percent of it, then I would be very happy indeed. Unfortunately, people on the whole seem to prefer free, and so it is a dream.

The argument I dislike is 'we already send search traffic to Google in exchange for cold, hard cash, and Google in turn makes money by showing these people ads; therefore, showing our users ads directly in the browser is no different or no worse or something'. As a software engineer, I know that indirection is very, very important. It is totally incorrect to treat a value the same way as a pointer to a value, and I believe the analogy holds with monetisation too. It comes down to incentives - money is a very powerful incentive (not the only one, and I trust Mozilla more than pretty much any other organisation to balance other incentives, but it is still an incentive - we each want to keep getting paid to keep doing these awesome jobs we have).

So, with the current system, in order to maximise search traffic and thus income, we might tweak our design to make the search box more prominent or otherwise encourage more users to search via the search box (I believe that this is not quite the case because we do not get paid _per search_, i.e., there is some slack in the system, and I don't think we have ever done something like this or intend to). That is not so bad, I would not feel bad about encouraging people to search the internet - it is kind of an essential task. Now Google (or whoever we might sell search traffic to in the future) is incentivised to show their users more ads, but there is no incentive for us to modify the browser to show the user more ads.

With directory tiles (or any system where we are directly showing ads to users) the above does not hold. The monetary incentive is to show users more ads and more often. And that (in my opinion) makes for a worse user experience.

In summary, when the incentive is indirect, optimising for it does not negatively affect our users. When the incentive is direct, optimising for it does negatively impact our users. And that makes me very uneasy about going down that road.

However, I fear we might have to. We need money to fund our mission for a better web and the current situation may not last forever (maybe it will, and we can all live happily ever after, but I fear everything changes). Fortunately, I trust Mozilla better than anyone to ignore the incentive described above, I just wish we didn't have to ignore it.