Sunday, April 29, 2012

Android phone

I have recently acquired an Android smartphone - the HTC One X, to be precise. It is a very nice piece of kit indeed. It is my first smartphone (I know, and I call myself a geek). My wife has an iPhone, and I've played with various Android devices over the years, but it seems to me that the Android platform is finally coming into its own, at least, for those who want a more 'interesting' experience than the iPhone.

Obviously, the hardware is very nice - the screen is beautiful (super-high resolution and very nice colours), and everything is very snappy, quick, and very, very pretty (which means the graphics bits must be doing some hard work). The size (4.7 inches) is ideal, it is not so big as to cause thumb strain, but the screen is very 'visible', it feels a lot more usable for a lot more things than the iPhone or similar-size devices. The main camera is really good, but the flash is a bit green for close ups, the forward facing camera (for Skype etc.) has poorer colour balance and is a little fish eyed, but still better than a standard webcam.

The Android and toned-down-Sense UI work very well, so much less clunky than older versions of Android. Most of the UI is very easy to use and intuitive. I would go so far as to say it is a better UI than iOS, but maybe that is because I have a bit of a geeky mindset. All the basic utilities and OS things (app selection, home screen layout, clock, calender, and so forth) work very well, the balance of summary vs detail is nearly always just right. There is great consistency, the same gestures tend to do the same things, wherever you are, and they seem more easily discoverable than on the iPhone. Having three perma-buttons, rather than one is a win, I think, although having a menu button (like the Samsung phones) would probably be good, since all the apps have one, but often in different places. Actually the recent apps (kind of task switching) button could probably have been got rid of, I only use it rarely. The mail application could do with some work though: I want to delete an email with one click and without reading the message, and I want to see all my new mail (from multiple accounts) in one place, and I want more than one message on my lock screen.

The thing I really like (and which really differentiates it from the iPhone) is how everything is so customisable. Everything I have wanted to change so far, I have been able to (actually, except the home screen clock, if you want to keep the weather forecast with it - you can make it smaller, but are stuck with the calender-style digits, and the aforesaid mail-on-lock-screen thing). And, you can customise without ruining the look and feel, a very impressive balance has been struck!

On the downside, I guess the UI guidelines are not as strictly enforced across apps as on the iPhone, some apps should be a bit more standard.

On to the fun stuff; it was a bit of effort to get a debug link to the phone. Setting the phone up was easy, but getting it to talk to the laptop, not so much. There are a lot of options for connecting, and it is unclear which to use for development work. HTC don't make available the required drivers either, so you have to install their sync software, which includes the drivers, then sync (which tries to auto-run) conflicts with ADB, so getting ADB to work reliably was tricky, eventually I got it working, and managed to install a nightly version of Fennec (mobile Firefox), so the path now is clear (hopefully). I'm looking forward to getting a debug version of Fennec on the phone and working out how to debug Android FF, but that first requires building Android FF locally, which is no picnic (top tip, you can download the Android apk from the Try server instead). I also installed the NDK plugin for Eclipse, so all I need now is a little bit of time and I can hack up some Android apps - fun times!

So, they say smart phones live or die by their apps, and so far my experience of Android apps has been good. All the ones I've downloaded have been very polished, usable, and powerful. Just as much as the equivalent iPhone apps, if not more. Plus they all tend to be fairly customisable, just like the UI. In fact using the app on my phone is often a more attractive way of accessing a service than using the website on my laptop. If anyone has recommendations for apps I should be using, please leave a comment!

Of course the real selling point of Android over iPhone is the openness of the system vs. living in a walled garden. I can't imagine buying a phone I can't put my own programs on, it just seems ridiculous. But, so far, other than the nightly version of Firefox, all the apps I've installed have been through the Google store, so I guess the practical affects are minor.

And last, but not least, the web. Obviously the mobile web is a big thing right now, and especially for Mozilla. I found mobile web disappointing. I tried a whole bunch of browsers, and none of them were as well-adjusted for mobile as I would like, they all felt like they needed some tweaking. All had bugs too, unlike desktop where you rarely notice bugs in any modern browser. More importantly, websites need to have mobile versions. Desktop versions of websites just don't work very well on mobile, mostly the problem is screen size, but there are other things, slow internet connection and slow processor cause problems too. When websites do make mobile versions they, sometimes use redirecting, which is annoying, having to wait for another page load (even worse when they ask you first). Or the mobile site is designed for iPhones or devices with similar sized screens, so is actually too big on the One X. I'm sure it is possible to do this right, but very few websites manage it, and using apps is nearly always a better experience. I hope that this changes, but as it requires some intelligence and effort, I'm not hopeful.

Back to Christchurch

This weekend we're in Christchurch for a friend's wedding (which was good, and the first wedding I've been to where the priest started with an earthquake warning). This morning I went into town to check out the restart mall (which I never got round to seeing whilst I lived here). It's actually very cool, nicely designed and with some interesting shops and cafes. I actually think it is an improvement on the centre, pre-quake. The city centre does still feels fairly post-apocalyptic, and it looks like it will for quite a while, a shame really.

It's actually very nice to be back down south. The air is cooler and so much less sticky, the sky seems bigger, and I feel more relaxed here than in Auckland, maybe familiarity, or maybe something about all the European trees (plus it is autumn, my favourite time of year, Hagley park was very nice). On Monday I am going out to the mountains to climb at Castle Hill, I am very excited.

Even with 90% of the city rubble, I still prefer Christchurch to Auckland, and not just because of all the nice things around Christchurch. I think I dislike big cities. I definitely dislike the climate in Auckland, and it lacks the quirkiness of things that NZ does so well. They say Auckland is a Marmite city, either you love it or hate it, and I think I am starting to not love it. The only thing I disagree with the stereotype on, is that I've actually found people in Auckland to be friendlier than elsewhere, perhaps not as polite, but it definitely feels easier to make friends. Perhaps because people have not lived there their entire life, so have more open social circles. Perhaps just because the Project wall is a good place to meet people. The only trouble is, because it is a big city, everyone lives so far away, so dropping in for a cup of tea is a bit tricky.

Thursday, April 26, 2012

Firefox presentations

My manager, Robert O'Callahan, just got back from a visit to Samsung in Korea, there he gave a bunch of talks on Firefox and browsers in general. The presentations make a good overview of what is interesting right now in the world of Mozilla, and includes overviews of the Layers system and concurrency in Firefox, which I talked about in my last post. My PLT friends might be most interested in the talk on Rust.

The presentations are linked from Robert's blog post: http://robert.ocallahan.org/2012/04/korea.html

Sunday, April 22, 2012

Layers, shadow layers, and multi-process Firefox

The layers system is an optimisation layer between the layout (how the DOM is translated into graphical objects) and graphics (how the graphical objects are actually rendered) modules. It is within the layers system that I have been doing the bulk of my work in the past few months (adding mask layers, I'll save that for another post). Shadow layers extend the layers system to multiple processes or threads. Shadow layers are the primary way for the layout/graphics parts of Firefox to utilise concurrency. It is shadow layers thatthis post is meant to be about.

Layers

Once a web page is laid out, it could just be rendered in one big bang, but then when some part of the page changes (which happens a lot), the whole page has to be re-rendered, and that is very bad. Layers organise a page into a coarse tree (yes, there are A LOT of different trees in a web browser). Container layers are the internal nodes in this tree, and the other kinds of layer objects are the leaves. Image layers are used for videos and some images, canvas layers for HTML canvases, colour layers for areas of plain colour, and Thebes layers for pretty much everything else (Thebes is a graphics abstraction layer in Firefox). Note that Thebes layers can contain quite a lot of content, not necessarily a single element, in fact, they can contain anything from a single element to the entire page. The layer tree is pretty much a scene graph (at the moment it is a tree, but it could become a DAG in the future), each layer can have a transform and opacity and so forth.

Layers are either active or inactive. Inactive basically means nothing interesting is happening and so when we repaint, we can just blit a cached copy of the content. Active layers are used for any moving content and also in some situations with transparency (videos are a complicated special case).

There are four layers backends: DirectX 10 (used on Windows 7 and Vista), DirectX 9 (Windows XP), OpenGL (Linux (sometimes), Mac, and mobile), and software (anywhere else, or if hardware acceleration is not working). So, for each kind of layer described above there are four classes, one for each backend (actually more, but wait for it...). Note, this is a perfect application for virtual classes, my favourite future language feature. Hardware acceleration is only used for active layers, inactive layers are rendered once using basic layers (the software backend), and then reused. Each backend also has a layer manager which organises things.

Rendering is fairly simple, the layer manager calls Render() on the root node of the layer tree, and rendering progresses down the tree (a depth first, post-order traversal). As the traversal unwinds, each container layer composites the results of rendering its children, until the root has the whole page rendered.

Note, basic layers might still get to use hardware acceleration at a lower level, if there is a hardware backend implemented lower down in the stack, which is often the case, e.g., the Direct2D backend for Cairo.

Concurrency in Firefox

There was an attempt to implement one process per tab in Firefox (Chrome does this and it is neat, because when you kill a tab, you kill its process and any memory leaks that may have cropped up) called Electrolysis, but it didn't work out so well. I don't know the ins and outs, apparently the story is "long and sad", I believe it is essentially due to the way extensions work. Anyway, this has (for now) been abandoned on desktop, but is implemented on mobile. Multiple processes are also used for things like plugins, and multiple threads for Windows stuff and a whole bunch of stuff I know nothing about.

As far as graphics/layout is concerned the big area of concurrency is off-main thread compositing (OMTC). Here we use one thread for rendering individual layers, and a separate thread for compositing the layers together. We can also use separate processes, rather than threads (which happens with process-per-tab), the mechanisms are the same. OMTC is pretty much done on mobile (I think) and will be coming to desktop soon. You can test multiple-process Firefox in various arcane ways on Linux, but it is very magical, I don't understand exactly what is going on, and results vary. From now on, I will talk about threads, but pretty much everything can be applied to processes too, in fact the shadow layers system was designed for multiple-processes first.

The benefits of OMTC? Better responsiveness of the UI, mostly, and better use of hardware accelerated graphics. Also, security, because (if we are using process), the child processes don't need privileged access to the OS. Plus some other stuff that I forget, sorry.

Shadow layers

The compositing thread is the parent thread (it also handles all the browser, as opposed to web page, stuff) and each tab can have a rendering thread, a child thread. The child thread has a layer tree (after all, this thread did all the layout work beforehand). There is a shadow of the layer tree on the parent thread. Rendering of the various leaf layers is done on the child thread, and composition of the layers is done on the parent thread.

So far, only the software backend can be used to render the leaf layers, each backend can be used for compositing (except DX10, which is different, due to interactions with Direct2D). Since this is not really wokring on Windows, in practice we only use the OpenGL backend for compositing.

There are a whole bunch of classes used for shadow layers: there are shadowable versions of each class that can be present on the child thread (each of the Basic layers classes) and shadow versions of each class which can be used for compositing (Basic, OpenGL, and DX9 layers).

IPC is handled using IPDL, a custom Mozilla language for defining the interactions between processes (or threads). Communication is transaction based, during a transaction the layer tree is built (or, if we are lucky, reused, but that deserves a post of its own) and at the end of the transaction the layer tree is copied to the shadow layer tree. For leaf layers, only the rendered buffer needs to be copied, and for the container layers, only the information needed to composite its children is copied (roughly speaking, in both cases). Then everything can be composited together and pasted to the screen.

Sunday, April 15, 2012

New programming languages and barriers to entry

So, the last blog post got me thinking (musing, if you will) on mission creep for languages. If we look at the modern history of programming languages - that is from Java onward - we find that most popular languages have started off in a niche and crept into the mainstream. E.g., Java from applets, C# from GUI Windows programs, Javascript from client-side scripting, Perl from server-side scripting, Python and Ruby from small scripts, and so forth.

This is a source of many bad things! Languages are often designed well for their niche, but end up causing problems for other tasks. A prime example is dynamic typing, which is great for short scripts but makes comprehending large systems extremely difficult to comprehend.

This problem is made worse because there has been no successful language designed for mainstream programming since C++. Unfortunately this is a political and commercial problem, rather than a technical one. It is hard to promote a language to wide-spread use, and it is expensive to develop one, and no-one has really found a way to make money out of an actual language.

Another cause of this problem is the high barrier to entry for creating a new language. It is much easier for even large and rich organisations to use an imperfect off-the-shelf language than to make and use their own. This is partly a business problem also, but it might have a (mostly) technical solution too.

Why is it hard to make a new language? One reason is that, because a new language is a lot of work, it has to be really good; if languages were more disposable, then we would have to put less effort into making them perfect, i.e., dealing with edge cases. The next big reason is that a language is only as good as its tools and libraries, and non-library code that it ties into. This is kind of a network effect. It actually takes a smart person only a few months to write a commercial strength compiler for a small language. For a bigger language and a good, optimising compiler, you are still only looking at a small team for less than a year. But a language needs more than a compiler, it needs libraries for doing everything from list and string manipulation to network access to running a GUI to IO and on and on. It needs an IDE, it needs a debugger, a profiler. It must inter-operate with local and remote programs in a variety of languages. And all of that takes time and testing.

Some moves have been made on the tools front: we can leverage the likes of GCC and LLVM to make writing the compiler easier, numerous parser generators to make the front end easier, and extensible IDEs like Eclipse. But, we could do much better, it would be nice to be able to quickly and easily generate an IDE without having to develop a new skill set (see Eclipse), even if the result was not quite as fully featured. It would be nice to get a debugger up and running without too much fuss, we need tools for this kind of thing! We really need some kind of language-neutral libraries which can be used quickly and easily by a new programming language (and without having to run on the JVM). And we need a standard, platform-independent way to communicate between languages, and which is efficient, not just inter-process message passing.

I think there are huge technical and non-technical problems to doing this, but how cool would it be if we could quickly and easily use languages without having to worry about mission creep or dedicating a huge team to it and then getting slammed on Slashdot because it doesn't do what everyone wants.

Tuesday, April 10, 2012

A musing on language design - transparency v. readability

People who spend a lot of time thinking about programming languages (and, I think, even just programming) inevitably end up thinking about language design. There is some truism/quote floating around (which I couldn't be bothered to find) about every language theorist having a pet language design secretly hidden in their deepest thoughts. Sometimes that pet language emerges, for better or worse. I admit to having such a pet, but also realise my imagination of a language is nowhere near a real language and the amount of work required to make it real is immense. Also I think real language design is difficult and not very exciting -- too many 'little' decisions about whether an int should convert into a float, or how to handle data transfer between big-endian and little-endian systems, which make me want to bleed from my eyes.

ANYWAY, my musing was on transparency v. readability, which is a much more interesting design decision for a language, at the philosophical/fundamental level. At some point a language designer must aim for readability or 'write-ability'. Generally speaking, 'serious' programming languages (e.g., Java, C#) go for the former, scripting languages go for the latter. Of course everyone wants both, but there are some fundamental trade-offs and you have to privilege one or the other.

One place that this decision manifests is in the 'transparency' (I'm sure there's better word for this) of entities in your language. At one end of the spectrum is Java, where everything must be spelt out, arrays are very different from array-like objects (e.g., ArrayLists), primitives (e.g., int) are very different from primitive-like objects (e.g., BigDecimal), there is no operator overloading, but there are no surprises. C# makes things a little easier with delegates and getters and setters, but basically subscribes to the Java philosophy.

C++ is an interesting case, transparency is much more important here, operator overloading is very flexible and allows user-defined objects to be used (almost) exactly like built-in types, such as ints, arrays, and pointers. Unfortunately, there are a lot of rough edges and it is rare that the programmer can actually use objects like built-in types without having to think about it. Also, being allowed to override assignment etc. can lead to some really evil bugs.

There is a tradition, starting with Smalltalk, and present in most OO-languages to some extent, of "everything is an object". In practice this means there are no built-in types and instead one has object versions of things like arrays, coupled with some form of trickiness to make them easier to use, ideally as easy to use as built-in types should be. Such trickiness is often used to replace things like loops and other control structures with objects too. In theory it means the language can be transparently extended with new control structures. In practice, the trickiness is often difficult to understand, means there are multiple ways to get the same result, and makes the language more complex. Java uses a similar idea to support for each loops.

Scripting languages often take a similar approach, see Python and Javascript. Here many things (e.g., arrays, dictionaries) are ordinary objects, but not quite - the trickiness rears its head so that arrays can be used like a programmer might expect, and you end up with layers of hidden methods and meta-methods, and so forth, which make the language easy to use for casual programmers and allow lots of tricksy programming for those willing to get their hands dirty. This might be a good approach, but this kind of transparency is hardly elegant (well, I guess it kind of is, or neat at least), and the languages are far from simple, or easy to understand in an holistic way.

These scripting languages and those in the Smalltalk tradition (Self, Grace) kind of fill the other end of the spectrum. However, as you can see, it is not really a linear scale from readability to transparency, more different approaches to the same problem, on some multi-dimensional axes.

Is there an optimal solution? Probably not, after all, different languages are used for different purposes. As scripting languages are more and more used for large programs and not just small scripts, I think that a focus on transparency might turn out to have been a poor decision. But there are many other decisions in scripting language design which will turn out to be poor in this situation too.

Perhaps a good question to ask is, is there a problem? Do we really need elegant languages? Could we have a language which was not elegant, i.e., many special cases, lots of different classes of entities, little unification, perhaps that lacked certain kins of transparency, but that was pleasant to write programs in and to read? (Is this in fact, Java or C++) Possibly not, having universal concepts make a language fundamentally easier to understand, but it is also possible that we have gone too far (I'm not definitely saying we have). Are there corpus studies that have investigated how often programmers actually need to mimic built-in types? Is there some compromise level of operator overloading that makes things easier than any of the current positions?

Sunday, April 01, 2012

Memory management, reference counting, and ownership in the Mozilla codebase

Memory management is always an issue in large C++ programs. In small programs with a single programmer, manual memory management is fine (and fast). But in large programs, it is doomed to failure. Garbage collection is the usual solution, and, usually, a good one. But it has disadvantages, namely pauses and slow down, which is not acceptable in system software, such as a web browser (frankly, I would have thought you could do this cleverly and sneak garbage collection in when the browser is quiescent, which happens quite often. But, as browsers are used more and more as software platforms, this becomes less and less practical).

In Mozilla, we use reference counting for memory management. This is a compromise between safety, complexity, and speed. It is fast and there are no pauses, not as fast as manual memory management, but close enough. It is safer than manual memory management, but leaks can still be a problem. And it is much less complex than manual memory management, but an order of magnitude more complex than garbage collected 'fire and forget'.

C++ operator overloading allows for semi-transparent use of reference counting, but logically, you have to keep in mind that objects are reference counted. Transparency doesn't really work. Key to the way reference counting in Firefox works is the idea of object ownership. Object ownership is a subject close to my heart because lots of my research has been around ownership types, which statically enforce the ownership hierarchy which is implicit in the Mozilla system.

OK, now for some details, we'll get back to speculation later...

A reference counted pointer uses the nsRefPtr class, so for a pointer to an object of class C, you would use nsRefPtr. nsRefPtrs are quite clever, they do the bulk of the ref counting goodness, when you copy one the reference count goes up, and when one goes out of scope, the ref count goes down. nsRefPtrs are used to represent owning references, for example, a Layer object owners it's mask layer, so has a field with type nsRefPtr (well, there is no MaskLayer class, but you get the idea). Raw pointers are used for non-owning references, for example, if the mask layer keeps a reference to it's owner, then it would have a field of type Layer*. This is important because it means you can have reference cycles, but indicate which are counted (owning) references and which aren't so the aggregate will still get destroyed when it ought to. We also need ownership transfer, and this is done using already_AddRefed. These are temporary references, where, conceptually, the reference count is not incremented, but the refered to object is kept alive. Or, equivalently, a reference where the reference count is not increased by copying. Note that, when we copy the reference into an nsRefPtr, then the count is increased. So if we have a nsRefPtr field and it is the only reference to an object, then the ref count is one. A geter which returns an already_AddRefed reference to the field does not increase this count, but when we store that reference in another nsRefPtr, the count is bumped to two. However, if we store the reference in a raw pointer, the reference count stays at one. Ownership transfer happens by calling the forget() method on an nsRefPtr, this decrements the ref count and returns an already_AddRefed to the object. Note that if the ref count drops to 0, then the object will not be destroyed until the already_AddRefed also goes out of scope (which is why you should often use nsRefPtrs locally, even if it is just a temporary reference).

There are also getter_AddRefs references for use as 'out' parameters, and dont_AddRef references, which I have no idea about, but I won't go into them here.

So, in summary, we use a fairly sophisticated reference counting scheme which has at its core the concept of object ownership. For this kind of programming, could ownership types help? Well first, you'd actually have to motivate people to get over the extra syntactic overhead of ownership types, and for that you would probably need more benefit than improving reference counting. On the other hand, you only need a pretty lightweight system to help with reference counting. If you could statically enforce an ownership hierarchy, then, together with the system described above, you could be sure to avoid reference cycles and thus many memory leaks. This would be great. I would also hope that you could use the types to simplify the reference counting system, possibly making it more transparent, certainly reducing the number of reference types. But the challenge is that the implicit ownership described above is very unlike most ownership types systems, it is dynamic, supports multiple ownership and lightweight ownership transfer. So this is a non-trivial problem.

So, here is a challenge to language research people - can you come up with an ownership type system that simplifies this style of coding and reduces memory leaks without too much syntactic overhead? This is a great project because there is a huge corpus of real-world code that is constantly expanding and is all ready cross referenced (MXR, DXR) and is all open source, and lots of friendly people have experience applying all kinds of tools to it (unfortunately it will take some Google-fu to track down all the various blog posts about various things, but hey). The Rust language has some concepts for improving pointers, but I hope that we could do better with ownership types. There is real motivation within Mozilla to start using Rust to rework much of the project, so if you are quick, there is opportunity for any research to see real-world use. Anyone keen?

Footnote: if you are interested in finding out more about nsRefPtr look at the nsComPtr docs, they are almost identical, but nsComPtrs are used for XUL objects. The documentation doesn't seem to have been updated to reflect the widespread use of nsRefPtr.