Sunday, June 24, 2012

Some mask layers optimisations

I've been working on a couple of small optimisations for mask layers, required for Boot to Gecko, but which will help on all platforms.

Drawing the mask layer (bug 757346)

The common case for mask layers is a single rounded rect clip. We would like this to be fast (obviously). The general case is some number of rounded rect clips, in this case the mask is the intersection of these rounded rects. Before mask layers, clipping was done by just drawing each rounded rect, then using that as a clip. Doing this multiple times gives the intersection we want. So, when we did the mask layers work, we kept this method of drawing the mask, we used the rounded rects as clips, and painted alpha = 1 into the whole mask image, giving the desired mask.

Unfortunately, clipping to an arbitrary path like this is often slow (apparently). So, for the common case of one rounded rect, we just draw the rounded rect onto the mask surface (kind of obvious, really). For the general case, we use the first n-1 rounded rects as clips and we draw the last one, again giving the desired effect, and ever so slightly faster, but the real optimisation is for the common case.

Sharing mask images (bug 757347, work in progress)

If we have multiple, similar masks on one page, then currently they get a mask image each. This is not efficient in terms of memory or speed (because the mask has to be created multiple times, which is slow). So we would like multiple mask layers to share a single mask image. This turns out to be more complicated than it sounds.

The basic idea is that we have a global hashtable, and every image we create for a mask is put into it, using the rounded rects used to create it as a key. We keep a count of how many mask layers are using the image, and when that falls to zero, we delete the image. When we need to create a new mask layer, we first check if we can reuse an image from the cache.

This sounds fairly simple, but we need to totally re-organise the way mask layers are built to do this. Fortunately, the changes are limited to the creation of masks and their transforms, the backends did not need to change. Rather than store the part of a mask which overlaps with the masked layer's visible region, we store the whole mask and nothing but the mask. This means that a layer with the same mask in different places can use one image, or different layers can share a single mask if it is at different places within each layer.

The new way to create a mask image is to calculate a bounding rect for the clipping rounded rects, this is in the masked layer's coordinate space. The mask image will be the same size as the bounding rect (modulo a maximum texture size), so the transform for the mask layer simply translates the mask to the right place (formally, it is the transform which takes us from mask space to the masked layer's space, and might involve a scale). For any other layer which reuses the mask, we just need a different transform.

To manage the cache of mask images we create a new class: MaskLayerImageCache, this holds the hashtable which stores the images, and has methods for using the cache. We also need classes for an entry in the hashtable (MaskLayerImageEntry), a key for looking up images (MaskLayerImageKey), and another representation of the rounded rects used to create the mask (PixelRoundedRect). An entry object keeps a reference to the image container (we actually cache the container, not the image itself), and to its key. When we get a hit in the hashtable, we return both the key and the image container, the entry is private to MaskLayerImageCache. The created mask layer (which is an image layer) keeps a reference to the image container. We keep a reference to the key in the mask layer's user data.

(Now it gets interesting), image containers are reference counted and we don't want the container to die when it is still referenced by a mask layer, but nor do we want to keep the image around for ever. So the entry has an owning ref to the image container which keeps it alive (as do the mask layers, but they don't actually need to), we then maintain our own count of how many mask layers reference the image in that image's key, which is primarily maintained by the mask layer's user data (which uses an nsRefPtr). We must separate the key from the entry because the hashtable can move around the entries in memory, and the mask layer (well, its user data) needs a permanent link to something to keep a reference count.

Once the count of mask layers in the key gets to zero we can remove the linked image from the cache, destroying the last reference and causing it to be released. To do this we sweep the cache once the layer tree has been constructed checking for zero reference counts. We do this after building but before the tree is destroyed, so that an image can be unreferenced between layer tree builds and not be destroyed, this means, we can sometimes save building a new mask image if it was used in the previous cycle.

This is all very nice (no, really, it saves us quite a lot of work in some cases), but it is not perfect. The main problem is with OMTC, where the image might be copied across to the parent process multiple times (once for each reference). This is a weakness of our multi-thread system, and needs a separate fix; it ought to be done soon-ish (for other reasons too).

The last problem with this work is that mask images can be shared between backends, and in our earlier work the different backends required different image formats (A8 vs ARGB32). This has been surprisingly difficult to fix, and I'm on about my third attempt, as always, Android is being particularly problematic. Hopefully, I have it sorted now, fingers crossed...

Tuesday, June 12, 2012

Mask Layers on the Direct3D backends

In the next few posts I'll describe how mask layers are used in the various graphic backends. The Direct3D 9 and 10 backends are pretty similar, so I'll describe them together. The Direct3D 10 backend is the cleanest, so that's where I will start.

A mask layer is loaded from the surface created in SetupMaskLayer into a DX10 texture in LayerD3D10::LoadMaskTexture. This should be fast because we use a DX10 texture as the backend to that surface. In fact, LoadMaskTexture does more than that, it also sets everything up for the shader to use the mask and returns a flag indicating the kind of shader to use (with a mask or without).

Each DX10 layer class has a GetAsTexture method which returns a texture snapshot of that layer. Currently, that is only implemented for image layers, and it returns the backing texture. This method is called by LoadMaskTexture to get the texture for the mask layer. GetAsTexture returns a shader resource view (SRV) which wraps the texture memory, LoadMaskTexture can simply set this SRV as a resource for the effect (a DX10 abstraction for a group of shaders and state), called tMask (t for texture).

GetAsTexture also returns the size of the texture. LoadMaskTexture uses this and the mask layer's effective transform to calculate the texture's bounding rectangle in the coordinate space which will be used for rendering, this is passed to the effect as vMaskQuad (v for vector, it is just a vector of length four, containing the position and dimensions of the bounding quad).

When a layer is rendered (RenderLayer), a system of shader flags is used (including the flag returned from LoadMaskTexture) to work out which shader to use. LoadMaskTexture us called as part of this calculation, so when we have the shader flags, ready the mask and its quad will be passed to the GPU is they are needed. SelectShader is used to pick the actual shaders (a technique, technically, which is a group of passes, which are combinations of vertex and fragment shaders), and the other arguments are passed to the effect. The technique is called and the layer is rendered.

There is one kind of vertex shader and a bunch of different kinds of fragment shaders (for different colour schemes, etc.). When we added mask layers, we introduced vertex shaders for layers with masks and for layers with masks and a 3D transform. We doubled the number of fragment shaders, adding a version of each for layers with masks (plus a couple of extra for 3D). For masks, the vertex shader calculates the location on the mask at which to sample (in the texture's coordinate space), and the fragment shader pretty much just does a multiply with the alpha value at that sample from the mask and the result of rendering the rest of the layer at that point.

The vertex shader operates on each corner of the rendered quad (which is a rectangular window onto a layer), it figures out the location of the corner it is working in the current coordinate space (which is that of the containing layer) and uses that to calculate a position in the coordinate space of the texture being rendered (an image of the layer). For mask layers, we do likewise to calculate a position on the mask's texture, the graphics hardware interpolates these positions to find texture coords for each pixel.

For layers with a 3D transform (with perspective) this interpolation could lead to distortion because the graphics hardware does perspective correct interpolation, but our mask layer was rendered to the texture, after being transformed, so interpolation should be linear. We therefore have to do a little dance to 'undo' the perspective correct interpolation in any vertex and fragment shaders which could be used on layers with 3D transforms.

DirectX 9 works pretty similarly to the DirectX 10 backend, the shaders and textures work differently, but the concepts are the same. The management of the mask layers changes to reflect the way the LayerManager is written too, but again the changes are minor. On the DirectX 9 backend, we support shadow layers for OMTC, but I'll discuss this stuff when I talk about the OpenGL backend, because that is where more of the effort for OMTC went.

Thursday, June 07, 2012

More on my new phone

So I am still pretty much in geek-love with my phone (HTC One X). But there has been something about it that has been annoying me and I have only just worked out what it is. I'm not sure quite how to phrase this, but using the phone for passive computing tasks is amazing, better (mostly) than using a proper computer. Reading stuff on the internet, listening to music, watching videos, playing games, and so forth, all an awesome experience. But, anything that involves active interaction is frustrating to the point of impossible - writing an email, doing some coding, in fact, getting any work done at all. This is kind of obvious given the input methods.

To put it another way, the device is perfect for consumption, but terrible for creation. This bother me because, for me, using a computer has always been a creative experience; the whole point is to make things. For me, the phone is much more like a TV then a computer in terms of interaction, and that makes me sad.

It is an amazingly powerful device, and seems to me that I ought to be productive with it, but the only useful thing I've done with it is testing Firefox. I suppose in a way, it's good, it means I don't need to get too attached to another expensive gadget.

Tuesday, June 05, 2012

An ill-thought out theory about what makes programming interesting

This one is going to be a bit random. Bear with me. I've been thinking a bit about what makes some programming tasks interesting and some not so interesting (to me). As a result of this thinking, I present my theory of different kinds of programming (TM). I'm sure someone has come up with this before, but I've not come across it, I think.

So, let us define normal programs as 1st order programming, they operate on data and produce an output. Most programming tasks are of this nature. Let us call data itself 0th order programming, that is, there is no real programming. 2nd order programs are those that operate on 1st order programs (for e.g., compilers, modern web browsers); 3rd order programs operate on 2nd order ones (e.g., compiler-compilers, parser generators, tools for the analysis of compilers or compiler theories, such as Coq); and so forth, although I can't think of any 4th order programs, and as with most nth-order type theories, the higher orders blur together and probably stop being interesting.

Furthermore, we can have half orders, where the programming task is meant to be extensively reused. So standard libraries and frameworks are 1.5th order and compiler libraries and so forth (e.g., llvm) are 2.5th order. Possibly there are 0.5th order programs, but I don't think of libraries here so much as hybrid data/programs such as a TeX document.

I think that from a programming perspective, the problems get more interesting as we go up the scale (for some definition of interesting, which might just be complicated). So, for a programmer, making a web page (0th order) is boring as hell, making web-based software is much more interesting, and making a compiler for that software is more interesting still. In fact, I don't think it is just about complicated, I think that as we go up the scale the problems are qualitatively different (and, to me at least, more interesting).

BUT, there are advantages to lower-order programming, and disadvantages to higher-order programming: working with data, you can be creative and solve hard problems about all kinds of non-progamming things. Likewise, 1st order programs tend to allow you to work closer to the 'people' problems, which, apparently, are the really interesting ones. The lower order a program is (in general), the easy it is to test and to specify the problem (the lack of a clear specification being one of the most annoying things, for me, about programming). As we go up the orders, the tasks can actually get repetitive, and certainly the risk of failure is higher, the problem more abstract and has a less direct impact on the real world. And at some point the complexity stops being fun - I have dabbled in a few 3rd order tasks, and progress was very slow, and the output very small, and a lot of the problems encountered were tedious edge cases.

So thinking about myself, 2nd order programming seems to be my Goldilocks kind of programming, not too abstract and complex, but operating on code, not data. All the projects I have really enjoyed fall into this category - compilers, interpreters, Firefox.

I wonder where programming language theory falls into my scheme. I think of it as just programming really, but with maths, rather than a compiler. It doesn't feel like any of my categories though (but has elements of all of them), maybe I need a different scheme.