Featherweight Musings: Some mask layers optimisations

I've been working on a couple of small optimisations for mask layers, required for Boot to Gecko, but which will help on all platforms.

Drawing the mask layer (bug 757346)

The common case for mask layers is a single rounded rect clip. We would like this to be fast (obviously). The general case is some number of rounded rect clips, in this case the mask is the intersection of these rounded rects. Before mask layers, clipping was done by just drawing each rounded rect, then using that as a clip. Doing this multiple times gives the intersection we want. So, when we did the mask layers work, we kept this method of drawing the mask, we used the rounded rects as clips, and painted alpha = 1 into the whole mask image, giving the desired mask.

Unfortunately, clipping to an arbitrary path like this is often slow (apparently). So, for the common case of one rounded rect, we just draw the rounded rect onto the mask surface (kind of obvious, really). For the general case, we use the first n-1 rounded rects as clips and we draw the last one, again giving the desired effect, and ever so slightly faster, but the real optimisation is for the common case.

Sharing mask images (bug 757347, work in progress)

If we have multiple, similar masks on one page, then currently they get a mask image each. This is not efficient in terms of memory or speed (because the mask has to be created multiple times, which is slow). So we would like multiple mask layers to share a single mask image. This turns out to be more complicated than it sounds.

The basic idea is that we have a global hashtable, and every image we create for a mask is put into it, using the rounded rects used to create it as a key. We keep a count of how many mask layers are using the image, and when that falls to zero, we delete the image. When we need to create a new mask layer, we first check if we can reuse an image from the cache.

This sounds fairly simple, but we need to totally re-organise the way mask layers are built to do this. Fortunately, the changes are limited to the creation of masks and their transforms, the backends did not need to change. Rather than store the part of a mask which overlaps with the masked layer's visible region, we store the whole mask and nothing but the mask. This means that a layer with the same mask in different places can use one image, or different layers can share a single mask if it is at different places within each layer.

The new way to create a mask image is to calculate a bounding rect for the clipping rounded rects, this is in the masked layer's coordinate space. The mask image will be the same size as the bounding rect (modulo a maximum texture size), so the transform for the mask layer simply translates the mask to the right place (formally, it is the transform which takes us from mask space to the masked layer's space, and might involve a scale). For any other layer which reuses the mask, we just need a different transform.

To manage the cache of mask images we create a new class: MaskLayerImageCache, this holds the hashtable which stores the images, and has methods for using the cache. We also need classes for an entry in the hashtable (MaskLayerImageEntry), a key for looking up images (MaskLayerImageKey), and another representation of the rounded rects used to create the mask (PixelRoundedRect). An entry object keeps a reference to the image container (we actually cache the container, not the image itself), and to its key. When we get a hit in the hashtable, we return both the key and the image container, the entry is private to MaskLayerImageCache. The created mask layer (which is an image layer) keeps a reference to the image container. We keep a reference to the key in the mask layer's user data.

(Now it gets interesting), image containers are reference counted and we don't want the container to die when it is still referenced by a mask layer, but nor do we want to keep the image around for ever. So the entry has an owning ref to the image container which keeps it alive (as do the mask layers, but they don't actually need to), we then maintain our own count of how many mask layers reference the image in that image's key, which is primarily maintained by the mask layer's user data (which uses an nsRefPtr). We must separate the key from the entry because the hashtable can move around the entries in memory, and the mask layer (well, its user data) needs a permanent link to something to keep a reference count.

Once the count of mask layers in the key gets to zero we can remove the linked image from the cache, destroying the last reference and causing it to be released. To do this we sweep the cache once the layer tree has been constructed checking for zero reference counts. We do this after building but before the tree is destroyed, so that an image can be unreferenced between layer tree builds and not be destroyed, this means, we can sometimes save building a new mask image if it was used in the previous cycle.

This is all very nice (no, really, it saves us quite a lot of work in some cases), but it is not perfect. The main problem is with OMTC, where the image might be copied across to the parent process multiple times (once for each reference). This is a weakness of our multi-thread system, and needs a separate fix; it ought to be done soon-ish (for other reasons too).

The last problem with this work is that mask images can be shared between backends, and in our earlier work the different backends required different image formats (A8 vs ARGB32). This has been surprisingly difficult to fix, and I'm on about my third attempt, as always, Android is being particularly problematic. Hopefully, I have it sorted now, fingers crossed...

Featherweight Musings

Sunday, June 24, 2012

Some mask layers optimisations

No comments: