Featherweight Musings

Saturday, April 13, 2013

The layers refactoring has landed!

I'm happy to report that the layers refactoring has landed on Mozilla central and is now in Nightly. We have already fixed a bunch of bugs (WebGL on b2g, plugins on Android, b2g tests, fixed position layers, ...) and are working on more. But, nothing seems insurmountable and it looks like the refactoring will stay landed.

We've tried to document the new system in the classes. The best place to start is gfx/layers/Compositor.h. To give an overview of the changes, there is now only one kind of layer on the compositor thread - composite layers, and one layer manager. These use a compositor interface to actually do the compositing, and there are (or will be) multiple compositor backends (see gfx/layers/opengl/CompositorOGL.h, for example). To implement a new OMTC backend, you should only have to implement a Compositor and one or more TextureHost, possibly also a TextureClient. There are other changes to how basic layers on the content thread interact with layers on the compositor thread. If it is not clear from the docs what is going on, please let me (nrc on irc) or nical know and we'll try to improve the docs!

We would love some help getting this tested! The easiest way is to simply use Firefox nightly for Android. If it crashes we should see it in the crash reports. If you notice anything rendering incorrectly, please either file a bug or let us know via irc (or even leave a comment here). You don't need to set any prefs etc. for Android, you will automatically get the refactored version.

We are most worried about FirefoxOS/b2g because that is where we use the most esoteric code paths and have the least automated testing. If you are working on b2g and are able to help us by running b2g built with m-c rather than one of the b2g branches, that would be great. Again, no prefs necessary, just using m-c is enough.

If you are on Linux and Mac and are feeling brave, then you can help us by running Firefox with OMTC on these platforms. Please bear in mind that this setup is unsupported for now and is known to be buggy and missing some features (plugins, for example). The most useful thing would be to compare Aurora (no refactoring) to Nightly (with refactoring) and let us know if anything has got worse. For both platforms you must set the pref "layers.offmainthreadcomposition.enabled" to true (in about:config). For Linux, you must be able to run normally with OpenGL with no issues, to do this you need to set "layers.acceleration.force-enabled" to true. If you have not tried this before, I would try it before trying OMTC, there is lots of driver sadness around. For Linux you must also set the environment variable "MOZ_USE_OMTC=1". Note that when using OMTC, about:support will report the layers backend for the content thread only, i.e., it will not really be true, it will appear that you do not have HWA, when you do.

Finding bugs for us to fix is much appreciated!

Finally thank you to everyone working on mozilla-central for your patience whilst we fix(ed) bugs, we appreciate it! And thank you to all the graphic team for their help getting this planned, written, finished, reviewed, and landed.

Labels:

Thursday, March 28, 2013

Layers refactoring update

We missed our target landing date of 18th March. But, other than being a few weeks late, things are progressing nicely, in fact we are nearly done. All our tests pass and we are starting to get reviews of the code. Although we still have a couple of bugs to clear up, I think we are in good shape. To land we just need to get all our reviews and address any comments that arise (and fix those bugs, obviously). We hope that won't take too long and we now aim to land on mozilla-central as soon after the next uplift (2nd April) as possible. All going well, that means the layers refactoring will be in Firefox version 23. Once we land on mozilla-central it would be great to have lots of people test this, I'll blog about how to do that once we land.

You can keep an eye on our tests on tbpl and our reviews and responses on bugzilla.

Labels:

Saturday, March 23, 2013

Firefox on Raspberry Pi

Some time ago I acquired a Raspberry Pi. These little computers are awesome, it is amazing that a full blown computer can be had in a tiny little package for so little money. The possibilities for tinkering geeks and for education are endless. Of course, the first thing I wanted to do was to get Firefox running on the thing. I also wanted to be able to build Firefox for it so that I can hack on the graphics support for it. And understand the process so that when an army of volunteers show up wanting to hack on Firefox for Raspberry Pi (which I hope they do), I'll be able to help. (For a variety of pretty sad reasons, we can't support accelerated graphics in a supported configuration on the Raspberry Pi. That would be massive boost to performance. I'm sure there are a lot of other, interesting things we could do too. Most of them are gated on hardware acceleration though.)

Anyway, the Raspberry Pi is way too underpowered to actually compile Firefox on. So that means cross compiling Firefox for the Raspberry Pi (ARM6,Raspbian) on my PC (X64,Ubuntu). And then I spent three months (seriously) in an entire world of pain. I am really not a Linux whiz, and I've never cross-compiled anything before, and I am not that familiar with Firefox's build system, so there was a lot to learn and a lot painful ways to screw up. Not least of which was that I ended up upgrading Ubuntu in the middle of this, and after that I could no longer debootstrap wheezy. So, if you are running Ubuntu 12.04, DO NOT upgrade to 12.10!

Oleg Romashin (romaxa) has excellent instructions here for doing this. A lot of my pain came from not following these to the letter. He also helped me out so many times on this journey that it was embarrassing, so big thanks to romaxa! Any credit for getting this working at all goes to him, I just blundered through it and hope I can help others by sharing my experience.

Anyway, the overall plan is to build a crosstool-ng toolchain, setup a chroot, install Raspbian into the chroot, use Scratchbox2 to manage the whole thing, and finally use Scratchbox2 to build Firefox for an ARM6 target.

Glossary

If you know all about this stuff, please skip this section. I had to look up most of these terms, so hopefully it will be helpful to some.

Cross-compile - to build a piece of software on one platform which will run on another platform. The target system is the one we will run the software on; the host system is the one we will build the software on.
chroot - chroot changes the root of the file system to a new directory for all programs executed inside the chroot ('inside' means run using the chroot program, not inside in terms of the directory structure). This allows the files and programs in the chroot to see a different set of files/programs/settings from the rest of the system. From inside the chroot, programs cannot see outside to the rest of the real file system.
toolchain - a bunch of tools and libraries used to compile a program. Compiling a program for a different target will require a different toolchain.
Crosstool-ng - a toolchain builder. You enter the configuration settings and crosstool-ng gives you a complete toolchain which you can then use to compile your software.
Scratchbox2 - a tool for making cross compiling easier. Scratchbox2 provides a virtual environment so configure and the like think they are in the target environment when they are executing on the host.
Raspbian - a version of the Debian Linux distro tailored for the Raspberry Pi.
Wheezy - a version of Debian/Raspbian. For some reason Linux distros use weird names instead of (or as well as) version numbers. I'm not sure why anyone would choose 'wheezy' it does not exactly have connotations of speed and reliability. But I guess this is what you get when engineers choose names instead of marketing people.
debootstrap - tool for installing Debian into an existing OS/file system.
Linaro - an organisation which produces open source software for ARM systems. In the context of cross-compiling for ARM, Linaro usually refers to the Linaro compiler, a version of GCC specifically targeting ARM.

Building Crosstool-ng

Basically, follow the instructions at http://www.bootc.net/archives/2012/05/26/how-to-build-a-cross-compiler-for-your-raspberry-pi/. But, don't download the tar ball, clone the repo from http://crosstool-ng.org/hg/crosstool-ng and build according to the instructions here (http://crosstool-ng.org/#using_the_latest_development_stuff).

When you come to running menu config (ct-ng menuconfig), you should add support for C++ and use the latest versions of everything.

Follow the instructions on the wiki page for setting up your chroot and installing Scratchbox2. The wiki suggests putting Raspbian and Crosstool-ng in separate directories, but this did not work for me - I get errors when building Firefox. Specifically, __off_t and __pid_t being undefined types. The fix is to install Raspbian into $PATH_TO_CROSSTOOLS/x-tools/arm-unknown-linux-gnueabi/arm-unknown-linux-gnueabi/sysroot. My CHROOTPATH variable is then $PATH_TO_CROSSTOOLS/x-tools/arm-unknown-linux-gnueabi/arm-unknown-linux-gnueabi/sysroot/wheezy-hf. (As an aside, the hf stands for hardware floating point, and means we are targeting ARM processors with hardware floating point capability (such as the Raspberry Pi). Some ARM chips do not have hardware support for floating point (-sf) and we would have to use software floating point routines. That requires different build targets for all the libraries etc.).

Install the necessary packages

Next, update any packages already installed with sudo chroot $CHROOTPATH apt-get update. Then install the packages from the wiki page (you can probably do without the Qt packages if you will use Gtk, like I did). I had to install additional packages. I'm not sure that all of these are essential, in fact I'm sure some aren't, but I'm not exactly sure which. In any case its only a few MB. I installed: binutils-dev, libc-dev, and the mozilla build prereqs from this wiki page, which are currently: mercurial g++ make autoconf2.13 yasm libgtk2.0-dev libglib2.0-dev libdbus-1-dev libdbus-glib-1-dev libasound2-dev libcurl4-openssl-dev libiw-dev libxt-dev mesa-common-dev. Some of them should already be installed (make, at least) and some you won't need (mercurial - because you already have the repo outside the chroot, g++ - because we have a compiler installed by crosstool-ng, probably others).

You will need a custom mozconfig file. There is one on the wiki page, I used a different one, which you can find here. My mozconfig will give you a version of Firefox which is closer to the versions we distribute for Linux, but not as performant as the version using the mozconfig from the wiki.

Then build Firefox with sb2 MOZCONFIG=$PATH_TO_MOZCONFIG make -f client.mk and make a tarball of the distributable using sb2 MOZCONFIG=$PATH_TO_MOZCONFIG make package. You can then post it over to your Raspberry Pi using scp, a usb stick, or whatever. Extract using tar -xjf $FILENAME which will create a firefox directory. Run using ./firefox in that directory. Then luxuriate in your Firefox on Raspberry Pi experience! (Warning - it will be pretty slow.)

An alternative configuration

At the end of all this, you'll get a version of Firefox which is as close as possible to that on other platforms. But unfortunately it does not support hardware acceleration. Alternatively, you can use the mozconfig on the wiki page which will give you a version of Firefox which uses Qt rather GTK and EGL rather than GLX. That is not a supported configuration, but will give you hardware acceleration, which in turn allows for using OMTC and tiling, which might be enabled (I haven't tested, it looks like it might need a bit of fiddling with settings and possibly environment variables, but it might work out of the box).

Labels:

Friday, March 22, 2013

Finding instructions generated from JS

This is a bit of a beginner's tip for hacking on the JS engine (because I have been hacking a tiny bit on the JS engine (specifically the ARM assembler), and I am definitely a beginner).

If you have just written some code generating code, you probably want to see what code it actually generates. I found this not as easy as I expected.

The plan is to execute the generated code, break as we execute it (just after in my case, though I imagine just before is often more useful), then use gdb to see which instructions are under the program counter.

To break in the generated code, I just called |MacroAssemblerARMCompat::breakpoint()| in the code generation code, which, when we generate code, inserts a breakpoint instruction (and does some other fanciness too, but we don't need that for now).

I could not come up with a minimal test case in the JS shell which hit the breakpoint. So I had to try and find a test that did. (As an aside just because the VM generates code, does not mean that it will run it, I did not realise that). I ran
./jit_test.py -f $PATH_TO_JS_SHELL
which runs all the tests and gives a command to run the failing ones. Hitting that breakpoint causes a segfault, and so any test that exercises it will fail.

The output commands look like
[objdir]/js -f [srcdir]/js/src/jit-test/lib/prolog.js -e "const platform='linux2'; const libdir='[srcdir]/js/src/jit-test/lib/'; const scriptdir='[srcdir]/js/src/jit-test/tests/v8-v5/'" -f [srcdir]/js/src/jit-test/tests/v8-v5/check-raytrace.js
You can then run gdb with the js shell (gdb ./js, assuming you are in the objdir) and start execution with
r -f [srcdir]/js/src/jit-test/lib/prolog.js -e "const platform='linux2'; const libdir='[srcdir]/js/src/jit-test/lib/'; const scriptdir='[srcdir]/js/src/jit-test/tests/v8-v5/'" -f [srcdir]/js/src/jit-test/tests/v8-v5/check-raytrace.js
(which are the arguments from the command above). Execution will quickly stop when you hit the breakpoint. At this point you can use a gdb command like
x /10i $pc-36
to give you the 10 instructions up to and including the one pointed to by the pc. You can adjust the 10 and 36 to get the required number of instructions. This will give output something like
   ...
   0x766a0c30:    sub    sp, sp, #20
   0x766a0c34:    stm    sp, {r0, r1, r2, r3, r4}
   0x766a0c38:    vpush    {d5}
   0x766a0c3c:    vpush    {d2-d3}
   0x766a0c40:    vpush    {d0}
=> 0x766a0c44:    bkpt    0x000b
The Mozilla pages on hacking JS and Javascript tests were very useful along the way. Thanks to Marty Rosenberg and Nicolas Pierron for helping me along my way.

Labels:

Thursday, March 21, 2013

Stupid British politicians and their stupid education policies

Urgh:
On Thursday Sir Michael Wilshaw, the chief inspector of England's schools, waded into the row, ordering the academics to get "out of their ivory towers". Pupils needed to learn some basic facts by heart, especially in maths and English, he said.
From The Guardian.

This makes me so angry! First the 'ivory towers' thing is just a cheap dig and a populist way to attack people who have spent their (in this case long and decorated) careers in education. And are probably some of the most educated people in the country. Perhaps they know something about education and you should listen?

But what really gets me is "...learn some basic facts by heart, especially in maths...". This sums up all that is wrong with the British (western?) attitude to maths. Learning maths by heart is not learning maths at all. If we taught maths properly then we might end up with some better (and more) science and technology students.

Saturday, March 16, 2013

How I learnt to stop worrying and love open source

I've always been a fan of open source, basically because who doesn't like free stuff? But I've never really seen the greatness that people get so excited about. Two reasons for this are that I am put off by some of the more fanatical elements of the community (I realise that this is due to a vocal minority, by the way). Second, people need to make a living, and giving away your product seems like a tough business model. Of course it can work, Mozilla being an excellent example and there are many others. But there is no simple model along the lines of 'make something, sell it to people, ..., profit' which can be applied to open source software in general. Maybe that is not a bad thing, but it has stopped me fully embracing the idea of open source as a software engineering solution.

Open source has many, many advantages. After working for Mozilla for a year, I almost can't imagine how it is possible to work on a closed source project. Having the involvement of a wider community, being able to search the web for our code, our bugs, documentation, blogs giving insight into the code, not worrying about secrets, and so forth are truly wonderful. Contributing to an open source project is also the best way to learn about software engineering and any specific domain of it. If you are a student, or are looking for work, or looking to improve your software skills in any way, then there is absolutely nothing better you can do for yourself than to find an open source project and get stuck in (plug time - anyone interested in graphics in web browsers should get in touch!). It is probably my biggest regret about university that I didn't get involved with some open source projects, and instead worked on my own projects.

Anyway, all this is in the past. As of the last few weeks I LOVE open source and I am now truly a believer. The reason is that I acquired some new hardware, in particular a Raspberry Pi and a Samsung Chromebook. Both of these have ARM processors, which although found in pretty much every phone and tablet, are pretty much a minority interest in terms of 'real' computing. First off I have been amazed at the quantity and quality of open source software specifically aimed at such devices. There are no closed source equivalents and, due to economics I suppose, there never will be for niche areas like this.

Secondly, and here is the amazing bit, where the software I want for the platform I want doesn't exist, I can just compile it! This is so simple, yet so powerful. Even for niche areas you can usually get things like an OS and a browser, but what about all the other bits and pieces you want? For example, I use Sublime Text 2 as my main editor, it is a lovely piece of software and I use it in Windows, Linux, and on Mac. But, it is not open source and there are no binaries for Linux/ARM, so I cannot use it. But SciTE is also a lovely text editor and IS open source, so I can just compile it and use it anywhere I want. That is amazing! No really, we take this a bit for granted, but in terms of encouraging innovation, open source is miles ahead due to this very simple fact.

Labels:

Tuesday, February 19, 2013

OMTC Questions

mayankleoboy1 asked some questions in the comments of another blog post and I figured the answers might be of interest to a wider audience, so here they are (edited for order because it makes them easier to answer):

So when is the expected date for GFX and the m-c trees to merge ?

Around March 18th, as long as there are no unexpected problems. This is our goal date, not a promise :-)

[...] a lot of the OMT* is being done on priority for FFOS and Android, and later trickling to desktops. Has the traditional desktpos (win, lin and mac) market become second tier platform for mozilla ?

No, certainly not, although we have a lot of work to do on mobile, so that is a focus for many engineers right now. OMT* is developed for mobile because it is needed most there. Without OMTC, Firefox for Android is really unusably bad. Without OMTA FirefoxOS is really slow and jittery in some key places. Desktop Firefox works pretty well without them (although it will work better with).

Why is that OMT* work lags on windows, compared to OSX and Linux ? AFAIK, windows makes 90% of mozilla users. So shouldnt windows desktop get more priority ?

As I said above, the focus of the OMT* work has been for mobile and that means OpenGL. We don't support OpenGL on Windows, so we don't have OMT* there. We only have it on Mac and Linux to make mobile development easier - it is not yet a supported configuration on either platform (although it will be in the future). Implementing OMT* on a different graphics backend has been very daunting. One goal of the layers refactoring is to make that easier. Our current focus for OMTC is Windows, in particular for the Metro browser. Unless there are unforeseen hurdles, Windows will be the next platform to get OMTC. (OMTA has a few other issues before it can be used anywhere other than FirefoxOS (including on Android), not least of which is testing).

And yes, Windows is a higher priority for Mozilla (in general) than Linux and Mac, although user share is not the sole determinant of priority (Linux gets a lot of love (relative to its user share) because it is more closely aligned to our mission and a lot of developers use it, for example).

Labels:

Thursday, February 07, 2013

Skia canvas on Windows XP

Using Skia as the rendering backend for canvas has been an option for a while now. Skia is now the default for Windows XP users. That will filter out to nightlies today or possibly tomorrow. It should make canvas perform a bit better on XP.

At the moment our benchmarking does not make a solid case for making it the default on other platforms. If you are not on XP and would like to experiment (possibly exposing yourself to 'fun' bugs) you can use Skia by setting the pref gfx.canvas.azure.backends to 'skia'.

Thanks to Rik Cabanier, Matt Woodrow, Jet, and George Wright for getting this done.

Labels:

Tuesday, February 05, 2013

A fun bug

(Actually this is a two for one kind of a deal)

I've spent the last two days finding two tricky bugs in my port of tiled Thebes layers to the async compositing API. I think they are kind of fun, so I'll try and describe them here. I'll try to elide the details a bit. If you want to check out the real code, look at ContentClient.cpp, ContentHost.cpp, and BasicTiledThebesLayer.cpp on the graphics branch.

First, the old way. A tile buffer keeps a bunch of tiles (the actual tiles, not references, that is important) and each tile keeps a reference to a gfxReusableSurfaceWrapper. A gfxReusableSurfaceWrapper is kind of neat, it keeps a reference to a surface and can be locked for reading. When we want to write to it we ask for its surface. If it is locked, then you get a fresh surface (with a new gfxReusableSurfaceWrapper to wrap it). If it is not locked, you get the same surface as last time.

To render the tiled layer, the content thread gets a surface for each tile and paints to it. When the tiled layer is rendered, a copy of the tile buffer is made in the heap and a reference is passed to the compositor thread. The compositor thread locks all of the gfxReusableSurfaceWrappers (via the tiles and buffers) and blits them to the screen.

Note that if the gfxReusableSurfaceWrapper is locked and we get a new surface when painting, then we store the new gfxReusableSurfaceWrapper in the tile and lose track of the old gfxReusableSurfaceWrapper. Also, gfxReusableSurfaceWrappers are reference counted. They are destroyed when there are no more references to them. Finally, it is very important that when a gfxReusableSurfaceWrapper is destroyed it is not locked for reading; we assert that.

This sounds fun already, right? But the fun bit is still to come...

As far as we are concerned, the main effect of refactoring into the new compositing API is that we add another layer between the tiles and the gfxReusableSurfaceWrappers. We add a TextureClient. The tile holds a reference to the TextureClient and the TextureClient holds a reference to the gfxReusableSurfaceWrapper. The TextureClient lives on the heap and is also reference counted.

What could go wrong?

What goes wrong is that we trigger an assertion by trying to destroy a locked gfxReusableSurfaceWrapper. Figuring out why took me a little while. What should happen is that the copy of the buffer and  its tiles on the compositor thread keeps the gfxReusableSurfaceWrappers alive once the tiles on the content tread forget about them. That works because we only lock the tiles for reading when we pass them to the compositor and because when we copy the buffer (a bitwise copy) we copy all the tiles, creating another reference to each gfxReusableSurfaceWrapper. But, with the TextureClients, the tiles are copied and we add another reference to the TextureClients, but they are not copied and so we only have one reference to the gfxReusableSurfaceWrappers. Thus, the next time around if we get new gfxReusableSurfaceWrappers and forget about the old ones, then they are destroyed, even though they are locked by the compositor! The fix is to do a 'deep' copy, copying the TextureClients rather than making another reference to them.

What could go wrong?

This gives rise to the really fun bug. Because if you do the 'deep' copy on the Compositor thread, you still hit the same assertions, just much less often. What is happening here is that there is a gap between when the tiles are locked (content thread) and when we make a copy (compositor thread). Sometimes we might get to repaint (content) before we composite the previous round and that means we un-reference the gfxReusableSurfaceWrappers after we lock and before we copy. That took a while to find, but in retrospect doing the 'deep' copy on the compositor thread was dumb, I'm not sure why I did that. The fix is easy, just move the deep copy to the content thread.

Labels:

Monday, February 04, 2013

Throttling off main thread animations

For the last few months I have been working mostly on throttling off main thread animations (OMTA), in between a little of the layers refactoring, which I'm now returning to. Under OMTA, CSS animations and transitions are animated on the compositor thread. That makes things run faster (because the main thread is free to do other work) and smoother (because if the main thread gets bogged down in some work, the compositor thread can carry on animating smoothly). Much of the work for OMTA was done by one of our awesome interns, David Zbarsky.

The old way of doing CSS animations (and the way we still do things for most properties) is for layout to do all the work. Every frame of the animation the necessary parts of the webpage are laid out (the process of converting HTML to graphical objects) and rendered (converting those graphical objects to pixels) afresh with the correctly interpolated property value. If we have off main thread composition (where each layer is rendered on the main thread, but layers are composited together on a separate thread) then we can instead layout the web page once and change the way we composite to take account of the animation. The initial implementation did this in such a way that the main thread still does a layout run for each frame, to keep its model up to date and the compositor did it's own animations too. That got the smoothness but not the speed-up. In fact, since we did the interpolating twice, presumably it slowed things down slightly. My task was to finish off the work to stop animating on the main thread (bug 780692). That is the 'throttling' bit. It has been surprisingly difficult; easily the hardest and most frustrating problem I have worked on at Mozilla. But also lots of fun.

The main difficulty is that we do sometimes need to 'catch up' on the main thread, mostly when we need to respond to some JS/DOM stuff. For example, if we have to test whether the mouse cursor is over an element with an animated scale, we need the current value of that scale to be able to tell whether the cursor is inside that element. That means that layout, which runs on the main thread, needs to have an accurate picture of the state of the animation. We call this update of layout a mini-flush. We do a mini-flush periodically (every 200ms at the moment) and when we need to have accurate information for DOM stuff. What happens during a mini-flush is that we calculate the animated values for that moment in time and post them to the compositor. It gets tricky because we want to avoid doing a full (and very expensive) re-layout of everything and only update the animating property of the animated element. It gets even trickier because it might have been a restyle which requires the animation data and we cannot start a new restyle pass while one is already in progress.

I have skipped *a lot* of the details here. There is a lot of interesting discussion in bug 780692 if you are interested in this stuff.

Currently OMTA is only used on Firefox OS. There is work in progress to port it to Firefox on Android, and that shouldn't be too hard. It should work fine on desktop (that is where I did most of the development work), but requires OMTC (which in turn, currently, requires hardware acceleration), which is a little way of for all platforms. Once we have that, OMTA should be good to go.

If you are writing a webpage, there is no way to guarantee you'll get OMTA. But you have a good chance if you use CSS animations or transitions to animate either the transform or opacity, and don't have a 3D transform on that element. For example, most of the 'windowing' animations on Firefox OS (window opening animation, window changing animation, etc.) get OMTA.

Labels: