It seems to be trendy to blog about startup times. So lets talk about the recent changes bluefish startup time. So what caused the changes? The current subversion trunk compiles on gtk-3, while the latest stable release (2.0.3) did not, most notably because it uses GtkItemFactory to generate menu’s. I did several optimizations for the new code already: I moved all the bluefish-specific stock icons from external png files to inline-pixmaps to reduce the file seek times, and I changed from seperate toolbar and menu definition files to a single file again to reduce the disk seek times.
So is the new gtk-3 compatible version faster? Both versions built against the same gtk-2 release shows that the old version was faster. On my development machine (core 2 duo 3.1 GHz) the time is too short to see a real difference (total startup took 0.6 s) So I tested this on an older computer (IBM T43 laptop with 1.8GHz Pentium M and a SSD) to make it easier to measure the startup time. I measured several times to make the results more accurate:
GtkItemFactory uses a struct that is directly compiled into the binary. GtkUIManager uses a XML format that first needs to be parsed. In Bluefish we load this file from disk which adds some seek time (but the laptop has a SSD so that should not be much). Could this change be the cause of the 0.3 s difference?
Luckily not everything is slower, the syntax scanning speed has increased a lot over the last releases. This is measured on the same laptop:
Given that the scanner engine did spend close to 50% of it’s time settings GtkTextTag’s (see the Bluefish editor widget design) the boost is pretty impressive.
I have several different computers, running Gnome 2, Gnome 3, Unity and Mac OSX. New interfaces always take a while to get used to, so after the initial launch of Gnome 3 and Unity the “classic” Gnome 2 interface was still my favourite to get my work done.
Gnome 3 has the best looks (yes, I like it better than OSX), but to get my work done I don’t need the best looks. A long time ago I ran Enlightenment with the aliens theme to have a very cool desktop, but I always switched the Sawfish when I had some real programming to do.
So what is making Gnome 3 making my first choice? The main reason for that is the keyboard control. Hit <alt><f1> or the <windows> key and start typing some characters of the program name and hit enter. Better, start typing the name of the bluefish project file that I used recently and hit enter, and I have my project open. I don’t have to type the exact name of the command (typing “te” already selects “gnome-terminal” for me, “tru” selects my “bluefish_trunk” bluefish project file, “fi” selects firefox, etc.) which makes it very fast and convenient. Switching virtual desktops (called workspaces in gnome 3) is <ctrl><alt><up> or <ctrl><alt><down>, and when I need a new desktop it is automatically created by hitting <down> one extra time.
Some other things I like a lot: tiling widows side by side by dragging a window to the right/left, and restoring the original size when moving the window again. However, I would like to be able to widen the windows after tiling, the left window can be widened on the bottom-right corner, but there is no way to make the left window a bit wider. I like <alt><`> for switching between windows of the same application (it feels natural because it is so close to <alt><tab>). I like <alt><f2> to start new commands, especially when using <ctrl><enter> to start that command in a new terminal.
What would make things even better for me:
- <Alt><tab> behavior per desktop per window. I just doesn’t make sense to me that switching between two web-pages in two firefox windows is different from switching between two web-pages in chrome-and-firefox. I often <alt><tab> between a couple of terminal windows and bluefish windows. The default just switches applications, and I usually need a specific session of that application on the same virtual desktop. The alternate tab extension however, makes me tab between all open windows on all virtual desktops (which usually is a long list).
- Easier mouse access to virtual desktops. The hot corner is left, but to switch to a different virtual desktop without key combination, I have to move the mouse all the way to the right (which is a long way on a widescreen display). I have the workspaces menu extension installed to have the virtual desktops in the top bar, but it needs two mouseclicks to switch between two desktops. An improvement could be to make the top-right of the screen a second hot corner that activates the workspaces area by default (I have the right-hot-corner extension installed, but I first have to move the mouse to the top right for the hot corner, and then to the middle to activate the workspaces area).
- Better use of the vertical screen space. The top bar of each window is quite high, and it only has a close button and you use it to drag the window. Especially when maximising a window the top space of the screen has a lot of unused space. This is an area where Unity tries to do good things (except that the menu thing in unity is slow and buggy as I posted earlier). Luckily Bluefish has a fullscreen feature!
- Make it the default to open a new window. For most programs I use I have multiple open windows (terminals, bluefish sessions, firefox sessions etc.). If I want to switch to an open window it is much faster to select that window in the overview mode than clicking the icon in the dash (which selects just one of the open sessions which is anyway usually not the one you need). I want to use the dash to start a new session, regardless if I have a session running already. Having the hold <ctrl> while clicking is annoying. Same for starting a program using the keyboard control: if I type “fi” and hit enter, I don’t want any of my existing firefox sessions, I want a new session!
We have something like 3000 printers. They are named something like “MF2301″ and “MF2302″. The printer properties luckily show the location of the printer.
So let’s fire up system-config-printer and search for the right printer. First thing is that it takes ages for system-config-printer to start with 3000 printers. Somewhere close to 20 seconds. What’s happening? Is it requesting the status for each printer? Is the delay caused by 3000 icons (Nautilus is faster when displaying a directory with 3000 files)? Then we have the search field. Hmmm it allows only searches on the name, not by location or description. Unfortunately our users are humans and not computers, so they usually know the location where they are, but not the number of the printer. So I first have to walk to the printer, write the name down and walk back to my thin-client to select the correct printer and click “set as default”.
Now I want to print something. So I hit <ctrl><p> in openoffice.org, and it shows MF00001 as printer? That was not my default printer! Openoffice.org shows a dropdown with all printers, so I have to scroll through the 3000 printers to select the right printer again.
So far for printing for now.
Update: let me be clear, I don’t want to bash the developers of the printer settings (I’m very glad it exists!!!!) but I just want to show some of the issues that arise in large desktop deployments.
For long running applications it is important that there are no memory leaks. For an application that runs for a short time the memory will be freed after the application quits, but for an application that runs for days or weeks or more, any memory that is allocated by the application should be free’ed by the application, otherwise I will not be available for other programs for a long time.
A very useful tool for memory leak debugging is valgrind. It makes your program run a lot slower (10X?) but it will resturn all interesting memory allocations that have not been explicitly free’ed. GTK does one thing that valgrind doesn’t like: the slice allocator does it owns memory management. Memory seems to be leaking, but it is ready to be used by the slice allocator. Luckily you can turn that off with the environment variable G_SLICE=always-malloc.
$ G_SLICE=always-malloc valgrind --tool=memcheck src/bluefish
valgrind will now report if you have memory leaks. To see where the leaking memory is allocated use
$ G_SLICE=always-malloc valgrind --tool=memcheck --leak-check=full --num-callers=32 src/bluefish
Valgrind will show several false positives, that is memory that gtk is allocating that is not supposed to be free’ed.
Sometimes the origin of the memory leak is not good enough, because it is a reference counted gobject, and you can’t find where the reference is increased that-should-have-been-decreased or another bug like that. A useful tool to debug that is the gobject-lifetime debugger library http://cgit.collabora.co.uk/git/user/danni/gobject-list.git/tree/ or the similar refdbg http://gitorious.org/refdbg/refdbg which is packaged for Debian.
The gobject-list library is used like this:
I get a lot of output here, and there seem to many false positives (I hope, because valgrind doesn’t report them!?!). I have to play with this a little more to learn how to use it effectively.
Gnome shell, but also Unity, make extensive use of modern video hardware possibilities. Which is a good thing. The downside is that they do not function anymore without access to the modern video hardware. In an organisation that uses thin-clients and terminal servers over a wide area network this becomes a bit of a problem. Protocols like NX (Nomachine) and VNC (many products, such as ThinLinc) that can handle the high latencies on wide area networks do not provide access to these functions of the video hardware.
This means that the thin-clients are limited to the “fallback” gnome desktop. But how long will that be maintained? When will the first open source product decide to drop support for the oldfashioned gnome desktop? What if Empathy or Networkmanager will not work anymore with the fallback desktop? Does that make our thin-clients worthless?
What is your strategy regarding thin-clients?
Several people posted GList anti-patterns, calling the code completely broken. Although I agree such code is broken, it must be said that the glib documentation for a very long time didn’t even specify if g_list_append() would return the first or the last item of the list. I know for sure that some of the early Bluefish code called append() in a loop. Currently there is a note in the glib documentation that calling g_list_append() has to traverse the list every time. That note is a major improvement, but for a beginning programmer it would be better to show an example that g_list_prepend() in a loop followed by a g_list_reverse() is actually faster than g_list_append() in a loop.
There are quite a few linked list implementations that look like the glib GQueue implementation (so they keep a pointer to the head and the tail) in which append() is equally fast to prepend(), so programmers coming from another language or library might not even think about the difference between the two. Perhaps we should recommend beginning programmers to use a GQueue instead of a GList ?
The Bluefish syntax scanning engine uses a DFA state table. To move from one state to the other we use a 16 bit unsigned integer. That means we can switch between 65536 states. But we now have a 100% complete HTML5 + SVG (+ all attributes of all tags in both languages) language definition, which needs about 70000 states.
A state table only needs to be able to switch to other states in the same context. Right now we keep all contexts in the same state table. A possible design change would be to create separate state tables for each context. That would raise the limit to 65536 states per context. Right now I cannot image which language file will hit this limit (but then, 640K memory is enough for a computer, right?)
Update: the new design is working, we have a 65536 state limit per context now, and right now no single language is getting even close to that limit.
After a couple of redesigns the optimisations are almost done now. In almost all cases everything works perfect. It’s now down to good testing before this branch is merged with trunk.
Memory usage for the XML file with a million matches is now only 19% of the original memory usage (from 437Mb down to 85Mb). Scanning a complete file is 10% faster now. But the best is that a small change in the file will no longer cause the complete file to be rescanned: most of the time only a small region needs to be rescanned.
After my previous post Valentijn suggested to use a tree structure with pointers instead of a stack. That indeed reduces a lot of redundant information in memory. With the memory usage in mind I redesigned some of the internal datastructures and the first results are promising.The queue is now replaced by pointers to the parent structure, I reduced the number of pointers and added a state which is stored bitwise in a single integer.
The syntax scanning cache for the 12Mb XML file with 1 million matches is now back to 24Mb for the cache on 40Mb of QSequenceNodes, 7Mb blocks and 13Mb for context changes. That’s only 22% of the original memory usage! Code is in a separate branch: http://bluefish.svn.sourceforge.net/viewvc/bluefish/branches/bluefish_optimise_bftextview/src/
On normal small to medium sized files the difference in memory usage is much smaller, about 80% of the original memory usage. The scanning speed on small to medium sized files also hasn’t changed much (~1.1X faster). Only on very large files the memory usage makes a difference in the scanning speed.
The other big improvement I want to add in this branch is more reuse of previous scan results. Currently everything following a change is always re-scanned. But many of the previous results are probably still valid.
I’m still playing with the 12Mb XML file. The syntax scanner matches more than a million times on a pattern in this file (1038246 to be precise). Each time we have a match, we store the context stack and the blockstack.The cache structures consume 56Mb and it needs 40Mb of GSequenceNodes. The items on the stack are refcounted, so their memory consumption isn’t spectacular. There are 221348 blocks in this file, that use 5.1Mb and 570036 context changes that use 8.9Mb memory. But the stack itself is a GList. Actually, we have about 2 Million GList’s here (for every cache item two stacks), and together they consume a whopping 277Mb of memory for this file!!!
What to do: don’t store the complete blockstack and contextstack every time we find a match.That would help to reduce the 277Mb for the stacks. That still leaves the 96Mb for the cache structures and the balanced tree..