Bluefish editor widget design

In Bluefish, Gnome, Linux desktop, open source, Programming on August 14, 2010 by oli4444

Many people ask why Bluefish developers chose not use the GtkSourceView widget. The reason: we hoped we could build an editor widget better suited for development work. If we succeeded? Try for yourself.

Before I start with the design, let me briefly describe the old editor widget design:

  • used libpcre regular expression compiler/matcher
  • every type of match (e.g. keyword, function name, etc.) needed it’s own regular expression pattern
  • because libpcre regular expression patterns are limited in size, some of these were split up in tens of regular expression patterns (for example all php function names)
  • every piece of text thus had to be scanned for matches up to 50 times (each pattern needs a separate matching run)
  • large file scanning could block the widget for a couple of seconds
  • patterns could be bound to scan only within the match of other patterns – having some kind of context sensitive matching (e.g. javascript within html), but not very flexible
  • there was no way we could re-use the scanning for autocompletion

So now the new design. Let’s start with the design requirements:

  1. syntax highlighting should be pretty fast; the user continuously changes the syntax while typing, and the highlighting should keep up with that
  2. syntax should be defined in a language file; new languages can be added and language files can be updated without a change in the scanning engine
  3. the language file should not contain any highlighting colors, it should map to textstyles that the user can define, such that all languages have a similar look
  4. the syntax scanning should support all kinds of languages, markup such as html and xml, and programming languages like javascript and php, and it should be capable of handling thousands of patterns.
  5. the syntax scanning should be context-aware (in a comment? in a php block? in a CSS block?) and block-aware (<p> opened, <b> opened, </b> closed etc.)
  6. the widget should allow context-aware autocompletion
  7. scanning large blocks of text should not block/freeze the gui

We have one additional constraint: Because we wanted to use GtkTextView as the base class the actual highlighting cannot be done in a separate thread or in the background (we have to set GtkTextTag’s from the main thread).

This resulted in the following high-level design:

  • we use a DFA engine to scan syntax because it is very fast (O(n) on the input data) independent from the number of patterns (O(1) on the number of patterns)1
  • because we want to scan context-sensitive we compile a DFA table for each context
  • the complete DFA is in a single continuous memory block to maximize CPU cache and minimize memory paging effects
  • for each context we also compile a GCompletion with all possible autocompletion strings in that context
  • all language file parsing and compiling is done in a separate thread so we exploit the possibilities of multi-core computers
  • we keep a stack of contexts and a stack of blocks during the scanning run
  • we scan for syntax in short timeslots that block the UI, but after the short timeslot we return control back to the gtk main loop.
  • we mark all text that needs scanning with a GtkTextTag, and thus we can quickly find where we should start scanning (the first bit that is marked as needscanning)
  • on text change we simply mark the changed area with the needscanning GtkTextTag and resume scanning
  • we should thus be capable to resume scanning on any given position
  • that means that we should be able to reconstruct the block-stack and the context-stack at any given position
  • a very fast way to look-up a given context-stack and block-stack at a given position is if we keep them in a balanced-tree which scales O(log n) on the number of stored positions. But we are in a worst-case situation for normal binary-tree’s: we insert data in sorted order. Glib has a nice Treap implementation that we use that is much better when data is inserted in sorted order 2
  • for autocompletion we look-up the position in the balanced tree, peek at the context stack to get the current context, and use the corresponding GCompletion to find the possible strings

Because of this design we get an additional bonus: we can look-up found syntax by a position very cheap (because of the balanced tree) and we can link additional information to that structure -> showing a tooltip when you move the mouse over a function is thus possible as well!

Some results on a 64bit 2.7GHz AMD CPU:

  • using the HTML language file (with 2575 patterns) on several fairly large HTML files the syntax highlighting parses about 1Mb per second. More than 50% of the time is spent on setting GtkTextTag’s in the GtkTextView widget, so the GtkTextView widget clearly is our main bottleneck with the new scanning engine and the scanner itself runs over 2Mb/s.
  • using the PHP language file (with over 7500 patterns) on some fairly large PHP files, the syntax highlighting parses about 500Kb of PHP per second. These files have have much more colors, and settings GtkTextTag’s in these files takes 75% of the time. The scanner itself still runs over 2Mb/s, confirming that our DFA engine performance is independent from the number of patterns

1: actually this is not completely true. It is only O(1) on the CPU, but more patterns mean more memory usage, which means that the cache size of your CPU comes into play, and virtual memory/paging comes into play.
2: because we do relatively few searches and lots of updates on the balanced tree it would be interesting to see if a red/black tree would perform better


One Response to “Bluefish editor widget design”

  1. Okay Oliver I’m visiting….now I’ve got to see about the architecture’s affect on performance. Often it comes down to the results of the compiler output. I’ve not looked at output from gcc for this. Since I’ve not made the run yet it’s all speculation on my part. As I recall Bluefish used miniGW-64 which again I’ve not used recently.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: