After my previous post Valentijn suggested to use a tree structure with pointers instead of a stack. That indeed reduces a lot of redundant information in memory. With the memory usage in mind I redesigned some of the internal datastructures and the first results are promising.The queue is now replaced by pointers to the parent structure, I reduced the number of pointers and added a state which is stored bitwise in a single integer.
The syntax scanning cache for the 12Mb XML file with 1 million matches is now back to 24Mb for the cache on 40Mb of QSequenceNodes, 7Mb blocks and 13Mb for context changes. That’s only 22% of the original memory usage! Code is in a separate branch: http://bluefish.svn.sourceforge.net/viewvc/bluefish/branches/bluefish_optimise_bftextview/src/
On normal small to medium sized files the difference in memory usage is much smaller, about 80% of the original memory usage. The scanning speed on small to medium sized files also hasn’t changed much (~1.1X faster). Only on very large files the memory usage makes a difference in the scanning speed.
The other big improvement I want to add in this branch is more reuse of previous scan results. Currently everything following a change is always re-scanned. But many of the previous results are probably still valid.

