SparseSet Documentation

Version:
0.2
Author:
Andreas Meissl (e9726184@student.tuwien.ac.at)
Date:
2007-06-20

Introduction

Sparse sets are sets usually containing only a few members. The typical representations for dense sets doesn't fit very well for sparse sets in terms of runtime and memory consumption. And even for sparse set optimized representations don't fit for each use case.

This collection of five different implementations of the same set interface was tested for a concrete use case as followpos set (as described in "Compilers. Principles, Techniques and Tools" by Aho, Sethi, Ullman - Addison-Wesley, 1985) in deterministic finite automata (DFA).

Representations

The following five representations of sets were implemented and tested.

Tests

Two concrete use cases of a DFA were used to log calls to the different methods of the set interface. The first one was representing the Microsoft SQL grammar, the second one a concrete regular expression. The test overhead (parsing the log file, temporarly storing the set instances) was measured using an empty implementation (sps::Dummy) of the interface. A description of the log file format can be found at the sps::SparseSetTest::Test() function documentation.

The tests were executed on a Win32 system (Intel Core 2 Duo, 2.4 GHz, 2GB RAM).

DFAs for SQL grammar

In this case many small DFAs were build. For each production of the SQL grammar one. This resulted in more dense sets. The follow position numbers stood small and therefore the representations without offsets (sps::DenseBitVector, sps::RiceSet, sps::BitTree) didn't cause so much memory overhead like for the second testcase.

The Details

The test log dfalog_sql.txt with 2,244,167 instructions (including constructor and destructor calls).

DFA for regular expression

In this case one big DFA was build. The regular expression
(abc|bcd|efg|h?i?j){100,1000} 
was used for the test. This caused many small sets partially filled with high numbers. Therefore the representations without offsets consumed much more memory than the representations with offsets.

The Details

The test log dfalog_regex.txt with 656,380 instructions (including constructor and destructor calls).
Generated on Wed Jun 20 21:20:43 2007 for SparseSet by  doxygen 1.5.2