SparseSet Documentation

Version:: 0.2

Author:: Andreas Meissl (e9726184@student.tuwien.ac.at)

Date:: 2007-06-20

Introduction

Sparse sets are sets usually containing only a few members. The typical representations for dense sets doesn't fit very well for sparse sets in terms of runtime and memory consumption. And even for sparse set optimized representations don't fit for each use case.

This collection of five different implementations of the same set interface was tested for a concrete use case as followpos set (as described in "Compilers. Principles, Techniques and Tools" by Aho, Sethi, Ullman - Addison-Wesley, 1985) in deterministic finite automata (DFA).

Representations

The following five representations of sets were implemented and tested.

sps::BitList
A sorted linked list of nodes existing of an offset and a machine word storing the bits.
sps::BitTree
A binary tree, bits are stored in the leaves, the position of the leaf in the tree encodes the most significant bits of the set members.
sps::DenseBitVector
The typical dense set implementation. A dynamic vector of machine words. The position of the node in the vector determines the offset.
sps::SparseBitVector
A sorted dynamic vector of nodes existing of an offset and a machine word storing the bits.
sps::RiceSet
A sparse set representation by Preston Briggs and Linda Torczon (Rice University) published in the ACM Letters on Programming Languages and Systems, Vol. 2, Nos. 1-4, March-December 1993, Pages 59-69. A dense and a sparse vector. Each node in the sparse vector links to a node in the dense vector and vice versa.

Tests

Two concrete use cases of a DFA were used to log calls to the different methods of the set interface. The first one was representing the Microsoft SQL grammar, the second one a concrete regular expression. The test overhead (parsing the log file, temporarly storing the set instances) was measured using an empty implementation (sps::Dummy) of the interface. A description of the log file format can be found at the sps::SparseSetTest::Test() function documentation.

The tests were executed on a Win32 system (Intel Core 2 Duo, 2.4 GHz, 2GB RAM).

DFAs for SQL grammar

In this case many small DFAs were build. For each production of the SQL grammar one. This resulted in more dense sets. The follow position numbers stood small and therefore the representations without offsets (sps::DenseBitVector, sps::RiceSet, sps::BitTree) didn't cause so much memory overhead like for the second testcase.

The Details

The test log dfalog_sql.txt with 2,244,167 instructions (including constructor and destructor calls).

sps::Dummy
- Runtime: 1,781 ms
- Memory peak: 1,056 KB
sps::BitList
- Runtime: 1,828 ms
- Memory peak: 1,460 KB
sps::BitTree
- Runtime: 1,953 ms
- Memory peak: 1,432 KB
sps::DenseBitVector
- Runtime: 1,829 ms
- Memory peak: 1,228 KB
sps::SparseBitVector
- Runtime: 1,828 ms
- Memory peak: 1,232 KB
sps::RiceSet
- Runtime: 1,875 ms
- Memory peak: 5,884 KB

DFA for regular expression

In this case one big DFA was build. The regular expression

(abc|bcd|efg|h?i?j){100,1000}

was used for the test. This caused many small sets partially filled with high numbers. Therefore the representations without offsets consumed much more memory than the representations with offsets.

The Details

The test log dfalog_regex.txt with 656,380 instructions (including constructor and destructor calls).

sps::Dummy
- Runtime: 563 ms
- Memory peak: 1,836 KB
sps::BitList
- Runtime: 625 ms
- Memory peak: 3,564 KB
sps::BitTree
- Runtime: 813 ms
- Memory peak: 6,204 KB
sps::DenseBitVector
- Runtime: 1,203 ms
- Memory peak: 27,364 KB
sps::SparseBitVector
- Runtime: 594 ms
- Memory peak: 2,548 KB
sps::RiceSet
- Runtime: 2,250 ms
- Memory peak: 758,968 KB

Generated on Wed Jun 20 21:20:43 2007 for SparseSet by

1.5.2