SparseSet Documentation
- Version:
- 0.2
- Author:
- Andreas Meissl (e9726184@student.tuwien.ac.at)
- Date:
- 2007-06-20
Sparse sets are sets usually containing only a few members. The typical representations for dense sets doesn't fit very well for sparse sets in terms of runtime and memory consumption. And even for sparse set optimized representations don't fit for each use case.
This collection of five different implementations of the same set interface was tested for a concrete use case as followpos
set (as described in "Compilers. Principles, Techniques and Tools" by Aho, Sethi, Ullman - Addison-Wesley, 1985) in deterministic finite automata (DFA).
The following five representations of sets were implemented and tested.
- sps::BitList
A sorted linked list of nodes existing of an offset and a machine word storing the bits. - sps::BitTree
A binary tree, bits are stored in the leaves, the position of the leaf in the tree encodes the most significant bits of the set members. - sps::DenseBitVector
The typical dense set implementation. A dynamic vector of machine words. The position of the node in the vector determines the offset. - sps::SparseBitVector
A sorted dynamic vector of nodes existing of an offset and a machine word storing the bits. - sps::RiceSet
A sparse set representation by Preston Briggs and Linda Torczon (Rice University) published in the ACM Letters on Programming Languages and Systems, Vol. 2, Nos. 1-4, March-December 1993, Pages 59-69. A dense and a sparse vector. Each node in the sparse vector links to a node in the dense vector and vice versa.
Two concrete use cases of a DFA were used to log calls to the different methods of the set interface. The first one was representing the Microsoft SQL grammar, the second one a concrete regular expression. The test overhead (parsing the log file, temporarly storing the set instances) was measured using an empty implementation (sps::Dummy) of the interface. A description of the log file format can be found at the sps::SparseSetTest::Test() function documentation.
The tests were executed on a Win32 system (Intel Core 2 Duo, 2.4 GHz, 2GB RAM).
In this case many small DFAs were build. For each production of the SQL grammar one. This resulted in more dense sets. The follow position numbers stood small and therefore the representations without offsets (sps::DenseBitVector, sps::RiceSet, sps::BitTree) didn't cause so much memory overhead like for the second testcase.
The test log dfalog_sql.txt
with 2,244,167 instructions (including constructor and destructor calls).
- sps::Dummy
- Runtime: 1,781 ms
- Memory peak: 1,056 KB
- sps::BitList
- Runtime: 1,828 ms
- Memory peak: 1,460 KB
- sps::BitTree
- Runtime: 1,953 ms
- Memory peak: 1,432 KB
- sps::DenseBitVector
- Runtime: 1,829 ms
- Memory peak: 1,228 KB
- sps::SparseBitVector
- Runtime: 1,828 ms
- Memory peak: 1,232 KB
- sps::RiceSet
- Runtime: 1,875 ms
- Memory peak: 5,884 KB
In this case one big DFA was build. The regular expression (abc|bcd|efg|h?i?j){100,1000}
was used for the test. This caused many small sets partially filled with high numbers. Therefore the representations without offsets consumed much more memory than the representations with offsets.
The test log dfalog_regex.txt
with 656,380 instructions (including constructor and destructor calls).
Generated on Wed Jun 20 21:20:43 2007 for SparseSet by
1.5.2