Mergemem Project

Description

The mergemem program is able to reduce memory consumption of processes under the Linux operating system. Many programs contain memory areas of the same content that remain undetected by the operating system. Typically, these areas contain data that have been generated on startup and remain unchanged for longer periods. With mergemem such areas are detected and shared. The sharing is performed on the operating system level and is invisible to the user level programs.

mergemem is particularily useful if you run many instances of interpreters and emulators (like Java or Prolog) that keep their code in private data areas. But also other programs can take advantage albeit to a lesser degree.

mergemem was realized in a student project by Philipp Richter and Philipp Reisner at TU Wien.

Getting the package

Please refer to one of the following sites via http:

Equally mergemem can be obtained via anonymous ftp:

% ftp ftp.complang.tuwien.ac.at
  login: anonymous
  password: my@email.adress
  ftp>cd pub/ulrich/mergemem/versions
  ftp>ls -ltr
  ftp>binary
  ftp>get XXX
  ftp>quit

Installation and security considerations

From the mmlib (3) manual:
In a hostile environment only select processes should be merged whose purpose and operation is well known. Arbitrary processes of untrusted users should not be merged.

Sharing pages of unrelated user processes might provide an indirect hint about the existence of other users' pages of same content. Statistical information about sharing and memory usage might be exploited by unauthorized processes to this end. Even without such information the following scenario is possible.

A hostile process tries to guess the content of a confidential page by creating a set of arbitrary pages containing some guesses. An authorized process merges one of those pages with the confidential page. The sharing of the two pages is not directly visible to the hostile process. But modifying a shared page takes much longer, because it causes a copy-on-write page fault.

To install simply uncompress and untar the distribution in a convenient directory and follow the instructions in the INSTALL file.

Further developments

The current tentative manual mmlib (3), source (based of 0.13), pre 0.14.

Performance considerations

Blocking: Merging of pages is performed with lots of small system calls, therefore the operating system is not blocked in its operation by mergemem.

Worst case scenario: Pages are merged and immediately afterwards split back.

Sample savings

The following programs were run under i386-Linux 2.0.33. Memory requirements/savings are indicated on a per instance basis. Numbers should be multiplied with the actual number of instances. (i.e. big savings are only realized when running many instances).

Depending on what the processes do afterwards, the amount of sharing might be reduced. But most sharings seem to remain.

SICStus Prolog 3#6
Original saved state
1st instance ................  2216 kB
further instance ............  +964 kB
further instance merged .....   +84 kB  i.e.  880 kB saved
I.e. any further instance requires initially only 84 kB instead of 964 kB.

Saved state with lots of code and libraries

1st instance ................  7608 kB
further instance ............ +3616 kB
further instance merged .....   +84 kB  i.e. 3532 kB saved
JDK Java (Version 1.1.3)
Invocation: bin/appletviewer demo/name/example1.html

SpreadSheet

1st instance ................ 12488 kB
further instance ............ +7612 kB
further instance merged ..... +2452 kB  i.e. 5160 kB saved
MoleculeViewer
1st instance ................ 13476 kB
further instance ............ +8612 kB
further instance merged ..... +3112 kB  i.e. 5500 kB saved
MoleculeViewer+SpreadSheet
MoleculeViewer .............. 13476 kB
MoleculeViewer+SpreadSheet .. +7744 kB
merged ...................... +2508 kB  i.e. 5236 kB saved
ghostview and ghostscript.
Viewing the same document:
1st instance ................  3888 kB
further instance ............ +1284 kB
further instance merged .....  +476 kB  i.e.  808 kB saved

Viewing two different documents:

1st document ................  3536 kB
1st and 2nd document ........ +2724 kB
after merging ............... +2163 kB  i.e.  561 kB saved
SWI-Prolog (Version 2.9.9)
1st instance ................  1004 kB
further instance ............  +516 kB
further instance merged .....  +160 kB  i.e.  356 kB saved

The measurements were performed as follows: First, it was ensured that file buffers etc. are flushed by calling a process that allocates and modifies slightly more memory than physically available. Then, free was used to measure the availabe memory. (For this reason the first instance uses lots of memory. Part of it is due to file buffering).

Related info about Hash functions

(Maybe some of this material is useful to determine faster hashfunctions/checksums) Ozan Yigit, Bob Jenkins (old) Hash Functions and Block Ciphers
Back: Linux Tips, SMP, DEC hinote