Next: The Usenet News System Up: Terminology and Basic Technologies Previous: Terminology

Caching

Caching is a well established technique to reduce the retrieval time and network bandwidth for repeatedly requested data. After the first request of the data it will be stored (cached ) on a medium that allows quick provision of the data to the requester (at least faster than the time required for the satisfaction of the original request). Repeated requests to the same data will be satisfied by the caching medium . The caching medium is dependent on the application domain [Tan96].

In the network area caching does not only reduce the transmission time, it also reduces the required network bandwidth for repeatedly requested data. Assume that some users are located in the same area and that they require the same object stored in some distant location. These users may either request it directly from the distant location or use a cache server located ``closer'' to them.

The first time the data is requested from the cache server, the cache server retrieves the object from the distant location and stores it. Successive requests will be handled from the cache server's copy. To assure that the cache has an up to date copy of the document control messages have to be exchanged between the cache server and the original object's server. These messages are usually small compared to the size of the original data.

If several cache servers are available, they may work together to achieve even higher benefits. If a cache server does not store the requested data, it may ask for the data in other related caches. Using related caches in a hierarchical structure is also known as cascading . Another technique called neighboring means to combine several caches into one big virtual cache. If some data are requested, each cache server can ask all other caches for the requested data.

Cascading: The caches build up a hierarchy (e.g., a tree) and if one cache server does not have the requested data, it asks its parent server. However, the height has to be balanced. A high hierarchy usually brings very few benefits, since it takes too much time to go through all levels possibly up to the root server.
Neighboring: This strategy combines several caches into one big virtual cache. Each cache may access the data stored in the other caches. This further improves the access time, if the requested data are available in one of the other caches and the link to these caches is faster than the link to the server holding the original data.

Both techniques described here are already used in the ``World Wide Web'' (WWW) for the ``Hyper Text Transfer Protocol'' (HTTP) and the ``File Transfer Protocol'' (FTP). Cascading has been introduced by the CERN WWW server ([Nie96] and [W3C95]), which was the first cache server (and even the first WWW server) available for the WWW. Neighboring has been introduced by the Harvest Object Cache [BDH+94], which has evolved into the Squid software ([Pea97] and [Wes97]).

Next: The Usenet News System Up: Terminology and Basic Technologies Previous: Terminology

gschwind@infosys.tuwien.ac.at