Consistent Hashing and Random Trees: Tools for Relieving Hot Spots on the World Wide Web.

Authors:
  • David Karger Laboratory for Computer Science, MIT, Cambridge, MA
  • Eric Lehman Laboratory for Computer Science, MIT, Cambridge, MA
  • Tom Leighton Laboratory for Computer Science, MIT, Cambridge, MA and Department of Mathematics, MIT, Cambridge, MA
  • Rina Panigrahy Laboratory for Computer Science, MIT, Cambridge, MA
  • Matthew Levine Laboratory for Computer Science, MIT, Cambridge, MA
  • Daniel Lewin Laboratory for Computer Science, MIT, Cambridge, MA

Complete Citation

Karger, D., Lehman, E., Leighton, T., Panigrahy, R., Levine, M., and Lewin, D. 1997. Consistent hashing and random trees: distributed caching protocols for relieving hot spots on the World Wide Web. In Proceedings of the Twenty-Ninth Annual ACM Symposium on theory of Computing (El Paso, Texas, United States, May 04 - 06, 1997). STOC '97. ACM, New York, NY, 654-663. DOI= http://doi.acm.org/10.1145/258533.258660

Abstract

Hash tables are extensively used in networking to implement data-structures that associate a set of keys to a set of values, as they provide O(1), query, insert and delete operations. However, at moderate or high loads collisions are quite frequent which not only increases the access time, but also induces nondeterminism in the performance. Due to this non-determinism, the performance of these hash tables degrades sharply in the multi-threaded network processor based environments, where a collection of threads perform the hashing operations in a loosely synchronized manner. In such systems, it is critical to keep the hash operations more deterministic.

A recent series of papers have been proposed, which employs a compact on-chip memory to enable deterministic and fast hash queries. While effective, these schemes require substantial onchip memory, roughly 10-bits for every entry in the hash table. This limits their general usability; specifically in the network processor context, where on-chip resources are scarce. In this paper, we propose a novel hash table construction called Peacock hash, which reduces the on-chip memory by more than 10-folds while keeping a high degree of determinism in performance. This significantly reduced on-chip memory not only makes Peacock hashing much more appealing for the general use but also makes it an attractive choice for the implementation of a hash hardware accelerator on a network processor.

Annotations

This paper is interesting to me because of the consistent hashing that is introduced. Basically, the problem with distributed hash tables is that the number of hash tables is not likely to be consistent. A table may be added or removed at any time. In tradition hashing functions, this meant that many (if not most) objects would be remapped if the number of tables changed. For any large scale system this is obviously a major problem.

Thus, this paper introduces the idea of a consistent hash function. The consistent hash function will map an object to the same server (as long as it exists) regardless of the number of hash tables currently in the set. It achieves this by basically adding a second hash function that hashes the hash tables them selves into the number range. The items are placed in each hash table that is the nearest. When an new hash table is added only the immediately adjacent tables need to be checked for any migrating items, for a O(log n / K) number of items. Same goes for when a hash table is removed from the set.

-- DavidSalyers - 04 Jun 2008

Topic revision: r1 - 04 Jun 2008 - 18:40:05 - DavidSalyers
 
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback