Jump to content

User:Guanaco96/sandbox

From Wikipedia, the free encyclopedia

Definitions

[edit]

Consider a family of functions . We say that is an LSH family[1][2][3] for

  • a metric space ,
  • a threshold ,
  • an approximation factor ,
  • and probabilities .

if it satisfies the following condition. For any two points and a hash function chosen uniformly at random from :

  • If , then (i.e., p and q collide) with probability at least ,
  • If , then with probability at most .

Such a family is called -sensitive.

Alternatively[4] it is defined with respect to a universe of items U that have a similarity function . An LSH scheme is a family of hash functions H coupled with a probability distribution D over the functions such that a function chosen according to D satisfies the property that for any .

Locality-preserving hashing

[edit]

A locality-preserving hash is a hash function f that maps points in a metric space to a scalar value such that

for any three points .[citation needed]

In other words, these are hash functions where the relative distance between the input values is preserved in the relative distance between the output hash values; input values that are closer to each other will produce output hash values that are closer to each other.

This is in contrast to cryptographic hash functions and checksums, which are designed to have random output difference between adjacent inputs.

Locality preserving hashes are related to space-filling curves.[how?]

  1. ^ Cite error: The named reference MOMD was invoked but never defined (see the help page).
  2. ^ Gionis, A.; Indyk, P.; Motwani, R. (1999). "Similarity Search in High Dimensions via Hashing". Proceedings of the 25th Very Large Database (VLDB) Conference.
  3. ^ Indyk, Piotr.; Motwani, Rajeev. (1998). "Approximate Nearest Neighbors: Towards Removing the Curse of Dimensionality.". Proceedings of 30th Symposium on Theory of Computing.
  4. ^ Charikar, Moses S. (2002). "Similarity Estimation Techniques from Rounding Algorithms". Proceedings of the 34th Annual ACM Symposium on Theory of Computing. pp. 380–388. CiteSeerX 10.1.1.147.4064. doi:10.1145/509907.509965.