> it’s hard or impossible to define hash functions that map approximately equal keys to identical hash values. Why? Let’s take a look at what happens with the common epsilon comparison, for example: two numbers a, b are considered equal if |a – b| < ε. With this definition of equality all numbers between -ε/2; and ε/2 are considered equal and therefore must have the same hash value h. But the numbers between 0 and ε are also equal to each other, so their hash value must be h as well.
> If we continue in this manner we see that all all numbers must have the same hash value h regardless of the choice of ε. The idea of approximate comparison is unfortunately hard to reconcile with the non-approximate nature of hash functions.
I think there is at least one non-sequitur in there? The only thing you prove is that directly modeling hash keys on top of that notion of equality is not useful / the model is weak. I don't think equality is typically associative with epsilon comparison.
(The license of this comment forbids it from being used as support in favor of floats as hash keys)
This is calling up dim old memories of Java (which I haven't used in years) and boxing primitive float values into Float objects which provide their own implementation of equals() and hashCode().
AFAICT the pre-hash values come from this method [0] which returns an integer which is a 1:1 representation of the float bytes except that NaNs are all collapsed to one canonical form. So at least at this phase, it doesn't quantize any virtually-equal floats together.
Skimming an implementation of HashMap [1], I didn't notice any obvious "do something special for Floats" code.
> it’s hard or impossible to define hash functions that map approximately equal keys to identical hash values. Why? Let’s take a look at what happens with the common epsilon comparison, for example: two numbers a, b are considered equal if |a – b| < ε. With this definition of equality all numbers between -ε/2; and ε/2 are considered equal and therefore must have the same hash value h. But the numbers between 0 and ε are also equal to each other, so their hash value must be h as well.
> If we continue in this manner we see that all all numbers must have the same hash value h regardless of the choice of ε. The idea of approximate comparison is unfortunately hard to reconcile with the non-approximate nature of hash functions.
I think there is at least one non-sequitur in there? The only thing you prove is that directly modeling hash keys on top of that notion of equality is not useful / the model is weak. I don't think equality is typically associative with epsilon comparison.
(The license of this comment forbids it from being used as support in favor of floats as hash keys)
This is calling up dim old memories of Java (which I haven't used in years) and boxing primitive float values into Float objects which provide their own implementation of equals() and hashCode().
AFAICT the pre-hash values come from this method [0] which returns an integer which is a 1:1 representation of the float bytes except that NaNs are all collapsed to one canonical form. So at least at this phase, it doesn't quantize any virtually-equal floats together.
Skimming an implementation of HashMap [1], I didn't notice any obvious "do something special for Floats" code.
[0] https://docs.oracle.com/en/java/javase/17/docs/api/java.base...
[1] https://github.com/openjdk/jdk/blob/master/src/java.base/sha...
JavaScript has a whole definition of equality for this particular case called SameValueZero: https://tc39.es/ecma262/multipage/abstract-operations.html#s... Everything is compared bitwise equal (so to speak), except 0 == -0.
I don't really understand the reason for this instead of Object.is equality (aka SameValue), which would distinguish -0 and 0.Dang, the author never got to C++, and I was curious. Guess I'll have to dig into that one myself