Using floating point numbers as hash keys (2017)

readafterwrite.wordpress.com

16 points by jstrieb 7 days ago

baobun 21 hours ago

> it’s hard or impossible to define hash functions that map approximately equal keys to identical hash values. Why? Let’s take a look at what happens with the common epsilon comparison, for example: two numbers a, b are considered equal if |a – b| < ε. With this definition of equality all numbers between -ε/2; and ε/2 are considered equal and therefore must have the same hash value h. But the numbers between 0 and ε are also equal to each other, so their hash value must be h as well.

> If we continue in this manner we see that all all numbers must have the same hash value h regardless of the choice of ε. The idea of approximate comparison is unfortunately hard to reconcile with the non-approximate nature of hash functions.

I think there is at least one non-sequitur in there? The only thing you prove is that directly modeling hash keys on top of that notion of equality is not useful / the model is weak. I don't think equality is typically associative with epsilon comparison.

(The license of this comment forbids it from being used as support in favor of floats as hash keys)

Terr_ a day ago

This is calling up dim old memories of Java (which I haven't used in years) and boxing primitive float values into Float objects which provide their own implementation of equals() and hashCode().

AFAICT the pre-hash values come from this method [0] which returns an integer which is a 1:1 representation of the float bytes except that NaNs are all collapsed to one canonical form. So at least at this phase, it doesn't quantize any virtually-equal floats together.

Skimming an implementation of HashMap [1], I didn't notice any obvious "do something special for Floats" code.

[0] https://docs.oracle.com/en/java/javase/17/docs/api/java.base...

[1] https://github.com/openjdk/jdk/blob/master/src/java.base/sha...

anematode a day ago

JavaScript has a whole definition of equality for this particular case called SameValueZero: https://tc39.es/ecma262/multipage/abstract-operations.html#s... Everything is compared bitwise equal (so to speak), except 0 == -0.

    > new Map().set(-0, -0).set(0, 0).set(NaN, NaN)
    Map { 0 → 0, NaN → NaN }

I don't really understand the reason for this instead of Object.is equality (aka SameValue), which would distinguish -0 and 0.

advael a day ago

Dang, the author never got to C++, and I was curious. Guess I'll have to dig into that one myself