True Inversion of a Hash in Ruby
by Tilo Sloboda, Nov 2004

The Ruby Hash.invert method should come with the following warning label:

 

What do you expect if you want to compute an inverted Hash?

If you have a Math background, you would expect that performing an “invert” operation twice would result in the original hash.

Let's see what Ruby's built-in Hash#invert method does to a hash:

        # given a hash which contains the words for numbers 1..3 in different languages: (English,German,Japanese)
        #
        h = {"eins"=>1, "drei"=>3, "uno"=>1, "one"=>1, "two"=>2, "san"=>3, "ichi"=>1, "three"=>3, "four"=>4}
        h.invert
         => {1=>"one", 2=>"two", 3=>"drei", 4=>"four"}  # oops, data is lost!

        #   the above result IS SIMPLY WRONG!   Why?

        h.invert.invert
	 => {"two"=>2, "one"=>1, "drei"=>3, "four"=>4}  # and you can not go back..

	h.invert.invert == h        # if you have a math background, your stomach will ache locking at this ;-)
	 => false

Not only is the above result very questionable, but you actually lose data which was stored in the original hash and you can not revert. I would say that Ruby's built-in Hash.invert method is simply broken! For an explanaition on why, see below

 

Correctly working Hash.inverse Implementation

If you use the new implemnentation of Hash.invert , you can either access it with Hash.inverse , or you can overload the old method with the new one and still access the old method with Hash.old_invert (see below)

        # givena hash which contains the wordsfor numbers 1..3 in different languages: (English,German,Japanese)
        #
        > h = {"eins"=>1, "drei"=>3, "uno"=>1, "one"=>1, "two"=>2, "san"=>3, "ichi"=>1, "three"=>3, "four"=>4}
	 => {"uno"=>1, "three"=>3, "two"=>2, "eins"=>1, "ichi"=>1, "san"=>3, "one"=>1, "drei"=>3, "four"=>4}

	> h.inverse
	 => {1=>["one", "ichi", "eins", "uno"], 2=>"two", 3=>["drei", "san", "three"], 4=>"four"}  # preserves data!

	> h.inverse.inverse
	 => {"uno"=>1, "three"=>3, "two"=>2, "eins"=>1, "san"=>3, "ichi"=>1, "one"=>1, "drei"=>3, "four"=>4}  # you can revert the operation

	> h.inverse.inverse == h   # this is always true
	 => true
 
Isn't that a much more pleasing result?

 

Overloading Hash.invert

In case you want to overload the old method, and replace it completely, you may want to do this:
	class Hash
	    alias old_invert invert

	    def invert
	       self.inverse
	    end
	end

 

If you always want to overload the Hash.invert method , you can modify the file invert_hash.rb , by removing the line which contains __END__

 

Download

Hash.inverse is also available through the Ruby Facets library.

... and is referenced in the Ruby Cookbook

 

License

Freely available under the terms of the OpenSource "Artistic License" in combination with the Addendum A (below)
In case you did not get a copy of the license along with the software, it is also available at:   http://www.unixgods.org/~tilo/artistic-license.html

 

 

Note: There is one corner case

If you start out with a Hash of Arrays, and want to inverse it twice, you end up with a similar Hash, including all the original values, but out of order... e.g.:
        h = {:key1 => [:a, :b, :c], :key2 => [:d, :e, :f]}

        h.inverse
         => {:a=>:key1, :b=>:key1, :c=>:key1, :d=>:key2, :e=>:key2, :f=>:key2} 

        h.inverse.inverse
         => { :key2 => [:d, :e, :f], :key1 => [:a, :c, :b]}  # preserves data, but h.inverse.inverse != h  in this case because order in the arrays is not preserved
The reason for this is that a regular Hash does not preserve order.

To fix this, you will need an OrderedHash:
        require'active_support'   # we need an OrderedHash

        class Hash

          def inverse
            i = ActiveSupport::OrderedHash.new
            self.each_pair{ |k,v|
              if (v.class == Array)
                v.each{ |x|
                  i[x] = i.has_key?(x) ? [i[x],k].flatten : k
                }
              else
                i[v] = i.has_key?(v) ? [i[v],k].flatten : k
              end
            }
            return i
          end

        end

Note

The corner-case is mentioned in this blog-post , but the author accidentially wrote: "If your original hash used arrays as hash keys" instead of: "If your original hash used arrays as hash values". It doesn't make much sense to use arrays as hash-keys ;-) Using an OrderedHash fixes the problem of arrays as hash-values.

 

 

Why is Ruby's Hash.invert broken?

I beleive that it's broken because of an inaccurate design-assumption.

Simple explanation:

Typically you want to use a Hash when you try to keep track of some data, and store some values associated with each item's key. In the real world multiple keys can map to the same value. The Ruby Hash class does not assume this, hence it can't cope with it.

More Lengthy Explaination:

The Ruby Hash class is a mis-nomer at best.

If you studied algorithms in computer science, then you learned that a hash is a data structure which has a mapping function to compute a key for each piece of data you want to place in the hash, e.g. f(value) = key . The key-concept of a hash is that the key is computed from the data/value. And often (for a hashes without collision resolution) the algorithm designers assume that the key is unique for each piece of data, and that no two pieces of data generate the same key:

    (A1)  Foreach  f(value1) = key1 , f(value2) = key2 :  value1 != value2 <==> key1 != key2

    That means in plain English: each value has only one key, and each key has only one value

Now here's what's wrong with Ruby's implementation of class Hash, and why class Hash in Ruby is not the same as a hash datastructure in CS at all! Ruby's class Hash is actually a mis-nomer.. it should rather be called Dictionary, lacking a better word. And that's how users use it - like a look-up dictionary for arbitrary key/value pairs, for which (in general) multiple different keys can lookup the same data.

Now Ruby's Hash.invert method was probably based on assumption A1 , which is not necessarily true for the data we may want to put into the hash.. that's why Hash.invert is not working properly..