Tuesday, April 28, 2009

CaseInsensitiveMap<V>

I recently wrote some utility code which we'll be using to process bundle manifests, typically to map header names to typed values. It was surprising that this code was neither available elsewhere nor completely trivial to implement.

So what was it? A map with String keys which is insensitive to the case of the keys and which preserves the case of the keys. Apache Commons has a CaseInsensitiveMap which whacks the keys to lower case on input. I was tempted to use that code, but in our environment, it's important to preserve the case of keys.

For example, the bundlor tool will eventually use this code to create bundle manifests from templates. If the user carefully spells headers in the template with mixed case, e.g. "Export-Package", they would probably be disappointed to see the headers converted to lower case in the generated manifest.

Implementing an (Apache licensed) case-preserving CaseInsensitiveMap was an interesting exercise. I saved some time by extending AbstractMap and AbstractSet (for the key and entry sets). I used an instance of ConcurrentHashMap to maintain the state. I exposed one internal type to be package visible so I could get 100% unit test coverage and made some inner classes static to stop findbugs complaining.

Why blog about this? Simply to increase your chances of finding the class if you need it.

8 comments:

rocket-gwt said...

What about a TreeMap with a The case insensitive Comparator that lives on String.

Glyn said...

Thanks for bringing this up! Given the TreeMap implementation is about 6 lines of code, I was really tempted to switch. If I hadn't already written and tested the current code, I might have gone down that route.

However, re-running my unit tests against the 6 line implementation showed up several weaknesses.

The entry set test method threw a ConcurrentModificationException, although it was not doing anything multi-threaded or particularly complex. I think this is a usability issue for my users.

put with a null key or a null value did not throw a NPE, so this would need overriding. There were some other NPEs that would need fixing up too.

addAll on the entry set no longer threw UnsupportedOperationException, but Map.entrySet says that the entry set does not support addAll. Similarly for the key set.

Finally, there are some disadvantages to sorting.

Firstly, sorting is not required, so TreeMap appearing in the inheritance tree of the 6 line implementation is a bit of a pain.

Secondly, sorted tree algorithms cost O(log n) for put/get compared to O(1) for hash tables (assuming the collision lists don't grow too long).

Thirdly, the javadoc for TreeMap says that the ordering must be consistent with equals, meaning that the comparator should return 0 if and only if equals returns true, if the Map interface is to be correctly implemented. This would not be the case for case-insensitive comparison.

rocket-gwt said...

I'm really shocked the Set views for key/ entries don't throw UOE for any of the add operations. It just doesn't make sense to add a key, after all where's the value.

While TreeSets are slightly slower than HashMaps this figure is not a magnitude number, but from my experience a few percentage points. After all TreeMsps etc use red/black trees which are slightly slower than a map with buckets etc. Sorting comes for free it's no big deal...

If the String case insensitive Comparator is trivial to replace so that it handles or denies null keys and it's equals is consistent...

The most mystifying problem you describe is the ConcurrentModEx I thought the views were live so one would imagine these sorts of things wouldn't happen...

Oh well I guess the important thing is your solution works :)...

Glyn said...

Don't be too shocked about the lack of UOE for addAll operations. Although it's not correct, it's benign!

The problem is that TreeMap uses an AbstractSet to implement its key and entry sets but does not override addAll.

AbstractSet.addAll (inherited from AbstractCollection) only throws UOE if it calls add and add throws UOE. So if you call addAll with a null or empty collection, you don't get a UOE. But, to be fair, it hasn't updated the set, so it's fairly harmless.

In my opinion, this behaviour does not implement Map.keySet or Map.entrySet which both state that the returned set "does not support the add or addAll operations".

The solution in my implementation is to override addAll and throw UOE regardless of the input parameter.

I don't mind being called a pedant on this issue. I was much more concerned by the unexpected CME which could present a usability issue for callers of my code.

Anonymous said...

FYI Glyn, the link in the blog to the source of your implementation no longer works.

Glyn said...

Thanks Ryan. Fixed the bad links.

Unknown said...

Thirdly, the javadoc for TreeMap says that the ordering must be consistent with equals, meaning that the comparator should return 0 if and only if equals returns true, if the Map interface is to be correctly implemented. This would not be the case for case-insensitive comparison.

Well, no Map can be both case-insensitive and consistent with String.equals… It is thus not possible to implement a case-insensitive Map.

Glyn said...

@Didier: hmmm, I'm not sureI follow. I don't see any mention of consistency in the Map javadoc. If a case-insensitive Map never contains two keys which differ only in case, I don't see how such a map can be inconsistent.

Projects

OSGi (130) Virgo (59) Eclipse (10) Equinox (9) dm Server (8) Felix (4) WebSphere (3) Aries (2) GlassFish (2) JBoss (1) Newton (1) WebLogic (1)