jeudi 18 septembre 2008

The method hashCode() and the method equals(Object obj)

The Object class implement two methods : hashCode() and equals(Object obj).

Most of the time you don't need to override them, except when you intend to use those objects with the collection API.

First let's refresh our memory about equals and hashcode and what they are for.

Most of the time developpers know what equals is for. It defines when two objects should be regarded as equal despite the fact they don't share the same memory location.

But when it comes to explain what is hashCode for, strangely there is much fewer able to explain.

Let's have a look to the javadoc :

hashCode returns a hash code value for the object. This method is supported for the benefit of hashtables such as those provided by HashTable.

Indeed the main purpose for hashCode is for the collection API, as in the collection API many datastructure rely on the hashTable pattern : HashSet, LinkedHashSet, HashMap, HashTable for example.

How a HashTable work ?

A HashTable is an array of Vector (also called Bucket) with an initial capacity c, when you add an object to the hashTable, first you evaluate the bucketPosition = (hashCode modulo c) and you add the object to the bucket living in the bucketPosition index.

Then inside the bucket you use the equals method to check if the object don't already exist.

let's see what really happen when you add an object in a hashTable :

//this could be the body of a add method
bucketPosition = object.hashCode() % c;
Bucket bucket = hashTable[buketPosition];
//check the object is not already in the bucket
boolean prensentInTheBucket = false;
Iterator it = bucket.iterator();
while (it.hasNext()){
Object objToCompare = it.next();
if (objToCompare.equals(object)){
prensentInTheBucket = true;
break;
}
}
//we did not found it, good !!!
if(!prensentInTheBucket){
bucket.add(object);
}



Checking if the hashTable contains the object also rely on hashCode


//this could be the body of a contains method
boolean containObject = false
bucketPosition = object.hashCode() % c;
Bucket bucket = hashTable[buketPosition];
Iterator it = bucket.iterator();
while (it.hasNext()){
Object objToCompare = it.next();
if (objToCompare.equals(object)){
containObject = true;
break;
}
}


What's the very big advantage about storing the object this way ?

It's the efficienty of course, when you add or search the object in the collection you're not going to search the whole collection to find if the object exist. You limit your search to a single bucket. Thus you just have to process a limited number of comparison.

But what's needed to make the add/search action consistent.

1) First if two objects are equal they should return the same hashCode, otherwise you're going two put two objects regarded as equals in two different bucket (sad).

By the way, there is no reciprocity two object not equals may return the same hashCode

2) You're hashCode should be stable, I mean here that if a slight change on the object generate a different hashcode you may never be able to retreive your object. HashCode should be stable.

3) Of course three logical extra rules apply as well Symmetry, Transitivity, and Reflexivity but they are out of the scope of our study.

Aucun commentaire: