Tuesday Aug 26 2008 4:15 pm by Smokinn

After reading a great article on bloom filters here, I ported his bloom filter to PHP:

EDIT: After some tests I've realized I have a bug somewhere. My false positive rate is WAY higher than the python script.. I'll repost the code once I've found and fixed the bug.

And.. we're back

class BloomFilter {

public $population;

public $vector;

public function __construct($bit_size, $num_hashes) {

$this->bit_size = $bit_size;

$this->num_hashes = $num_hashes;

$this->vector = array();

$this->population = 0;

}

public function contains($string) {

foreach($this->_hash_indices($string) as $index) {

if($this->vector[$index] != 1)

return false;

}

return true;

}

public function insert($string) {

foreach($this->_hash_indices($string) as $index) {

$this->vector[$index] = 1;

}

$this->population++;

}

private function _hash_indices($string) {

$indices = array();

for($i=1; $i < ($this->num_hashes + 1); $i++){

$indices[] = ($this->_num_hash($string) + $i * $this->_hash_string($string)) % $this->bit_size;

}

return $indices;

}

private function _num_hash($string) {

$size = strlen($string);

if ($size==0) return 0;

$chars = str_split($string);

$i = 0;

foreach($chars as $char) {

$i += ($i>>14) + ($char * 0xd2d84a61);

}

$i += ($i>>14) + ($size * 0xd2d84a61);

return $i;

}

private function _hash_string($string) {

$val = 0;

$array = str_split($string);

foreach($array as $char) {

$val = ($val << 4) + ord($char);

$tmp = $val & 0xf0000000;

if ($tmp != 0) {

$val = $val ^ ($tmp >> 24);

$val = $val ^ $tmp;

}

}

return $val;

}

}

Turns out the problem was that I was hashing with crc32b for the first hash and that was causing way too many false positives. Tim gave me the code for _num_hash (He found it on a python mailing list as a fast potential replacement for their internal hash function) and now with this hash it works well.

Comments
Post a comment
Name:
Email (optional):
URL (optional):
(Allowed tags: <a> <p> <strong> <em>)

Sorry, but due to spambots, to post I'll need you to prove you're human.

Of the six following animals, just select the two that are not fluffy

About the Site:

I might update. Don't hold your breath though.

About Me:

Name: Guillaume Theoret

Age: 801488945 seconds

Job: Mostly web dev

Some Friends:
Search:

RSS Feeds:

RSS