Flower

This is an implementation of Bloom Filters for Elixir. It uses NIFs written in Rust for better performance, since Bloom Filters rely highly on mutability.

What are Bloom Filter?

TL;DR: Huge amount of data ➜ small Bloom Filter : Was X not in Huge amount of data?

For more about this topic consider, checking out:

Installation

The package can be installed by adding flower to your list of dependencies in mix.exs:

def deps do
  [
    {:flower, "~> 0.1.4"},
  ]
end

Also, you need to have Rust installed for development, since this uses NIFs.

Docs can be found at https://hexdocs.pm/flower.

Documentation can be generated with ExDoc.

Usage

There is also a small walk through in the Example Section

alias Flower.Bloom, as: Bloom

Create new Bloom Filter

expected_number_of_unique_elements = 1000
# With a size atom
iex> filter = Bloom.new(:"8 KB", expected_number_of_unique_elements)
...
# Or by a maximum byte size:
iex> filter = Bloom.new_by_byte_size(8 * 1024, expected_number_of_unique_elements)
...
# Or by a bit address length:
iex> filter = Bloom.new(16, expected_number_of_unique_elements)
...

Bit Address Length	Size Atom	Bit Address Length	Size Atom	Bit Address Length	Size Atom
		13	1 KB	23	1 MB
		14	2 KB	24	2 MB
		15	4 KB	25	4 MB
6	8 Byte	16	8 KB	26	8 MB
7	16 Byte	17	16 KB	27	16 MB
8	32 Byte	18	32 KB	28	32 MB
9	64 Byte	19	64 KB	29	64 MB
10	128 Byte	20	128 KB	30	128 MB
11	256 Byte	21	256 KB	31	256 MB
12	512 Byte	22	512 KB	32	512 MB

Insert Elements

iex> Bloom.insert(filter, 42)
:ok
iex> Bloom.insert(filter, [1,2,{:atom, <<1,2,3,4>>}])
:ok
iex> Bloom.insert(filter, "Hello!")
:ok

Check for Elements

iex> Bloom.has_not?(filter, "Hello!")
false
iex> Bloom.has?(filter, [1,2,{:atom, <<1,2,3,4>>}])
true
iex> Bloom.has?(filter, :atom)
false
# `has?` is always the opposite of `has_not?`.
iex> Bloom.has?(filter, 42) != Bloom.has_not?(filter, 42)
true

Was actually inserted?	has?	has_not?
yes	yes	no
no	most of the times: no	most of the times: yes

Write To Disk

filename = "prime_filter.bloom"
file = File.stream!(filename, [:delayed_write, :binary], 8096)

(Bloom.stream(filter) |> Stream.into(file) |> Stream.run)

Read From Disk

filename = "prime_filter.bloom"
file = File.stream!(filename, [:read_ahead, :binary], 8096)

{:ok, new_filter} = Bloom.from_stream(file)

Funky Stuff

iex> Bloom.estimate_count(filter)
3
iex> Bloom.false_positive_probability(filter)
3.2348227494719115e-28
# This number is only that small, because we have just 3 elements in 8 KBs.

Example

Prime Numbers

Checking if a number is not a prime and below 100_000.

# helper module
defmodule Step do
    def throught(from, to, step, func) when from < to do
        func.(from)
        throught(from+step, to, step, func)
    end
    def throught(_,_,_,_), do: :ok
end && :ok

# alias so we need to write less
alias Flower.Bloom, as: Bloom


# Select appropriate size.
# We will put about 100_000 elements in.
# 64 KB = 512 KBit
# 512 KBit / 100_000 ~= 5.25 Bits per element.
# Meeh. Let's see...
non_primes = Bloom.new(:"64 KB", 100_000)
# We can also write
non_primes = Bloom.new(19, 100_000)
# or (it will choose the next smaller size)
non_primes = Bloom.new_by_byte_size(64 * 1024, 100_000)

# Let's put some non primes in:
Bloom.insert(non_primes, 42)
Bloom.insert(non_primes, "one hundred")
Bloom.insert(non_primes, [:ok, {Anything, 1.0e9}])

# Let's double check:
Bloom.has?(non_primes, 42)
Bloom.has?(non_primes, "one hundred")
Bloom.has?(non_primes, 7)
Bloom.has?(non_primes, [:ok, {Anything, 1.0e9}])
Bloom.has?(non_primes, 52)
# Works!


# Now we can get a estimate of
# the number of items we
# put in.
Bloom.estimate_count(non_primes)


# Apply Sieve of Eratosthenes.
# This may take a few seconds.
# At least we can do it with
# constant memory.
for x <- 2..50_000 do
  # Skip multiples of previous numbers.
  # This is actually not safe to do
  # with a bloom filter since they are
  # approximations. But let's don't care for now.
  if(Bloom.has_not?(non_primes, x)) do
    Step.throught(x*2, 100_000, x, fn non_prime ->
        # Much of the time is used to calculate
        # hashes.
        Bloom.insert(non_primes, non_prime)
    end)
 end
end && :ok

non_primes |> Bloom.has_not?(12) # not a prime
non_primes |> Bloom.has_not?(11) # prime
non_primes |> Bloom.has_not?(6719) # no prime
non_primes |> Bloom.has_not?(4245) # prime
non_primes |> Bloom.has_not?(9973) # no prime
non_primes |> Bloom.has_not?(3549) # prime
non_primes |> Bloom.has_not?(89591) # prime
non_primes |> Bloom.has_not?(84949) # no prime

# Let's write it to disk:
filename = "prime_filter.bloom"
file = File.stream!(filename, [:delayed_write, :read_ahead, :binary], 8096)
(Bloom.stream(non_primes)
|> Stream.into(file)
|> Stream.run)
# Let's inspect the size of the filter:
File.lstat!(filename).size

# We can also read from disk:
{:ok, new_non_primes} = Bloom.from_stream(file)
new_non_primes |> Bloom.has_not?(12) # not a prime
new_non_primes |> Bloom.has_not?(11) # prime
# Works!

# Maybe a bit high, 64 KB for 100_000 Numbers
# is not that much.
Bloom.false_positive_probability(non_primes)

# Let's reestimate .
# There are 9_592 primes below 100_000, so this
# should yield about 100_000 - 9_592 = 90_408.
Bloom.estimate_count(non_primes)

Side note: If you want to check primes, ~~google~~ search for Miller–Rabin primality test.

Name		Name	Last commit message	Last commit date
Latest commit History 41 Commits
config		config
lib		lib
native/bitarray		native/bitarray
test		test
.gitignore		.gitignore
.travis.yml		.travis.yml
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
README.md		README.md
mix.exs		mix.exs
mix.lock		mix.lock

License

CodeSteak/Flower

Folders and files

Latest commit

History

Repository files navigation

Flower

What are Bloom Filter?

Installation

Usage

Create new Bloom Filter

Insert Elements

Check for Elements

Write To Disk

Read From Disk

Funky Stuff

Example

Prime Numbers

About

Topics

Resources

License

Stars

Watchers

Forks

Languages