Dictionary¶
Installation¶
pip install litedict
Use cases¶
You can use this to implement a persistent dictionary. It also uses some SQLite functions to enable getting keys using pattern matching (see examples). Values are JSON encoded before being saved, the underlying database uses a TEXT
column for the values. The encoder/decoder can be overridden to use the pickle module and convert the objects to bytes and then to and hex string. The reason for this is that these SQLite structures are not meant to be used from wherever you want, not just Python. By having the values as JSON strings, it's easier to interact with the database from different applications using different programming languages.
Examples¶
Initialize dictionary and set up 2 key names.
from litedict import SQLDict
TEST_1 = "key_test_1"
TEST_2 = "key_test_2"
The dictionary object inherits from collections.abc.MutableMapping, so you can use it as you would use a normal Python dictionary.
d = SQLDict(":memory:")
d[TEST_1] = "asdfoobar"
assert d[TEST_1] == "asdfoobar"
del d[TEST_1]
assert d.get(TEST_1, None) is None
Values are JSON encoded before being saved, so you can store numbers normally and operate as if they were numbers. You don't have to worry about parsing the string
as an int
, the values are encoded/decoded internally.
Pattern matching¶
By using SQLite's GLOB operator, we can select a set of keys by pattern.
d[TEST_1] = "asdfoobar"
d[TEST_2] = "foobarasd"
d["key_testx_3"] = "barasdfoo"
assert d.glob("key_test*") == ["asdfoobar", "foobarasd", "barasdfoo"]
assert d.glob("key_test_?") == ["asdfoobar", "foobarasd"]
assert d.glob("key_tes[tx]*") == ["asdfoobar", "foobarasd", "barasdfoo"]
Transactions¶
We can use this to wrap multiple key operations inside a transaction and make sure all of them execute atomically.
with d.transaction():
d["asd"] = "efg"
d["foo"] = "bar"
assert d.conn.in_transaction
try:
with d.transaction():
d["failed"] = "no"
assert d.conn.in_transaction
raise Exception
# the transaction will now rollback
# and undo the operation
except:
# check the transaction succesfully rolled back
assert d.get("failed") is None
Use a custom encoder/decoder¶
You can pass both functions during the initialization. Make sure they return a string.
import pickle
d = SQLDict(
":memory:",
encoder=lambda x: pickle.dumps(x).hex(),
decoder=lambda x: pickle.loads(bytes.fromhex(x)),
)
Benchmarks¶
We will have a look at some benchmarks. First we will import some libraries and create a utility function to generate random strings.
from string import ascii_lowercase, printable
from random import choice
import random
import gc
import pickle
import json
def random_string(string_length=10, fuzz=False, space=False):
"""Generate a random string of fixed length """
letters = ascii_lowercase
letters = letters + " " if space else letters
if fuzz:
letters = printable
return "".join(choice(letters) for i in range(string_length))
Pickle
Encoding values with pickle.dumps()
and converting the bytes output to and hexadecimal string.
d = SQLDict(
":memory:",
encoder=lambda x: pickle.dumps(x).hex(),
decoder=lambda x: pickle.loads(bytes.fromhex(x)),
)
gc.collect()
# %%timeit -n20000 -r10
d[random_string(8)] = random_string(50)
d.get(random_string(8), None)
# 69.2 µs ± 4.84 µs per loop (mean ± std. dev. of 10 runs, 20000 loops each)
Pickle custom Python object
d = SQLDict(
":memory:",
encoder=lambda x: pickle.dumps(x).hex(),
decoder=lambda x: pickle.loads(bytes.fromhex(x)),
)
gc.collect()
class C:
def __init__(self, x):
self.x = x
def pp(self):
return x
def f(self):
def _f(y):
return y * self.x ** 2
return _f
# %%timeit -n20000 -r10
d[random_string(8)] = C(random.randint(1, 200))
d.get(random_string(8), None)
# 41.1 µs ± 2.75 µs per loop (mean ± std. dev. of 10 runs, 20000 loops each)
Noop
Do not do any encoding/encoding. This requires all values to be strings before being saved.
d = SQLDict(
":memory:",
encoder=lambda x: x,
decoder=lambda x: x,
)
gc.collect()
# %%timeit -n20000 -r10
d[random_string(8)] = random_string(50)
d.get(random_string(8), None)
# 66.8 µs ± 2.41 µs per loop (mean ± std. dev. of 10 runs, 20000 loops each)
JSON
This is the default encoder.
d = SQLDict(
":memory:",
encoder=lambda x: json.dumps(x),
decoder=lambda x: json.loads(x),
)
gc.collect()
# %%timeit -n20000 -r10
d[random_string(8)] = random_string(50)
d.get(random_string(8), None)
# 68.6 µs ± 3.07 µs per loop (mean ± std. dev. of 10 runs, 20000 loops each)
Dictionary
Using a standard Python dictionary.
d = {}
gc.collect()
# %%timeit -n20000 -r10
d[random_string(8)] = random_string(50)
d.get(random_string(8), None)
# 53.1 µs ± 4.42 µs per loop (mean ± std. dev. of 10 runs, 20000 loops each)
There's a ~33% difference between the standard Python dictionary and the SQLite + JSON encoding one.
Alternatives¶
- RaRe-Technologies/sqlitedict: This library uses a separate writing thread. Modern versions of SQLite are thread safe by default (serialized), so a separate writing thread is not strictly needed. It can be helpful to avoid DB locks, but it also adds extra complexity. That implementation is also missing some performance optimizations that are present in this library.