Configuring Sumatra for Postgres
For the last few months I’ve been using Sumatra to log the provenance data for simulations. It’s a really promising tool, and I’ve been hacking on it from time to time as I proceed with real research. Sumatra even has a web interface driven by Django and uses SQLite as the default back end database.
One of the issues with SQLite is concurrency. This issue manifests
itself with Sumatra when dozens of jobs are launched simultaneously
with each job having a similar life time. In this event most of the
jobs are not recorded and the unrecorded jobs will sign off with the
dreaded django.db.utils.DatabaseError: database is locked
. For
further discussion of this issue see the
Sumatra mailing list thread on this topic.
Quick and Dirty Solution
In the short term, I decided to use a quick and dirty solution to work around this issue using file locks. I couldn’t face learning to configure Postgres as I have little to no experience with databases. The file lock solution allowed me to proceed with my work without having to learn the ins and outs of configuring Django and Postgres.
The solution required creating a decorator class
available at Github
that encapsulated the Python function to be logged. The two Sumatra
commands that need to be protected from concurrency issues are
project.add_record
and project.save
. Here the with
statement is
used to set and release the lock:
where SMTLock
is defined as
The above class requires the lockfile module. The file locking mechanism worked well enough and might be a solution for those that wish to maintain a lightweight database solution with Sumatra. However, it does require making changes to the script or program for which the provenance data is being logged. This goes against the grain of the “don’t change the existing workflow” approach of Sumatra.
Postgres
Given that the above solution is unsatisfactory, another alternative is to use a database that properly handles concurrency. To install Postgres on Ubuntu use
and then use
to set the password. Then create a Sumatra user for Postgres using
To create a database do
Exit the Postgres shell prompts and edit
/etc/postgresql/9.1/main/pg_hba.conf
by adding
and relaunch Postgres
If Sumatra gives errors during configuration and the database has the wrong field sizes then you’ll need to repeat the process above to create a new database. You can delete the old database with
These instructions were snatched from iiilx’s blog. Now that Postgres is working we can move on to setting up Sumatra.
Configuring Sumatra
Edit the Django database configuration in
src/recordstore/django_store/__init__.py
to swap SQLite for
Postgres.
Ideally this would be the only change required, however, there is an
issue with the field sizes in Sumatra. Using Sumatra with the above
configuration will result in Postgres errors of the type
DatabaseError: value too long for type character varying(100)
. This
error is caused because the field sizes have never been checked with
anything but SQLite and SQLite has no size limits in the way that
Postgres does (see this
stackoverflow thread
for more details). Anyway, fixing the field size problem simply
requires making a number of changes like this
in src/recordstore/django_store/models.py
. There are about 10 of
these altogether (see the
full changeset
for a complete list). The above instructions are valid as of commit
d65bb4fa1f83.
Testing Sumatra for Concurrency
The following is a test to see that it works. Set up a trivial script.py
,
1
2
3
4
5
6
7
8
9
10
import time
import sys
param_file = sys.argv[1]
f = open(param_file, 'r')
exec f.read()
print 'waiting for ' + str(wait) + '(s)'
time.sleep(wait)
print 'finished'
and a parameter file (default.param
) with
1
wait=3
Set up a Git repository.
Set up a Sumatra repository.
Check the repository with smtweb
. There should be one record. Now to
test the concurrency use
Check the repository. 101 records. No concurrency issues!
What next?
In the near future I hope to submit a patch for this and include some kind of command line configuration for Sumatra to allow easy set up. Something like this
comments powered by Disqus