FiPy, Trilinos and Anaconda
There was a recent question on the FiPy mailing list regarding running FiPy in parallel.
My desired use case uses a nonuniform 3D grid (from fipy.meshes.nonUniformGrid3D import NonUniformGrid3D). Running this in parallel, it takes about the same amount of time or longer. If I switch the same case to a uniform 3D grid (Grid3D), parallel execution is faster than serial, as I expected (although not exactly optimal – see attached plot).
I decided to look into this as I haven’t been using FiPy in parallel
for awhile and haven’t tested it for efficiency in even longer. I used
a simple test case, called
kris.py
, to test FiPy in parallel. The test case is just a diffusion
problem on a 3D grid. It uses Gmsh to partition the grid sensibly as
the non-Gmsh grids in FiPy only use suboptimally sliced partitions. In
serial the test case ran without issues:
(with PySparse) and with Trilinos:
The script, kris.py
, uses time
to measure the duration of one time
step. In general, it’s better to use timeit
, but a bit more fiddly
to set up. Anyway, Trilinos is quite a bit slower than PySparse, but I
know that the default solver selection along with the default
tolerance and number of iterations is not consistent between PySparse
and Trilinos. This accounts for some of the run time duration
discrepency, though not all. At this stage I just want to check that
it’s actually working in parallel so I tried running it with MPI:
The system version of Gmsh (installed with the package manager in
Ubuntu) is 2.8.2 so FiPy’s Python error message is clearly
inaccurate. I searched with the routed:binomial
error message and
found
a ticket that I’d filed ages ago and never really resolved.
Looking at the ticket, it became clear that the real issue above is
that MPI is messing with Python’s ability to communicate with a
subprocess. FiPy calls out to Gmsh as a subprocess to gather its
version number. This prompted me to reinstall Trilinos
again. Sometimes this has helped resolve issues in the past when I’ve
had an old version of Trilinos knocking around (maybe the system
libraries get out of sync with Trilinos). Predictably, this resolved
the issue as far as Trilinos was concerned. However, mpi4py
generated exactly the the same subprocess communication issue (FiPy
requires mpi4py
for parallel operation) and subsequent
reinstallation of mpi4py
didn’t help.
Anaconda
I’ve been interested in switching from Virtualenv to Anaconda for some time. I had a hunch that Anaconda might handle some of the issues with library incompatibilities more seamlessly than using the system installation along with Virtualenv. My understanding is that Anaconda inherits none of the system installation (unlike Virtualenv). First, I made sure that FiPy worked in serial with Anaconda. This step consisted of installing PySparse and maybe a few other packages (can’t remember exactly), but there were no issues getting FiPy set up in serial.
Of course, to get FiPy working in parallel I needed to install Trilinos. At first I installed Trilinos without realizing that Anaconda comes with all the MPI compilers and libraries. I’d compiled Trilinos against the system MPI libraries rather than Anaconda’s, so when running:
it gave errors (which I unfortunately didn’t record anywhere). The Trilinos build recipe that was eventually used, after a few more missteps, points at Anaconda’s MPI:
where the ANACONDA
variable is just the base Anaconda
directory. After installing Trilinos correctly, the Gmsh subprocess
gave the following error with 2 processors:
The Python error is just due to Gmsh falling over. The fact that the
error message is pointing at the system MPI is confusing and suggests
the system version of Gmsh is somehow compiled with MPI
support. However, ENABLE_MPI=OFF
is the default setting for Gmsh
which seems inconsistent. Anyway, I didn’t try to really resolve this
problem, but just updated the version of Gmsh without using the system
version. I downloaded the latest binary version of Gmsh and placed it
in Anaconda’s bin
directly and then everything seemed to work:
Of course, the above doesn’t answer any of the original questions on the mailing list, but at least I have a working version of FiPy in parallel.
comments powered by Disqus