What is mpiBLAST?
mpiBLAST is a parallelization of NCBI BLAST. mpiBLAST is a pair of programs that replace formatdb and blastall with versions that execute BLAST jobs in parallel on a cluster of computers with MPI installed. There are two primary advantages to using mpiBLAST versus traditional BLAST. First, mpiBLAST splits the database across each node in the cluster. Because each node's segment of the database is smaller it can usually reside in the buffer-cache, yielding a significant speedup due to the elimination of disk I/O. Second, it allows BLAST users to take advantage of efficient, low-cost Beowulf clusters because interprocessor communication demands are low.
We provide two example setups for running mpiBLAST.
Example setup 1 (in detailed form, provided by Jan-Frode Myklebust)
- Create a file ~/.ncbirc containing
[NCBI]
Data=/home/fimm/cbu/andersl/blast-2.2.10/data
[BLAST]
BLASTDB=/work/speil/flatdb/uniprot/uniprot
BLASTMAT=/home/fimm/cbu/andersl/blast-2.2.10/data
[mpiBLAST]
Shared=/work/janfrode/blastdb
Local=/scratch/
- Create a unique local directory on each node, and tell the mpiblast-configfile where it is:
# Need to use a unique local directory for each job:
LOCAL=/scratch/${LOGNAME}/${PBS_JOBID}
echo /work/janfrode/blastdb > /work/janfrode/blastdb/mpiblast.conf
echo $LOCAL >> /work/janfrode/blastdb/mpiblast.conf
for node in `cat $PBS_NODEFILE | sort | uniq` ; do
mkdir -p /net/${node}/${LOCAL}
done
The first line of the mpiblast.conf is the shared storage, and the second is the
local storage used during the search. These should be the same as the Shared and
Local in ~/.ncbirc.
- Split the database you're going to search in up into multiple segmens. One segment
per cpu you want to run on, f.ex. 4:
mkdir -p /work/janfrode/blastdb
cd /work/janfrode/blastdb
mpiformatdb -N 4 -i /work/speil/flatdb/algae/uniprot_sp_trembl_no_algae_virus.fasta
- Run parallel blast (Will only work in the batchsystem!):
mpiexec mpiblast --config-file=/home/fimm/plab/janfrode/.mpiblast \
-d uniprot_sp_trembl_no_algae_virus.fasta -i ~janfrode/PBCV-1.fasta -p blastx \
--concurrent=10 --removedb
- Example batch-script based on the above: mpiblast.pbs
- Publish results!
Example setup 2 (batch-script only, provided by Tim Hughes)
mpiblast2.pbs