Grid Computing Using Chinook
This document describes grid computing using
the Chinook platform.
Distributed Grid Computing for Bioinformatics
The design of Chinook facilitates bioinformatics grid
computing across heterogeneous networks. Traditionally, cluster computing and
high performance computing (HPC) resources
were strictly limited to the organizations
that
could afford
them (for a review of the differences between Grid and P2P computing, visit this paper); Chinook intends to provide access to these resources for individuals
and organizations working with common analytical goals.
The usage of Chinook for distributed computing promotes the accesibility of
bioinformatics analysis within consortia and resource-limited research areas.
We have been actively promoting the usage of this system alongside standard
web servers that offer CGI access to various bioinformatics tools.
Chinook is completely open-source and is designed to be driven by a community
interested in HPC analysis across heterogeneous networks. This could be a community
that is interested in rapidly annotating transcription factors in genomes,
aligning whole genomes, or other pursuits. The ability of Chinook to integrate
ANY command-line application and database leaves the possibilities for Grid
computing completely up to the user.
Advantages of Chinook as a GRID computing application
There are several high-profile GRID solutions that are currently available. Chinook provides some unique advantages to these platforms:
- Chinook is designed for bioinformatics. Issues suchs as attribution, open-source, and community-expertise are addressed.
- Chinook provides selection on what services are available over time
- No membership or association is required. Chinook can be setup inhouse or globally for free
- Cross-platform design promotes usage across operating systems
- Designed to work with existing standards (Perl code is Bioperl compliant and is being integrated into BioMoby
-
- No prerequisite knowledge about endpoints/services are required (you don't have to know where the target URL resides. Chinook is
self-assembling
- Software can be developed without requiring special libraries for parallel processing
Grid Computing Example: Whole Genome Alignments
This section describes how whole genomes can be aligned across a world-wide
network of Chinook servers using Perl.
THE PERL ENGINE ARCHITECTURE OF CHINOOK
There are 3 principal components to the Chinook architecture:
- Chinook Client
- Chinook P2P Agent (Node)
- Chinook Server
Both Servers and Clients talk to P2P Agents to discover and advertise services
over the Chinook network (using JXTA). For more info on the architecture, click
here. However, in the case of distributing jobs to multiple servers, a Perl
Engine is initiated
on
the
Client.
This Engine
actually allows socket connections from any language to talk to the client.
We have implemented a Perl interface to this client. The Perl interface acts
to allow communication between Perl scripts and the Chinook client; using the
Perl interface, a script can submit jobs and receive results from a Chinook
server.

The Chinook Perl Engine is easily started and stopped using either the executables
bundled in the installers or via Apache Ant. To start the Perl Engine, first
visit the configuration file called <chinook_install_dir>/resources/batch-config.xml and
change the directories to match directories that you are comfortable with having
Chinook read and write batch files to. Then issue
these commands from your Chinook installation directory:
ant p2p-start
ant perl-client-start
At this point, Chinook is ready to accepted requests from Perl scripts. There
are several example Perl scripts which talk to the Perl Engine in the directory <chinook_install_dir>/perl/t/.
In this directory, there is one script called t_distributePairwiseAlignment.pl.
If it is not there, (it will not be in any version including and prior to 1.1).
Download and install these two files.
The t_distributePairwiseAlignment.pl is the file that distributes
pairwise alignments to random Chinook servers. The output is placed in the /tmp directory.
You will need to edit this if you are working in a non-Linux environment to
match your temporary directory.
Using this code, you can only issue commands like
perl t_distributePairwiseAlignment.pl LAGAN file1.fa file2.fa
This will send an alignment to a random Chinook server somewhere on the Internet
for processing. Read through the comments for this script, the user
documentation (Perl walkthrough), and some of the other scripts to get an idea
of how to send jobs to different servers with more controlled parameters (based
on version of utility, jobs in queue on server, etc.)
|