Grid Computing Using Chinook

This document describes grid computing using the Chinook platform.

Distributed Grid Computing for Bioinformatics

The design of Chinook facilitates bioinformatics grid computing across heterogeneous networks. Traditionally, cluster computing and high performance computing (HPC) resources were strictly limited to the organizations that could afford them (for a review of the differences between Grid and P2P computing, visit this paper); Chinook intends to provide access to these resources for individuals and organizations working with common analytical goals.

The usage of Chinook for distributed computing promotes the accesibility of bioinformatics analysis within consortia and resource-limited research areas. We have been actively promoting the usage of this system alongside standard web servers that offer CGI access to various bioinformatics tools.

Chinook is completely open-source and is designed to be driven by a community interested in HPC analysis across heterogeneous networks. This could be a community that is interested in rapidly annotating transcription factors in genomes, aligning whole genomes, or other pursuits. The ability of Chinook to integrate ANY command-line application and database leaves the possibilities for Grid computing completely up to the user.

Advantages of Chinook as a GRID computing application

There are several high-profile GRID solutions that are currently available. Chinook provides some unique advantages to these platforms:

  • Chinook is designed for bioinformatics. Issues suchs as attribution, open-source, and community-expertise are addressed.
  • Chinook provides selection on what services are available over time
  • No membership or association is required. Chinook can be setup inhouse or globally for free
  • Cross-platform design promotes usage across operating systems
  • Designed to work with existing standards (Perl code is Bioperl compliant and is being integrated into BioMoby
  • No prerequisite knowledge about endpoints/services are required (you don't have to know where the target URL resides. Chinook is self-assembling
  • Software can be developed without requiring special libraries for parallel processing
  • Grid Computing Example: Whole Genome Alignments

    This section describes how whole genomes can be aligned across a world-wide network of Chinook servers using Perl.

    THE PERL ENGINE ARCHITECTURE OF CHINOOK

    There are 3 principal components to the Chinook architecture:

    1. Chinook Client
    2. Chinook P2P Agent (Node)
    3. Chinook Server

    Both Servers and Clients talk to P2P Agents to discover and advertise services over the Chinook network (using JXTA). For more info on the architecture, click here. However, in the case of distributing jobs to multiple servers, a Perl Engine is initiated on the Client. This Engine actually allows socket connections from any language to talk to the client. We have implemented a Perl interface to this client. The Perl interface acts to allow communication between Perl scripts and the Chinook client; using the Perl interface, a script can submit jobs and receive results from a Chinook server.

    The Chinook Perl Engine is easily started and stopped using either the executables bundled in the installers or via Apache Ant. To start the Perl Engine, first visit the configuration file called <chinook_install_dir>/resources/batch-config.xml and change the directories to match directories that you are comfortable with having Chinook read and write batch files to. Then issue these commands from your Chinook installation directory:

    • ant p2p-start
    • ant perl-client-start

    At this point, Chinook is ready to accepted requests from Perl scripts. There are several example Perl scripts which talk to the Perl Engine in the directory <chinook_install_dir>/perl/t/. In this directory, there is one script called t_distributePairwiseAlignment.pl. If it is not there, (it will not be in any version including and prior to 1.1). Download and install these two files.

    The t_distributePairwiseAlignment.pl is the file that distributes pairwise alignments to random Chinook servers. The output is placed in the /tmp directory. You will need to edit this if you are working in a non-Linux environment to match your temporary directory.

    Using this code, you can only issue commands like

    perl t_distributePairwiseAlignment.pl LAGAN file1.fa file2.fa

    This will send an alignment to a random Chinook server somewhere on the Internet for processing. Read through the comments for this script, the user documentation (Perl walkthrough), and some of the other scripts to get an idea of how to send jobs to different servers with more controlled parameters (based on version of utility, jobs in queue on server, etc.)

top Canada's Michael Smith Genome Sciences Centre | Genetics Graduate Program (UBC) | Want bioinformatics training??? | Vancouver Bioinformatics Users Group

(c) 2004 Stephen Montgomery, Canada's Michael Smith Genome Sciences Centre