Version

1


canada’s michael smith

Genome Sciences Centre, BCCA

Chinook User Guide


Genome Sciences Centre, BCCA

Chinook User Guide

ã Genome Sciences Centre
BC
Cancer Agency
Suite 100
570 West 7th Ave

Vancouver, BC
V5Z 4S6

Phone: (604) 707-5800
First Floor Fax: (604) 876-3561
Fifth Floor Fax: (604) 733-9481

 


Table of Contents




1.0 Introduction

C

hinook is a peer-to-peer (P2P) bioinformatics platform. The goal of the Chinook platform is to facilitate exchange of analysis techniques within a local community and/or worldwide. Chinook operates by turning command-line applications into services which are broadcast over a virtual network. Currently, there are multiple analysis services that have been made accessible by Chinook. These range from alignment to regulation prediction algorithms. Furthermore, Chinook is designed to make it extremely easy to add new services. This is facilitated using XML (A GUI is under development to facilitate service configuration when this manual is written).

Chinook clients can be operated from Java, Perl, or within applications like Sockeye (And soon Pegasys at the Ouellette Lab in Vancouver, and OrthoSeq project at the Wasserman Lab in Vancouver (CMMT)).

1.0.1 Feature Matrix

Server

Integrates command-line applications in XML

ü

Allow multiple input/output files

ü

STDERR/STDOUT Previewing

ü

RMI (remote method invocation)

ü

WSDL (web services)

ü

Ant startup

ü

Add new services (GUI supported)

ü

Add static services (GUI supported)

ü

Provide access for server-side databases (i.e. Ensemble)

ü

File Storage memory allocation is user configurable

ü

Client

Run and kill services (GUI supported)

ü

Run batch files

ü

Download results

ü

Specify filters to narrow search

ü

Allow preview of service webpage

ü

Contain server information dialogs(GUI supported)

ü

P2P

Discover nodes

ü

Advertise services

ü

Multiple clients and servers supported

ü

Auto-configuration supported for JXTA

ü

Perl

Run batch job through command-line

ü

Submit batch jobs to queue

ü

Monitor service and job information

ü

Download results

ü

 

Back to Table of Contents

1.0.2 Background / Use Cases

Bioinformatics techniques are used to identify complex, re-occurring relationships in biological data. Genome sequencing projects and high-throughput expression analyses have contributed large amounts of data; both complicating analysis and demanding higher-level coordination of computational resources. Furthermore, the variety of available bioinformatics tools and algorithms, and their diverse modes of usage create a situation where most users have trouble discerning where to invest their time and resources. Chinook resolves these issues by creating a virtual network for bioinformatics analyses. A user is able to dynamically resolve available bioinformatics services (algorithms) over the Internet or their local network. The user can then validate a server's authenticity and submit bioinformatics analyses to peers that publish their ability to perform desired services. Information like bandwidth, jobs in queue and the location of Chinook services are reported to clients to aid in their job submission process. A user is also able to visit the service creator's website to identify what the particular service does. Chinook allows a service provider to create a new service by simply editing an XML file; as long as the new service has a standard output format, no additional programming is required. The Chinook server runs over the JXTA peer-to-peer network in both Java Remote Method Invocation (RMI) mode or through Apache Axis web services. Chinook creates a virtual community where researchers can rapidly hone in on applications of interest to run them across multiple service providers while shifting the responsibility for application maintenance from the client to the developer.

Chinook is currently designed to work with only command-line applications where sequence is the input. Chinook groups these applications together to provide a GUI for users to easily run these applications. Current input sequence formats supported are Fasta and Multi-fasta. Work is in progress to allow Chinook to access files uploaded from clients. Furthermore, Chinook has been designed to allow servers to ‘plug-in’ multiple databases, giving clients access to protein, DNA, or other types of biological data.

Back to Table of Contents


2.0 Installation

T

here are three ways to install Chinook; you can use the installer, download a tarred and zipped release, or checkout the source from CVS.  This section will describe how to do each of these tasks.  For detailed information on setting up a server see the Walkthrough in Section 5. 

2.0.1 Installer

The advantage of installing Chinook from installer is that it is very straightforward.

Installers are available at www.bcgsc.bc.ca/chinook/install.htm.   Download the latest release from this site.  This need not be the directory where you want to install Chinook.

Once downloaded, run the downloaded installer and follow the instructions to install Chinook.  (In Windows, you can double-click the downloaded installer.  In Linux, you can run the installer from a shell.)

2.0.2 Release builds

Since the installer is an interactive method of installing Chinook, a non-interactive release build is also available.  This is more suitable for setting up servers on remote nodes.

Release builds are available at www.bcgsc.bc.ca/chinook/install.htm.   Download the latest version.  Unpack it in the directory where you want to install Chinook.

2.0.3 CVS

The Chinook code is available over CVS for GSC network users.  (NOTE: For more information on what CVS is, read https://www.cvshome.org/docs/manual/.)  Currently, only release builds are available for installation.  For the latest version of the Chinook code, e-mail chinook@bcgsc.bc.ca.  (If enough interest emerges, we will move the CVS repository to an external home). 

For GSC Network users:

To login to the CVS repository at GSC (need a user name and password), type in the following command.

cvs -d :pserver:USERNAME@triton.bcgsc.bc.ca:/home/cvs login

 

Replace the USERNAME with your user name, and type in your password when being prompted. Then type in the following command to checkout Chinook.

cvs checkout –d chinook –r chinook

 

Now you have downloaded a working copy of Chinook, and you can customize the code or compile it.

For more information on how to connect to CVS from a GUI or even from an IDE like Eclipse, please refer to Chinook Developer Guide.

Back to Table of Contents


3.0 Running Chinook

T

here are several ways you can run Chinook. You can run Chinook only as a client. Or if you want to share your computer resources and services, you can run Chinook as a server. Or you can choose both and run Chinook as a server and a client. No matter which one you choose, if you want to enable the P2P functionality, you need to run the ChinookP2PNode before you start client or server.

3.0.1 Running Client

Running the Chinook Client is quite simple. There are several ways you can run the Chinook Client. As a client, you can run services locally or services provided by others if you have connected your computer to the P2P network. Chinook is designed to make running services simple and intuitive, so that users will focus on the data and results, not on how to run a single service and how to type in the command to run the service.

3.0.1.1 From Installer

This is the easiest way to start Chinook Client. Under Windows, just go to the <$INSTALL_DIR>, which is your installation directory where you installed Chinook, and click Chinook Client. This will automatically start Chinook Client.

3.0.1.2 From ANT

Ant is a Java-based build tool developed by the Apache Software Foundation (http://ant.apache.org). If you are familiar with the Make Utility, then you already know the principles of Ant. The difference is Ant configuration files are XML-based, which makes it portable. Each task is run by an object that implements a particular Task interface. If you are using an IDE to customize Chinook code, then probably Ant is already bundled with the IDE. If you are going to run Ant from command-line, then you may need to install Ant yourself. To install Ant, visit (http://ant.apache.org). Running the Chinook Client is easy.  Go to the installation. At the prompt type in:

ant client

Ant will use the default build.xml file in the installation directory and run the target client. The Client will start automatically.

3.0.1.3 From Code

If you are interested in customizing Chinook for your own purpose, you can download the Chinook code and modify it, and then run or build the Chinook Server and the Chinook Client from the code.

Note: Ensure all the dependency jars in the Chinook/lib/folder are added to the CLASSPATH before you compile any Chinook code, Furthermore, it is required that the Chinook/resources/ folder is added to the CLASSPATH. The resources folder contains all the user-configurable Chinook files.

To Run the Chinook Client from the code, you need to run ChinookP2PNode first if you want to connect to other computers through the peer-to-peer network. There are no arguments required for the ChinookP2PNode to run. The main method of the ChinookP2PNode is located at ca.bcgsc.chinook.p2p.ChinookP2PNode.java. Also, there are no arguments required for Chinook Client to run. The main method of the Chinook Client is located at ca.bcgsc.chinook.client.exec.ChinookClient.java. Depending on which IDE you are using to customizing Chinook project, the procedure may be different. For Example, in Eclipse you can go to Run menu, then click Run… menu item. A window similar to the following will appear.

Figure 3.1 Run Window. select which main method to run.

Click Java Application in the Configuration window then click the New button under it. In the Name text field, type in “ChinookP2PNode”. If there is something in the text field besides the Search button, clear it, and then click the Search button and the following window will appear.

Figure 3.2 Choose Main Type Window. It displays all the main methods available.

Select the ChinookP2PNode and then click the OK button. We are going back to the Run window. Click the Apply button, and then the Run button. The ChinookP2PNode will start.

Figure 3.3 Run window. Running ChinooP2PNode.

Follow the same procedure to run the ChinookClient. The difference is to select ChinookClient from the Choose Main Type window.

3.0.1.4 From Webstart

Alternatively, you can run Chinook Client from Webstart, go to the following website, http://www.bcgsc.ca/chinook. Click the Latest Download on the right column, and then click Chinook Client Web Start link to open the web page containing the Chinook Client start. Click on the corresponding link to start the Chinook Client. The Webstart is suitable for those people who just want to run services, and don’t care where the services are located. One advantage of running Chinook Client from Webstart is that the user can always run the latest version of Chinook, and does not need to worry about the maintenance of the software.

Back to Table of Contents

3.0.2 Running Server

Running the Chinook Server can be a little bit tricky for first-time users. There are two ways you can provide services to Client. One is through Web services. The other is through RMI. If you want to run the server in Web Services mode, you need to configure a tomcat server correctly. In addition, you need to set up your VM arguments. To run the Chinook Server using Ant, you need to edit build.xml file to change the VM parameters in the <server> target corresponding to your own installation. Then running a server is similar to run a Chinook Client. At the prompt, type in:

ant server-start

The server will start to run.  To stop the server type:

ant server-stop

 

3.0.2.1 For Web Services

If you want to run Chinook through Web Services, You need to configure a Tomcat server correctly. And you also need to edit chinookImplAdvWSDL.xml and chinookSpecAdvWSDL.xml. For an example of how to modify the xml files, go to 3.0.3.1.2 advertisement section.

3.0.2.2 For RMI

Providing services through RMI, you need to run rmiregistry first. Most of the time the rmiregistry is located in the bin directory under $JAVA_HOME, depending on where you installed the java package.

If you are running the Chinook Server from code, you need to set RMI security manager, and specifying the right codebase. This can be done by adding to your VM parameters (note that the codebase parameters are separated by a space):

-Djava.rmi.server.codebase=file:///home/smontgom/jbproject/chinook/classes/ file:///home/smontgom/jbroject/chinook/lib/filewire.jar

-Djava.security.policy=/home/smontgom/jbproject/Chinook/resources/chinookRMI.policy

You need to customize these parameters to your own Chinook installation.

Back to Table of Contents

3.0.3 P2P

Chinook is designed to be a peer-to-peer application to facilitate the exchange of bioinformatics utilities. In this way, users don’t need to have all services provided locally; they can run services advertised by other computers.

3.0.3.1 Using Chinook in a p2p Environment

Running Chinook in a p2p environment using Ant is very similar to running the Chinook Client. At the prompt, type in

ant p2p-start

The p2pNode will start automatically.   TO stop the p2p node type:

ant p2p-stop

 

3.0.3.1.1 JXTA

Chinook servers publish advertisements using the JXTA protocol (For more information about JXTA protocol, visit http://www.jxta.org/). The client peer intercepts these advertisements and displays services (for which jobs can subsequently be run). The next section describes how Chinook advertisements are made and how you can edit them for your services.

3.0.3.1.2 Advertisements

An advertisement is an XML document that describes a particular JXTA message, whether that is a peer, peer group or service. These messages are discovered then cached locally. (To see your cache go to your own .jxta/ directory - created when you first run Chinook). As a service provider, you are interested in only two types of messages, the ModuleSpecAdvertisement (MSA) and the ModuleImplAdvertisement (MIA). These are located in you advertisements/ directory. (Chinook handles peer group and peer advertising by itself).

The Chinook ModuleSpecAdvertisement

Chinook has two ModuleSpecAdvertisement's in its advertisements/ folder. One for the RMI protocol and another for the web services protocol. Depending on how you want to run your server, you will need to modify one of these advertisements. For the purpose of this example, we will look at the RMI advertisement.

chinookSpecAdvRMI.xml

<?xml version="1.0"?>

<!DOCTYPE jxta:MSA>

<jxta:MSA xmlns:jxta="http://jxta.org">

  <MSID>

  urn:jxta:uuid-72CE4F415C994ADBB5BCB897E6BBB3D0EB39B9952C0D4D79BAD5BDE678877F4D06

  </MSID>

  <Name>JXTASPEC:Chinook-RMI</Name>

  <Crtr>smontgom@bcgsc.bc.ca</Crtr>

  <SURI>http://www.bcgsc.ca/Chinook/</SURI>

  <Vers>1.0</Vers>

  <Desc>A Chinook DBAS RMI Server</Desc>

</jxta:MSA>

 

The ModuleSpecAdvertisement is simple in nature. The <MSID> tag holds a unique id that identifies this service, for your purposes it can be any valid JXTA id (see http://spec.jxta.org/nonav/v1.0/docbook/JXTAProtocols.html JXTA Protocol Specification). The <Name> tag shouldn't be changed; Chinook searches for Spec advertisements based on this. If the <Name> tag is changed, your advertisement won't be discovered. The <Crtr> tag specifies who the publisher of this service is (I always use my own name, this will show up for the Chinook client and they will be able to contact me based on this info). The <SURI> tag points to the documentation for Chinook; but this can be changed to be any service providers’ relevant service documents. The <Vers> tag specifies what version of the Chinook is being used. Finally, the <Desc> tag describes the service.

The Chinook ModuleImplAdvertisement

Chinook has two ModuleImplAdvertisement's in its advertisements/ folder. These correspond to specific implementations of the ModuleSpecAdvertisement's. Any service provider MUST change these advertisements to reflect the appropriate server information (the URI). We will look at the RMI example of the ModuleImplAdvertisement.

chinookImplAdvRMI.xml

<?xml version="1.0"?>

<!DOCTYPE jxta:MIA>

<jxta:MIA xmlns:jxta="http://jxta.org">

  <MSID>

urn:jxta:uuid-

72CE4F415C994ADBB5BCB897E6BBB3D0EB39B9952C0D4D79BAD5BDE678877F4D06

  </MSID>

  <Comp>

    <Efmt> JDK1.4 </Efmt>

    <ChinookImpl> 1.0 </ChinookImpl>

  </Comp>

  <Code>//localhost:1099/ApplicationServerImpl</Code>

  <PURI>Not yet available</PURI>

  <Prov>smontgom@bcgsc.bc.ca</Prov>

  <Desc>RMI Chinook Implementation, Parm is generated with service names

  </Desc>

</jxta:MIA>

 

 

The ModuleImplAdvertisement here has only a few things that must be noted. The <MSID> tag must match that of the corresponding ModuleSpecAdvertisement. The <Comp> tag specifies compatibility information. Here it states that users must have at least JDK1.4 and the 1.0 implementation of the Chinook interface. The <Code> tag points to the relevant URI to run the Chinook service. The <PURI> tag specifies a location to download the appropriate classes; this is not yet implemented. The <Prov> tag specifies who is providing this implementation. Finally, the <Desc> tag describes the implementation. If you have seen a discovered advertisement, you may be wondering why they are different from this description. A discovered Chinook ModuleImplAdvertisement will have automatically filled in <service_type> and <service_name> elements as part of the <Param> tag. The <Param> tag specifies additional Chinook metadata. Chinook uses this to ontologically classify services. This allows versions of Chinook to discover advertisements of only the type they are interested in.

Back to Table of Contents

3.0.4 PERL

Chinook has been integrated with Perl to allow the automated discovery and execution of services from scripts.  The Perl code is packaged for Bioperl and will likely be made part of the Bioperl distribution when Chinook is widely-released.  This section describes how to set-up your environment to be able to run Perl scripts for Chinook.  For a walkthrough of creating a Perl script for Chinook also see the Walkthrough in Section 5.

3.0.4.1 Setting up the Chinook Bioperl Environment

Perl needs to know the location of the Chinook Perl modules.  The modules for Chinook are installed under the perl/modules/ directory in the Chinook installation directory.  To point Perl at these modules, you can set the PERL5LIB environment for your shell.  This can be performed by issuing the following commands.

If you are using tcsh/csh shells:

setenv PERL5LIB ${PERL5LIB}:${CHINOOK_HOME}/perl/modules

In bash, the equivalent command is:

export PERL5LIB=${PERL5LIB}:${CHINOOK_HOME}/perl/modules

Where ${CHINOOK_HOME} is the location of your Chinook installation.

NOTE: We typically prefer to write these commands to our user .bashrc file in our user directory to prevent having to retype them every time we want to use the Chinook Perl modules.

Alternatively, you can use the perl pragma ‘use lib’ at the top of your scripts to point to the location of the Perl modules you wish to use:

use lib ‘/home/chinook_install_directory/perl/modules’;

 

Where /home/chinook_install_directory/ is the installation directory for Chinook. 

3.0.4.2 Starting the Client

To discover services and run analyses, an instance of the Chinook Client must be running in batch mode with a port open for communication with Perl scripts.  The Perl scripts connect to this port to determine what services are available and to send requests for execution. 

To set-up the Chinook Client for execution in batch mode (for Perl):

1)      Open the batch-config.xml file in the resources/ directory in your Chinook installation directory.

2)      There are several tags that need to be set.

a.       <batch_directory> specifies the directory where information about discovered services is written (batch files).  Whenever a new service is discovered, a batch file is written to this directory describing the service, its location, and required parameters for execution.

b.      <batch_queue_directory> specifies the directory where completed batch files are stored (batch_queue files).  A complete batch file has parameters and data set and is ready for execution.  It also has a batch_queue id attached to it to identify downstream output files.  The Chinook Client can be notified to read all the files in this directory and process them.  (Alternatively, it can be given the location of a batch_queue file directly)

c.       <batch_reporting_directory> specifies the directory where completed report information is written to (batch_reports).  The Chinook Perl code usually polls this directory for reports matching a specific batch_queue file id.

d.      <batch_machine_name> specifies where the Client is executing.  Usually localhost is sufficient.  In NFS mounted systems, an explicit machine name is required.  The Chinook Perl code needs to know this location to be able to connect to the Chinook Client.

e.       <batch_port> specifies the port that the Chinook Client will be receiving incoming requests from Perl scripts.  The default port is 7999.  This can be any valid open port number, but Perl clients will need to know this information in addition to the <batch_machine_name> in order to connect to the Chinook Client.

f.        <batch_socket_conns> specifies the maximum number of open socket connections that can be made to the Chinook Client at a time.  This number should usually be greater than <batch_receiver_thread_queue_size>.

g.       <batch_receiver_thread_queue_size> specifies the maximum number of concurrent processing requests that scripts can make of the client.  The rest will block until the pending requests are finished.  This should be a low number to prevent excessive use of memory.  But should be inline with the number of concurrent requests your Chinook Client receives.

3)      Once the batch-config.xml file has been configured to your desired settings, the Chinook Client can be started with batch-mode enabled.  This starts a small server inside the Client that will manage incoming requests from Perl scripts.

4)      To start the Chinook Client with batching mode, start the Client as normal but with the following flag set. 

./ChinookClient –batch

The batch flag will ensure that the batching mode is activated.  If you do not want the GUI to appear, you can call the ChinookClient with:

./ChinookClient –batch –nogui

This is ideal for running the ChinookClient on remote machines. 

3.0.4.3 Running PERL Scripts

For examples and more information on running Perl scripts once the environment has been configured and batch-enabled client has been started, see the Perl Walkthrough in Section 5.0.4.

 

Back to Table of Contents


4.0 Customizing Chinook

C

hinook is an open source project. In this section, we are going to guide you through how to customize chinook platform for your own installation, including configuring static services, adding new services , and configuring server information.

4.0.1 Using the Resources directory

As Chinook starts, it uses xml files under resources directory to configure itself. You can customize Chinook by editing some of the xml files in resources directory.

4.0.1.1Configuring static services

You can add static services to Chinook by editing the static-services.xml file under resources directory. Each time Chinook starts; it tries to connect to servers in static-services.xml first instead of trying to discover services through p2p node directly. The static-services.xml file is very simple. It may look like the following.

<?xml version="1.0" encoding="ISO-8859-1"?>

<static-services>

  <staticservice>

    <URI>//localhost:1099/ApplicationServerImpl</URI>

    <mode>RMI</mode>

  </staticservice>

  <staticservice>

    <URI>http://localhost:1099/ApplicationServerImpl</URI>

    <mode>WSDL</mode>

  </staticservice> 

</static-services>

 

The <URI> tag defines the location of the server, and <mode> tag defines the mode of the server, either RMI (remote method invocation) or WDSL (web service). You can add more than one static server. Enclose your static between <staticservice> and </staticservice> tags.

You can also edit the static-sevices.xml file through the GUI. After you start the Chinook Client, click on the Tools menu. Click the Static Services… menu item. A window similar to the following will appear.

Figure 4.1 Static Services window. It displays all static services currently available.

You can add new static services by clicking the Add button. After clicking the Add button, the following window will appear.

Figure 4.2 Static Services Editor window. It is used to add or edit static services.

Type in the server location and choose the type of the service, then click Test Connection button. If the client is able to connect to the server, the red square will become green, and the OK button will become enabled. If it can not be connected, the red square will remain red, and you cannot add the static service to the static-services.xml file.

Figure 4.3 static services editor window. How to add a new static service.

After you’ve finished, click the OK button on the Static Services Editor window, and then click the OK button on the Static Services window. All the static services you just added will be written to the static-services.xml file. Editing existing static service is similar. You can delete a static service by selecting the service you want to delete in the Static Services window, and then click the Delete button.

4.0.1.2 Adding services

Each time Chinook server starts, it checks all the services specified in applications.xml file, which under resources directory, and then publishes them. In order to add new service to your Chinook Server, you need to describe a new service using xml tags. For a more detailed example, you can go to 5.0.2 Adding a new Service.

4.0.1.3 Configure server information

If you want to run as a Chinook Server, you need to customize the server-info.xml file, which under resources directory. This allows clients to obtain information about your server. The server-info.xml file is very simple.

<server-info>

  <location>Genome Science Centre</location>

  <description>

    This is the description of the server

  </description>

  <contact>chinook@bcgsc.bc.ca</contact

</server-info>

 

The <location> tag defines you server location, it could be the URL of your website. The <description> tag specifies the description of the server. The <contact> tag defines the contact information for the server (typically a maintainers’ email address).

Back to Table of Contents


5.0 Walkthroughs

I

n this section, we are going to guide you through using several common features available in Chinook, including running a job using a GUI, adding a new service, and running a batch Perl job.

5.0.1 Running a job with Chinook (using the GUI)

Chinook is designed to facilitate a bioinformatics’ work. Most of the jobs can be accomplished through a GUI environment. In this example, we will show you how to run a job in Chinook.

Step 1: Starting Chinook Client.

After you start the Chinook Client, the following window will appear (See Figure 5.1). There are five main parts in Chinook client. The first one is the Chinook Menubar (top of the window under the title bar). The second one is Service Type and Filter Panel (upper-left panel), which is used to display all the available service types and used to specify filters to narrow the search of the services you want. The third one is the Discovered Services Panel (upper-right panel), which is used to display all the services by choosing the service type in the Service Type and Filter Panel or by specifying the filters. The fourth one is the Job Status Panel (the lower-left panel), which is used to display the currently running jobs’ status. The fifth one is the Lightweight Web Browser (the lower-right panel), which is used to display the original service developer’s web page.

Figure 5.1 Chinook client starts up.

Step 2: Specifying a filter.

You can select the services you want to run by clicking the service name on the Service Types pane. For example, you can click ALIGNMENT on the left panel; all the server providing ALIGNMENT services will appear on the Discovered Services Panel. If there are too many servers providing the service, you can further filter the servers by providing some filters. To do so, click the Filters tab in the Service type and filter panel. Specify the filters you want by typing in the text field, then checking the checkbox of the corresponding text field. The services fulfill the filters will be displayed in the Discovered Services Panel immediately.

Figure 5.2 Specify filters to facilitate finding a service.

Step 3: Running a service on a server.

Running a service on Chinook is quite straightforward. You can select the service you want to run in the Discovered Services panel by clicking the name of the service; then click the Run service button. Or you can right click on the service you want to run; then select Run job on server in the Popup menu. For example, you select MLAGAN in the Discovered Services panel, then click Run service button. A window like the following will appear on the screen

Figure 5.3 Configuring Service window. It is used to edit data and supply parameters.

Step 4: Editing data.

4.1 In the above window, the red dot to the right of the Edit data button indicates currently the data is invalid. You can specify the data you want by clicking the Edit data button. A window like the following will appear

Figure 5.4 Enter data window. It is used to enter all the data the service needed.

4.2 In the above window, the red square on the right of the data box indicates that currently the data is invalid. You can edit the data by select one of the data box by clicking it; then click the Edit button. A window like the following will appear.

Figure 5.5 Enter data window. It is used to enter one sequence used by the service.

4.3 The red dot to the right of each text field and the red square under the window indicate that the data is invalid. You can point the mouse to the red square; a tool tip will tell you which part the data is invalid. After you specifying all the data, if the data is valid, the red square will change to a green square. This indicates the data now is valid. You can click the OK button to return to the previous window to edit another data box if there are some. After you finish editing all the data boxes, you can click the OK button to return to the Configure Service window. Now the red dot to the right of the Edit data button should become green to indicate the data is valid. And you are ready to submit the job to the server.

Figure 5.6 Enter data window. Enter the valid data.

Figure 5.7 Enter data window. Add all valid data.

4.4 You can specify the parameters needed by the service by modifying the Parameter entry panel. Click the OK button on the Configure service window to run the service on server.

Figure 5.8 Configure service window.  Specify parameters, and submit the job

Figure 5.9 Running  job. The job is running on the server.

4.5 You can run more jobs simultaneously by following the same procedure from 4.1 to 4.4. If for some reason you want to cancel the jobs you are submitting, just select the job you want to cancel by clicking it in the Job Status Panel, and then click the Discard Job button. The job will stop running on the server, and being removed form the table. You can do the same thing by right click on the job you selected, and then click the Discard job on the popup menu.

4.6 You can view the job status by right-click on the job you selected, and then click the View result files on the popup menu. A window like the following will appear.

Figure 5.10 running job window.

4.7 After the job is completed on server, the Job status and Result files panel will be updated like this.

Figure 5.11 Job finished window. The job is finished on the server

4.8 One table entry in the Result files is the stand error message. If there some error occurs when the server is running the job, you can download the message to see what has caused the error. If the job is completed successfully, you can download the result by selecting the other table entry, and then click the download button.

Figure 5.12 Downloading report window. The report is downloaded from server to local hard drive.

4.9 After downloading the file, you can use your favorite text editor to open the report and read it.

Step 5: Exiting Chinook

You can go to File menu and click Exit to exit Chinook Client. Or you can click the close button located on the right of the title bar of Chinook Client.

Back to Table of Contents

5.0.2 Adding a new Service

Currently, there are over 10 analysis services that have been made "Chinook-ready". These range from DNA sequence alignment to gene regulation prediction algorithms. For a complete and updated list of services Chinook currently supported, Visit the Chinook website at http://www.bcgsc.ca/gc/bomge/chinook/algorithms.

The following example gives you an idea how to add a new service to Chinook Server. Let’s look at LAGAN as an example.

<application>

  <name>LAGAN</name>

  <type>ALIGNMENT</type>

  <path>/opt/mlagan</path>

  <executable>lagan.pl</executable>

  <format>exe_path/executable dna_sequence parameter</format>

  <allow_stderr_preview>true</allow_stderr_preview>

  <allow_stdout_preview>true</allow_stdout_preview>

  <results_written_to_stdout>true</results_written_to_stdout>

  <parsing_class>

    ca.bcgsc.chinook.server.runner.alignment.Lagan

  </parsing_class>

  <output_path>/tmp</output_path>

  <description>

    Lagan is developed at Stanford by Mike Brudno

  </description>

  <creator>http://lagan.stanford.edu</creator>

  <version>1</version>

  <data_entry_set>

    <name>dna_sequence</name>

    <maximum_count>2</maximum_count>

    <minimum_count>2</minimum_count>

    <data_entry_type_name>DNA_LOCATION</data_entry_type_name>

    <data_entry_type_name>DNA_FILE</data_entry_type_name>

    <set_output_class_name>

ca.bcgsc.chinook.parsing.setoutput.impl.DataEntrySetOutputterImpl

   </set_output_class_name>

   </data_entry_set>

 

  <parameter>

    <descriptor>chaos_STRING</descriptor>

    <regex_format>["]([.]+)["]</regex_format>

    <description>The contents of this string will be passed as arguments to chaos</description>

    <user_defined>true</user_defined>

  </parameter>

  <parameter>

    <descriptor>order_STRING</descriptor>

    <regex_format>"-gs ([0-9]+) -gc ([0-9]+) -mt ([0-9]+) -ms ([0-9]+)"</regex_format>

    <description>The contents of this string will be passed as arguments to order

    </description>

    <user_defined>true</user_defined>

  </parameter>

  <parameter>

    <descriptor>recurfl_STRING</descriptor>

    <regex_format>"(\([0-9]+,[0-9]+,[0-9]+,[09]+\),)+"

    </regex_format>

    <description>Used in recursive anchoring</description>

    <user_defined>true</user_defined>

  </parameter>

  <parameter>

    <descriptor>translate_BOOLEAN</descriptor>

    <description>Use translated anchoring</description>

    <user_defined>true</user_defined>

  </parameter>

  <parameter>

    <descriptor>bin_BOOLEAN</descriptor>

    <description>Output in binary format</description>

    <user_defined>false</user_defined>

    <on>false</on>

  </parameter>

  <parameter>

    <descriptor>mfa_BOOLEAN</descriptor>

    <description>Output in multifasta format</description>

    <user_defined>false</user_defined>

    <on>true</on>

  </parameter>

  <parameter>

    <descriptor>rc_BOOLEAN</descriptor>

    <description>

      Reverse complement the second sequence before alignment

    </description>

    <user_defined>true</user_defined>

  </parameter>

  <parameter>

    <descriptor>fastreject_BOOLEAN</descriptor>

    <description>Abandon alignment if homology looks weak

    </description>

    <user_defined>true</user_defined>

  </parameter>

</application>

 

All new application specs are defined between <application> and </application> tags. The <name> tag defines the application that is being run. This can be anything. The <type> tag is more important. This is an ontological definition that marks the type of services class you belong to. It is planned that the website will carry a dictionary of the terms that wildly used. For now, the well-defined term is ALIGNMENT, VARIATION, MOTIF DISCOVERY, PATTERN DISCOVERY. The <path> tag simple points to the directory that the main service is in and <executable> tag holds the name of the application that will be run.

In the <format> tag, several terms are special and are replaced by appropriate values when the script is run.

            1) exe_path is replaced by the contents of the <path> tag.

            2) executable is replaced by the contents of the <executable> tag.

            3) dna_sequence is replaced by the location of the sequence files.

            4) parameter is replaced by the services specific parameters.

<allow_stderr_preview> tag defines if the standard error preview is allowed. <allow_stdout_preview> tag defines if the standard out preview is allowed. <set_output_class_name> tag defines the class name used to setup the output. <parsing_class> tag defines what class will parse the outputfile. Currently supported parsing classes are: ca.bcgsc.chinook.server.runner.alignment.Lagan for Fasta, ca.bcgsc.chinook.server.runner.alignment.clustalw for GCG. If a parsing class isn’t defined, you will have to make one in the runner package of the Chinook server code.

The <output_path> tag points to your temporary directory for formatting files. The <description> tag points to a description of the service. The <creator> tag is the website of the original author of the services implementation (not the service providers). So in the case of Lagan, it points to the site at Stanford. The <version> is the version of the application.

The <data_entry_set> tag defines the data formats. The first tag <name> defines the name of the data, which is used in <format> tag. The <maximum_count> defines the maximum number of sequences. If there are no maximum number of sequences can be supplied to the application. This tag does not need to be defined. The <minimum_count> defines the minimum number of the sequences being supplied to the application. <data_entry_type_name> defines the name to get the sequences. <set_output_class_name> defines what class is used to setup the output.

The <parameter> tag is another special tag in the XML description of your service. These tags define what parameters you want your client to input, what parameters you'd prefer they didn't, and what default values you'd like to maintain. The <parameter> tag offers extensive control over how your client uses your service.

The first part of the <parameter> tag if the <descriptor> tag. This tag defines what type of parameter it is. Fundamentally, there are two types STRING and empty (boolean). When defining the <descriptor> tag, define it as the name than the type, i.e. tree_STRING tells Chinook that you have a parameter called tree that needs a string whereas translate_ tells Chinook that you have a parameter that is either there or it isn't; for instance,

mlagan -tree “mytree” -translate

The <regex_format> tag describes the regular expression that you want your input string data to match. This is a security feature that allows you to guarantee that parameters will be inputted in the way that you expect to get them – as users are prevented from entering parameters that don't match. This tag is not required however.

The <description> tag describes to the user what this parameter does. The <user_defined> tag tells Chinook whether you want the user to be able to change this parameter; it has two options true or false. The <use_equals> tag tells Chinook whether the parameter is of the form -tree=value or -tree value. It takes two options true or false. The <on> tag tells Chinook whether this boolean parameter is active or not by default. Finally, the <default_value> tag holds the data that you want this parameter to input, i.e the default string data. It is very possible as a service provider to use these tags in way that doesn't make sense. Be very careful about what you want users to do and what the default parameters are. If you find that something is missing to fully describe your input parameters, e-mail me at smontgom@bcgsc.bc.ca

After, you have defined your new service you should be able to share it with the world using Chinook. Visit our online applet to see if our discovery service has picked it up (in development).

Back to Table of Contents

5.0.3 Setting up a Chinook server node

This walkthrough will guide you through setting up a Chinook server.

To make Chinook available to Internet users, you will have to have as a prerequisite:

1)      A computer where bioinformatics applications can be installed and run (i.e. you will not be able to provide ClustalW analyses if your computer cannot run it as is)

2)      Open required ports on the computer.  You will need ports 9700 and 9701 open for JXTA communication.  You will also need either port 1099 (for RMI mode) or port 8080 (for Web Services mode).  These are the default ports; other ports can be selected in place of these in case of overlapping services (you will need to edit the advertisement and resource files if you are not using a default port).

3)      A working directory.  Ideally, your computer will have at least 100MB of hard-drive space to store temporary files.

To set-up a Chinook server node:

1)      Ensure your have the right hardware dependencies (see above).

2)      Install Java. 

a.       Go to http://java.sun.com

b.      Look for Downloads

c.       Download the 1.4.x version of the J2SE JDK.

d.      Set the JAVA_HOME environment variable to the installation directory of Java (i.e. export JAVA_HOME=/usr/lib/java).  You made want to do this in a configuration script so that this environment variable is preserved.

3)      Install Chinook (see Section 2).

4)      If you are planning to use the Web Services version of Chinook, you will need to install Tomcat and Apache Axis.  Read the inset for instructions on how to do this.

 

 

 

 

 

 

 

 

 

 

 

 

 

Installing Tomcat and Apache Axis:

 

5)      For the Web Services version of Chinook, you can now deploy the Chinook code.  This is done by running the Ant task deploy from the Chinook installation directory.  If the CATALINA_HOME environment variable has not been set, this task will fail.

a.       Run the command: ant deploy

b.      If Ant is not installed, follow the installation instructions below.

Installing ANT

6)      Before you run the server, in either RMI or Web Services mode, you will need to configure the server for your machine.  Go to the Chinook installation directory.

a.       Editing the files in the resources/ directory

                                                              i.      The most important file to edit is applications.xml

                                                            ii.      You will want to comment out the protocol block you are not using and comment out the protocol block you are using.  For instance, in RMI mode the following would appear:

                                                          iii.      IMPORTANT: Change the name of the uri to your machine name.  Do not use localhost.

                                                          iv.      Change the publisher information to reflect your user information.  This will allow users of your server to contact you.  This information should also be set in more detail in the server-info.xml file in the same directory.

                                                            v.      IMPORTANT: We have not installed any new services into Chinook.  When new services are added, they are described in the applications.xml file.  If this file contains services, you should comment them out as your server would end up advertising services that do not exist at your location.

b.      Editing the files in the advertisements/ directory

                                                              i.      The advertisements that Chinook uses are specified in advertisement-config.xml in the resources/ directory.

                                                            ii.      To ensure that you are advertising the right endpoint for your services, edit the advertisement implementation files.  Change it from localhost to your machine name.

7)      That is all there is to it.  You MUST still install services.  But to run Chinook start ant p2p-start then wait a few seconds (until the p2pNode has found a rendezvous ~ 10-15 seconds) then type ant server-start.  NOTE: Check the Axis configuration page to ensure that your service was deployed.  If it was not, first try stopping and restarting the Tomcat server.

8)      Test out your services by running a client.

Troubleshooting:

If you are not able to see the deployed services, look in the Tomcat log/ directory to determine the source of the error.

The server-config.wsdd file is not found. 

Copy this file from elsewhere in your Tomcat installation (it will likely be in the work/ directory).

I get a 401 error from ANT and the log says: - Rejected remote access from host /0:0:0:0:0:0:0:1

The server-config.wsdd file needs to allow remote administration.  Set <parameter name="enableRemoteAdmin" value="true"/> for the AdminService.

Other error

Send the tailing lines of your log files and the ant execution information to chinook@bcgsc.bc.ca. 

 

Back to Table of Contents

5.0.4 Running a batch Perl job

The Perl interface to Chinook allows you to add analysis capabilities directly into your scripts.  This walkthrough will outline how to find services, submit jobs, and read reports from using the Perl interface.

Follow steps:

1)      Configure your Perl environment. (see 3.0.4.1)

2)      Start the Chinook Client in batching mode (see 3.0.4.2). 

3)      The example Perl scripts for Chinook batching are located in the perl/t/ directory from your Chinook installation directory.

4)      How to Find Chinook Services using Perl.  One of the first programmatic tasks is to discover what services are currently available for running over Chinook.  There are two ways to input discovered services into your Perl script.  This can be either performed by parsing the batch directory using the Bio::Tools::Run::Chinook::BatchDirectory module or by asking the Chinook Client directly using the Bio::Tools::Run::Chinook::ChinookManager module.

a.       To parse the batch directory:

my $batch_directory =

      Bio::Tools::Run::Chinook::BatchDirectory->new(  

         batch_directory => "/home/smontgom/batch/batch");

my $services_ref = $batch_directory->getAllServices();

 

b.      To use the Bio::Tools::Run::Chinook::ChinookManager:

my $chinook_man =

      Bio::Tools::Run::Chinook::ChinookManager->new(              

         machine_name => "localhost", port => "7999");

my $services_ref = $chinook_man->getServices();

 

 

Both of these methods get the current services that are available.  (Each returns a list of Bio::Tools::Run::Chinook::Service objects)  However, only the Bio::Tools::Run::Chinook::ChinookManager is guaranteed of being current as service descriptions are not removed from the batch directory unless manually removed.  The recommended method is to use the Bio::Tools::Run::Chinook::ChinookManager to get current information whenever possible. 

5)      Once a service has been selected, you will need to access a batch file (create a Bio::Tools::Run::Chinook::Batch object) for that service to get the required parameters and supported databases.

6)      How to Get Information about Required Data and Parameters using Perl.  To create a Bio::Tools::Run::Chinook::Batch object, call the following ( the filename of the batch file can be accessed from the  Bio::Tools::Run::Chinook::Service  object or it will have to be discovered in the batch directory if the Bio::Tools::Run::Chinook::ChinookManager doesn’t provide the information)

my $batch = Bio::Tools::Run::Chinook::Batch->new(

batch_filename => $batch_filename);

 

 

Once, you have a Bio::Tools::Run::Chinook::Batch object, you can interrogate the required DEOSets (Data Entry Object Sets) that are required to run the Service.

NOTE: Data Entry Object Sets (DEOSets).  DEOSets are the data requirements for various services.  A DEOSet contains information about what types of data objects a server is expecting, how many of them it requires, and the data itself (in the form of a DEO, Data Entry Object).  Each DEOSet has a name that references it to the server as the data that is contained in a DEOSet is specifically manipulated to allow the associated service to access it in a desirable way.  To use a DEOSet, you must first determine what DEO types are acceptable.  Examples of the types are “DNA_LOCATION” or “DNA_FILE”, each describes a specific type of data that the server knows how to interpret.  In this case, the server can either take a file containing DNA sequence or the specific coordinates of a genomic sequence (from one of the supported databases on the server).  You will need to ask the server to provide an outline of the data that is required.  This is done in Perl by using the Bio::Tools::Run::Chinook::ChinookManager to get the associated DEO for a DEO name, as so:

my $deo = $chinook_manager->getDEO("DNA_LOCATION");

 

7)      Fill in the required DEOSets.  The process to do this is in the test scripts in the perl/t/ directory.  Essentially, first get the DEOSet objects from the Bio::Tools::Run::Chinook::Batch object.  Iterate through each DEOSet and determine how many DEOs are required and what the allowed DEO types are (i.e. DNA_LOCATION, etc).  Once you have found a DEO type that is suitable for your purposes, get the DEO object from the ChinookManager (see the Note above).  From the DEO object, you can access all the attributes of this DEO and fill them in.  (These will be validated when you submit the job if a mistake is made).  You can set various properties like the DEO order and name to specifically reference how and in what order the server should process the data.   (A negative number for the order means that the client doesn’t care what order the data is in).  See the testBatch.pl scripts for examples of how this is done!

8)      Once the data has been filled in and set, the required parameters need to be set.

9)      How to Set Required Parameters using Perl.  The parameters that are required for any given service are accessible from the Bio::Tools::Run::Chinook::Batch object.  An example of how to set all the Boolean parameters to false is described below.

my $parameters_ref = $batch->getParameters();

my @parameters = @$parameters_ref;

foreach my $parameter (@parameters) {

  if ($parameter->getType eq "BOOLEAN") {

     $parameter->setValue("false");

  }

}

 

 

10)  How to Run a Chinook Job using Perl.  To run a job in Chinook using the Perl interface, you finally need to create a Bio::Tools::Run::Chinook::QueueBatch object.  Once the parameters and DEOSets have been set, this can be done as below:

my $queue_batch =

Bio::Tools::Run::Chinook::QueueBatch->new(   

       deosets => $deosets,

       parameters => \@parameters,                                                    

       batch => $batch,                                                                  

       filename =>

       "/home/smontgom/batch/batch_queue/chinook.LAGAN");

 

     

The Bio::Tools::Run::Chinook::QueueBatch object requires a filename which will become the prefix for the XML file containing all the information required to run the job over Chinook.  This file is written by calling:

my $id = $queue_batch->writeQueueBatch();

 

 

Then the ChinookManager is used to point the Chinook Client at the batch queue file for processing on the server.  An example of this is below:

$chinook_man->processQueueBatch($queue_batch);

 

 

11)  Accessing Reports using Perl.  The ID that was returned when the Bio::Tools::Run::Chinook::QueueBatch object was written to file is used to reference that file to resulting report files that are written to the batch reporting directory.  To monitor completion of reports, you can periodically poll the reporting directory (substituting your reporting directory in place of the one provided below).  An example of this is below.

$filename = $chinook_man->isReportReady($id,

"/home/smontgom/batch/batch_reporting/");

 

 

12)  If the filename is defined, the report has been written to the file.

13)  Getting the Bio::Tools::Run::Chinook::Report object.   To get the Report object, once the filename has been defined, call:

my $report =

Bio::Tools::Run::Chinook::Report->new(

report_filename => $filename);

 

 

14)  From the report object, you can get information about warnings, errors, or the information required to download the result files from the server.

15)  Downloading Results from the Report.  To download the results from the report file, first determine the canonical output file.  This is the file that the results are written to (you can also access other files that describe how the job was run and what was available on various streams).  To download results to a sample file, follow the example below.

my $report_file = $report->getCanonicalReportFile();

 

my $sample_file = "/tmp/results.out.steve.3";

 

$chinook_manager->downloadFile(

$report_file->getFileId(),

$report->getServiceLocation(),

$sample_file);

 

16)  Congratulations.  That should cover the basic process for running jobs via Chinook.  There are still lots of steps required that we hope to reduce the complexity of.  There is also lots of uncovered functionality that can be observed by looking at the modules and reading the associated Perldocs.  But, with a little bit of effort, your scripts and users can access state-of-the-art algorithms without having them downloaded.

Back to Table of Contents


6.0 Further Information

C

Hinook is funded by Genome Canada as part of the Bioinformatics of Mammalian Gene Regulation grant. Stephen Montgomery is a Ph.D. graduate student in Genetics at the University of British Columbia. He is funded by the Michael Smith Foundation for Health Research. His work is performed primarily at Canada's Michael Smith Genome Sciences Centre as part of the Gene Regulation Informatics team. Steven Jones is the Head of Bioinformatics for the CMSGSC. He is a Scholar of the Michael Smith Foundation for Health Research. Currently, there is no source code available for Chinook. The source though is licensed under Creative Common's Attribution-Non-Commercial license and is freely available on request to chinook@bcgsc.bc.ca. If you have any questions and find any bugs in Chinook, please email us.

6.0.1 Mailing List

The Chinook mailing list is a low-volume regulated list that broadcasts weekly development announcements. We recommend you to sign up the mailing list at http://www.bcgsc.ca/mailman/listinfo/chinook. You will get the latest information about Chinook (including upgrade, bug fix). You can also view the Archive at http://www.bcgsc.ca/pipermail/chinook/.

Back to Table of Contents

6.0.2 Authors

Montgomery SB, Fu T, Guan J, Jones SJM (in preparation).

 

Chinook Internal:

Chinook Service Developers:
Monica Sleumer, Keven Lin, Tamara Astakhova, Jun Guan, Maik Hassel, James Kennedy, Eddy Tsang, Yvonne Li, Tony Fu

Other thanks:
Asim Siddiqui, Misha Bilenky, Gordon Robertson

 

Chinook External:

Jonathan Lim, Wyeth Wasserman, David He, Sohrab Shah, Francis Ouellette

 

Back to Table of Contents

6.0.3 Known Problems

1. Web pages may not be displayed properly in the Lightweight Web Browser. This is because Chinook is a Java application, and JEditorPane, which is used to display web pages, only supports Html 3.2 currently. Any web pages created using html version above 3.2 will likely not be displayed properly.

Back to Table of Contents

6.0.4 License

Creative Commons LicenseChinook is licensed under a Creative Commons License.

 

Back to Table of Contents