Aggregating and analyzing WGA (Whole genome alignment) data is sufficiently troublesome. This site provides utilities and services that are useful for manipulating EXISTING WGA datasets for human (Homo sapiens), mouse (Mus musculus), and rat (Rattus Norvegicus). Particularly,
Perl data manipulation scripts
A MySQL database
A WGA Web service over Tomcat
SOAP::Lite and Apache Axis web service examples
As of February 17th, 2004 we support both Berkeley and UCSC Human-Mouse-Rat alignment datasets. However, we will attempt to add more species as data is made available. (The current schema's chromosome focus and the low coverage of the chimp data doesn't make the addition of this species very fruitful right now, that is just my opinion though)
If you find this resource extremely useful, please feel free to acknowledge it and/or Stephen Montgomery
Web service for human-mouse-rat alignments
To access the hmr data directly using perl or java, we have set-up a web service on coast.bcgsc.bc.ca. NOTE: To run the perl client, you must have the SOAP::Lite module installed. You can get this module here, Download SOAP::Lite. We have tested this with version 0.60 Beta 1 for UNIX.
Services
This document describes the services that are exported over this web service. This spec is written in java but can be applied to perl by looking at the examples below. MGA-Spec.pdf
Download the web service Java code version 1.0 here. Examples are provided within under the junit classes. Run using ant test. mga.tar.gz Version 1
MySQL service:
To access our database, connect to db02.bcgsc.bc.ca using user "ensembl" and pass "ensembl". (These are named ensembl because of db02.bcgsc.bc.ca also acts as an EnsEMBL mirror for Sockeye). The UCSC data is in the hmr_ucsc database and the berkeley data is in the hmr_berkeley database. The connection information may change soon to force users to obtain individual accts. (Allows us to kill runaway queries in a user specific manner - instead of just killing everybody).
Login
User: ensembl
Pass: ensembl
Host: db02.bcgsc.bc.ca
Web-based
To view the contents of the hmr database on db02 and run read-only queries through a web browser, click HMR Database viewer
Scheme Diagram
Building the Berkeley data
The Berkeley data comes in XMFA format. To build the berkeley data, we manually downloaded it from the URL above and ran the following build steps
Changed to the tables/ directory and ran mysqlimport -u smontgom -pMYPASS -h db02 hmr_berkeley *.txt.table
Note: Each script may require you to check the input parameters. I will create a tar.gz in the future to download these so they work out of the box. There are a few absolute paths right now to support running jobs on a cluster.
Building the UCSC data
UCSC data comes in axt format, while Berkeley data comes in a XMFA format. To build the UCSC data, we first converted this to XMFA and then transfer our XMFA data to the table schema above. There are individual tables for each chromosome
Changed to the tables/ directory and ran mysqlimport -u smontgom -pMYPASS -h db02 hmr_ucsc *.txt.table
Note: Each script may require you to check the input parameters. I will create a tar.gz in the future to download these so they work out of the box. There are a few absolute paths right now to support running jobs on a cluster.
Contact:
Contact Stephen Montgomery, smontgom@bcgsc.bc.ca for more information. All scripts on this page are copyright under the Mozilla Public License (MPL). The MGA Service code is licenced under the Creative Common's Attribution-NonCommercial license. If they are useful for anything, please let me know.