Peer-to-peer systems for distributed analysis

The recent explosion of growth in peer-to-peer (P2P) systems is largely fueled by consumer demand for multimedia files (music, video, software). Estimates report 60-80% of the global ISP traffic is utilized for this activity. However, what is legitimate, successful technology often gets bogged down in the quagmire of important issues related to copyright infringement and Internet security. In my experience, it is safe to assume that the majority of people do not trust P2P systems; either in terms of protecting their data, or allowing them to operate without infringing on proprietary media. This is a reasonable assumption considering that the majority of the publicity for P2P technology is derived from legal proceedings between individuals who have unlawfully reproduced media and the respective copyright owners. In so much, as that the term "peer-to-peer" now invokes imagery of these widely publicized, ongoing legal battles that consequently puts this technology into a negative light. However, groups at Stanford and United Devices/IBM have been demonstrating that the application of this technology, when administered by an authority can expedite and solve very important questions related to Molecular Biology to Healthcare to National Defense. At the Chinook Project, we are trying to remove the authority, by implementing a decentralized peer-to-peer system, that enables academics to capture and utilize the instantaneous state of development in the field of bioinformatics.

Why bioinformatics?

Bioinformatics is fundamentally a new science driven by the explosion of data from worldwide high-throughput experimentation; specifically, assays ranging from sequencing the human genome to the determination of transcript expression in various diseases, tissues, and stages of development. The principle bioinformatics activity is to derive patterns in large datasets and use these examples to predict outcomes in new datasets. This principle activity has resulted in the generation of an unwieldy number of available utilities. Now, most users of bioinformatics software use particular software because of their own affiliations, the software's popularity, or it was simply what they were originally exposed to. Few bioinformaticians and most certainly an even fewer number of biologists keep up with emerging advances in alignment software, let alone advances in state-of-the-art across the bioinformatics field. Ultimately this reduces the competitiveness of the field and increased the importance of good affiliations and exposure. Clearly, a unified approach to being able to access and utilize these algorithms would improve the exposure of new tools and facilitate comparison by diverse sets of users. Furthermore, if computers providing utilities were providing redundant utilities, subsequent computational analysis could be expedited over public systems.

Why the Chinook Project?

The Chinook Project aims to create a platform for any community trying to manage distributed computation and a large number of available utilities. Whether for integration into a workbench application, for accessing over a script or web application, or for being able to maintain the state of development (the state-of-the-art) in a heterogeneous environment. Decentralized peer-to-peer technology enables communities to be self-forming (whether for public or private use), failure resilient, and administration independent. We promote usage of this technology by making Chinook freely-available and open source under the LGPL license, promoting the access and manipulation of this technology to novel problems.

Currently, data submission through Chinook is as secure as submitting data to any Internet bioinformatics web application. Future versions plan to include user authentication and SSL pipes

 

 

top Canada's Michael Smith Genome Sciences Centre | Genetics Graduate Program (UBC) | Want bioinformatics training??? | Vancouver Bioinformatics Users Group

(c) 2004 Stephen Montgomery, Canada's Michael Smith Genome Sciences Centre