Quick update before the end of the month:
The long-awaited paper from Eric Lander on missing heritability has been published and is causing a new round of discussions, following the initial debate after the ASHG announcement. Luke has a short summary over at Genomes Unzipped, and Steve Hsu delves into the supplementary material. A good addition to this is Genetic Inference’s report on the current state of complex trait sequencing via ICHG2011.
BGI has started to move the sequence analysis to GPU-based servers, though the article in Wired is unfortunately light on details what algorithms ended up getting ported to the different architecture.
IonTorrent meanwhile starts supporting paired-end reads, sort of, and announces a new machine which is a bit of a disappointment as one of their main selling arguments was that improvements would happen through the chips, not the hardware around it. Be that as it may, we are getting close to the $1000 genome — for the data generation, which doesn’t take the very time consuming data analysis into account. This is also reflected in my favorite quote of Elaine Mardis’ interview with The Scientist:
“It makes me crazy to listen to people say sequencing is so cheap. Sure, generating data is cheap. But then what do you do with it? That’s where it gets expensive. ‘The $1,000 genome’ is just this throwaway phrase people use and I don’t think they stop to think about what’s involved. When we looked back after we analyzed the first tumor/normal pair, published in Nature in 2008, we figured that genome—not just generating the data, but coming up with the analytical approach, which nobody had ever done before—probably cost $1.6M. If the cost of analysis doesn’t fall over time, we’re never going to get to clinical reality for a lot of these tests.”
This is not going to get any easier as the sequencers get more and more efficient; see Illumina’s announcement of the HiSeq 2500 (summary by OmicsOmics and on the SeqAnswers forum). And though the price of the reagents keeps decreasing it’s still cheaper to store the data than to re-sequence the samples, storage problems notwithstanding.
Michael Eisen has a comment in the NYT on the Research Works Act that’s recommended reading. If you are a member of ISCB you might want to consider signing their policy statement which strongly opposes the act.
- If you aren’t following Edge you are missing out on some great science debates. The Guardian talks to its founder, John Brockman
- TopHat gets a new release
- Aaron Kitzmiller has a terrific commentary regarding the Core model in academia, and why this incredibly difficult to get right for bioinformatics
- St Jude’s releases Explore, a portal to their pediatric cancer genome data
- Keith Bradnam summarizes why it is so difficult to evaluate genome assemblies
- Dan Koboldt provides a neat summary of the current state of dbSNP
Back in the office after the holidays and quite some catching up to do.
Growth in genomics
Coverage of computational biology and genomics in the general media continues to increase. This time the Economist covers bioinformatics and the New York Times has an article on computational biology and cancer. And while public funding for some of the current genome centers is cut by as much as 20 percent new centers in New York and Connecticut are hoping to benefit from the increase in demand.
Clinical grade sequencing
Some of this demand is driven by a general trend towards clinical sequencing which is benefiting the Broad and other centers. Given the inroads made to identify causal mutations (nicely summarized by MassGenomics for disorders and cancers) in 2011 this is perhaps not surprising, and even clinical trials for personal genome sequencing are kicking off.
While the technology is making rapid progress we still have to deal with a large number of problems, among them the handling of genomics and privacy
, how to make sense of the results in the first place — something that new initiatives like openSNP are trying to address — and the discrepancies caused by differences in sequencing technologies and data processing. We will need a public assessment or competition of workflows and methods, similar to what the Assemblathon and GAGE have been doing for genome assembly approaches.
Resources and discussions
- Neil Saunders has started a lovely series on next-generation sequencing NGS for those familiar with good old Sanger
- A renewed discussion about public access to research funded by tax payers
- More thoughts on hypothesis vs data driven science, and also from Daily Life ruminations on data intensive science and workflows
- If you are new to the field of metagenomics this review is a great starting point
- New tool for quality control in RNA-seq: RNA-SeQC and an alternative workflow
- Two additional papers listing biases in RNA-seq introduced by barcoding
- OmicsOmics provides a comparison of IonTorrent and MiSEQ
- Our own Brad Chapman provides a summary of BioCloudCentral, a simple way to get started with a sequence analysis using Galaxy and CloudMan
- Speaking of which, CloudMan gets an update
- James Taylor wrote up a Galaxy tutorial for ChIP-seq analysis
- SomaticSniper has been released, hopefully the first of many tools made available by WashU
See you in a week or two!
A quick summary of interesting publications, articles and resources the CHB staff encountered this week:
News and publications
- Reproducible research received coverage last week, most notably from articles in Science (commentary by Robert Peng in Simply Statistics) and The Wall Street Journal. While tools like Sweave or Knitr help by keeping code and results in sync the consistent reproducibility of research is far from a solved problem. In particular, different versions of databases or bioinformatic tools increasingly cause problems in next-generation sequencing experiments. A problem closely associated with the question of reproducible research is the handling of large scale data storage, including issues around data security, redundancy and versioning which also is slowly starting to make headlines.
- The WSJ also covered Open Science and Citizen Scientists in the same issue (also see commentary from Jeff Leek). Researchers, scientific journals and funding agencies are still struggling trying to find the best approach to foster collaboration and give data back to the public, and any raised awareness through articles like these helps
- Daniel McArthur — who recently accepted a new position at MGH — takes down the Independent’s coverage of a sleep study. Recommended reading.
- The European Bioinformatics Institute launches a metagenomics framework
- Concise summary of the current next-generation sequencing field from The Molecular Ecologist, based on an article by Travis Glenn. Highly recommend as a resource for planning and budgeting the next sequencing experiment
- Dan Koboldt comments on the findings of a new publication from the Shendure group on filtering strategies for exome seq and the need for matched normals
- Yet-another-learning-R-book, but the new The Art of R Programming is getting stellar reviews
- And if you decide to delve into Python, Mir Nazim published an excellent overview of the current Python ecosystem
See you next week!