. . . .

Training resources

While we are setting up training courses and workshops with a focus on sequence analysis there are a number of local and remote resources offering excellent training opportunities.

Local resources


The Countway Library is a great resource for learning bioinformatics, but courses fill up quickly. To find out scheduling and availability make sure to check on the main course site periodically for updates and
 attempt to contact the course instructor directly to express your interest if the course is already booked.

Past courses have included hands-on Ensembl Genome Browser workshops, Introduction to BioConductor/R, and practical sessions on analyzing gene expression arrays.


The Broad Institute has a new initiative — BroadE workshops that are open to scientists from the Harvard and MIT communities. They focus on insights and hands-on training in rapidly evolving technologies, high-throughput methods, and computational tools that are not typically found in conventional research labs.

Initial workshops included courses for GenePattern, the CellProfiler and RNA-seq methods; new courses include:

  • GATK and Variant Detection
  • Genome Assembly and Analysis
  • RNA-Seq Part II and III
  • Integrative Genomics Viewer (IGV)
  • Workshop on Metabolomics: Methods and Data Analysis
  • Cancer Genome Analysis
  • Proteomics


Periodically, Harvard Medical School offers Nanocourses on a variety of topics that may be of interest. Previous courses have topics like Next Generation Sequencing and Genome-Wide Association Studies. Check their site for information and updates.

Harvard Catalyst

Harvard Catalyst offers Nanocourses as well. Past courses have addressed a variety of topics including Proteomics and Epigenetics. Check their site for information and updates. You can also explore the Harvard Catalyst Education Video Library by course, category, or keywords.

External courses and workshops

Cold Spring Harbor Laboratories

CSHL offers fantastic immersive hands-on programs, but they are not free. The yearly course on Programming for Biology costs almost 4000$ (including board and lodging), but provides an a terrific experience for students and researchers with little or no prior programming experience.

The two-week course starts with introductory coding and continues with a survey of available biological libraries and practical topics in bioinformatics. Students end by learning how to construct and run powerful and extensible analysis pipelines in a straightforward manner. The course combines formal lectures with hands-on sessions in which students work to solve problem sets covering common scenarios in the acquisition, validation, integration, analysis, and visualization of biological data. As part of a final project during the second week of the course, students will pose problems using their own data and work with each other and the faculty to solve them. Final projects have formed the basis of publications as well as public biological websites (see, for example, http://bio.perl.org/wiki/Deobfuscator).

Other CSHL courses cover advanced sequence technologies and their applications, or focus on comparative genomics, but require prior experience in computational biology.

MSU Summer Course

Titus Brown organizes a great summer course on NGS that unfortunately fills up very quickly each year. Keep an eye out for the 2013 registration to open; in the meantime the course team usually makes all data and course materials freely available at the conclusion of the course.

UC Davis

The Bioinformatics Core at UC Davis offers extensive courses and workshops revolving around NGS data analysis, and they are open to all scientists. Best of all, all training is offered using the Galaxy platform (see below) and course materials are freely available for you to explore and work through.


Most advanced bioinformatic analyses will require you to learn to program at least at a basic level. A basic familiarity with the Unix command line also helps, and the Unix Primer is a great starting point.

We recommend learning at least one scripting language (e.g., Python, Ruby or Perl) and at least one statistical analysis language (we prefer the open source R language over closed source alternatives such as SPSS, SAS or Matlab).

We recommend running R from within the Rstudio program. Rstudio lets you write and run code, install packages (to extend the abilities of R) as well as view files and plots. Much of the R help literature can be confusing but here are some resources we suggest examining.

Best practices
Getting started with programming can be daunting. The Software Carpentry project aims to help scientists to become better programmers by providing introductory modules on best practices for writing code, data management, keeping track of versions and testing your algorithms. Well worth the time investment.


Not everyone has the time or interest to get involved with programming or implementing their own algorithms. Galaxy is an open, web-based platform for accessible, reproducible, and transparent computational biomedical research. Users without programming experience can easily specify parameters and run tools and workflows, while Galaxy captures information required to repeat previous analysis, share results or create interactive, web-based documents that describe a complete analysis.

Getting started
The Galaxy wiki includes a lot of informative documentation, and they try very hard to keep it current.
 To start learning Galaxy we recommend you go to their learning page and work through the Galaxy 101 exercise. When you feel ready, you can start using Galaxy by jumping straight to the “Galaxy interface itself and start exploring.

CHB is in the process of making a HSPH/HSCI Galaxy instance available (contact us for details); in the meantime BioCloudCentral provides you with an easy way to launch your own, personal Galaxy server using Amazon’s EC2 environment.


Hands-on guides are more difficult to provide as technology and best practices change rapidly. In most cases a Google with carefully chosen search terms will still give you best results. For more specialized questions

  • SeqAnswers, a bioinformatics community forum focused on next-generation sequencing
  • Biostar, a community forum where members vote for the best answer (so you don’t have to wade through incorrect responses)

A number of workshops also make course materials available:

For a much more exhaustive and current review of current courses in computational biology, please see David Searls article in PLoS Computational Biology, An Online Bioinformatics Curriculum.