CS6024 - Algorithmic Approaches to Computational Biology

Course Data :

Course Objectives

Invite students to tour algorithms on strings, trees and graphs that have transformed the field of biology and medicine in modern times. Importantly, offer students an interesting perspective of how algorithmic/statistical research is done in the field of computational biology and bioinformatics, and thereby inspire them to formulate their own fresh computational problems to address intriguing biological questions.


In each course section below, we start with an intriguing biological question, interactively formulate a computational problem addressing the question, present a key algorithmic technique or statistical model (e.g., dynamic programming, randomization in algorithms, evolutionary tree model) solving the problem, and end with a students-led research paper that will spur a final course project. The idea is to gradually lead students from solved problems in bioinformatics to open research problems and eventually to have them formulate their own fresh research problem.
  • Where in the Genome Does DNA Replication Begin? - Algorithmic warmup (frequent exact/inexact k-mers in a string) .
  • Which DNA Patterns Play the Role of Molecular Clocks? - Randomized Algorithms (randomized motif search, Gibbs sampling) .
  • How Do We Assemble Genomes? - Graph Algorithms (Eulerian paths, de Bruijn graphs) .
  • How Do We Compare Biological Sequences? - Dynamic Programming (edit distance, single/multiple sequence alignment) .
  • Which Animal Gave Us SARS? - Evolutionary Tree Reconstruction (distance-based phylogeny, neighbor-joining algorithm) .
  • How Did Yeast Become a Wine Maker? - Clustering Algorithms (hard and soft k-means) .
  • How Do We Locate Disease-Causing Mutations? - Combinatorial Pattern Matching (suffix trees/arrays, Burrows-Wheeler transform) .
  • Why Have Biologists Still Not Developed an HIV Vaccine? - Hidden Markov Models (Viterbi and forward-backward algorithms) .
  • Was T. rex Just a Big Chicken? - Computational Proteomics (peptide identification and spectral match) .
  • Which Motifs Are Hidden in a Biological Network? - Randomized Algorithms (color coding for long paths in graphs) .
Other new topics from latest research or from book (or cover subset of topics above depending on time) (The course contents, style and occasional code challenges would follow closely the active learning style of the referred textbook and its affiliated programming platform.)


1. Bioinformatics Algorithms: An Active Learning Approach, 2nd Edition, Vols. 1 and 2. Phillip Compeau, Pavel Pevzner. 2015.
2. Algorithms on Strings, Trees and Sequences. Dan Gusfield. 1997.


1. Biological Sequence Analysis. Richard Durbin, Sean R. Eddy, Anders Krogh, Graeme Mitchison. 1998.
2. Rosalind bioinformatics programming platform. http://rosalind.info/problems/list-view/?location=bioinformatics-textbook-track .


  • CS2800 or EE4371 or BT3051 or MA5910 or equivalent
  • None


Credits Type Date of Introduction
4-0-0-0-8-12 Elective Mar 2018

Previous Instances of the Course

© 2016 - All Rights Reserved - Dept of CSE, IIT Madras
Website Credits