Graduate Education

Term Title Course Number Instructor
1st Perl For Bioinformatics SPH 140.636 Fernando Pineda
2nd Genomics SPH 260.605 Jonathan Pevsner
2nd Analysis of Biological Sequences SPH 140.638 Sarah Wheelan
3rd & 4th quarters Statistics for Laboratory Scientists I & II SPH 140.615
SPH 140.616
Ingo Ruczinski
3rd term Practical Machine Learning: Methods and Algorithmics SPH 140.644 Hector Corrada Bravo
4th term Statistics for Genomics SPH 140.688 Jeff Leek, Rafael Irizarry & Hong Kai Ji

 

pdfClick here for a printable version.

Course Descriptions

To register for courses, visit JHU's ISIS site.

Perl for Bioinformatics / SPH 140.636

Uses the PERL programming language to introduce skills and concepts needed to process and interpret data from high-throughput technologies in the biological sciences. Key concepts are introduced and reinforced through lectures with live computer demonstrations, weekly readings, and programming exercises. Excercises and examples draw heavily from biological sequency analysis as welll as real-world problems in proteomics and genetics. Guest lecturers present case studies of PERL and UNIX usage in scientific investigations. Students are introduced to bioinformatics software-development resources available online and to necessary computer science fundamentals.

Student evaluation based on homework and programming project.

Course learning objectives: Upon successful completion of this course, students will have: 1) a working knowledge of the Perl programming language (including the ability to (1) read and write perl scripts, and (2) download and use perl bioinformatics libraries, e.g. bioperl); 2) an understanding of programming techniques and styles, e.g. top-down vs bottom-up programming, debugging and object oriented programming; 3) an understanding of key fundamental concepts from computer science including notions of data structures, algorithms and computational complexity; 4) the ability to organize the processing of large amounts of data from high-throughput biology experiments; 5) the ability to write automatic scripts that query local and web-based biological databases; and 6) the ability to search and use the wealth of software development resources available on the web, e.g. cpan.org, sourceforge.net, and bioperl.org.

Class Times:
Monday 1:30 - 2:20
Wednesday 1:30 - 2:20
Friday 1:30 - 2:20

Lab Times:
Friday 10:30 - 11:50

Instructor Consent Required

Return to course list


 

Genomics / SPH 260.605

Explores genomes across the tree of life, using the tools of bioinformatics. Topics include viruses; bacteria and archaea; protozoa (e.g. Plasmodium); plants (with a focus on Arabidopsis and rice); the fungi; the metazoans (Drosophila, C. elegans, the rodents, the primates, and human). Each lecture highlights features of the relevant genome(s), key websites and bioinformatics tools, the phylogenetic context in which to understand the significance of the organism, and genomics-based approaches to human disease. Weekly computer labs introduce students to genomics software available on the internet, including tools for genome annotation, comparison, and analysis.

One midterm exam; one final exam; weekly quizzes.

Course learning objectives: After successfully completing this course, you will be able to do the following: • Define the main features of viral, prokaryotic, and eukaryotic genomes • Define the relevance of various genomes to human disease • Use web-based tools for genome analysis (e.g. annotation and comparison) • Read papers on the sequencing of the human genomes and other genomes, and evaluate the quality and limitations of the analytic approaches.

Class Times:
Monday 10:30 - 11:50
Wednesday 10:30 - 11:50

Lab Times:
Friday 10:30 - 11:50

Return to course list


Analysis of Biological Sequences / SPH 140.638

Presents an algorithmic approach to modern biological sequence analysis. Provides an overview of the core algorithms and statistical principles of bioinformatics. Topics include general probability and molecular biology background, sequence alignment (local, global, pairwise and multiple), hidden Markov Models (as powerful tools for sequence analysis), gene finding, and phylogenetic trees. Emphasizes algorithmic perspective although no prior programming experience is required. Covers basic probability and molecular biology in enough detail so that no prior probability or advanced biology classes are required.

Homework 60%, presentation plus written critique 30%, attendance 10%

Course learning objectives: The general goal of the course is to provide students with an in-depth understanding of the algorithms and modeling ideas behind common tools used in genomic sequences research. Students are expected to develop the ability to independently construct models to address specific biological questions, and to independently carry out analyses and interpret the results. No prior programming experience is necessary—students may construct models and algorithms in pseudocode (methodical description, in words, of the series of steps that a program would follow). The specific goals are: 1) Understand concepts in basic molecular biology and probability; 2) Be familiar with classic and modern pairwise alignment algorithms, including BLAST; 3) Understand the statistical significance of alignment scores and the interpretation of alignment algorithm output; 4) Understand the mechanism and the use of dynamic programming; 5) Be familiar with multiple alignment; 6) Understand the different assumptions about evolution made by different models and algorithms; 7) Understand the likelihood approach to phylogenetic reconstruction, and multiple alignment as applied to phylogenetic tree construction; 8) Understand Markov models and hidden Markov models (HMM) in the genomic context, and essential algorithms for analyzing HMMs; 9) Understand HMMs as applied to gene finding. Be familiar with other algorithms in gene finding; 10) Identify from the literature important algorithmic/statistical advances in bioinformatics, and prepare an oral presentation of a recent bioinformatics publication that is important from either a biological or a mathematical perspective.

Class Times:
Tuesday 3:30 - 4:50
Thursday 3:30 - 4:50

Return to course list


Statistics for Laboratory Scientists I & II / SPH 140.615 & 140.616

Introduces the basic concepts and methods of statistics with applications in the experimental biological sciences. Demonstrates methods of exploring, organizing, and presenting data, and introduces the fundamentals of probability. Presents the foundations of statistical inference, including the concepts of parameters, estimates, and the use of confidence intervals and hypothesis tests. Topics include experimental design, linear regression, the analysis of two-way tables, and sample size and power calculations. Introduces and employs the freely available statistical software, R, to explore and analyze data.

Each term there are three quizzes, four computer labs and one exam.

Course learning objectives: Upon successful completion of this course, students will be able to: 1) create appropriate statistical graphics; 2) identify flaws in experimental designs and observational studies, and form appropriate simple experimental designs; 3) explain confounding and identify potential confounding factors in an observational study; 4) solve simple probability problems; 5) calculate and interpret confidence intervals for the difference between two populations' means and for a population proportion; 6) conduct simple tests of statistical hypotheses and calculate and interpret P-values from such tests; 7) calculate power and minimal sample size for simple experiments; 8) use the statistical software, R, to display and analyze data.

Class Times:
Monday 10:30 - 11:20
Wednesday 10:30 - 11:20
Friday 10:30 - 11:20

Lab Times:
Wednesday 1:30 - 2:20
Wednesday 2:30 - 3:20

Return to course list


Statistics for Genomics / SPH 140.688

Covers the basics of R software and the key capabilities of the Bioconductor project (a widely used open source and open development software project for the analysis and comprehension of data arising from high-throughput experimentation in genomics and molecular biology and rooted in the open source statistical computing environment R), including importation and preprocessing of high-throughput data from microarrays and other platforms. Also introduces statistical concepts and tools necessary to interpret and critically evaluate the bioinformatics and computational biology literature. Includes an overview of of preprocessing and normalization, statistical inference, multiple comparison corrections, Bayesian Inference in the context of multiple comparisons, clustering, and classification/machine learning.

Student evaluation will be based on data analysis homework assignments and a final project. Students who want to learn the concepts without programming may take the class pass/fail and perform a literature review for a final project.

Course learning objectives: Upon successful completion of this course, students will be able to: 1) Understand the basics of how microarray technology works; 2) Understand and critique existing methodology for the analysis of microarray data; 3) Write R code to import and analyze microarray data.

Class Times:
Monday 10:30 - 11:50
Wednesday 10:30 - 11:50

Return to course list


Practical Machine Learning: Methods and Algorithmics / SPH 140.644

Teaches students to use modern, computationally-based methods for exploring and drawing inferences from data. After a brief review of probability, the central limit theorem, and inference, the course covers resampling methods, nonparametric regression, prediction, and dimension reduction and clustering. Specifically covers: Monte Carlo simulation, bootstrap cross-validation, splines, local weighted regression, CART, random forests, neural networks, support vector machines, and hierarchical clustering.

Class Times:
Monday 1:30 - 2:50
Wednesday 1:30 - 2:50

Lab Times: TBA

Prerequisites: Successful completion of 140.611-12 or 140.621-24; or working knowledge of calculus and linear algebra

Return to course list