Scalable Machine Learning for Big Data Biology
Number of units (credits): 3
Grading Basis: Letter Grade
Day/Time/Location: Biomedical Science Tower 3, Room 3081, Wed–Fri 1:30–3
Enrollment Capacity: 25
Machine learning (ML) has become an integral part of computational thinking in the era of big data biology. This course will focus on understanding the statistical structure of large-scale biological datasets using ML algorithms. We will cover the basics of ML and study their scalable versions for implementation on a distributed computing framework. We will pursue distributed ML algorithms for: matrix factorization, convex optimization, dimensionality reduction, clustering, classification, graph analytics and deep learning, among others.
The course will be project driven (3 to 4 mini projects) with source material from genomic sciences, structural biology, drug discovery, systems modeling and biological imaging. There will be one final project, along with a presentation.
Students will be expected to design, implement and test their ML solutions in Apache Spark.
No biological background is expected. The assignments will cover the necessary biology. Experience in programming and some software engineering is preferred. Knowledge of probability, statistics, linear algebra and algorithms is a bonus.
The class is open to senior-year undergraduates and graduate students.
Prof. Chakra Chennubhotla
Prof. David Koes