Free ebooks Library zlib project

New class starting Fall 2016 : Scalable Machine Learning for Big Data Biology

Scalable Machine Learning for Big Data Biology

MSCBIO 2065
Number of units (credits): 3
Grading Basis: Letter Grade
Day/Time/Location: Biomedical Science Tower 3, Room 3081, Wed–Fri 1:30–3
Enrollment Capacity: 25

chakra_wordcloudCourse Overview:
Machine learning (ML) has become an integral part of computational thinking in the era of big data biology. This course will focus on understanding the statistical structure of large-scale biological datasets using ML algorithms. We will cover the basics of ML and study their scalable versions for implementation on a distributed computing framework. We will pursue distributed ML algorithms for: matrix factorization, convex optimization, dimensionality reduction, clustering, classification, graph analytics and deep learning, among others.

The course will be project driven (3 to 4 mini projects) with source material from genomic sciences, structural biology, drug discovery, systems modeling and biological imaging. There will be one final project, along with a presentation.
Students will be expected to design, implement and test their ML solutions in Apache Spark.

Prerequisites:
No biological background is expected. The assignments will cover the necessary biology. Experience in programming and some software engineering is preferred. Knowledge of probability, statistics, linear algebra and algorithms is a bonus.
The class is open to senior-year undergraduates and graduate students.

Questions:
Prof. Chakra Chennubhotla
chakracs@pitt.edu
Prof. David Koes
dkoes@pitt.edu