Beginning the Columbia University IDSE Data Science Certificate Program

Data Science Logo

I will sign up for both the Fall course but will only take 1 due to time constraints.

Introduction to the design and analysis of efficient algorithms, with an emphasis on data science. Topics include efficient sorting and searching, graph algorithms, dynamic programming, randomized algorithms, approximation algorithms, and NP completeness. In addition the course will cover material relevant to big data problems: for example models of parallelism, and hashing, sketching, and sublinear time algorithms.

A calculus-based tour of the fundamentals of probability theory and statistical inference. Probability models, random variables, useful distributions, expectations, law of large numbers, central limit theorem, point and interval estimation, hypothesis tests, asymptotic ideas, non-parametrics, resampling, Bayesian inference, linear regression.

An introduction to machine learning, with an emphasis on data science. Topics will include least squares methods, Gaussian distributions, linear classification, linear regression, maximum likelihood, exponential family distributions, Bayesian networks, Bayesian inference, mixture models, the EM algorithm, graphical models, hidden Markov models, support vector machines, and kernel methods. An emphasis of the course will be on methods and problems relevant to big data problems.

This class introduces the algorithmic skills and design principles necessary to explore and present datasets computationally and visually. These include command line tools, the use of state-of-the art languages and software, an algorithmic understanding of how to work with a large datasets (including parallelism and the map-reduce framework), interactive visualizations, exploratory data analysis as a means to generate and test hypotheses, as well as basics of data exploration and visualization.