# Beginning the Columbia University IDSE Data Science Certificate Program

This fall I am beginning the program at the Columbia University Institute for Data Science and Engineering: Certification of Professional Achievement in Data Sciences

I will sign up for both the Fall course but will only take 1 due to time constraints.

The following text is from http://idse.columbia.edu/certification:

### Fall Course Offerings:

#### Algorithms for Data Science (3) CSOR W4246

- Introduction to the design and analysis of efficient algorithms, with an emphasis on data science. Topics include efficient sorting and searching, graph algorithms, dynamic programming, randomized algorithms, approximation algorithms, and NP completeness. In addition the course will cover material relevant to big data problems: for example models of parallelism, and hashing, sketching, and sublinear time algorithms.

#### Probability & Statistics (3) STAT W4700

- A calculus-based tour of the fundamentals of probability theory and statistical inference. Probability models, random variables, useful distributions, expectations, law of large numbers, central limit theorem, point and interval estimation, hypothesis tests, asymptotic ideas, non-parametrics, resampling, Bayesian inference, linear regression.

### Spring Course Offerings:

#### Machine Learning for Data Science (3) COMS W4721

- An introduction to machine learning, with an emphasis on data science. Topics will include least squares methods, Gaussian distributions, linear classification, linear regression, maximum likelihood, exponential family distributions, Bayesian networks, Bayesian inference, mixture models, the EM algorithm, graphical models, hidden Markov models, support vector machines, and kernel methods. An emphasis of the course will be on methods and problems relevant to big data problems.

#### Exploratory Data Analysis and Visualization (3) STAT W4701

- This class introduces the algorithmic skills and design principles necessary to explore and present datasets computationally and visually. These include command line tools, the use of state-of-the art languages and software, an algorithmic understanding of how to work with a large datasets (including parallelism and the map-reduce framework), interactive visualizations, exploratory data analysis as a means to generate and test hypotheses, as well as basics of data exploration and visualization.