﻿ Statistical Learning with Python

## Statistical Learning with Python Syllabus of Fundamental Statistical Learning in Big Data with Python

--- Practical Hands-On Data Analysis

1. Introduction to Data Analysis

·         Overall and Required Prerequisite

·         Why Learn Statistics?

·         What is Statistics?

·         Statistical vs Machine Learning

·         Two Different Focus and Cultures

·         Summary

2. Examples of Statistics in Data Analysis

·         Overview

·         Problem Framing

o Data Exploration

o Data Mining

·         Collecting Data

·         Selecting a good sample

o Avoiding bias in your data

o Data Understanding

·         Data Cleansing and Data Preparation

·         Analysis Validation and Verification

·         Analysis Configuration, Method Selection and Optimization

·         Analysis Presentation and Implementation

·         Summary

3. Foundation of Data Analysis

·         Tutorial Overview

·         Common Statistical Distributions

·         Uniform Distribution

·         Normal Distribution

·         Bernoulli Distribution

·         Binomial Distribution

·         Poisson Distribution

·         T- statistics Distribution

·         F- statistics Distribution

·         χ2 statistics Distribution

·         Population vs Sample

·         Descriptive Statistics vs. Inferential Statistics

4. Random Numbers

·         Random Numbers with NumPy

·         When to Seed the Random Number Generator

·         How to Control for Randomness

·         Common Questions

·         Summary

5. Point Estimates and Confidence Intervals

·         Law of Large Numbers

·         Worked Example

·         Central Limit Theorem

·         What is a Confidence Interval?

·         Confidence Intervals of Mean

·         Confidence Intervals of Proportion

·         Interval for Classification Accuracy

·         Implications in Machine Learning

·         Summary

6. Hypothesis Testing

·         Tutorial Overview

·         Statistical Hypothesis Testing

·         Statistical Test Interpretation

·         Type I and Type II Error

·         Degrees of Freedom in Statistics

·         Questions

·         Summary

7. Common Statistical Distribution and Critical Value of Statistic

·         Common Statistical Distribution

·         Critical Value of Statistic

·         Application of Critical Value of Statistic in Statistical Significance Tests

·         How to calculate Critical Value of Statistic

·         Questions

·         Summary

8. Resampling Method

·         Tutorial Overview

·         Statistical Sampling and Statistical Resampling

·         Bootstrap

·         Configuration of Bootstrap

·         Estimation with Bootstrap

·         Bias Caused by OOB and Correction

·         Nonparametric Confidence Interval

·         Cross-Validation

·         Configuration of Cross-Validation

·         Variations on Cross-Validation

·         Example

·         Questions

·         Summary

9. Interval Estimation

·         Interval Estimation

o Tutorial Overview

o Problems with Hypothesis Testing

o Estimation Statistics

o Interval Estimation

o Meta-Analysis

o Summary

·         Tolerance Intervals

o Tutorial Overview

o Bounds on Data

o What Are Statistical Tolerance Intervals?

o How to Calculate Tolerance Intervals

o Tolerance Interval for Gaussian Distribution

o Questions

o Summary

·         Confidence Intervals

o Tutorial Overview

o What is a Confidence Interval?

o Interval for Classification Accuracy

o Nonparametric Confidence Interval

o Questions

o Summary

·         Prediction Intervals

o Tutorial Overview

o What Is a Prediction Interval?

o Why and How Calculate a Prediction Interval?

o Prediction Interval for Linear Regression

o Example

o Prediction Interval for Logistic Regression

o Example

o Summary

10. Nonparametric Methods

·         5-Nmuber Summary

o Nonparametric Data Summarization

o Use of Five-Number Summary

o Summary

·         Rank Data

o Tutorial Overview

o Non-Parametric vs. Parametric

o Ranking Data

o Working with Ranked Data

o Questions

o Summary

·         Normality Tests

o Tutorial Overview

o Normality Assumption

o Visual Normality Checks

o Statistical Normality Tests

o What Test Should Be Used?

o Questions

o Summary

·         Make Data Normal

o Tutorial Overview

o Normal and Normal-Like

o Impacts of Sample Size, Data Resolution and Heavy Tail

o Box-Cox Transformation

o Questions

o Summary

11. Covariance and Correlation

·         What are Covariance, Correlation, and Association?

·         Correlation Calculation

o Between two continuous variables

·         Rank Correlation Calculation

o Tutorial Overview

o Between two ordinal variables

o Between continuous and ordinal variable

·         Association Calculation

o Between two nominal variables

o Between nominal and ordinal variables

·         Questions

·         Summary

12. Ten Common Statistical Mistakes

·         Biased Data

·         No Margin of Error

·         Inappropriate Samples

·         Missing Sample Sizes or Effect Size

·         Misinterpreted Correlations

·         Confounding Variables

·         Botched Numbers

·         Data Fishing

·         The Almighty Anecdote

13. Simple Data Visualization

·         Tutorial Overview

·         Introduction to Seaborn vs. Matplotlib

·         Histograms, KDE, and densities (one/two-dimensional dataset)

·         Pair plots

·         Faceted histograms

·         Factor plots

14. Appendix

·         Statistics on Wikipedia

·         Statistics Text Book

·         Python API Resources