A Gentle Introduction to Statistical Power and Power Analysis in Python
The statistical power of a hypothesis test is the probability of detecting an effect, if there is a true effect present to detect. Power can be calculated and reported for a completed experiment to...
View ArticleAll of Statistics for Machine Learning
A foundation in statistics is required to be effective as a machine learning practitioner. The book “All of Statistics” was written specifically to provide a foundation in probability and statistics...
View ArticleThe Role of Randomization to Address Confounding Variables in Machine Learning
A large part of applied machine learning is about running controlled experiments to discover what algorithm or algorithm configuration to use on a predictive modeling problem. A challenge is that there...
View ArticleHow to Calculate McNemar’s Test to Compare Two Machine Learning Classifiers
The choice of a statistical hypothesis test is a challenging open problem for interpreting machine learning results. In his widely cited 1998 paper, Thomas Dietterich recommended the McNemar’s test in...
View ArticleHow to Code the Student’s t-Test from Scratch in Python
Perhaps one of the most widely used statistical hypothesis tests is the Student’s t test. Because you may use this test yourself someday, it is important to have a deep understanding of how the test...
View ArticleStatistics for Machine Learning (7-Day Mini-Course)
Statistics for Machine Learning Crash Course. Get on top of the statistics used in machine learning in 7 Days. Statistics is a field of mathematics that is universally agreed to be a prerequisite for a...
View Article15 Statistical Hypothesis Tests in Python (Cheat Sheet)
Quick-reference guide to the 15 statistical hypothesis tests that you need in applied machine learning, with sample code in Python. Although there are hundreds of statistical hypothesis tests that you...
View ArticleArithmetic, Geometric, and Harmonic Means for Machine Learning
Calculating the average of a variable or a list of numbers is a common operation in machine learning. It is an operation you may use every day either directly, such as when summarizing data, or...
View ArticleA Gentle Introduction to Degrees of Freedom in Machine Learning
Degrees of freedom is an important concept from statistics and engineering. It is often employed to summarize the number of values used in the calculation of a statistic, such as a sample statistic or...
View ArticleHypothesis Test for Comparing Machine Learning Algorithms
Machine learning models are chosen based on their mean performance, often calculated using k-fold cross-validation. The algorithm with the best mean performance is expected to be better than those...
View Article