π Statistical Distribution Theory
π― What are Probability Distributions?
A probability distribution describes how the values of a random variable are distributed. It specifies the probabilities of different outcomes or ranges of outcomes.
π Continuous vs Discrete Distributions
- Continuous Distributions: Random variables can take any value within a continuous range (e.g., Normal, Beta, Gamma)
- Discrete Distributions: Random variables take only distinct, separate values (e.g., Binomial, Poisson, Geometric)
π§ͺ Statistical Tests & Analysis
π Normality Tests
- Shapiro-Wilk Test: Most powerful normality test for small to medium samples (n < 50). Tests null hypothesis that data comes from normal distribution.
- Anderson-Darling Test: Enhanced version of Kolmogorov-Smirnov test, more sensitive to tails of distribution.
- Jarque-Bera Test: Tests normality based on skewness and kurtosis. Good for large samples.
- D'Agostino-Pearson Test: Omnibus test combining skewness and kurtosis assessments.
- Lilliefors Test: Modification of Kolmogorov-Smirnov test when parameters are estimated from data.
π Goodness-of-Fit Tests
- Chi-Square Test: Tests if observed frequencies match expected frequencies under a distribution.
- Kolmogorov-Smirnov Test: Tests the maximum deviation between empirical and theoretical cumulative distributions.
- Anderson-Darling Test: Weighted version of KS test, more sensitive to distribution tails.
π Statistical Distance Measures
- Hellinger Distance: Symmetric measure of distribution difference, bounded [0,1].
- Jensen-Shannon Divergence: Symmetric version of KL divergence, always finite.
- Wasserstein Distance: "Earth mover's distance" - intuitive measure of distribution difference.
- Kullback-Leibler Divergence: Measures information loss when using one distribution to approximate another.
π§ͺ Statistical Tests Implemented
π― Normality Tests
- Shapiro-Wilk Test: W = Ξ£(aα΅’x_(n+1-i))Β² / Ξ£((xα΅’-ΞΌ)Β²). Most powerful normality test for n β€ 50. Used when sample sizes are small to medium.
- Anderson-Darling Test: AΒ² = -n - Ξ£(2i-1)/n * [ln(Fβ(xα΅’)) + ln(1-Fβ(x_{(n+1-i)}))]. More sensitive to distribution tails than KS test. Good for detecting departure from normality in extreme values.
- Jarque-Bera Test: JB = n/6 * (SΒ² + (K-3)Β²/4) where S is skewness and K is kurtosis. Tests third and fourth moments. Good for large samples. Asymptotically chi-square with 2 df.
- D'Agostino-Pearson Omnibus Test: Combines skewness and kurtosis tests into single omnibus normality test. Transformation-invariant. Valid for n β₯ 8.
- Lilliefors Test: Modified Kolmogorov-Smirnov test when parameters are unknown. Uses mean and variance estimated from sample data rather than theoretical population parameters.
π Goodness-of-Fit Tests
- Kolmogorov-Smirnov Test: D = sup|Fβ(x) - Fβ(x)| where Fβ is empirical CDF and Fβ is theoretical CDF. Tests maximum deviation between distributions. P-values computed using correction for estimated parameters.
- Anderson-Darling Test (GoF): AΒ² = -Ξ£(2i-1)/n * [ln(Fβ(xα΅’)) + ln(1-Fβ(x_{(n+1-i)}))]. Weights deviations more heavily in distribution tails.
- Chi-Square Goodness-of-Fit: ΟΒ² = Ξ£(Oα΅’ - Eα΅’)Β²/Eα΅’ where Oα΅’ are observed and Eα΅’ are expected frequencies. Requires binned data. Asymptotically chi-square with k-1 df.
π Test Result Interpretation
Hypothesis Testing Framework:
- Hβ (Null): Data follows specified distribution
- Hβ (Alternative): Data does not follow specified distribution
- Ξ± (Significance Level): Usually 0.05
- If p-value β€ Ξ±: Reject Hβ (evidence of departure from specified distribution)
- If p-value > Ξ±: Fail to reject Hβ (no strong evidence against specified distribution)
Normality Assessment Categories:
- Normal: p-value > 0.10, strongly consistent with normality
- Borderline: p-value between 0.01-0.10, marginal evidence against normality
- Non-Normal: p-value < 0.01, strong evidence data is not normal
π Distribution Families & Applications
CONTINUOUS DISTRIBUTIONS
π Normal Distribution (Gaussian)
f(x) = (1/(Οβ(2Ο))) * exp(-((x-ΞΌ)Β²)/(2ΟΒ²))
Applications:
- Measurement errors in physical experiments
- Height, weight, IQ scores in populations
- Stock returns under normal assumptions
- Quality control tolerances
π― Beta Distribution
f(x) = (x^(Ξ±-1) * (1-x)^(Ξ²-1)) / B(Ξ±,Ξ²)
Applications:
- Bayesian statistics (conjugate prior for Bernoulli)
- Proportions and rates between 0-1
- Project completionηΎεζ―
- Acceptance probabilities in quality control
π₯ Gamma Distribution
f(x) = (x^(k-1) * e^(-x/ΞΈ)) / (ΞΈ^k * Ξ(k))
Applications:
- Waiting times between Poisson events
- Reliability engineering (component lifetimes)
- Insurance claims amounts
- Rainfall amounts, tornado intensities
β‘ Weibull Distribution
f(x) = (k/Ξ») * (x/Ξ»)^(k-1) * exp(-(x/Ξ»)^k)
Applications:
- Failure analysis and reliability engineering
- Wind speed modeling
- Material fatigue life analysis
- Product lifetime testing
π Pareto Distribution (Power Law)
f(x) = Ξ± xβ^Ξ± / x^(Ξ±+1) for x β₯ xβ
Applications:
- Income inequality analysis (80/20 rule)
- City population sizes
- File size distributions in networks
- Insurance claims with high deductibles
π Log-Normal Distribution
f(x) = (1/(xΟβ(2Ο))) * exp[-(ln(x)-ΞΌ)Β²/(2ΟΒ²)]
Applications:
- Stock prices in Black-Scholes model
- Particle sizes in materials science
- Concentrations in environmental studies
- Survival times with multiplicative effects
βοΈ Student-t Distribution
f(x) = Ξ((Ξ½+1)/2) / (β(Ξ½Ο) * Ξ(Ξ½/2)) * (1 + xΒ²/Ξ½)^(-(Ξ½+1)/2)
Applications:
- Statistical inference with small samples (t-tests)
- Confidence intervals when population variance unknown
- Regression analysis residuals
- Robust statistical methods
βΉοΈ Uniform Distribution (Continuous)
f(x) = 1/(b-a) for a β€ x β€ b
Applications:
- Random number generation
- Round-off errors in measurements
- Physical measurements with limited precision
- Bayesian prior for lack of information
π Logistic Distribution
f(x) = (1/s) * exp(-(x-ΞΌ)/s) / (1 + exp(-(x-ΞΌ)/s))Β²
Applications:
- Growth models in biology and economics
- Neural network activation functions
- Probit/logit regression models
- Demographic population modeling
⬩ Laplace Distribution (Double Exponential)
f(x) = (1/(2b)) * exp(-|x-ΞΌ|/b)
Applications:
- Modeling absolute deviations/errors
- Robust statistics alternatives to normal
- Signal processing with heavy tails
- Financial modeling of extreme events
β
Chi-Squared Distribution
f(x) = (1/(2^(k/2) * Ξ(k/2))) * x^(k/2 - 1) * exp(-x/2)
Applications:
- Chi-squared tests for independence
- Goodness-of-fit tests
- Variance estimation and testing
- Quality control and process capability
π F-Distribution (Snedecor F)
f(x) = Ξ((dβ+dβ)/2) / Ξ(dβ/2)Ξ(dβ/2) * (dβ/dβ)^(dβ/2) * x^(dβ/2-1) * (1 + (dβ/dβ)x)^(-(dβ+dβ)/2)
Applications:
- ANOVA (Analysis of Variance)
- F-tests for equal variances
- Regression analysis significance tests
- Generalized linear models
π Bivariate Normal Distribution
f(x,y) = (1/(2ΟΟβΟββ(1-ΟΒ²))) * exp[-1/(2(1-ΟΒ²))] * [(x-ΞΌβ)Β²/ΟβΒ² - 2Ο(x-ΞΌβ)(y-ΞΌβ)/(ΟβΟβ) + (y-ΞΌβ)Β²/ΟβΒ²]
Applications:
- Multivariate analysis of correlated variables
- Asset returns in portfolio optimization
- Biological measurements (height-weight correlations)
- Economic indicators analysis
DISCRETE DISTRIBUTIONS
π² Binomial Distribution
P(X = k) = C(n,k) Γ p^k Γ (1-p)^(n-k)
Applications:
- Success/failure experiments with fixed trials
- Quality control (defective items in sample)
- Election outcomes in voting polls
- Clinical trials response counts
π― Poisson Distribution
P(X = k) = e^(-Ξ») * Ξ»^k / k!
Applications:
- Events occurring in fixed time/space intervals
- Customer arrivals at service centers
- Radioactive decay events counting
- Traffic accidents at intersections
π Geometric Distribution
P(X = k) = (1-p)^(k-1) * p
Applications:
- Waiting time until first success
- Reliability (trials until first failure)
- Quality control (items until first defect)
- Sales calls until first sale
π Hypergeometric Distribution
P(X = k) = [C(K,k) Γ C(N-K,n-k)] / C(N,n)
Applications:
- Sampling without replacement from finite populations
- Quality control lot sampling
- Election auditing procedures
- Card game probability calculations
π Negative Binomial Distribution
P(X = k) = C(k-1,r-1) Γ p^r Γ (1-p)^(k-r)
Applications:
- Number of trials needed for r successes
- Quality control (trials until r defects)
- Sales (attempts until r sales made)
- Epidemiology (contacts until r infections)
π Multivariate Hypergeometric Distribution
P(Xβ = kβ, Xβ = kβ, ..., Xβ = kβ) = [β C(Kα΅’,kα΅’)] / C(N,n)
Applications:
- Elections with multiple candidate categories
- Contingency table analysis
- Multivariate sampling from categorized populations
- Genetics and population studies
π Multinomial Distribution
P(Xβ = kβ, Xβ = kβ, ..., Xβ = kβ) = [n! / (kβ!kβ!...kβ!)] Γ pβ^kβ Γ pβ^kβ Γ ... Γ pβ^kβ
Applications:
- Election results across multiple parties
- Consumer choice modeling
- Marketing response categorization
- Genetic inheritance patterns
π Key Statistical Concepts
- Parameters vs Statistics: Parameters describe population distributions, statistics describe samples
- Maximum Likelihood Estimation (MLE): Method for estimating distribution parameters from data
- Goodness-of-Fit Tests: Kolmogorov-Smirnov test assesses how well data fits a distribution
- Central Limit Theorem: Sample means approach normal distribution regardless of parent distribution
- Law of Large Numbers: Sample statistics converge to population parameters with increasing sample size
π Choosing the Right Distribution
- Data Type: Continuous/interval data β continuous distributions, discrete/count data β discrete distributions
- Data Range: Bounded (0-1) data β Beta, positive only β Gamma/Exponential, unlimited β Normal
- Shape Characteristics: Symmetry (Normal), right-skewness (Gamma, Weibull), heavy tails (t-distribution)
- Subject Matter Knowledge: Domain expertise often guides choice (e.g., Pareto for wealth distributions)
- Statistical Tests: Use goodness-of-fit tests to validate distribution assumptions