📚 Statistical Distribution Theory
🎯 What are Probability Distributions?
A probability distribution describes how the values of a random variable are distributed. It specifies the probabilities of different outcomes or ranges of outcomes.
📊 Continuous vs Discrete Distributions
- Continuous Distributions: Random variables can take any value within a continuous range (e.g., Normal, Beta, Gamma)
- Discrete Distributions: Random variables take only distinct, separate values (e.g., Binomial, Poisson, Geometric)
🧪 Statistical Tests & Analysis
📠Normality Tests
- Shapiro-Wilk Test: Most powerful normality test for small to medium samples (n < 50). Tests null hypothesis that data comes from normal distribution.
- Anderson-Darling Test: Enhanced version of Kolmogorov-Smirnov test, more sensitive to tails of distribution.
- Jarque-Bera Test: Tests normality based on skewness and kurtosis. Good for large samples.
- D'Agostino-Pearson Test: Omnibus test combining skewness and kurtosis assessments.
- Lilliefors Test: Modification of Kolmogorov-Smirnov test when parameters are estimated from data.
📊 Goodness-of-Fit Tests
- Chi-Square Test: Tests if observed frequencies match expected frequencies under a distribution.
- Kolmogorov-Smirnov Test: Tests the maximum deviation between empirical and theoretical cumulative distributions.
- Anderson-Darling Test: Weighted version of KS test, more sensitive to distribution tails.
🔗 Statistical Distance Measures
- Hellinger Distance: Symmetric measure of distribution difference, bounded [0,1].
- Jensen-Shannon Divergence: Symmetric version of KL divergence, always finite.
- Wasserstein Distance: "Earth mover's distance" - intuitive measure of distribution difference.
- Kullback-Leibler Divergence: Measures information loss when using one distribution to approximate another.
🧪 Statistical Tests Implemented
🎯 Normality Tests
- Shapiro-Wilk Test: W = Σ(aᵢx_(n+1-i))² / Σ((xᵢ-μ)²). Most powerful normality test for n ≤ 50. Used when sample sizes are small to medium.
- Anderson-Darling Test: A² = -n - Σ(2i-1)/n * [ln(F₀(xᵢ)) + ln(1-F₀(x_{(n+1-i)}))]. More sensitive to distribution tails than KS test. Good for detecting departure from normality in extreme values.
- Jarque-Bera Test: JB = n/6 * (S² + (K-3)²/4) where S is skewness and K is kurtosis. Tests third and fourth moments. Good for large samples. Asymptotically chi-square with 2 df.
- D'Agostino-Pearson Omnibus Test: Combines skewness and kurtosis tests into single omnibus normality test. Transformation-invariant. Valid for n ≥ 8.
- Lilliefors Test: Modified Kolmogorov-Smirnov test when parameters are unknown. Uses mean and variance estimated from sample data rather than theoretical population parameters.
📠Goodness-of-Fit Tests
- Kolmogorov-Smirnov Test: D = sup|Fâ‚™(x) - Fâ‚€(x)| where Fâ‚™ is empirical CDF and Fâ‚€ is theoretical CDF. Tests maximum deviation between distributions. P-values computed using correction for estimated parameters.
- Anderson-Darling Test (GoF): A² = -Σ(2i-1)/n * [ln(F₀(xᵢ)) + ln(1-F₀(x_{(n+1-i)}))]. Weights deviations more heavily in distribution tails.
- Chi-Square Goodness-of-Fit: χ² = Σ(Oᵢ - Eᵢ)²/Eᵢ where Oᵢ are observed and Eᵢ are expected frequencies. Requires binned data. Asymptotically chi-square with k-1 df.
📊 Test Result Interpretation
Hypothesis Testing Framework:
- Hâ‚€ (Null): Data follows specified distribution
- Hâ‚ (Alternative): Data does not follow specified distribution
- α (Significance Level): Usually 0.05
- If p-value ≤ α: Reject H₀ (evidence of departure from specified distribution)
- If p-value > α: Fail to reject H₀ (no strong evidence against specified distribution)
Normality Assessment Categories:
- Normal: p-value > 0.10, strongly consistent with normality
- Borderline: p-value between 0.01-0.10, marginal evidence against normality
- Non-Normal: p-value < 0.01, strong evidence data is not normal
📈 Distribution Families & Applications
CONTINUOUS DISTRIBUTIONS
🔔 Normal Distribution (Gaussian)
f(x) = (1/(σ√(2π))) * exp(-((x-μ)²)/(2σ²))
Applications:
- Measurement errors in physical experiments
- Height, weight, IQ scores in populations
- Stock returns under normal assumptions
- Quality control tolerances
🎯 Beta Distribution
f(x) = (x^(α-1) * (1-x)^(β-1)) / B(α,β)
Applications:
- Bayesian statistics (conjugate prior for Bernoulli)
- Proportions and rates between 0-1
- Project completion百分比
- Acceptance probabilities in quality control
🔥 Gamma Distribution
f(x) = (x^(k-1) * e^(-x/θ)) / (θ^k * Γ(k))
Applications:
- Waiting times between Poisson events
- Reliability engineering (component lifetimes)
- Insurance claims amounts
- Rainfall amounts, tornado intensities
âš¡ Weibull Distribution
f(x) = (k/λ) * (x/λ)^(k-1) * exp(-(x/λ)^k)
Applications:
- Failure analysis and reliability engineering
- Wind speed modeling
- Material fatigue life analysis
- Product lifetime testing
💎 Pareto Distribution (Power Law)
f(x) = α x₀^α / x^(α+1) for x ≥ x₀
Applications:
- Income inequality analysis (80/20 rule)
- City population sizes
- File size distributions in networks
- Insurance claims with high deductibles
📉 Log-Normal Distribution
f(x) = (1/(xσ√(2π))) * exp[-(ln(x)-μ)²/(2σ²)]
Applications:
- Stock prices in Black-Scholes model
- Particle sizes in materials science
- Concentrations in environmental studies
- Survival times with multiplicative effects
âš–ï¸ Student-t Distribution
f(x) = Γ((ν+1)/2) / (√(νπ) * Γ(ν/2)) * (1 + x²/ν)^(-(ν+1)/2)
Applications:
- Statistical inference with small samples (t-tests)
- Confidence intervals when population variance unknown
- Regression analysis residuals
- Robust statistical methods
â¹ï¸ Uniform Distribution (Continuous)
f(x) = 1/(b-a) for a ≤ x ≤ b
Applications:
- Random number generation
- Round-off errors in measurements
- Physical measurements with limited precision
- Bayesian prior for lack of information
📈 Logistic Distribution
f(x) = (1/s) * exp(-(x-μ)/s) / (1 + exp(-(x-μ)/s))²
Applications:
- Growth models in biology and economics
- Neural network activation functions
- Probit/logit regression models
- Demographic population modeling
⬩ Laplace Distribution (Double Exponential)
f(x) = (1/(2b)) * exp(-|x-μ|/b)
Applications:
- Modeling absolute deviations/errors
- Robust statistics alternatives to normal
- Signal processing with heavy tails
- Financial modeling of extreme events
✅ Chi-Squared Distribution
f(x) = (1/(2^(k/2) * Γ(k/2))) * x^(k/2 - 1) * exp(-x/2)
Applications:
- Chi-squared tests for independence
- Goodness-of-fit tests
- Variance estimation and testing
- Quality control and process capability
📊 F-Distribution (Snedecor F)
f(x) = Γ((dâ‚+dâ‚‚)/2) / Γ(dâ‚/2)Γ(dâ‚‚/2) * (dâ‚/dâ‚‚)^(dâ‚/2) * x^(dâ‚/2-1) * (1 + (dâ‚/dâ‚‚)x)^(-(dâ‚+dâ‚‚)/2)
Applications:
- ANOVA (Analysis of Variance)
- F-tests for equal variances
- Regression analysis significance tests
- Generalized linear models
🔗 Bivariate Normal Distribution
f(x,y) = (1/(2πσâ‚σ₂√(1-ϲ))) * exp[-1/(2(1-ϲ))] * [(x-μâ‚)²/σ₲ - 2Ï(x-μâ‚)(y-μ₂)/(σâ‚σ₂) + (y-μ₂)²/σ₂²]
Applications:
- Multivariate analysis of correlated variables
- Asset returns in portfolio optimization
- Biological measurements (height-weight correlations)
- Economic indicators analysis
DISCRETE DISTRIBUTIONS
🎲 Binomial Distribution
P(X = k) = C(n,k) × p^k × (1-p)^(n-k)
Applications:
- Success/failure experiments with fixed trials
- Quality control (defective items in sample)
- Election outcomes in voting polls
- Clinical trials response counts
🎯 Poisson Distribution
P(X = k) = e^(-λ) * λ^k / k!
Applications:
- Events occurring in fixed time/space intervals
- Customer arrivals at service centers
- Radioactive decay events counting
- Traffic accidents at intersections
📠Geometric Distribution
P(X = k) = (1-p)^(k-1) * p
Applications:
- Waiting time until first success
- Reliability (trials until first failure)
- Quality control (items until first defect)
- Sales calls until first sale
🔄 Hypergeometric Distribution
P(X = k) = [C(K,k) × C(N-K,n-k)] / C(N,n)
Applications:
- Sampling without replacement from finite populations
- Quality control lot sampling
- Election auditing procedures
- Card game probability calculations
📊 Negative Binomial Distribution
P(X = k) = C(k-1,r-1) × p^r × (1-p)^(k-r)
Applications:
- Number of trials needed for r successes
- Quality control (trials until r defects)
- Sales (attempts until r sales made)
- Epidemiology (contacts until r infections)
🔗 Multivariate Hypergeometric Distribution
P(Xâ‚ = kâ‚, Xâ‚‚ = kâ‚‚, ..., Xₘ = kₘ) = [∠C(Káµ¢,káµ¢)] / C(N,n)
Applications:
- Elections with multiple candidate categories
- Contingency table analysis
- Multivariate sampling from categorized populations
- Genetics and population studies
🎠Multinomial Distribution
P(Xâ‚ = kâ‚, Xâ‚‚ = kâ‚‚, ..., Xₘ = kₘ) = [n! / (kâ‚!kâ‚‚!...kₘ!)] × pâ‚^k₠× pâ‚‚^kâ‚‚ × ... × pₘ^kₘ
Applications:
- Election results across multiple parties
- Consumer choice modeling
- Marketing response categorization
- Genetic inheritance patterns
📋 Key Statistical Concepts
- Parameters vs Statistics: Parameters describe population distributions, statistics describe samples
- Maximum Likelihood Estimation (MLE): Method for estimating distribution parameters from data
- Goodness-of-Fit Tests: Kolmogorov-Smirnov test assesses how well data fits a distribution
- Central Limit Theorem: Sample means approach normal distribution regardless of parent distribution
- Law of Large Numbers: Sample statistics converge to population parameters with increasing sample size
🔠Choosing the Right Distribution
- Data Type: Continuous/interval data → continuous distributions, discrete/count data → discrete distributions
- Data Range: Bounded (0-1) data → Beta, positive only → Gamma/Exponential, unlimited → Normal
- Shape Characteristics: Symmetry (Normal), right-skewness (Gamma, Weibull), heavy tails (t-distribution)
- Subject Matter Knowledge: Domain expertise often guides choice (e.g., Pareto for wealth distributions)
- Statistical Tests: Use goodness-of-fit tests to validate distribution assumptions