🎲 Dakota AI Statistical Distribution Explorer

Interactive visualization of 22 probability distributions with real-time analysis and educational insights

🎯 Select Distribution

📊 Data Input Method

📁 File Upload

🎲 Random Data Generation

✏️ Manual Data Input

⚙️ Distribution Parameters

🔄 Loading distribution parameters...

📊 Theoretical Statistics

--
Mean (μ)
--
Variance (σ²)
--
Skewness
--
Kurtosis
--
Median
--
Mode

📈 Sample Data Analysis

--
Data Points
--
Sample Mean
--
Sample Std Dev
--
Skewness
--
Kurtosis
--
Median

📏 Percentiles & Quartiles

--
P10
--
P25 (Q1)
--
P50 (Q2)
--
P75 (Q3)
--
P90
--
P95
--
P99
--
IQR

🔍 Outlier Detection

--
Z-Score Outliers
--
IQR Outliers
--
Modified Z Outliers
--
Lower Fence
--
Upper Fence
--
MAD

🧪 Hypothesis Testing Suite

Select a hypothesis test to see results

📊 Distribution Fitting

📁 Upload data to see fitted parameters and goodness-of-fit results

📈 Probability Density Function (PDF)

🔄 Data Transformations

Upload data and apply transformations to see results

🎨 Advanced Visualizations

Select an advanced visualization option to see results

📚 Statistical Distribution Theory

🎯 What are Probability Distributions?

A probability distribution describes how the values of a random variable are distributed. It specifies the probabilities of different outcomes or ranges of outcomes.

📊 Continuous vs Discrete Distributions

  • Continuous Distributions: Random variables can take any value within a continuous range (e.g., Normal, Beta, Gamma)
  • Discrete Distributions: Random variables take only distinct, separate values (e.g., Binomial, Poisson, Geometric)

🧪 Statistical Tests & Analysis

📏 Normality Tests
  • Shapiro-Wilk Test: Most powerful normality test for small to medium samples (n < 50). Tests null hypothesis that data comes from normal distribution.
  • Anderson-Darling Test: Enhanced version of Kolmogorov-Smirnov test, more sensitive to tails of distribution.
  • Jarque-Bera Test: Tests normality based on skewness and kurtosis. Good for large samples.
  • D'Agostino-Pearson Test: Omnibus test combining skewness and kurtosis assessments.
  • Lilliefors Test: Modification of Kolmogorov-Smirnov test when parameters are estimated from data.
📊 Goodness-of-Fit Tests
  • Chi-Square Test: Tests if observed frequencies match expected frequencies under a distribution.
  • Kolmogorov-Smirnov Test: Tests the maximum deviation between empirical and theoretical cumulative distributions.
  • Anderson-Darling Test: Weighted version of KS test, more sensitive to distribution tails.
🔗 Statistical Distance Measures
  • Hellinger Distance: Symmetric measure of distribution difference, bounded [0,1].
  • Jensen-Shannon Divergence: Symmetric version of KL divergence, always finite.
  • Wasserstein Distance: "Earth mover's distance" - intuitive measure of distribution difference.
  • Kullback-Leibler Divergence: Measures information loss when using one distribution to approximate another.
🧪 Statistical Tests Implemented
🎯 Normality Tests
  • Shapiro-Wilk Test: W = Σ(aáµ¢x_(n+1-i))² / Σ((xáµ¢-μ)²). Most powerful normality test for n ≤ 50. Used when sample sizes are small to medium.
  • Anderson-Darling Test: A² = -n - Σ(2i-1)/n * [ln(Fâ‚€(xáµ¢)) + ln(1-Fâ‚€(x_{(n+1-i)}))]. More sensitive to distribution tails than KS test. Good for detecting departure from normality in extreme values.
  • Jarque-Bera Test: JB = n/6 * (S² + (K-3)²/4) where S is skewness and K is kurtosis. Tests third and fourth moments. Good for large samples. Asymptotically chi-square with 2 df.
  • D'Agostino-Pearson Omnibus Test: Combines skewness and kurtosis tests into single omnibus normality test. Transformation-invariant. Valid for n ≥ 8.
  • Lilliefors Test: Modified Kolmogorov-Smirnov test when parameters are unknown. Uses mean and variance estimated from sample data rather than theoretical population parameters.
📏 Goodness-of-Fit Tests
  • Kolmogorov-Smirnov Test: D = sup|Fâ‚™(x) - Fâ‚€(x)| where Fâ‚™ is empirical CDF and Fâ‚€ is theoretical CDF. Tests maximum deviation between distributions. P-values computed using correction for estimated parameters.
  • Anderson-Darling Test (GoF): A² = -Σ(2i-1)/n * [ln(Fâ‚€(xáµ¢)) + ln(1-Fâ‚€(x_{(n+1-i)}))]. Weights deviations more heavily in distribution tails.
  • Chi-Square Goodness-of-Fit: χ² = Σ(Oáµ¢ - Eáµ¢)²/Eáµ¢ where Oáµ¢ are observed and Eáµ¢ are expected frequencies. Requires binned data. Asymptotically chi-square with k-1 df.
📊 Test Result Interpretation

Hypothesis Testing Framework:

  • Hâ‚€ (Null): Data follows specified distribution
  • H₁ (Alternative): Data does not follow specified distribution
  • α (Significance Level): Usually 0.05
  • If p-value ≤ α: Reject Hâ‚€ (evidence of departure from specified distribution)
  • If p-value > α: Fail to reject Hâ‚€ (no strong evidence against specified distribution)

Normality Assessment Categories:

  • Normal: p-value > 0.10, strongly consistent with normality
  • Borderline: p-value between 0.01-0.10, marginal evidence against normality
  • Non-Normal: p-value < 0.01, strong evidence data is not normal

📈 Distribution Families & Applications

CONTINUOUS DISTRIBUTIONS
🔔 Normal Distribution (Gaussian)
f(x) = (1/(σ√(2π))) * exp(-((x-μ)²)/(2σ²))
Applications:
  • Measurement errors in physical experiments
  • Height, weight, IQ scores in populations
  • Stock returns under normal assumptions
  • Quality control tolerances
🎯 Beta Distribution
f(x) = (x^(α-1) * (1-x)^(β-1)) / B(α,β)
Applications:
  • Bayesian statistics (conjugate prior for Bernoulli)
  • Proportions and rates between 0-1
  • Project completion百分比
  • Acceptance probabilities in quality control
🔥 Gamma Distribution
f(x) = (x^(k-1) * e^(-x/θ)) / (θ^k * Γ(k))
Applications:
  • Waiting times between Poisson events
  • Reliability engineering (component lifetimes)
  • Insurance claims amounts
  • Rainfall amounts, tornado intensities
âš¡ Weibull Distribution
f(x) = (k/λ) * (x/λ)^(k-1) * exp(-(x/λ)^k)
Applications:
  • Failure analysis and reliability engineering
  • Wind speed modeling
  • Material fatigue life analysis
  • Product lifetime testing
💎 Pareto Distribution (Power Law)
f(x) = α x₀^α / x^(α+1) for x ≥ x₀
Applications:
  • Income inequality analysis (80/20 rule)
  • City population sizes
  • File size distributions in networks
  • Insurance claims with high deductibles
📉 Log-Normal Distribution
f(x) = (1/(xσ√(2π))) * exp[-(ln(x)-μ)²/(2σ²)]
Applications:
  • Stock prices in Black-Scholes model
  • Particle sizes in materials science
  • Concentrations in environmental studies
  • Survival times with multiplicative effects
⚖️ Student-t Distribution
f(x) = Γ((ν+1)/2) / (√(νπ) * Γ(ν/2)) * (1 + x²/ν)^(-(ν+1)/2)
Applications:
  • Statistical inference with small samples (t-tests)
  • Confidence intervals when population variance unknown
  • Regression analysis residuals
  • Robust statistical methods
⏹️ Uniform Distribution (Continuous)
f(x) = 1/(b-a) for a ≤ x ≤ b
Applications:
  • Random number generation
  • Round-off errors in measurements
  • Physical measurements with limited precision
  • Bayesian prior for lack of information
📈 Logistic Distribution
f(x) = (1/s) * exp(-(x-μ)/s) / (1 + exp(-(x-μ)/s))²
Applications:
  • Growth models in biology and economics
  • Neural network activation functions
  • Probit/logit regression models
  • Demographic population modeling
⬩ Laplace Distribution (Double Exponential)
f(x) = (1/(2b)) * exp(-|x-μ|/b)
Applications:
  • Modeling absolute deviations/errors
  • Robust statistics alternatives to normal
  • Signal processing with heavy tails
  • Financial modeling of extreme events
✅ Chi-Squared Distribution
f(x) = (1/(2^(k/2) * Γ(k/2))) * x^(k/2 - 1) * exp(-x/2)
Applications:
  • Chi-squared tests for independence
  • Goodness-of-fit tests
  • Variance estimation and testing
  • Quality control and process capability
📊 F-Distribution (Snedecor F)
f(x) = Γ((d₁+d₂)/2) / Γ(d₁/2)Γ(d₂/2) * (d₁/d₂)^(d₁/2) * x^(d₁/2-1) * (1 + (d₁/d₂)x)^(-(d₁+d₂)/2)
Applications:
  • ANOVA (Analysis of Variance)
  • F-tests for equal variances
  • Regression analysis significance tests
  • Generalized linear models
🔗 Bivariate Normal Distribution
f(x,y) = (1/(2πσ₁σ₂√(1-ρ²))) * exp[-1/(2(1-ρ²))] * [(x-μ₁)²/σ₁² - 2ρ(x-μ₁)(y-μ₂)/(σ₁σ₂) + (y-μ₂)²/σ₂²]
Applications:
  • Multivariate analysis of correlated variables
  • Asset returns in portfolio optimization
  • Biological measurements (height-weight correlations)
  • Economic indicators analysis
DISCRETE DISTRIBUTIONS
🎲 Binomial Distribution
P(X = k) = C(n,k) × p^k × (1-p)^(n-k)
Applications:
  • Success/failure experiments with fixed trials
  • Quality control (defective items in sample)
  • Election outcomes in voting polls
  • Clinical trials response counts
🎯 Poisson Distribution
P(X = k) = e^(-λ) * λ^k / k!
Applications:
  • Events occurring in fixed time/space intervals
  • Customer arrivals at service centers
  • Radioactive decay events counting
  • Traffic accidents at intersections
📐 Geometric Distribution
P(X = k) = (1-p)^(k-1) * p
Applications:
  • Waiting time until first success
  • Reliability (trials until first failure)
  • Quality control (items until first defect)
  • Sales calls until first sale
🔄 Hypergeometric Distribution
P(X = k) = [C(K,k) × C(N-K,n-k)] / C(N,n)
Applications:
  • Sampling without replacement from finite populations
  • Quality control lot sampling
  • Election auditing procedures
  • Card game probability calculations
📊 Negative Binomial Distribution
P(X = k) = C(k-1,r-1) × p^r × (1-p)^(k-r)
Applications:
  • Number of trials needed for r successes
  • Quality control (trials until r defects)
  • Sales (attempts until r sales made)
  • Epidemiology (contacts until r infections)
🔗 Multivariate Hypergeometric Distribution
P(X₁ = k₁, X₂ = k₂, ..., Xₘ = kₘ) = [∏ C(Kᵢ,kᵢ)] / C(N,n)
Applications:
  • Elections with multiple candidate categories
  • Contingency table analysis
  • Multivariate sampling from categorized populations
  • Genetics and population studies
🎭 Multinomial Distribution
P(X₁ = k₁, X₂ = k₂, ..., Xₘ = kₘ) = [n! / (k₁!k₂!...kₘ!)] × p₁^k₁ × p₂^k₂ × ... × pₘ^kₘ
Applications:
  • Election results across multiple parties
  • Consumer choice modeling
  • Marketing response categorization
  • Genetic inheritance patterns

📋 Key Statistical Concepts

  • Parameters vs Statistics: Parameters describe population distributions, statistics describe samples
  • Maximum Likelihood Estimation (MLE): Method for estimating distribution parameters from data
  • Goodness-of-Fit Tests: Kolmogorov-Smirnov test assesses how well data fits a distribution
  • Central Limit Theorem: Sample means approach normal distribution regardless of parent distribution
  • Law of Large Numbers: Sample statistics converge to population parameters with increasing sample size

🔍 Choosing the Right Distribution

  • Data Type: Continuous/interval data → continuous distributions, discrete/count data → discrete distributions
  • Data Range: Bounded (0-1) data → Beta, positive only → Gamma/Exponential, unlimited → Normal
  • Shape Characteristics: Symmetry (Normal), right-skewness (Gamma, Weibull), heavy tails (t-distribution)
  • Subject Matter Knowledge: Domain expertise often guides choice (e.g., Pareto for wealth distributions)
  • Statistical Tests: Use goodness-of-fit tests to validate distribution assumptions