🎲 Dakota AI Statistical Distribution Explorer

Interactive visualization of 22 probability distributions with real-time analysis and educational insights

🎯 Select Distribution

πŸ“Š Data Input Method

πŸ“ File Upload

🎲 Random Data Generation

✏️ Manual Data Input

βš™οΈ Distribution Parameters

πŸ”„ Loading distribution parameters...

πŸ“Š Theoretical Statistics

--
Mean (ΞΌ)
--
Variance (σ²)
--
Skewness
--
Kurtosis
--
Median
--
Mode

πŸ“ˆ Sample Data Analysis

--
Data Points
--
Sample Mean
--
Sample Std Dev
--
Skewness
--
Kurtosis
--
Median

πŸ“ Percentiles & Quartiles

--
P10
--
P25 (Q1)
--
P50 (Q2)
--
P75 (Q3)
--
P90
--
P95
--
P99
--
IQR

πŸ” Outlier Detection

--
Z-Score Outliers
--
IQR Outliers
--
Modified Z Outliers
--
Lower Fence
--
Upper Fence
--
MAD

πŸ§ͺ Hypothesis Testing Suite

Select a hypothesis test to see results

πŸ“Š Distribution Fitting

πŸ“ Upload data to see fitted parameters and goodness-of-fit results

πŸ“ˆ Probability Density Function (PDF)

πŸ”„ Data Transformations

Upload data and apply transformations to see results

🎨 Advanced Visualizations

Select an advanced visualization option to see results

πŸ“š Statistical Distribution Theory

🎯 What are Probability Distributions?

A probability distribution describes how the values of a random variable are distributed. It specifies the probabilities of different outcomes or ranges of outcomes.

πŸ“Š Continuous vs Discrete Distributions

  • Continuous Distributions: Random variables can take any value within a continuous range (e.g., Normal, Beta, Gamma)
  • Discrete Distributions: Random variables take only distinct, separate values (e.g., Binomial, Poisson, Geometric)

πŸ§ͺ Statistical Tests & Analysis

πŸ“ Normality Tests
  • Shapiro-Wilk Test: Most powerful normality test for small to medium samples (n < 50). Tests null hypothesis that data comes from normal distribution.
  • Anderson-Darling Test: Enhanced version of Kolmogorov-Smirnov test, more sensitive to tails of distribution.
  • Jarque-Bera Test: Tests normality based on skewness and kurtosis. Good for large samples.
  • D'Agostino-Pearson Test: Omnibus test combining skewness and kurtosis assessments.
  • Lilliefors Test: Modification of Kolmogorov-Smirnov test when parameters are estimated from data.
πŸ“Š Goodness-of-Fit Tests
  • Chi-Square Test: Tests if observed frequencies match expected frequencies under a distribution.
  • Kolmogorov-Smirnov Test: Tests the maximum deviation between empirical and theoretical cumulative distributions.
  • Anderson-Darling Test: Weighted version of KS test, more sensitive to distribution tails.
πŸ”— Statistical Distance Measures
  • Hellinger Distance: Symmetric measure of distribution difference, bounded [0,1].
  • Jensen-Shannon Divergence: Symmetric version of KL divergence, always finite.
  • Wasserstein Distance: "Earth mover's distance" - intuitive measure of distribution difference.
  • Kullback-Leibler Divergence: Measures information loss when using one distribution to approximate another.
πŸ§ͺ Statistical Tests Implemented
🎯 Normality Tests
  • Shapiro-Wilk Test: W = Ξ£(aα΅’x_(n+1-i))Β² / Ξ£((xα΅’-ΞΌ)Β²). Most powerful normality test for n ≀ 50. Used when sample sizes are small to medium.
  • Anderson-Darling Test: AΒ² = -n - Ξ£(2i-1)/n * [ln(Fβ‚€(xα΅’)) + ln(1-Fβ‚€(x_{(n+1-i)}))]. More sensitive to distribution tails than KS test. Good for detecting departure from normality in extreme values.
  • Jarque-Bera Test: JB = n/6 * (SΒ² + (K-3)Β²/4) where S is skewness and K is kurtosis. Tests third and fourth moments. Good for large samples. Asymptotically chi-square with 2 df.
  • D'Agostino-Pearson Omnibus Test: Combines skewness and kurtosis tests into single omnibus normality test. Transformation-invariant. Valid for n β‰₯ 8.
  • Lilliefors Test: Modified Kolmogorov-Smirnov test when parameters are unknown. Uses mean and variance estimated from sample data rather than theoretical population parameters.
πŸ“ Goodness-of-Fit Tests
  • Kolmogorov-Smirnov Test: D = sup|Fβ‚™(x) - Fβ‚€(x)| where Fβ‚™ is empirical CDF and Fβ‚€ is theoretical CDF. Tests maximum deviation between distributions. P-values computed using correction for estimated parameters.
  • Anderson-Darling Test (GoF): AΒ² = -Ξ£(2i-1)/n * [ln(Fβ‚€(xα΅’)) + ln(1-Fβ‚€(x_{(n+1-i)}))]. Weights deviations more heavily in distribution tails.
  • Chi-Square Goodness-of-Fit: χ² = Ξ£(Oα΅’ - Eα΅’)Β²/Eα΅’ where Oα΅’ are observed and Eα΅’ are expected frequencies. Requires binned data. Asymptotically chi-square with k-1 df.
πŸ“Š Test Result Interpretation

Hypothesis Testing Framework:

  • Hβ‚€ (Null): Data follows specified distribution
  • H₁ (Alternative): Data does not follow specified distribution
  • Ξ± (Significance Level): Usually 0.05
  • If p-value ≀ Ξ±: Reject Hβ‚€ (evidence of departure from specified distribution)
  • If p-value > Ξ±: Fail to reject Hβ‚€ (no strong evidence against specified distribution)

Normality Assessment Categories:

  • Normal: p-value > 0.10, strongly consistent with normality
  • Borderline: p-value between 0.01-0.10, marginal evidence against normality
  • Non-Normal: p-value < 0.01, strong evidence data is not normal

πŸ“ˆ Distribution Families & Applications

CONTINUOUS DISTRIBUTIONS
πŸ”” Normal Distribution (Gaussian)
f(x) = (1/(Οƒβˆš(2Ο€))) * exp(-((x-ΞΌ)Β²)/(2σ²))
Applications:
  • Measurement errors in physical experiments
  • Height, weight, IQ scores in populations
  • Stock returns under normal assumptions
  • Quality control tolerances
🎯 Beta Distribution
f(x) = (x^(Ξ±-1) * (1-x)^(Ξ²-1)) / B(Ξ±,Ξ²)
Applications:
  • Bayesian statistics (conjugate prior for Bernoulli)
  • Proportions and rates between 0-1
  • Project completionη™Ύεˆ†ζ―”
  • Acceptance probabilities in quality control
πŸ”₯ Gamma Distribution
f(x) = (x^(k-1) * e^(-x/ΞΈ)) / (ΞΈ^k * Ξ“(k))
Applications:
  • Waiting times between Poisson events
  • Reliability engineering (component lifetimes)
  • Insurance claims amounts
  • Rainfall amounts, tornado intensities
⚑ Weibull Distribution
f(x) = (k/Ξ») * (x/Ξ»)^(k-1) * exp(-(x/Ξ»)^k)
Applications:
  • Failure analysis and reliability engineering
  • Wind speed modeling
  • Material fatigue life analysis
  • Product lifetime testing
πŸ’Ž Pareto Distribution (Power Law)
f(x) = Ξ± xβ‚€^Ξ± / x^(Ξ±+1) for x β‰₯ xβ‚€
Applications:
  • Income inequality analysis (80/20 rule)
  • City population sizes
  • File size distributions in networks
  • Insurance claims with high deductibles
πŸ“‰ Log-Normal Distribution
f(x) = (1/(xΟƒβˆš(2Ο€))) * exp[-(ln(x)-ΞΌ)Β²/(2σ²)]
Applications:
  • Stock prices in Black-Scholes model
  • Particle sizes in materials science
  • Concentrations in environmental studies
  • Survival times with multiplicative effects
βš–οΈ Student-t Distribution
f(x) = Ξ“((Ξ½+1)/2) / (√(Ξ½Ο€) * Ξ“(Ξ½/2)) * (1 + xΒ²/Ξ½)^(-(Ξ½+1)/2)
Applications:
  • Statistical inference with small samples (t-tests)
  • Confidence intervals when population variance unknown
  • Regression analysis residuals
  • Robust statistical methods
⏹️ Uniform Distribution (Continuous)
f(x) = 1/(b-a) for a ≀ x ≀ b
Applications:
  • Random number generation
  • Round-off errors in measurements
  • Physical measurements with limited precision
  • Bayesian prior for lack of information
πŸ“ˆ Logistic Distribution
f(x) = (1/s) * exp(-(x-ΞΌ)/s) / (1 + exp(-(x-ΞΌ)/s))Β²
Applications:
  • Growth models in biology and economics
  • Neural network activation functions
  • Probit/logit regression models
  • Demographic population modeling
⬩ Laplace Distribution (Double Exponential)
f(x) = (1/(2b)) * exp(-|x-ΞΌ|/b)
Applications:
  • Modeling absolute deviations/errors
  • Robust statistics alternatives to normal
  • Signal processing with heavy tails
  • Financial modeling of extreme events
βœ… Chi-Squared Distribution
f(x) = (1/(2^(k/2) * Ξ“(k/2))) * x^(k/2 - 1) * exp(-x/2)
Applications:
  • Chi-squared tests for independence
  • Goodness-of-fit tests
  • Variance estimation and testing
  • Quality control and process capability
πŸ“Š F-Distribution (Snedecor F)
f(x) = Ξ“((d₁+dβ‚‚)/2) / Ξ“(d₁/2)Ξ“(dβ‚‚/2) * (d₁/dβ‚‚)^(d₁/2) * x^(d₁/2-1) * (1 + (d₁/dβ‚‚)x)^(-(d₁+dβ‚‚)/2)
Applications:
  • ANOVA (Analysis of Variance)
  • F-tests for equal variances
  • Regression analysis significance tests
  • Generalized linear models
πŸ”— Bivariate Normal Distribution
f(x,y) = (1/(2Ο€Οƒβ‚Οƒβ‚‚βˆš(1-ρ²))) * exp[-1/(2(1-ρ²))] * [(x-μ₁)Β²/σ₁² - 2ρ(x-μ₁)(y-ΞΌβ‚‚)/(σ₁σ₂) + (y-ΞΌβ‚‚)Β²/Οƒβ‚‚Β²]
Applications:
  • Multivariate analysis of correlated variables
  • Asset returns in portfolio optimization
  • Biological measurements (height-weight correlations)
  • Economic indicators analysis
DISCRETE DISTRIBUTIONS
🎲 Binomial Distribution
P(X = k) = C(n,k) Γ— p^k Γ— (1-p)^(n-k)
Applications:
  • Success/failure experiments with fixed trials
  • Quality control (defective items in sample)
  • Election outcomes in voting polls
  • Clinical trials response counts
🎯 Poisson Distribution
P(X = k) = e^(-Ξ») * Ξ»^k / k!
Applications:
  • Events occurring in fixed time/space intervals
  • Customer arrivals at service centers
  • Radioactive decay events counting
  • Traffic accidents at intersections
πŸ“ Geometric Distribution
P(X = k) = (1-p)^(k-1) * p
Applications:
  • Waiting time until first success
  • Reliability (trials until first failure)
  • Quality control (items until first defect)
  • Sales calls until first sale
πŸ”„ Hypergeometric Distribution
P(X = k) = [C(K,k) Γ— C(N-K,n-k)] / C(N,n)
Applications:
  • Sampling without replacement from finite populations
  • Quality control lot sampling
  • Election auditing procedures
  • Card game probability calculations
πŸ“Š Negative Binomial Distribution
P(X = k) = C(k-1,r-1) Γ— p^r Γ— (1-p)^(k-r)
Applications:
  • Number of trials needed for r successes
  • Quality control (trials until r defects)
  • Sales (attempts until r sales made)
  • Epidemiology (contacts until r infections)
πŸ”— Multivariate Hypergeometric Distribution
P(X₁ = k₁, Xβ‚‚ = kβ‚‚, ..., Xβ‚˜ = kβ‚˜) = [∏ C(Kα΅’,kα΅’)] / C(N,n)
Applications:
  • Elections with multiple candidate categories
  • Contingency table analysis
  • Multivariate sampling from categorized populations
  • Genetics and population studies
🎭 Multinomial Distribution
P(X₁ = k₁, Xβ‚‚ = kβ‚‚, ..., Xβ‚˜ = kβ‚˜) = [n! / (k₁!kβ‚‚!...kβ‚˜!)] Γ— p₁^k₁ Γ— pβ‚‚^kβ‚‚ Γ— ... Γ— pβ‚˜^kβ‚˜
Applications:
  • Election results across multiple parties
  • Consumer choice modeling
  • Marketing response categorization
  • Genetic inheritance patterns

πŸ“‹ Key Statistical Concepts

  • Parameters vs Statistics: Parameters describe population distributions, statistics describe samples
  • Maximum Likelihood Estimation (MLE): Method for estimating distribution parameters from data
  • Goodness-of-Fit Tests: Kolmogorov-Smirnov test assesses how well data fits a distribution
  • Central Limit Theorem: Sample means approach normal distribution regardless of parent distribution
  • Law of Large Numbers: Sample statistics converge to population parameters with increasing sample size

πŸ” Choosing the Right Distribution

  • Data Type: Continuous/interval data β†’ continuous distributions, discrete/count data β†’ discrete distributions
  • Data Range: Bounded (0-1) data β†’ Beta, positive only β†’ Gamma/Exponential, unlimited β†’ Normal
  • Shape Characteristics: Symmetry (Normal), right-skewness (Gamma, Weibull), heavy tails (t-distribution)
  • Subject Matter Knowledge: Domain expertise often guides choice (e.g., Pareto for wealth distributions)
  • Statistical Tests: Use goodness-of-fit tests to validate distribution assumptions