The lottery has long captivated the imagination of the public — a tantalizing mix of chance, dreams, and data. Behind every draw lies a rich history of numbers, waiting to be explored. While winning may be left to fate, analyzing the patterns, distributions, and correlations in lottery data reveals something just as intriguing: the structure of randomness itself.
This quarto doc briefly explores decades of PowerBall data. From frequency polygons and boxplots to 3D surface plots and correlation heatmaps.
Whether you’re a curious lottery fan, a statistician interested in probability, or a data viz nerd looking to see Plotly in action — this analysis offers a visually rich and data-driven perspective on the numbers that fuel billion-dollar dreams.
Summary Statistics
# Compute summary statistics with mode using tidyverse functionspb_summary <- pb_df %>%select(-DrawDate) %>%summarise(across(everything(), list(min =~min(.x, na.rm =TRUE),max =~max(.x, na.rm =TRUE),mean =~mean(.x, na.rm =TRUE),sd =~sd(.x, na.rm =TRUE) ))) %>%pivot_longer(everything(), names_to =c("variable", ".value"), names_sep ="_") %>%arrange(variable)# Calculate mode for each variablemodes <- pb_df %>%select(-DrawDate) %>%map_int(~as.integer(names(sort(table(.x), decreasing =TRUE)[1])))# Add mode to summarypb_summary$mode <- modes# Reorder columnspb_summary <- pb_summary %>%select(variable, min, max, mean, mode, sd)# Display as a clean HTML tablekable(pb_summary, caption ="Summary Statistics of Powerball Numbers")
Summary Statistics of Powerball Numbers
variable
min
max
mean
mode
sd
PB
1
45
18.459519
1
11.238591
PP
2
10
2.786220
15
1.173435
WB1
1
52
9.930367
27
8.123502
WB2
2
61
19.701851
39
10.555695
WB3
3
65
29.476927
45
11.651599
WB4
6
68
39.326610
20
11.911665
WB5
10
69
48.883670
2
11.014188
Line Plot of Drawn Numbers Over Time
# Plot white ball numbers over timeplot(pb_df$DrawDate, pb_df$WB1, type ="l", col ="red", ylim =c(0, 70), xlab ="Date", ylab ="Value")lines(pb_df$DrawDate, pb_df$WB2, col ="blue")lines(pb_df$DrawDate, pb_df$WB3, col ="green")lines(pb_df$DrawDate, pb_df$WB4, col ="purple")lines(pb_df$DrawDate, pb_df$WB5, col ="orange")
Box plot of Powerball Components
# Create a boxplot for the numeric columnspb_df %>%select(-DrawDate) %>%boxplot(names =colnames(pb_df)[-1],main ="Distribution of Powerball Components",ylab ="Values",col ="lightblue",border ="darkblue" )
Histograms and Frequency Polygons
Distribution of picks for the first winning ball.
# Histogram and frequency polygon for WB1ggplot(pb_df) +geom_histogram(aes(WB1), fill ="skyblue", bins =30) +geom_freqpoly(aes(WB1), color ="red", size =1)
Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
ℹ Please use `linewidth` instead.
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
All five visualized
# Frequency polygons for WB1 to WB5ggplot(pb_df) +geom_freqpoly(aes(WB1), color ="red", size =1) +geom_freqpoly(aes(WB2), color ="blue", size =1) +geom_freqpoly(aes(WB3), color ="green", size =1) +geom_freqpoly(aes(WB4), color ="orange", size =1) +geom_freqpoly(aes(WB5), color ="brown", size =1)
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Heatmap Interpretation
The correlation heat map of white ball numbers since 2014 shows generally low correlation across positions, reflecting the expected randomness of Powerball draws. Slight positive correlations between adjacent positions, such as WB2 and WB3, are likely due to the enforced ascending order of drawn numbers rather than any meaningful predictive relationship. Overall, the heatmap reinforces that the numbers are distributed in a non-linear and non-repeating way, making prediction based on past results unreliable.
# Filter and correlate data since 2014forcor <- pb_df %>%filter(DrawDate >=as.Date("2014-01-22")) %>%select(WB1:WB5)heatmap(cor(forcor), col =colorRampPalette(c("yellow", "orange", "red"))(100), scale ="none", symm =TRUE,main ="Correlation Heatmap of White Balls")
An Interactive 3D Perspective of Winning PowerBall Numbers
# Prepare data for 3D surface plotz_matrix <- pb_df %>%select(-DrawDate) %>%as.matrix()x_values <-as.Date(pb_df$DrawDate)y_values <-seq_len(ncol(z_matrix))# Generate interactive 3D surface plotplot_ly(x =~x_values, y =~y_values, z =~t(z_matrix), type ="surface") %>%layout(scene =list(xaxis =list(title ="Date", tickformat ="%Y-%m-%d"),yaxis =list(title ="Variables"),zaxis =list(title ="Value", range =c(0, 70)) ) )