EpidigiR is an R package for epidemiological analysis, modeling, and visualization…
EpidigiR is an R package for epidemiological analysis, modeling, and visualization, designed with minimal dependencies and comprehensive functionality. It provides three main functions to cover 12 epidemiological topics, including a digital epidemiology aspect that leverages real-time data integration and advanced computational techniques to enhance disease tracking and prediction.
The package includes nine datasets to support these analyses: epi_prevalence, sir_data, geno_data, ml_data, nlp_data, clinical_data, daly_data, survey_data, diagnostic_data, and survival_data.
This vignette demonstrates how to use these functions and datasets for various epidemiological tasks.
The package includes the following datasets:
data(epi_prevalence)
result <- epi_analyze(
epi_prevalence,
outcome = "cases",
population = "population",
group = "region",
type = "summary"
)
print(result)## group mean_outcome population prevalence incidence_rate
## 1 East 140.0000 34000.00 0.4243056 4.117647
## 2 North 133.3333 30000.00 0.4666667 4.444444
## 3 South 120.0000 29333.33 0.4345238 4.090909
## 4 West 100.0000 24333.33 0.4333333 4.109589
sir_result <- epi_analyze(
data = NULL, outcome = NULL, type = "sir",
N = 1000, beta = 0.3, gamma = 0.1, days = 50
)
epi_visualize(sir_result, x = "time", y = "Infected", type = "curve", main = "Epidemic Curve")data(epi_prevalence)
coordinates(epi_prevalence) <- ~lon + lat
epi_visualize(epi_prevalence, x = "prevalence", type = "map", main = "Prevalence Map")data(clinical_data)
clinical_data$outcome <- as.factor(clinical_data$outcome)
model <- epi_model(clinical_data, formula = outcome ~ age + health_score + dose, type = "logistic")
head(model$predictions)## lambda.min
## 1 0.41
## 2 0.41
## 3 0.41
## 4 0.41
## 5 0.41
## 6 0.41
## note: only 2 unique complexity parameters in default grid. Truncating the grid to 2 .
## NULL
## group daly
## 1 group_1 809.1125
## 2 group_2 1171.2362
## 3 group_3 806.3073
## 4 group_4 1392.1371
## 5 group_5 1291.4882
## 6 group_6 509.8396
## 7 group_7 870.1247
## 8 group_8 1220.5410
## 9 group_9 776.4134
## 10 group_10 627.1544
## 11 group_11 1444.5109
## 12 group_12 964.0353
## 13 group_13 1070.6309
## 14 group_14 1023.3304
## 15 group_15 253.7084
## 16 group_16 1174.8507
## 17 group_17 712.7858
## 18 group_18 285.2372
## 19 group_19 588.3101
## 20 group_20 1113.2849
## statistic p_value
## X-squared 1.769353 0.4128477
## standardized_rate
## 1 33.45531
data(ml_data)
epi_model(ml_data, formula = outcome ~ age + exposure + genetic_risk, type = "logistic")## $coefficients
## 4 x 1 sparse Matrix of class "dgCMatrix"
## lambda.min
## (Intercept) -0.4054651
## age .
## exposure .
## genetic_risk .
##
## $predictions
## lambda.min
## 1 0.4
## 2 0.4
## 3 0.4
## 4 0.4
## 5 0.4
## 6 0.4
## 7 0.4
## 8 0.4
## 9 0.4
## 10 0.4
## 11 0.4
## 12 0.4
## 13 0.4
## 14 0.4
## 15 0.4
## 16 0.4
## 17 0.4
## 18 0.4
## 19 0.4
## 20 0.4
## 21 0.4
## 22 0.4
## 23 0.4
## 24 0.4
## 25 0.4
## 26 0.4
## 27 0.4
## 28 0.4
## 29 0.4
## 30 0.4
## 31 0.4
## 32 0.4
## 33 0.4
## 34 0.4
## 35 0.4
## 36 0.4
## 37 0.4
## 38 0.4
## 39 0.4
## 40 0.4
## 41 0.4
## 42 0.4
## 43 0.4
## 44 0.4
## 45 0.4
## 46 0.4
## 47 0.4
## 48 0.4
## 49 0.4
## 50 0.4
## 51 0.4
## 52 0.4
## 53 0.4
## 54 0.4
## 55 0.4
## 56 0.4
## 57 0.4
## 58 0.4
## 59 0.4
## 60 0.4
## 61 0.4
## 62 0.4
## 63 0.4
## 64 0.4
## 65 0.4
## 66 0.4
## 67 0.4
## 68 0.4
## 69 0.4
## 70 0.4
## 71 0.4
## 72 0.4
## 73 0.4
## 74 0.4
## 75 0.4
## 76 0.4
## 77 0.4
## 78 0.4
## 79 0.4
## 80 0.4
## 81 0.4
## 82 0.4
## 83 0.4
## 84 0.4
## 85 0.4
## 86 0.4
## 87 0.4
## 88 0.4
## 89 0.4
## 90 0.4
## 91 0.4
## 92 0.4
## 93 0.4
## 94 0.4
## 95 0.4
## 96 0.4
## 97 0.4
## 98 0.4
## 99 0.4
## 100 0.4
Perform survival analysis using survival_data.
## $survfit
## Call: survfit(formula = surv_obj ~ 1)
##
## n events median 0.95LCL 0.95UCL
## [1,] 100 71 11 9.24 14.4
##
## $summary
## Call: survfit(formula = surv_obj ~ 1)
##
## time n.risk n.event survival std.err lower 95% CI upper 95% CI
## 0.046 100 1 0.9900 0.00995 0.97069 1.000
## 0.292 99 1 0.9800 0.01400 0.95294 1.000
## 0.316 98 1 0.9700 0.01706 0.93714 1.000
## 0.318 97 1 0.9600 0.01960 0.92235 0.999
## 0.421 96 1 0.9500 0.02179 0.90823 0.994
## 0.562 94 1 0.9399 0.02379 0.89440 0.988
## 0.674 93 1 0.9298 0.02559 0.88096 0.981
## 0.986 90 1 0.9195 0.02731 0.86745 0.975
## 1.453 89 1 0.9091 0.02889 0.85422 0.968
## 1.883 88 1 0.8988 0.03036 0.84122 0.960
## 2.161 87 1 0.8885 0.03172 0.82842 0.953
## 2.596 84 1 0.8779 0.03306 0.81543 0.945
## 2.804 82 1 0.8672 0.03434 0.80242 0.937
## 2.810 81 1 0.8565 0.03555 0.78956 0.929
## 2.847 80 1 0.8458 0.03668 0.77685 0.921
## 3.000 78 1 0.8349 0.03778 0.76407 0.912
## 3.062 77 1 0.8241 0.03881 0.75142 0.904
## 3.135 76 1 0.8132 0.03979 0.73888 0.895
## 3.165 74 1 0.8022 0.04074 0.72625 0.886
## 3.197 73 1 0.7913 0.04164 0.71372 0.877
## 3.771 72 1 0.7803 0.04249 0.70129 0.868
## 3.800 71 1 0.7693 0.04328 0.68895 0.859
## 4.204 70 1 0.7583 0.04404 0.67671 0.850
## 4.802 67 1 0.7470 0.04481 0.66411 0.840
## 4.807 66 1 0.7357 0.04554 0.65160 0.831
## 5.066 65 1 0.7243 0.04622 0.63917 0.821
## 5.646 64 1 0.7130 0.04687 0.62683 0.811
## 5.726 63 1 0.7017 0.04747 0.61457 0.801
## 5.887 60 1 0.6900 0.04810 0.60189 0.791
## 5.909 59 1 0.6783 0.04868 0.58930 0.781
## 6.293 57 1 0.6664 0.04926 0.57653 0.770
## 6.436 56 1 0.6545 0.04980 0.56383 0.760
## 6.855 55 1 0.6426 0.05030 0.55122 0.749
## 7.907 54 1 0.6307 0.05075 0.53868 0.738
## 8.457 51 1 0.6183 0.05124 0.52564 0.727
## 8.498 50 1 0.6060 0.05169 0.51268 0.716
## 9.240 49 1 0.5936 0.05209 0.49981 0.705
## 9.659 48 1 0.5812 0.05245 0.48701 0.694
## 9.724 47 1 0.5689 0.05278 0.47430 0.682
## 9.746 46 1 0.5565 0.05306 0.46166 0.671
## 10.244 44 1 0.5439 0.05334 0.44875 0.659
## 10.279 43 1 0.5312 0.05358 0.43593 0.647
## 10.477 42 1 0.5186 0.05377 0.42319 0.635
## 10.672 41 1 0.5059 0.05393 0.41053 0.623
## 11.003 40 1 0.4933 0.05404 0.39795 0.611
## 11.149 38 1 0.4803 0.05416 0.38505 0.599
## 11.293 37 1 0.4673 0.05423 0.37225 0.587
## 11.685 36 1 0.4543 0.05425 0.35952 0.574
## 11.920 35 1 0.4413 0.05423 0.34688 0.562
## 12.290 34 1 0.4284 0.05417 0.33433 0.549
## 12.546 32 1 0.4150 0.05410 0.32140 0.536
## 13.291 30 1 0.4011 0.05404 0.30806 0.522
## 13.480 29 1 0.3873 0.05392 0.29483 0.509
## 14.395 27 1 0.3730 0.05380 0.28112 0.495
## 14.967 24 1 0.3574 0.05375 0.26618 0.480
## 15.631 21 1 0.3404 0.05382 0.24970 0.464
## 15.705 19 1 0.3225 0.05389 0.23243 0.447
## 15.707 18 1 0.3046 0.05379 0.21546 0.431
## 16.059 17 1 0.2867 0.05353 0.19881 0.413
## 16.199 16 1 0.2687 0.05309 0.18246 0.396
## 16.424 15 1 0.2508 0.05249 0.16643 0.378
## 17.312 14 1 0.2329 0.05171 0.15074 0.360
## 18.569 13 1 0.2150 0.05074 0.13538 0.341
## 18.878 11 1 0.1954 0.04975 0.11868 0.322
## 21.678 10 1 0.1759 0.04846 0.10251 0.302
## 22.483 9 1 0.1564 0.04685 0.08691 0.281
## 25.361 8 1 0.1368 0.04489 0.07192 0.260
## 27.253 5 1 0.1095 0.04346 0.05026 0.238
## 40.410 3 1 0.0730 0.04155 0.02390 0.223
## 44.987 2 1 0.0365 0.03312 0.00616 0.216
## 72.110 1 1 0.0000 NaN NA NA
data(nlp_data)
nlp_result <- epi_analyze(nlp_data, outcome = NULL, population = NULL, type = "nlp", n = 5)
head(nlp_result)## word frequency
## region region 33
## dengue dengue 32
## south south 32
## influenza influenza 31
## north north 31
## $clusters
## [1] 3 2 3 1 3 1 1 3 2 2 1 2 2 1 3 2 3 1 3 2 3 2 2 2 2 1 2 2 2 3 3 1 3 3 3 3 1
## [38] 1 1 3 2 2 2 2 1 2 2 2 2 3 1 3 2 2 2 2 3 2 3 2 3 2 3 2 2 3 3 2 2 2 3 2 1 3
## [75] 3 3 2 3 1 3 1 2 3 2 3 3 2 3 1 1 2 2 1 3 1 3 2 3 2 3
##
## $centers
## age exposure genetic_risk
## 1 74.00521 0.4178058 0.3936161
## 2 34.17596 0.5159618 0.4529798
## 3 55.09524 0.4881809 0.5910485
data(ml_data)
ml_data$outcome <- as.factor(ml_data$outcome)
svm_model <- epi_model(ml_data, formula = outcome ~ age + exposure + genetic_risk, type = "svmRadial")
svm_model$performance## Confusion Matrix and Statistics
##
## Reference
## Prediction 0 1
## 0 60 40
## 1 0 0
##
## Accuracy : 0.6
## 95% CI : (0.4972, 0.6967)
## No Information Rate : 0.6
## P-Value [Acc > NIR] : 0.5433
##
## Kappa : 0
##
## Mcnemar's Test P-Value : 6.984e-10
##
## Sensitivity : 1.0
## Specificity : 0.0
## Pos Pred Value : 0.6
## Neg Pred Value : NaN
## Prevalence : 0.6
## Detection Rate : 0.6
## Detection Prevalence : 1.0
## Balanced Accuracy : 0.5
##
## 'Positive' Class : 0
##
## test_id sensitivity specificity accuracy
## 1 test_1 0.8602151 0.8585859 0.8593750
## 2 test_2 0.7619048 0.6923077 0.7257143
## 3 test_3 0.8181818 0.7857143 0.8012422
## 4 test_4 0.8253968 0.8846154 0.8622754
## 5 test_5 0.8584906 0.8380952 0.8483412
## 6 test_6 0.9108911 0.7625000 0.8453039
## 7 test_7 0.8349515 0.8000000 0.8196721
## 8 test_8 0.8596491 0.7641509 0.8136364
## 9 test_9 0.9135802 0.7583333 0.8208955
## 10 test_10 0.8823529 0.6746988 0.7797619
data(clinical_data)
epi_visualize(clinical_data, x = "arm", y = "outcome", type = "boxplot", main = "Outcome by Treatment Arm")data(ml_data)
epi_visualize(ml_data, x = "age", y = "outcome", type = "scatter", main = "Age vs. Disease Outcome")EpidigiR offers a streamlined yet powerful toolkit for epidemiological analysis, featuring three key functions—epi_analyze, epi_model, and epi_visualize—and nine datasets that address all major topics. These tools support a range of analyses, from SIR modeling to sophisticated machine learning methods such as Random Forest and SVM. Furthermore, it integrates a digital epidemiology component, utilizing real-time data and advanced computational approaches to improve disease monitoring and forecasting, providing a valuable resource for researchers and analysts.
EpidigiR is released under the MIT License © 2025 Esther Atsabina Wanjala.