EpidigiR: Digital Epidemiological Analysis and Visualization Tools

1 Introduction to EpidigiR: Epidemiological Analysis and Visualization

EpidigiR is an R package for epidemiological analysis, modeling, and visualization…

EpidigiR is an R package for epidemiological analysis, modeling, and visualization, designed with minimal dependencies and comprehensive functionality. It provides three main functions to cover 12 epidemiological topics, including a digital epidemiology aspect that leverages real-time data integration and advanced computational techniques to enhance disease tracking and prediction.

epi_analyze: Performs summary statistics, SIR modeling, DALY calculations, age standardization, diagnostic test evaluation, and NLP keyword extraction.
epi_model: Handles clinical trial power calculation, survival analysis, SNP association, logistic regression, k-means clustering, Random Forest, and SVM.
epi_visualize: Creates visualizations for prevalence mapping, epidemic curves, scatter plots, and boxplots.

The package includes nine datasets to support these analyses: epi_prevalence, sir_data, geno_data, ml_data, nlp_data, clinical_data, daly_data, survey_data, diagnostic_data, and survival_data.

This vignette demonstrates how to use these functions and datasets for various epidemiological tasks.

3 Datasets

The package includes the following datasets:

epi_prevalence: Disease prevalence by region and age group, with spatial coordinates (12 rows).
sir_data: Simulated SIR model output (50 rows).
geno_data: Genotype and case-control data for SNP analysis (100 rows).
ml_data: Patient data for machine learning (logistic regression, clustering, Random Forest, SVM; 100 rows).
nlp_data: Epidemiological text data for NLP (100 rows).
clinical_data: Clinical trial data for power calculations and outcome analysis (200 rows).
daly_data: Data for DALY calculations (20 rows).
survey_data: Data for age standardization (20 rows).
diagnostic_data: Data for diagnostic test evaluation (10 rows).
survival_data: Data for survival analysis (100 rows).

5 Summary Statistics

data(epi_prevalence)
result <- epi_analyze(
  epi_prevalence,
  outcome = "cases",
  population = "population",
  group = "region",
  type = "summary"
)
print(result)

##   group mean_outcome population prevalence incidence_rate
## 1  East     140.0000   34000.00  0.4243056       4.117647
## 2 North     133.3333   30000.00  0.4666667       4.444444
## 3 South     120.0000   29333.33  0.4345238       4.090909
## 4  West     100.0000   24333.33  0.4333333       4.109589

6 SIR Epidemic Model

sir_result <- epi_analyze(
  data = NULL, outcome = NULL, type = "sir",
  N = 1000, beta = 0.3, gamma = 0.1, days = 50
)
epi_visualize(sir_result, x = "time", y = "Infected", type = "curve", main = "Epidemic Curve")

7 Spatial map

data(epi_prevalence)
coordinates(epi_prevalence) <- ~lon + lat
epi_visualize(epi_prevalence, x = "prevalence", type = "map", main = "Prevalence Map")

8 Logistic Model

data(clinical_data)
clinical_data$outcome <- as.factor(clinical_data$outcome)
model <- epi_model(clinical_data, formula = outcome ~ age + health_score + dose, type = "logistic")
head(model$predictions)

##   lambda.min
## 1       0.41
## 2       0.41
## 3       0.41
## 4       0.41
## 5       0.41
## 6       0.41

9 Random Forest with Clinical Data

rf_model <- epi_model(clinical_data, formula = outcome ~ age + health_score + dose, type = "rf")

## note: only 2 unique complexity parameters in default grid. Truncating the grid to 2 .

head(rf_model$predictions)

## NULL

10 Global Health Burden (DALY)

data(daly_data)
epi_analyze(daly_data, outcome = NULL, type = "daly")

##       group      daly
## 1   group_1  809.1125
## 2   group_2 1171.2362
## 3   group_3  806.3073
## 4   group_4 1392.1371
## 5   group_5 1291.4882
## 6   group_6  509.8396
## 7   group_7  870.1247
## 8   group_8 1220.5410
## 9   group_9  776.4134
## 10 group_10  627.1544
## 11 group_11 1444.5109
## 12 group_12  964.0353
## 13 group_13 1070.6309
## 14 group_14 1023.3304
## 15 group_15  253.7084
## 16 group_16 1174.8507
## 17 group_17  712.7858
## 18 group_18  285.2372
## 19 group_19  588.3101
## 20 group_20 1113.2849

11 SNP Association

data(geno_data)
epi_model(geno_data, formula = outcome ~ snp1 + snp2, type = "snp")

##           statistic   p_value
## X-squared  1.769353 0.4128477

12 Age Standardization

data(survey_data)
epi_analyze(survey_data, outcome = NULL, type = "age_standardize")

##   standardized_rate
## 1          33.45531

13 Machine-learning-logistic

data(ml_data)
epi_model(ml_data, formula = outcome ~ age + exposure + genetic_risk, type = "logistic")

## $coefficients
## 4 x 1 sparse Matrix of class "dgCMatrix"
##              lambda.min
## (Intercept)  -0.4054651
## age           .        
## exposure      .        
## genetic_risk  .        
## 
## $predictions
##     lambda.min
## 1          0.4
## 2          0.4
## 3          0.4
## 4          0.4
## 5          0.4
## 6          0.4
## 7          0.4
## 8          0.4
## 9          0.4
## 10         0.4
## 11         0.4
## 12         0.4
## 13         0.4
## 14         0.4
## 15         0.4
## 16         0.4
## 17         0.4
## 18         0.4
## 19         0.4
## 20         0.4
## 21         0.4
## 22         0.4
## 23         0.4
## 24         0.4
## 25         0.4
## 26         0.4
## 27         0.4
## 28         0.4
## 29         0.4
## 30         0.4
## 31         0.4
## 32         0.4
## 33         0.4
## 34         0.4
## 35         0.4
## 36         0.4
## 37         0.4
## 38         0.4
## 39         0.4
## 40         0.4
## 41         0.4
## 42         0.4
## 43         0.4
## 44         0.4
## 45         0.4
## 46         0.4
## 47         0.4
## 48         0.4
## 49         0.4
## 50         0.4
## 51         0.4
## 52         0.4
## 53         0.4
## 54         0.4
## 55         0.4
## 56         0.4
## 57         0.4
## 58         0.4
## 59         0.4
## 60         0.4
## 61         0.4
## 62         0.4
## 63         0.4
## 64         0.4
## 65         0.4
## 66         0.4
## 67         0.4
## 68         0.4
## 69         0.4
## 70         0.4
## 71         0.4
## 72         0.4
## 73         0.4
## 74         0.4
## 75         0.4
## 76         0.4
## 77         0.4
## 78         0.4
## 79         0.4
## 80         0.4
## 81         0.4
## 82         0.4
## 83         0.4
## 84         0.4
## 85         0.4
## 86         0.4
## 87         0.4
## 88         0.4
## 89         0.4
## 90         0.4
## 91         0.4
## 92         0.4
## 93         0.4
## 94         0.4
## 95         0.4
## 96         0.4
## 97         0.4
## 98         0.4
## 99         0.4
## 100        0.4

14 Survival Analysis

Perform survival analysis using survival_data.

data(survival_data)
epi_model(survival_data, type = "survival")

## $survfit
## Call: survfit(formula = surv_obj ~ 1)
## 
##        n events median 0.95LCL 0.95UCL
## [1,] 100     71     11    9.24    14.4
## 
## $summary
## Call: survfit(formula = surv_obj ~ 1)
## 
##    time n.risk n.event survival std.err lower 95% CI upper 95% CI
##   0.046    100       1   0.9900 0.00995      0.97069        1.000
##   0.292     99       1   0.9800 0.01400      0.95294        1.000
##   0.316     98       1   0.9700 0.01706      0.93714        1.000
##   0.318     97       1   0.9600 0.01960      0.92235        0.999
##   0.421     96       1   0.9500 0.02179      0.90823        0.994
##   0.562     94       1   0.9399 0.02379      0.89440        0.988
##   0.674     93       1   0.9298 0.02559      0.88096        0.981
##   0.986     90       1   0.9195 0.02731      0.86745        0.975
##   1.453     89       1   0.9091 0.02889      0.85422        0.968
##   1.883     88       1   0.8988 0.03036      0.84122        0.960
##   2.161     87       1   0.8885 0.03172      0.82842        0.953
##   2.596     84       1   0.8779 0.03306      0.81543        0.945
##   2.804     82       1   0.8672 0.03434      0.80242        0.937
##   2.810     81       1   0.8565 0.03555      0.78956        0.929
##   2.847     80       1   0.8458 0.03668      0.77685        0.921
##   3.000     78       1   0.8349 0.03778      0.76407        0.912
##   3.062     77       1   0.8241 0.03881      0.75142        0.904
##   3.135     76       1   0.8132 0.03979      0.73888        0.895
##   3.165     74       1   0.8022 0.04074      0.72625        0.886
##   3.197     73       1   0.7913 0.04164      0.71372        0.877
##   3.771     72       1   0.7803 0.04249      0.70129        0.868
##   3.800     71       1   0.7693 0.04328      0.68895        0.859
##   4.204     70       1   0.7583 0.04404      0.67671        0.850
##   4.802     67       1   0.7470 0.04481      0.66411        0.840
##   4.807     66       1   0.7357 0.04554      0.65160        0.831
##   5.066     65       1   0.7243 0.04622      0.63917        0.821
##   5.646     64       1   0.7130 0.04687      0.62683        0.811
##   5.726     63       1   0.7017 0.04747      0.61457        0.801
##   5.887     60       1   0.6900 0.04810      0.60189        0.791
##   5.909     59       1   0.6783 0.04868      0.58930        0.781
##   6.293     57       1   0.6664 0.04926      0.57653        0.770
##   6.436     56       1   0.6545 0.04980      0.56383        0.760
##   6.855     55       1   0.6426 0.05030      0.55122        0.749
##   7.907     54       1   0.6307 0.05075      0.53868        0.738
##   8.457     51       1   0.6183 0.05124      0.52564        0.727
##   8.498     50       1   0.6060 0.05169      0.51268        0.716
##   9.240     49       1   0.5936 0.05209      0.49981        0.705
##   9.659     48       1   0.5812 0.05245      0.48701        0.694
##   9.724     47       1   0.5689 0.05278      0.47430        0.682
##   9.746     46       1   0.5565 0.05306      0.46166        0.671
##  10.244     44       1   0.5439 0.05334      0.44875        0.659
##  10.279     43       1   0.5312 0.05358      0.43593        0.647
##  10.477     42       1   0.5186 0.05377      0.42319        0.635
##  10.672     41       1   0.5059 0.05393      0.41053        0.623
##  11.003     40       1   0.4933 0.05404      0.39795        0.611
##  11.149     38       1   0.4803 0.05416      0.38505        0.599
##  11.293     37       1   0.4673 0.05423      0.37225        0.587
##  11.685     36       1   0.4543 0.05425      0.35952        0.574
##  11.920     35       1   0.4413 0.05423      0.34688        0.562
##  12.290     34       1   0.4284 0.05417      0.33433        0.549
##  12.546     32       1   0.4150 0.05410      0.32140        0.536
##  13.291     30       1   0.4011 0.05404      0.30806        0.522
##  13.480     29       1   0.3873 0.05392      0.29483        0.509
##  14.395     27       1   0.3730 0.05380      0.28112        0.495
##  14.967     24       1   0.3574 0.05375      0.26618        0.480
##  15.631     21       1   0.3404 0.05382      0.24970        0.464
##  15.705     19       1   0.3225 0.05389      0.23243        0.447
##  15.707     18       1   0.3046 0.05379      0.21546        0.431
##  16.059     17       1   0.2867 0.05353      0.19881        0.413
##  16.199     16       1   0.2687 0.05309      0.18246        0.396
##  16.424     15       1   0.2508 0.05249      0.16643        0.378
##  17.312     14       1   0.2329 0.05171      0.15074        0.360
##  18.569     13       1   0.2150 0.05074      0.13538        0.341
##  18.878     11       1   0.1954 0.04975      0.11868        0.322
##  21.678     10       1   0.1759 0.04846      0.10251        0.302
##  22.483      9       1   0.1564 0.04685      0.08691        0.281
##  25.361      8       1   0.1368 0.04489      0.07192        0.260
##  27.253      5       1   0.1095 0.04346      0.05026        0.238
##  40.410      3       1   0.0730 0.04155      0.02390        0.223
##  44.987      2       1   0.0365 0.03312      0.00616        0.216
##  72.110      1       1   0.0000     NaN           NA           NA

15 NLP-keyword Extraction

data(nlp_data)
nlp_result <- epi_analyze(nlp_data, outcome = NULL, population = NULL, type = "nlp", n = 5)
head(nlp_result)

##                word frequency
## region       region        33
## dengue       dengue        32
## south         south        32
## influenza influenza        31
## north         north        31

15.1 K-means Clustering

data(ml_data)
epi_model(ml_data[, c("age", "exposure", "genetic_risk")], type = "kmeans", k = 3)

## $clusters
##   [1] 3 2 3 1 3 1 1 3 2 2 1 2 2 1 3 2 3 1 3 2 3 2 2 2 2 1 2 2 2 3 3 1 3 3 3 3 1
##  [38] 1 1 3 2 2 2 2 1 2 2 2 2 3 1 3 2 2 2 2 3 2 3 2 3 2 3 2 2 3 3 2 2 2 3 2 1 3
##  [75] 3 3 2 3 1 3 1 2 3 2 3 3 2 3 1 1 2 2 1 3 1 3 2 3 2 3
## 
## $centers
##        age  exposure genetic_risk
## 1 74.00521 0.4178058    0.3936161
## 2 34.17596 0.5159618    0.4529798
## 3 55.09524 0.4881809    0.5910485

16 SVM-Modelling

data(ml_data)
ml_data$outcome <- as.factor(ml_data$outcome)
svm_model <- epi_model(ml_data, formula = outcome ~ age + exposure + genetic_risk, type = "svmRadial")
svm_model$performance

## Confusion Matrix and Statistics
## 
##           Reference
## Prediction  0  1
##          0 60 40
##          1  0  0
##                                           
##                Accuracy : 0.6             
##                  95% CI : (0.4972, 0.6967)
##     No Information Rate : 0.6             
##     P-Value [Acc > NIR] : 0.5433          
##                                           
##                   Kappa : 0               
##                                           
##  Mcnemar's Test P-Value : 6.984e-10       
##                                           
##             Sensitivity : 1.0             
##             Specificity : 0.0             
##          Pos Pred Value : 0.6             
##          Neg Pred Value : NaN             
##              Prevalence : 0.6             
##          Detection Rate : 0.6             
##    Detection Prevalence : 1.0             
##       Balanced Accuracy : 0.5             
##                                           
##        'Positive' Class : 0               
##

17 Diagnostic Tests

data(diagnostic_data)
epi_analyze(diagnostic_data, outcome = NULL, type = "diagnostic")

##    test_id sensitivity specificity  accuracy
## 1   test_1   0.8602151   0.8585859 0.8593750
## 2   test_2   0.7619048   0.6923077 0.7257143
## 3   test_3   0.8181818   0.7857143 0.8012422
## 4   test_4   0.8253968   0.8846154 0.8622754
## 5   test_5   0.8584906   0.8380952 0.8483412
## 6   test_6   0.9108911   0.7625000 0.8453039
## 7   test_7   0.8349515   0.8000000 0.8196721
## 8   test_8   0.8596491   0.7641509 0.8136364
## 9   test_9   0.9135802   0.7583333 0.8208955
## 10 test_10   0.8823529   0.6746988 0.7797619

EpidigiR: Digital Epidemiological Analysis and Visualization Tools

Esther Atsabina Wanjala

2025-11-03