If you are reading this vignette, chances are that you have requested an export from an EDC software that provided you with a directory filled with files of datasets.
Wouldn’t it be so tedious if you had to load all those files one by one? Lucky you, EDCimport knows a better way!
Depending on the type of files in your export directory, you should use:
read_all_sas(), to read .sas7bdat files
read_all_xpt(), to read .xpt files
read_all_csv(), to read .csv files
read_trialmaster() to read a TrialMaster zip archive.
Formats are imported through a metadata file, format_file, that can be either:
a procformat.sas file, containing the whole PROC FORMAT
a catalog file (.sas7bcat)
or a data file (.csv or .sas7bdat) containing 3 columns: the SAS format name (repeated), each level, and its associated label. Use options(edc_var_format_name="xxx", edc_var_level="xxx", edc_var_label="xxx") to specify the names of the columns.
You can then load your datasets into the global environment with load_database().
library(EDCimport)db =read_all_sas("path/to/my/files/folder", format_file="procformat.sas")print(db)load_database(db) #this also removes `db` to save some RAMmean(dataset1$column5)
Explore your database
Knowing a CRF by hand is not always an easy task, so EDCimport provide a few useful tools:
edc_lookup(), to remember what are the available datasets.
edc_find_column() and edc_find_value(), to search the database for a column/label or for an actual value.
db =edc_example()load_database(db)edc_lookup()#> ── Lookup table - EDCimport example (extraction of 2024-01-01) - EDCimport v0.6.#> dataset nrow ncol n_id rows_per_id crfname #> <chr> <dbl> <dbl> <int> <dbl> <chr> #> 1 long_pure 150 4 50 3 long data #> 2 data1 100 7 50 2 data1 #> 3 long_mixed 100 6 50 2 both short and long data#> 4 data2 50 6 50 1 data2 #> 5 data3 50 7 50 1 data3 #> 6 enrol 50 6 50 1 enrol #> 7 short 50 5 50 1 short data #> 8 ae 175 7 48 3.6 Adverse eventsedc_find_column("date")#> # A tibble: 11 × 5#> dataset crfname names labels prop_na#> <chr> <chr> <chr> <chr> <dbl>#> 1 data1 data1 date1 Date at visit 1 0#> 2 data1 data1 date2 Date at visit 2 0#> 3 data1 data1 date3 Date at visit 3 0#> 4 data2 data2 date4 Date at visit 4 0#> 5 data2 data2 date5 Date at visit 5 0#> 6 data2 data2 date6 Date at visit 6 0#> 7 data3 data3 date7 Date at visit 7 0#> 8 data3 data3 date8 Date at visit 8 0#> 9 data3 data3 date9 Date at visit 9 0#> 10 data3 data3 date10 Date at visit 10 0#> 11 enrol enrol enrol_date Date of enrolment 0edc_find_value("immune")#> # A tibble: 7 × 5#> subjid dataset column column_label value #> <chr> <chr> <chr> <chr> <chr> #> 1 9 ae aesoc AE SOC Immune system disorders#> 2 24 ae aesoc AE SOC Immune system disorders#> 3 26 ae aesoc AE SOC Immune system disorders#> 4 31 ae aesoc AE SOC Immune system disorders#> 5 46 ae aesoc AE SOC Immune system disorders#> 6 46 ae aesoc AE SOC Immune system disorders#> 7 49 ae aesoc AE SOC Immune system disorders
Shiny browser
The simplest way to explore your database is by running edc_viewer(), which launches a local Shiny application: