CiteSource can be used to examine topical overlap between databases. In this example, we are interested in the overlap among databases, both multi-disciplinary and subject-specific, for the literature on the harmful effects of gambling addiction. To assess this, we ran a very specific search for the term “gambling harm*” in the title and abstract fields of the following databases: Lens, Scopus, Criminal Justice Abstracts, PsycInfo and Medline.
Users can import multiple RIS or bibtex files into CiteSource, labeling each with source information such as the database or platform it came from.
citation_files <- list.files(path = "topic_data", pattern = "\\.ris", full.names = TRUE)
citations <- read_citations(citation_files,
cite_sources = c("crimjust", "lens", "psycinfo", "pubmed", "scopus"),
tag_naming = "best_guess")
#> Import completed - with the following details:
#> file cite_source cite_string cite_label
#> 1 20221207_gambling-harms_crimjust_41.ris crimjust <NA> <NA>
#> 2 20221207_gambling-harms_lens_49.ris lens <NA> <NA>
#> 3 20221207_gambling-harms_psycinfo_124.ris psycinfo <NA> <NA>
#> 4 20221207_gambling-harms_pubmed_176.ris pubmed <NA> <NA>
#> 5 20221207_gambling-harms_scopus_255.ris scopus <NA> <NA>
#> citations
#> 1 41
#> 2 49
#> 3 124
#> 4 176
#> 5 255CiteSource merges duplicate records while preserving the
cite_source metadata field, so the origin of each record is
retained through deduplication.
A heatmap shows the total number of records from each database and the number of overlapping records for each pair. Here, Scopus yielded the highest number of records on gambling harms, and Criminal Justice Abstracts the least.
The percentage heatmap shows what share of each row’s records were also found in each column. Here, 67% of records in Scopus were also found in PubMed, while 97% of PubMed records were found in Scopus.
An upset plot provides more detail about shared and unique records across all source combinations. Scopus had the most unique records (n=35); Criminal Justice Abstracts had only one. Six records were found in every database.
To examine which records are only found in a single database, filter
n_unique for unique == TRUE and rejoin with
unique_citations to recover the full bibliographic
data.
unique_lens <- n_unique |> dplyr::filter(cite_source == "lens", unique == TRUE) |> dplyr::inner_join(unique_citations, by = "duplicate_id")
unique_psycinfo <- n_unique |> dplyr::filter(cite_source == "psycinfo", unique == TRUE) |> dplyr::inner_join(unique_citations, by = "duplicate_id")
unique_pubmed <- n_unique |> dplyr::filter(cite_source == "pubmed", unique == TRUE) |> dplyr::inner_join(unique_citations, by = "duplicate_id")
unique_crimjust <- n_unique |> dplyr::filter(cite_source == "crimjust", unique == TRUE) |> dplyr::inner_join(unique_citations, by = "duplicate_id")
unique_scopus <- n_unique |> dplyr::filter(cite_source == "scopus", unique == TRUE) |> dplyr::inner_join(unique_citations, by = "duplicate_id")Looking at the top journals producing unique records in Scopus that were not found in any other database:
scopus_journals <- unique_scopus |>
dplyr::group_by(journal) |>
dplyr::summarise(count = dplyr::n()) |>
dplyr::arrange(dplyr::desc(count))
knitr::kable(scopus_journals[1:10, ])| journal | count |
|---|---|
| International Gambling Studies | 5 |
| Current Addiction Reports | 3 |
| International Journal of Mental Health and Addiction | 3 |
| Journal of Gambling Issues | 3 |
| Computers in Human Behavior | 2 |
| Journal of Public Health (Germany) | 2 |
| Applied Research in Quality of Life | 1 |
| Canadian Journal of Addiction | 1 |
| Cognition and Addiction: A Researcher’s Guide from Mechanisms Towards Interventions | 1 |
| Critical Public Health | 1 |
Publication year analysis can reveal whether a database’s unique contributions are concentrated in a particular time period. Here the unique records from Scopus are mostly recent, which may indicate more current coverage on gambling harms.
unique_scopus |>
dplyr::group_by(year) |>
dplyr::summarise(count = dplyr::n()) |>
ggplot2::ggplot(ggplot2::aes(year, count, group = 1)) +
ggplot2::geom_line() +
ggplot2::geom_point() +
ggplot2::xlab("Publication year") +
ggplot2::ylab("Unique records")We can also compare publication years of unique records across each
database using facet_wrap:
all_unique <- dplyr::bind_rows(unique_scopus, unique_lens, unique_pubmed, unique_psycinfo)
all_unique |>
dplyr::group_by(cite_source.x, year) |>
dplyr::summarise(count = dplyr::n()) |>
ggplot2::ggplot(ggplot2::aes(year, count, group = 1)) +
ggplot2::geom_line() +
ggplot2::geom_point() +
ggplot2::facet_wrap(~ cite_source.x) +
ggplot2::xlab("Publication year") +
ggplot2::ylab("Unique records")CiteSource can export deduplicated results as CSV, RIS, or BibTeX files, and reimport them to resume analysis later.
#export_csv(unique_citations, filename = "unique-by-source.csv", separate = "cite_source")
#export_ris(unique_citations, filename = "unique_citations.ris", source_field = "DB", label_field = "N1")
#export_bib(unique_citations, filename = "unique_citations.bib", include = c("sources", "labels", "strings"))
#reimport_csv("unique-by-source.csv")CiteSource can evaluate coverage of different databases for a specific topic. In this example, Scopus has the most content on gambling harms, including the most unique content and the best coverage for earlier years. Lens also contributes a proportionally large amount of unique records, perhaps representing grey literature. Analysis of this kind can help determine which databases to include in an evidence synthesis search, or inform collection development decisions.