---
title: "Getting started with scholidonline"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Getting started with scholidonline}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r setup, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

is_pkgdown <- identical(Sys.getenv("IN_PKGDOWN"), "true")
```

`scholidonline` provides online utilities for working with scholarly
identifiers. It builds on
[`scholid`](https://thomas-rauter.github.io/scholid/)
for structural detection and
normalization, and adds registry-backed functionality such as:

- Existence checks
- Identifier conversion across systems
- Metadata retrieval
- Retrieval of directly linked identifiers

This vignette introduces the interface and typical workflows when
working with registry-connected identifier data.

## Installation

```{r installation, eval = FALSE}
install.packages("scholidonline")
```


## Interface

`scholidonline` exposes a small set of user-facing functions:

- `scholidonline_types()`
- `scholidonline_capabilities()`
- `id_exists()`
- `id_convert()`
- `id_metadata()`
- `id_links()`


## Supported identifier types

You can inspect which identifier types are supported:

```{r scholidonline_types, eval = TRUE}
scholidonline::scholidonline_types()
```


## Inspecting capabilities

`scholidonline` is registry-driven. You can inspect all supported
operations, conversions, and providers:

```{r scholidonline capabilities, eval = TRUE}
out <- scholidonline::scholidonline_capabilities()
knitr::kable(out)
```

Not every supported type offers every operation. For example, ROR and
UniProt support existence checks and metadata, while DOI and PMID also
support linked identifiers and conversion. To inspect one type:

```{r capabilities by type, eval = TRUE}
out <- scholidonline::scholidonline_capabilities()
knitr::kable(subset(out, type == "openalex"))
```


## Existence checks: `id_exists()`

`id_exists()` verifies whether identifiers exist in their respective
registries.

```{r id_exists 1, eval = is_pkgdown}
scholidonline::id_exists(
  x    = "10.1000/182",
  type = "doi"
)
```

If `type = NULL`, the type is inferred automatically:

```{r description, eval = is_pkgdown}
scholidonline::id_exists(
  x = c(
    "10.1000/182",
    "12345678"
  )
)
```

Return values:

- TRUE  → confirmed by registry
- FALSE → confirmed not found
- NA    → cannot be classified or normalized


## Conversion: `id_convert()`

Many scholarly identifiers are cross-linked across systems.

Common examples:

- PMID → DOI
- PMCID → PMID
- DOI → PMCID

```{r conversion 1, eval = is_pkgdown}
scholidonline::id_convert(
  x    = "12345678",
  from = "pmid",
  to   = "doi"
)
```

If `from = NULL`, the source type is inferred per element:

```{r conversion 2, eval = is_pkgdown}
scholidonline::id_convert(
  x = c("12345678", "PMC1234567"),
  to = "doi"
)
```

Unresolvable mappings return `NA_character_`.


## Metadata retrieval: `id_metadata()`

`id_metadata()` retrieves harmonized metadata from external registries.

```{r metadata 1, eval = is_pkgdown}
out <- scholidonline::id_metadata(
  x    = "10.1038/nature12373",
  type = "doi"
)
knitr::kable(out)
```

Metadata completeness depends on the registry. For NCBI accession types such
as BioProject, `title` is the short registry title from Entrez ESummary, not
the full project description on the NCBI website; use `url` for the complete
record.

You can restrict returned fields:

```{r metadata 2, eval = is_pkgdown}
out <- scholidonline::id_metadata(
  x = "10.1038/nature12373",
  type = "doi",
  fields = c("title", "year", "doi")
)
knitr::kable(out)
```


## Linked identifiers: `id_links()`

`id_links()` returns related identifiers discovered via registry
queries. Returns an empty table when the provider exposes no linked
identifiers for that record.

```{r id_links 1, eval = is_pkgdown}
out <- scholidonline::id_links(
  x    = "PMC1234567",
  type = "pmcid"
)
knitr::kable(out)
```

The result is a long data.frame with one row per link. When no links are
found, the same columns are returned with zero rows.


## Working with mixed data

A common workflow for messy identifier columns:

1. Detect identifier types (via `scholid`)
2. Normalize identifiers
3. Check registry existence

Example:

```{r mixed data, eval = is_pkgdown}
x <- c(
  "https://doi.org/10.1000/182",
  "PMCID: PMC1234567",
  "not an id"
)

types <- scholid::detect_scholid_type(x)

x_norm <- rep(NA_character_, length(x))

for (i in seq_along(x)) {
  if (is.na(types[i])) {
    next
  }

  x_norm[i] <- scholid::normalize_scholid(
    x = x[i],
    type = types[i]
  )
}

types
x_norm
```

`id_exists(x)` below uses the default `type = "auto"`, so each element is
classified and normalized automatically. You do not need to pass a vector
`type` argument.

```{r mixed data exists, eval = is_pkgdown}
scholidonline::id_exists(x)
```


## Provider selection

Most functions accept a `provider` argument.

```{r provider selection, eval = is_pkgdown}
scholidonline::id_exists(
  x        = "10.1000/182",
  type     = "doi",
  provider = "crossref"
)

scholidonline::id_exists(
  x        = "10.1000/182",
  type     = "doi",
  provider = "doi.org"
)
```

If `provider = "auto"` (default), a sensible registry is chosen
automatically, potentially with fallback behavior.

Available providers depend on the identifier type and operation.
Use `scholidonline_capabilities()` to inspect them.

The chosen provider affects:

- Response speed
- Metadata richness
- Crosswalk coverage


## Scope of scholidonline

`scholidonline` focuses on identifier types with stable public registries and
accessible APIs. The package supports online operations for:

- **Bibliographic core:** DOI, PMID, PMCID, arXiv
- **Graph and people:** OpenAlex, ORCID
- **Organizations:** ROR
- **Life science:** UniProt; NCBI accessions (GEO, BioProject, RefSeq, SRA,
  genome assembly)

Not every type supports every operation. For example, ROR and UniProt support
existence checks and metadata, while DOI and PMID additionally support linked
identifiers and conversion. Use `scholidonline_capabilities()` as the
authoritative summary.

Many other identifier types (e.g., ISBN, ISSN, bibcode, RRID) are
structurally supported by `scholid`, but are not covered by `scholidonline`
because they lack a stable, open registry API fit for this package.