---
title: "Species invasions as a relational event process"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Species invasions as a relational event process}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.width = 6,
  fig.height = 4
)
set.seed(2026)
```

```{r setup}
library(amorem)
```

## Why model invasions as relational events?

A non-native species establishing in a new country is a one-shot
directed event: source country `s` "invades" target country `r` at some
time `t`. Once `s` has reached `r`, the dyad `(s, r)` is removed from
the at-risk set — it cannot fire again. This is a textbook **relational
event process** with a one-shot `risk = "remove"` rule.

We sketch a minimal end-to-end workflow in `amorem`:

1. Build a synthetic invasion log with known drivers (distance and
   propagule pressure).
2. Use the `risk = "remove"` simulator with case-control sampling.
3. Recover the drivers from the case-control output with a smooth GAM.

## A synthetic invasion process

We use the 56 × 56 US-state distance matrix shipped with `amorem` as a
stand-in for inter-country distances and treat the states as the
country universe.

```{r data}
data("dist_matrix", package = "amorem")

states <- rownames(dist_matrix)
n_states <- length(states)

# Convert metres → log-units, scaled to a usable range.
dist_log <- log(dist_matrix / 1e5 + 1)
```

The true intensity for an invasion from `s` to `r` is

$$\lambda_{sr}(t) = Y_{sr}(t)\;\lambda_0\;\exp\!\bigl(\beta_d\,f(d_{sr})
+ \beta_h\,h_{sr}(t)\bigr),$$

where $Y_{sr}(t)$ is the at-risk indicator (1 until `s` invades `r`,
0 thereafter), $d_{sr}$ is the (log) distance, $f$ is the smooth shape
$\sin(-d/1.5)$, and $h_{sr}(t)$ is an endogenous "neighbour pressure"
term that grows whenever a country related to `s` has recently invaded
`r`.

For this short example we use just the distance term and a
half-life-decayed reciprocity proxy as the endogenous neighbour
pressure, then turn `risk = "remove"` on so each dyad fires at most
once.

```{r simulate}
true_dist_effect <- sin(-dist_log / 1.5)

cc <- simulate_relational_events(
  n_events            = 600,
  senders             = states,
  receivers           = states,
  contribution_logits = true_dist_effect,
  baseline_rate       = 1,
  allow_loops         = FALSE,
  n_controls          = 1,
  endogenous_stats    = "reciprocity_exp_decay",
  endogenous_effects  = c(reciprocity_exp_decay = 0.4),
  half_life           = 2,
  risk                = "remove"
)
nrow(cc)
head(cc)
```

`risk = "remove"` guarantees the realized dyads are all distinct:

```{r unique-check}
events_only <- cc[cc$event == 1L, ]
nrow(events_only)
any(duplicated(paste(events_only$sender, events_only$receiver)))
```

## Recovering the drivers

Attach the log-distance for every (sender, receiver) pair in the
case-control table, then fit a conditional logistic model via GAM on
the within-stratum differences.

```{r recovery, fig.alt = "recovered smooth distance effect"}
get_dist <- function(s, r) {
  dist_log[cbind(match(s, states), match(r, states))]
}
cc$dist_val <- mapply(get_dist, cc$sender, cc$receiver)

cases    <- cc[cc$event == 1L, ]
controls <- cc[cc$event == 0L, ]
cases    <- cases[order(cases$stratum), ]
controls <- controls[order(controls$stratum), ]

fit_df <- data.frame(
  y          = 1,
  delta_dist = cases$dist_val - controls$dist_val,
  delta_r    = cases$reciprocity_exp_decay - controls$reciprocity_exp_decay
)

if (requireNamespace("mgcv", quietly = TRUE)) {
  library(mgcv)
  fit <- gam(y ~ s(delta_dist) + delta_r - 1, family = binomial, data = fit_df)
  summary(fit)

  x_grid <- seq(min(fit_df$delta_dist), max(fit_df$delta_dist), length.out = 200)
  pred   <- predict(fit,
                    newdata = data.frame(delta_dist = x_grid, delta_r = 0),
                    type = "link")
  plot(x_grid, pred, type = "l", lwd = 2,
       xlab = expression(Delta ~ "log-distance"),
       ylab = "estimated effect",
       main = "GAM smooth for distance (event - control)")
  abline(h = 0, lty = 2, col = "grey60")
}
```

The smooth recovers the underlying $\sin(-d/1.5)$ pattern, and the
linear coefficient on `delta_r` should be close to the true 0.4 used in
the simulation.

## Where to go from here

- Real data: replace `dist_matrix` with a country-by-country distance
  table and the simulated event log with a curated invasion record.
- Additional covariates: trade flows or climatic similarity enter via
  `contribution_logits` or sender/receiver covariates.
- Time-varying global covariates (policy regimes, seasonality) attach
  through `global_covariates` / `global_effects`.
- For large state spaces or high invasion rates, switch to the
  approximate `method = "tau_leap"` simulator with a small `tau` to
  cut wall-clock cost; the `risk = "remove"` rule is honoured under
  both algorithms.
