---
title: "Network analysis"
author: "Gilles Colling"
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Network analysis}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r setup, include = FALSE}
has_sf <- requireNamespace("sf", quietly = TRUE)
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  eval = has_sf
)
```

A line layer becomes a routable graph when its segments are joined at their
endpoints and weighted by a traversal cost. vectra builds that graph once and
holds it resident in a `vectra_network` object, the network counterpart of the
small resident `sf` layer the streaming verbs compare against. Origins then
stream past the resident graph: each batch of start points is routed in native
C, so a national set of origins flows past a fixed memory budget while the graph
stays in memory.

The graph and the shortest-path solver are pure C, a binary-heap Dijkstra over a
compressed adjacency, with no `igraph` dependency. The solver parallelises over a
batch of origins with OpenMP.

```{r libs, message = FALSE}
library(vectra)
library(sf)
```

## Building a network

`spatial_network()` takes a line layer and returns a `vectra_network`. Endpoints
are snapped to shared nodes (exactly coincident by default, or within
`tolerance` CRS units), and each edge is weighted by its geometry length unless a
`weight` column names another cost.

The examples use a small grid of unit streets.

```{r build}
mk <- function(x1, y1, x2, y2)
  st_linestring(rbind(c(x1, y1), c(x2, y2)))

streets <- st_sfc(
  mk(0, 0, 1, 0), mk(1, 0, 2, 0), mk(2, 0, 3, 0),   # bottom row
  mk(0, 1, 1, 1), mk(1, 1, 2, 1), mk(2, 1, 3, 1),   # middle row
  mk(0, 2, 1, 2), mk(1, 2, 2, 2), mk(2, 2, 3, 2),   # top row
  mk(0, 0, 0, 1), mk(0, 1, 0, 2),                   # left verticals
  mk(1, 0, 1, 1), mk(1, 1, 1, 2),
  mk(2, 0, 2, 1), mk(2, 1, 2, 2),
  mk(3, 0, 3, 1), mk(3, 1, 3, 2))                   # right verticals

net <- spatial_network(streets)
net
```

## Shortest routes

`spatial_route()` streams a layer of origin points past the network and returns
the shortest path from each origin to one or more destinations. With
`geometry = FALSE` it returns only the cost, so a set of destinations per origin
is an origin-destination cost matrix in long form.

```{r route-cost}
f <- tempfile(fileext = ".vtr")
write_vtr(data.frame(id = 1L, x = 0, y = 0), f)

dest <- st_sfc(st_point(c(3, 2)))

tbl(f) |>
  spatial_route(net, to = dest, coords = c("x", "y"), geometry = FALSE) |>
  collect()
```

The cost is the five unit steps from the bottom-left corner to the top-right.
With `geometry = TRUE` (the default) each row also carries the route line, ready
to materialise with `collect_sf()`:

```{r route-geom}
tbl(f) |>
  spatial_route(net, to = dest, coords = c("x", "y")) |>
  collect_sf()
```

Several destinations, named through `to_id`, give one row per origin and
destination, the cost matrix in long form:

```{r od-matrix}
g <- tempfile(fileext = ".vtr")
write_vtr(data.frame(id = 1:2, x = c(0, 3), y = c(0, 0)), g)

dests <- st_sf(
  name = c("top_left", "top_right"),
  geometry = st_sfc(st_point(c(0, 2)), st_point(c(3, 2))))

tbl(g) |>
  spatial_route(net, to = dests, to_id = "name", geometry = FALSE,
                coords = c("x", "y")) |>
  collect()
```

Each origin carries its own attributes (`id` here) onto every route row.
Unreachable pairs return an infinite cost rather than dropping the row.

## Service areas and isochrones

`spatial_service_area()` returns, per origin, the part of the network reachable
within a cost budget. A vector of budgets returns nested travel-cost bands, one
row per origin and band. The `output` argument selects the shape: the reachable
nodes as a multipoint, the reachable edges as lines, or their convex hull as a
service-area polygon.

```{r service-nodes}
tbl(f) |>
  spatial_service_area(net, cost = c(1, 2), output = "nodes",
                       coords = c("x", "y")) |>
  collect_sf()
```

The first band reaches the start node and its two immediate neighbours; the
second band reaches everything within two steps. Asking for `output = "polygon"`
wraps each band in its hull, the usual drive-time catchment:

```{r service-polygon}
tbl(f) |>
  spatial_service_area(net, cost = 2, output = "polygon",
                       coords = c("x", "y")) |>
  collect_sf() |>
  st_area()
```

## Weighted and directed networks

By default every edge is two-way and weighted by length. A `weight` column sets
a different traversal cost (a travel time, a toll), and `directed = TRUE` makes
edges one-way. The `direction` column then carries one-way codes (`"B"`
two-way, `"FT"` along the digitised direction, `"TF"` against it, `"N"` closed),
and `weight_to` gives the reverse cost on a two-way edge.

```{r weighted}
streets_df <- st_sf(
  cost = runif(length(streets), 1, 3),
  geometry = streets)

wnet <- spatial_network(streets_df, weight = "cost")

tbl(f) |>
  spatial_route(wnet, to = dest, coords = c("x", "y"), geometry = FALSE) |>
  collect()
```

The route now minimises the summed `cost` weight instead of the step count.

## The streaming model

The graph is the resident budget: it is built once and held in the
`vectra_network` object for the life of the queries against it. The origin layer
is what streams. Each batch of origins is solved against the resident graph with
one Dijkstra per origin, run in parallel across the batch, so the query side
scales by streaming while memory tracks the graph plus one batch. This is the
network analogue of the resident-`y` pattern the
[streaming spatial verbs](streaming-spatial.html) use for a small locator layer.

```{r cleanup, include = FALSE}
unlink(c(f, g))
```
