There’s a function vapour_read_attributes that returns the attributes as list of vectors.
pfile <- system.file("extdata", "point.shp", package = "vapour")
library(vapour)
vapour_read_attributes(pfile)
#> $a
#> [1] 1 2 3 4 5 6 7 8 9 10A higher level function somewhere else could wrap that function to return a data frame, but we don’t want that in vapour because it’s not aligned with the goals of being lightweight and reducing the level of interpretation applied. The data.frame function in R is actually a very primitive implememtation for data frames, so we avoid putting that interpretation on the data and leave that up to the developer / user.
mvfile <- system.file("extdata", "tab", "list_locality_postcode_meander_valley.tab", package="vapour")
dat <- as.data.frame(vapour_read_attributes(mvfile), stringsAsFactors = FALSE)
dim(dat)
#> [1] 58 11
head(dat)
#> LOCAL_ID NAME POSTCODE PLAN_REF GAZ_DATE NOM_REG_NO
#> 1 100422 Caveside 7304 CPR5322 1970-01-01 947L
#> 2 100366 Weegena 7304 CPR5327 1970-01-01 1300M
#> 3 100337 Kimberley 7304 CPR5361 1970-01-01 1063T
#> 4 100308 Parkham 7304 CPR5327 1970-01-01 1179Y
#> 5 100263 Frankford 7275 CPR5142 1970-01-01 1003Q
#> 6 100311 Bridgenorth 7277 CPR5140 1970-01-01 925X
#> UFI CREATED_ON
#> 1 {4a5db4da-ca19-41a0-8dd4-c28a14bbee18} 2016-03-04 10:42:37
#> 2 {253b676e-2791-469c-ac5e-9cb3a95cc158} 2015-06-19 13:46:50
#> 3 {75f60a99-4c58-4d3e-911d-bbaa9a04164c} 2016-09-16 10:54:56
#> 4 {b008d456-4e80-4237-80f6-a26c03817e3c} 2014-06-06 16:50:22
#> 5 {953f4006-6397-4d03-af97-507eab170862} 2014-12-08 09:07:12
#> 6 {5cf8e2a3-631c-4d52-a79c-0ce475f76848} 2015-05-11 10:49:09
#> LIST_GUID SHAPE_AREA SHAPE_LEN
#> 1 {839edd46-01a7-4a45-9d97-499962fa952b} -9999 39785.88
#> 2 {de35ebd4-0ac0-4299-947d-87d07c69426a} -9999 31587.54
#> 3 {73ced9ad-ee9a-41d5-a5bc-c95c4ab948d9} -9999 35693.32
#> 4 {37f17d1f-d2a0-4b78-ba0d-e5f62f216658} -9999 67614.51
#> 5 {47f3a313-913f-4f83-8dc1-021907f9ee80} -9999 70140.78
#> 6 {425d01cc-223b-4348-b965-384a98fd7999} -9999 38156.70A low-level function will return a character vector of JSON, GML, KML or WKT.
vapour_read_geometry(pfile)[5:6] ## format = "WKB"
#> [[1]]
#> [1] 01 01 00 00 00 00 00 60 08 18 ad ec 3f 00 00 e0 9a ec 77 e2 3f
#>
#> [[2]]
#> [1] 01 01 00 00 00 00 00 c0 40 3c bb d0 3f 00 00 80 0e 30 25 d5 3f
vapour_read_geometry_text(pfile)[5:6] ## format = "json"
#> [[1]]
#> [1] "{ \"type\": \"Point\", \"coordinates\": [ 0.89612962375395, 0.577139189234003 ] }"
#>
#> [[2]]
#> [1] "{ \"type\": \"Point\", \"coordinates\": [ 0.261427939636633, 0.330394758377224 ] }"
vapour_read_geometry_text(pfile, textformat = "gml")[2]
#> [[1]]
#> [1] "<gml:Point><gml:coordinates>0.145755324047059,0.395469118840992</gml:coordinates></gml:Point>"
## don't do this with a non-longlat data set like cfile
vapour_read_geometry_text(pfile, textformat = "kml")[1:2]
#> [[1]]
#> [1] "<Point><coordinates>0.623376188334078,0.380098037654534</coordinates></Point>"
#>
#> [[2]]
#> [1] "<Point><coordinates>0.145755324047059,0.395469118840992</coordinates></Point>"
cfile <- system.file("extdata/sst_c.gpkg", package = "vapour")
str(vapour_read_geometry_text(cfile, textformat = "wkt")[1:2])
#> List of 2
#> $ : chr "MULTILINESTRING ((-16254.4210476553 -3269904.98849485,-48956.5880244328 -3282652.40200143,-82133.8545994558 -33"| __truncated__
#> $ : chr "MULTILINESTRING ((-18812.5003359828 -3784514.28779524,-40153.6937313319 -3789439.97415405,-37747.2881609 -38569"| __truncated__Combine these together to get a custom data set.
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
dat <- as.data.frame(vapour_read_attributes(cfile), stringsAsFactors = FALSE) %>%
dplyr::mutate(wkt = vapour_read_geometry_text(cfile, textformat = "wkt"))
glimpse(dat)
#> Observations: 7
#> Variables: 3
#> $ level <chr> "275", "280", "285", "290", "295", "300", "305"
#> $ sst <dbl> 1.85, 6.85, 11.85, 16.85, 21.85, 26.85, 31.85
#> $ wkt <list> ["MULTILINESTRING ((-16254.4210476553 -3269904.98849485...There is a function vapour_read_extent to return a straightforward bounding box vector for every feature, so that we can flexibly build an index of a data set for later use.
mvfile <- system.file("extdata", "tab", "list_locality_postcode_meander_valley.tab", package="vapour")
str(head(vapour_read_extent(mvfile)))
#> List of 6
#> $ : num [1:4] 448353 457706 5386606 5397352
#> $ : num [1:4] 453544 459318 5403972 5412505
#> $ : num [1:4] 454840 461042 5411562 5417892
#> $ : num [1:4] 461505 476213 5410911 5424854
#> $ : num [1:4] 471573 483157 5417110 5424645
#> $ : num [1:4] 491638 494048 5417262 5419331This makes for a very lightweight summary data set that will scale to hundreds of large inputs.
Find raster info.
f <- system.file("extdata", "sst.tif", package = "vapour")
vapour_raster_info(f)
#> $geotransform
#> [1] 140.00000000 0.07000000 0.00000000 -39.99722207 0.00000000
#> [6] -0.07000389
#>
#> $dimXY
#> [1] 143 286
#>
#> $minmax
#> [1] 271.350 289.859
#>
#> $tilesXY
#> [1] 143 14
#>
#> $projection
#> [1] "GEOGCS[\"WGS 84\",DATUM[\"WGS_1984\",SPHEROID[\"WGS 84\",6378137,298.257223563,AUTHORITY[\"EPSG\",\"7030\"]],AUTHORITY[\"EPSG\",\"6326\"]],PRIMEM[\"Greenwich\",0],UNIT[\"degree\",0.0174532925199433],AUTHORITY[\"EPSG\",\"4326\"]]"
#>
#> $bands
#> [1] 1SQL is available for general GDAL vector data.
Note that each lower-level function accepts a sql argument, which sends a query to the GDAL library to be executed against the data source, this can create custom layers and so is independent of and ignores the layer argument. Note that the same sql statement can be passed to the geometry readers, so we get the matching sets of information. vapour_read_geometry will return NULL for each missing geometry if the statement doesn’t include geometry explicitly or implicitly, but vapour_read_geometry, vapour_read_geometry_text and vapour_read_extent all explicitly modify the statement “SELECT *“. (We are also assuming the data source hasn’t changed between accesses … let me know if this causes you problems!).
vapour_read_attributes(mvfile, sql = "SELECT NAME, PLAN_REF FROM list_locality_postcode_meander_valley WHERE POSTCODE = 7310")
#> $NAME
#> character(0)
#>
#> $PLAN_REF
#> character(0)
vapour_read_attributes(mvfile, sql = "SELECT NAME, PLAN_REF, FID FROM list_locality_postcode_meander_valley WHERE POSTCODE = 7306")
#> $NAME
#> [1] "Cradle Mountain" "Mount Roland" "Middlesex" "Lower Beulah"
#>
#> $PLAN_REF
#> [1] "CPR5363" "CPR5359" "CPR5362" "CPR5361"
#>
#> $FID
#> [1] 20 31 36 45Also note that FID is a special row number value, to be used a as general facility for selecting by structural row. This FID is driver-dependent, it can be 0- or 1-based, or completely arbitrary.
Variously, drivers (GDAL’s formats) are 0- or 1- based with the FID. Others (such as OSM) are arbitrary, and have non-sequential (and presumably persientent) FID values.
library(vapour)
file0 <- "list_locality_postcode_meander_valley.tab"
mvfile <- system.file("extdata/tab", file0, package="vapour")
layer <- gsub(".tab$", "", basename(mvfile))
## get the number of features by FID (DISTINCT should be redundant here)
vapour_read_attributes(mvfile, sql = sprintf("SELECT COUNT(DISTINCT FID) AS nfeatures FROM %s", layer))
#> $nfeatures
#> [1] 58
## note how TAB is 1-based
vapour_read_attributes(mvfile, sql = sprintf("SELECT COUNT(*) AS n FROM %s WHERE FID < 2", layer))
#> $n
#> [1] 1
## but SHP is 0-based
shp <- system.file("extdata/point.shp", package="vapour")
vapour_read_attributes(shp, sql = sprintf("SELECT COUNT(*) AS n FROM %s WHERE FID < 2", "point"))
#> $n
#> [1] 2See http://www.gdal.org/ogr_sql.html
There are many useful higher level operations that can be used with this. The simplest is the ability to use GDAL as a database-like connection to attribute tables.