Working with vectors in R

Today we’ll build on the introductory discussion we were having about vector operations and the sf package. We’ll build a few vectors from scratch and then move on to explore a broader suite of common vector operations implemented by the sf package.

A reminder about vector geometries in R

You’ll recall that the sf package organizes the different types of vectors (e.g., points, lines, polygons) in to a hierarchical structure organized by complexity of geometries contained within an R object. For example, a single point will be a POINT, several points will be a MULTIPOINT, and an object containing points, polygons, and lines will be a GEOMETRYCOLLECTION. We need to be aware of what types of geometries and objects we have becasue some operations are restricted to particular types of objects or geometries as indicated by errors that read:

Error in UseMethod("st_crs<-") : no applicable method for 'st_crs<-' applied to an object of class "c('XY', 'POINT', 'sfg')"

which indicates that the function (st_crs) does not have a method defined for the type of object it’s being applied to. Note that the function inside UseMethod will be replaced by whichever function you’re attempting to apply to your object and the object of class component will vary based on the function and the object class.

type description
POINT single point geometry
MULTIPOINT set of points
LINESTRING single linestring (two or more points connected by straight lines)
MULTILINESTRING set of linestrings
POLYGON exterior ring with zero or more inner rings, denoting holes
MULTIPOLYGON set of polygons
GEOMETRYCOLLECTION set of the geometries above

As is, these geometries are built on vertices with coordinates that are based on the Cartesian plane and thus are “spatial”, but not georeferenced or geographic. In order to convert these sf geometries to a geogrphic object (i.e., one with a CRS and whose location depicts and actual spot on the earth’s surface), we use st_sfc() to create a simple feature geography list column (see ?st_sfc for an example of this workflow).

Conventions in sf and the tidyverse

One of the benefits of the sf package is that it is designed to interface with the tidyverse suite of packages. One of the appealing parts of working with tidyverse packages is that they share an underlying philosophy, data structure, and grammar. This can make life a lot easier as you move from getting your data into R, constructing a set of covariates (including those derived from spatial data), analyzing, and plotting (or mapping) those data. People have strong opinions about the tidyverse, but I find it to be an (eventually) useful way for people to gain some intuition for working in R. One of the grammatical conventions used in the tidyverse suite of packages is the use _ in function calls (this is known as snake case should you ever need to know that at a dinner party). The _ is typically used to separate the verb in a function call from its predicate. For example, bind_rows() in the dplyr package “binds” (the verb) rows (the predicate) wheras bind_cols() binds columns. For the sf package it’s slightly different in that most of the functions begin with a st_ or sf_ prefix to indicate that the function is designed to work on spatial objects followed by a word (or words) describing what the operation does (e.g., st_centroid() returns a MULTIPOINT object with each point located at the centroid of a polygon). We can classify these functions based on what they are expected to return:

  • Predicates: evaluate a logical statement asserting that a property is TRUE

  • Measures: return a numeric value with units based on the units of the CRS

  • Transformations: create new geometries based on input geometries.

We can also distinguish these functions based on how many geometries that operate on:

  • Unary: operate on a single geometry at a time (meaning that if you have a MULTI* object the function works on each geometry individually)
  • Binary: operate on pairs of geometries
  • n-ary: operate on sets of geometries

We’ll focus on the unary operators for now, but the binary and n-ary operators will become more important as we move to develop databases for spatial analysis.

Unary predicates

Unary predicates are helpful ‘checks’ to make sure the object you are working with has the properties you might expect. Are the geometries valid? Is the data projected? Because we are asking a set of TRUE/FALSE questions, these functions are specified as st_is_:

predicate asks…
simple is the geometry self-intersecting (i.e., simple)?
valid is the geometry valid?
empty is the geometry column of an object empty?
longlat does the object have geographic coordinates? (FALSE if coords are projected, NA if no crs)
is(geometry, class) is the geometry of a particular class?
nc <- st_read(system.file("shape/nc.shp", package="sf"))
## Reading layer `nc' from data source 
##   `/Library/Frameworks/R.framework/Versions/4.0/Resources/library/sf/shape/nc.shp' 
##   using driver `ESRI Shapefile'
## Simple feature collection with 100 features and 14 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: -84.32385 ymin: 33.88199 xmax: -75.45698 ymax: 36.58965
## Geodetic CRS:  NAD27
st_is_longlat(nc)
## [1] TRUE
st_is_valid(nc)
##   [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
##  [16] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
##  [31] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
##  [46] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
##  [61] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
##  [76] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
##  [91] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE

Unary measures

Measures return a quantity that describes the geometry

measure returns
dimension 0 for points, 1 for linear, 2 for polygons, possibly NA for empty geometries
area the area of a geometry
length the length of a linear geometry

distance is a binary measure that returns the distance between pairs of geometries either within a single object or between features in multiple objects

head(st_area(nc))
## Units: [m^2]
## [1] 1137107793  610916077 1423145355  694378925 1520366979  967504822
st_distance(nc)[1:5,1:5]
## Units: [m]
##          [,1]     [,2]     [,3]      [,4]      [,5]
## [1,]      0.0      0.0  25591.8 439493.26 299049.94
## [2,]      0.0      0.0      0.0 408416.68 268284.09
## [3,]  25591.8      0.0      0.0 366648.94 226461.23
## [4,] 439493.3 408416.7 366648.9      0.00  67066.43
## [5,] 299049.9 268284.1 226461.2  67066.43      0.00

Unary transformers

Unary transformations work on a per object basis and return a new geometry for each geometry. These are a few of the most common, we’ll encounter a few more as the semester continues.

transformer returns a geometry …
centroid of type POINT with the geometry’s centroid
buffer that is this larger (or smaller) than the input geometry, depending on the buffer size
jitter that was moved in space a certain amount, using a bivariate uniform distribution
boundary with the boundary of the input geometry
convex_hull that forms the convex hull of the input geometry
line_merge after merging connecting LINESTRING elements of a MULTILINESTRING into longer LINESTRINGs.
make_valid that is valid
node with added nodes to linear geometries at intersections without a node; only works on individual linear geometries
point_on_surface with a (arbitrary) point on a surface
polygonize of type polygon, created from lines that form a closed ring
segmentize a (linear) geometry with nodes at a given density or minimal distance
simplify simplified by removing vertices/nodes (lines or polygons)
split that has been split with a splitting linestring
transform transformed or convert to a new coordinate reference system (last week)
collection_extract with subgeometries from a GEOMETRYCOLLECTION of a particular type
cast that is converted to another type
plot(st_geometry(nc))
plot(st_geometry(st_centroid(nc)), add=TRUE, col='red')
## Warning in st_centroid.sf(nc): st_centroid assumes attributes are constant over
## geometries of x

Using sf and the tidyverse

As I mentioned, one of the benefits of using the sf package is that commands from the other tidyverse package have defined methods for spatial objects. The dplyr package has a ton of helpful functions for maniputlating data in R. For example, we might select a single row from a shapefile based on the value of its attributes by using the dplyr::filter() command:

durham.cty <- nc %>% 
  filter(., NAME == "Durham")
## We can also use the bracket approach
durham.cty2 <- nc[nc$NAME == "Durham",]

plot(st_geometry(nc))
plot(st_geometry(durham.cty), add=TRUE, col="blue")

Or perhaps we only want a few of the columns in the dataset (because shapefiles always have lots of extra stuff). We can use dplyr::select() to choose columns by name:

nc.select <- nc %>% 
  select(., c("CNTY_ID", "NAME", "FIPS"))
plot(nc.select)

Notice that the geometries are sticky, this will be important later

Previous
Next