Introduction

This workshop will provide you with an introduction to manipulating raster spatial data using the R package terra. terra and its predecessor, raster are widely used for spatial data manipulation in R. No prior experience of spatial data is assumed, but this short workshop will not have time to delve into some important aspects of spatial data such as projections.

We will cover:

  • What is a raster?
  • Creating and loading rasters in terra
  • Cropping and masking rasters
  • Exploring raster values, including simple statistical summaries and some raster math
  • Projecting
  • Multi-layer rasters

terra’s predecessor raster had many of the same functions as terra, and I will mention how functions have changed or been renamed which might be helpful for people migrating from using raster. In the words of the creator of both packages: terra is simpler, faster and can do more, so definitely switch to terra if you are still using raster!

raster users note: Boxes like this will highlight differences between the raster package and terra.

Resources:

Prerequisites

You will need the terra package installed, which can be done by running install.packages("terra"). If you have problems, there are more details about installing it here. I will also show how to make an interactive map with terra, which requires the leaflet package to be installed; install.packages("leaflet").

We can now load the terra package:

library(terra)
#> terra 1.7.78

Spatial data

There are basically two types of spatial data: vector and raster

Vector data

Can be points, lines or polygons. Useful for representing things like survey locations, rivers, and boundaries.

pts <- rbind(c(3.2,4), c(3,4.6), c(3.8,4.4), c(3.5,3.8), c(3.4,3.6), c(3.9,4.5)) |>
  vect()

lnes <- as.lines(vect(rbind(c(3,4.6), c(3.2,4), c(3.5,3.8)))) |>
  rbind(as.lines(vect(rbind(c(3.9, 4.5), c(3.8, 4.4), c(3.5,3.8), c(3.4,3.6)))))

lux <- vect(system.file("ex/lux.shp", package = "terra"))

par(mfrow = c(1,3))

plot(pts, axes = F, main = "Points")
plot(lnes, col = "blue", axes = F, main = "Lines")
plot(lux, "NAME_2", col = terrain.colors(12), las = 1, axes = F, main = "Polygons")


par(mfrow = c(1,3))

Raster data

Raster data is a grid of rectangles, normally called cells. Each cell has a value, making rasters useful for storing continuous data, such as temperature and elevation.

Here is an example of raster data, where each cell in the raster represents elevation.

elev <- system.file("ex/elev.tif", package = "terra") |>
  rast() |>
  aggregate(fact = 2)

plot(elev, las = 1, main = "Elevation map")

elev |>
  as.polygons(aggregate = FALSE, na.rm = FALSE) |>
  lines(col = "grey40", lwd = 0.2)

Getting started

Let’s start by creating our own raster. We can create rasters from scratch and load them from a file using the function rast(). We can create a simple raster by specifying the x and y limits for the raster and the resolution (how big each cell is).

raster users note: rast() replaces raster()

#create raster
ras <- rast(xmin = 0, xmax = 10, ymin = 0, ymax = 10, resolution = 2)

#see what we've created
ras
#> class       : SpatRaster 
#> dimensions  : 5, 5, 1  (nrow, ncol, nlyr)
#> resolution  : 2, 2  (x, y)
#> extent      : 0, 10, 0, 10  (xmin, xmax, ymin, ymax)
#> coord. ref. : lon/lat WGS 84

The figure below shows what most of the terms above refer to. As you can see, you don’t need to use all the terms to define a raster. Couple of other points:

  • Every object in R has a class, such as data.frame and as you can see, rasters in terra are of class SpatRaster.
  • We did not tell rast() which coordinate reference system to use, so it defaults to using longitude latitude coordinates, also known as EPSG 4326.

raster users note: rasters are now SpatRaster class not RasterLayer

But what does the raster we created actually look like when plotted. Let’s see. All we need is plot()

plot(ras)

Why is there no plot? Because the raster we created is empty; there are no values associated with the the cells. Let’s assign some values to each cell in the raster and try again. First we will find out how many cells are in our raster using `ncell()

ncell(ras)
#> [1] 25

Ok, now we know this, we can give our raster cells values from 1 to 25:

values(ras) <- 1:25

plot(ras)

Now our raster has values, we get a plot! Each cell has an integer value between 1 and 25, with cell values increasing from left to right and top to bottom. So the values start being “filled up” in the top left, and finish in the bottom right.

Let’s have another look at our raster properties

ras
#> class       : SpatRaster 
#> dimensions  : 5, 5, 1  (nrow, ncol, nlyr)
#> resolution  : 2, 2  (x, y)
#> extent      : 0, 10, 0, 10  (xmin, xmax, ymin, ymax)
#> coord. ref. : lon/lat WGS 84 
#> source(s)   : memory
#> name        : lyr.1 
#> min value   :     1 
#> max value   :    25

We can now see a few extra pieces of information compared to last time:

  • sources(s): where is the data held on your computer? It says memory for this raster, indicating that the raster is in the computer memory. Rasters can also be held on your hard disk, in which case this will be the file name of the raster. We won’t go into details here, but terra is smart about loading data into memory, only doing so when it needs to and it thinks it will have enough space.
  • name: what is the raster called?
  • min value & max value: the minimum and maximum values in the raster

Ok, now we understand the basic structure of a raster, let’s load some data.

Exploring some real world data

Sea surface temperature can be measured by satellites, and the National Oceanic and Atmospheric Administration (NOAA) in the U.S. has been doing so since 1972! Daily sea surface temperature for the world from 1985 to the present is available via the NOAA Coral Reef Watch website.

We are going to explore sea surface temperature (SST) data for the Great Barrier Reef in Australia. All the data is in the data folder in the Github repository, and full details on how I got the data are in the data_prep.R script in that folder.

Load the data

We can use the same rast() command that we used to make our own raster to load data The data is stored in GeoTiff (.tif) format, which is widely used for raster data. You can get a full list of data formats that terra can read and write by running gdal(drivers = TRUE).

sst <- rast("../data/gbr_temp_2023_05_31.tif") #load the data

sst #view the properties
#> class       : SpatRaster 
#> dimensions  : 801, 1101, 1  (nrow, ncol, nlyr)
#> resolution  : 0.04999999, 0.05  (x, y)
#> extent      : 110, 165.05, -45, -4.95  (xmin, xmax, ymin, ymax)
#> coord. ref. : lon/lat WGS 84 (EPSG:4326) 
#> source      : gbr_temp_2023_05_31.tif 
#> name        : CRW_SST_2 
#> min value   :      9.59 
#> max value   :     30.46 
#> unit        :   Celsius

If the data loaded correctly, you should see the properties as shown above. We have all the properties I described earlier, but there are a couple of things worth noting:

  • the coordinate reference system (crs) is in lon/ lat, which means the unit of measurement is degrees. So the extent coordinates are in degrees, i.e. xmin = 110\(^\circ\), xmax = 165.05\(^\circ\), ymin = -45.00\(^\circ\) and ymax = 165.05\(^\circ\), and resolution is also in degrees, i.e 0.05\(^\circ\).
  • the source is the filename of the data we loaded. This means it is on disk, not in memory. We can double check by running inMemory(sst) which should return FALSE.
  • unit is Celsius. Most rasters don’t come with the units, so you will normally need to look at external metadata to find this information.

Plot the data

We want to plot our data to see what it looks like. This is the first thing you should do with almost any spatial data to do a quick check that it looks right

plot(sst)

The values in the legend seem about what we would expect for water temperatures, ranging from ~10 - 30\(^\circ\) Celsius.

The white areas of the plot are cells with NA values, which in this case are land; there are no sea surface temperature data for the land! We can see the land mass of Australia taking up much of the map and Papua New Guinea to the north.

We can set the NA values to a different colour if we want, which can be helpful if you want to see some NA cells that are getting lost against the other colours.

plot(sst, colNA = "black")

Note that every cell within our raster has to have a value.

Load and plot some vector data

To put our raster data in context, let’s plot the Great Barrier Reef Marine Park boundary (dowloaded from here). This is vector data. You might be used to handling vector data with the sf package, but terra can also be used for vector data manipulation. We can load a vector using vect()

raster users note: terra has its own methods for handling vector data unlike raster which used the sp package for vector data handling. Vector data in terra are SpatVector objects; different from sf objects.

#load the park boundary
gbr_boundary <- vect("../data/gbr_MPA_boundary.gpkg")

We can plot vector boundaries on top of raster data using lines():

#plot the SST data and the boundary on top
plot(sst)
lines(gbr_boundary) #this plots the vector lines over the raster data

We now see the outline of the Great Barrier Reef marine park on top of the sea surface temperature data we plotted before.

Crop and mask data

The data we have at the moment is for a much larger area than just the Great Barrier Reef marine park. To get only SST data for the marine park, we need to crop and mask the raster data using the marine park boundary.

Cropping means that we keep only the data inside the extent of the vector we are using. Mask means that all the data outside the vector is set to NA or some other value we specify. Let’s have a look how this works.

First let’s have a look at the extent of the marine park boundary. We can get the extent of a raster or vector using ext(). We need to convert this into a SpatVector object for plotting using vect(). We only need to do this for plotting; when we crop, we can just use the marine park boundary as the input.

raster users note: ext() replaces extent()

#get the marine park boundary extent and convert it into a SpatVector so we can plot it
gbr_boundary_extent <- ext(gbr_boundary) |>
  vect()

plot(sst)
lines(gbr_boundary)
lines(gbr_boundary_extent, col = "blue")
Cropping means we remove everything outside the extent (blue box) of our polygon. Masking sets all values outside our polygon to NA.

Cropping means we remove everything outside the extent (blue box) of our polygon. Masking sets all values outside our polygon to NA.

So when we crop, we get only the area within the blue box.

We crop using the crop() function, using the raster we want to crop as the first argument and the vector we are cropping with second.

#crop
sst_cropped <- crop(sst, gbr_boundary)

#plot
plot(sst_cropped)
lines(gbr_boundary)

Now we have cropped our raster, we can mask it so that we only have values for the area within the marine park boundary. We do this using mask:

#mask
sst_cropped_masked <- mask(sst_cropped, gbr_boundary)

#plot
plot(sst_cropped_masked)
lines(gbr_boundary)

Now we only see raster values for cells that are within the marine park boundary. But remember that the areas that are white, still have values, they are just NA values.

Often we want to crop and mask one after the other, and you can do this in one command using crop(sst, gbr_boundary, mask = TRUE).

For reference, here is a figure comparing what crop, mask and crop(mask = TRUE) do:

par(mfrow = c(2,2))

plot(sst, main = "Original raster")
lines(gbr_boundary)

plot(sst_cropped, main = "Cropped")
lines(gbr_boundary)

sst |>
  mask(gbr_boundary) |>
  plot(main = "Masked")
lines(gbr_boundary)

plot(crop(sst, gbr_boundary, mask = TRUE), main = "Cropped and masked")
lines(gbr_boundary)


par(mfrow = c(1,1))

Why not just mask rather than crop and mask? As we see in the figure above, this would mean we have a lot of area we are not interested in and even though most of those cells would be NA they take up space in our raster, so it is not efficient.

Raster values

Now we have cropped and masked our original raster to get only data within the area we are interested in, we can start exploring the values. terra has several functions that can help us do this easily.

Histogram

We can get a histogram of all the values in our raster using the hist() function, which is equivalent to the base R function.

hist(sst_cropped_masked)

The x-axis is in the units of our raster values; temperature in degrees Celsius in our case. The y-axis is frequency; how many cells in our raster have those values.

You can change the histogram using the same arguments you use with hist() in base R. For example, let’s increase the number of bars we have:

hist(sst_cropped_masked, breaks = 100)

Frequency table

We can get a frequency table of values in our raster using freq(). This is essentially the same information that is shown in the histogram in graphical format.

freq(sst_cropped_masked)
#>   layer value count
#> 1     1    20    31
#> 2     1    21   223
#> 3     1    22   396
#> 4     1    23   740
#> 5     1    24  2861
#> 6     1    25  2827
#> 7     1    26  2194
#> 8     1    27  2638
#> 9     1    28   444

The default is to round values to the nearest integer. To get more integers we can do:

freq(sst_cropped_masked, digits = 1) |>
  head() #this is a long table: just show the first few values
#>   layer value count
#> 1     1  19.8     1
#> 2     1  19.9     3
#> 3     1  20.0     6
#> 4     1  20.1     3
#> 5     1  20.2     3
#> 6     1  20.3     7

We can also check how many NA values are in our raster:

freq(sst_cropped_masked, value = NA)
#>   layer value count
#> 1     1    NA 50850

Statistics

We can use the same summary() command that is used in base R to get a summary of the statistical information for an entire raster.

summary(sst_cropped_masked)
#>    CRW_SST_2    
#>  Min.   :19.82  
#>  1st Qu.:24.25  
#>  Median :25.05  
#>  Mean   :25.19  
#>  3rd Qu.:26.49  
#>  Max.   :27.74  
#>  NA's   :50850

The statistics are for all raster values excluding NAs, e.g. the mean of all raster values excluding NA values.

To get individual statistical values, we need to use global(). For example to get the mean value of a raster:

global(sst_cropped_masked, "mean")
#>           mean
#> CRW_SST_2  NaN

Hmmm, what went wrong? The default is for global() to include all values in the raster, and since we have lots of NAs, the result is NaN. We need to set na.rm = TRUE to exclude the NA values.

global(sst_cropped_masked, "mean", na.rm = TRUE)
#>               mean
#> CRW_SST_2 25.18745

The object returned by global() is a data frame, so if you want just the value, you need to do:

sst_mean <- as.numeric(global(sst_cropped_masked, "mean", na.rm = TRUE))

sst_mean
#> [1] 25.18745

raster users note: global() replaces cellStats(), and the default is na.rm = FALSE, whereas the default for cellStats() was na.rm = TRUE.

Classifying

We might want to break our raster values into groups. For example, we could say that all temperatures that are below the mean temperature are classified as “cooler” and all temperatures above the mean are “warmer”. We can do this using the classify function.

raster users note: classify() replaces reclassify()

#first we create a matrix that will be used for the classification
# all values >= 0 and less than the mean become 1
# all values greater than the mean become 2
reclass_matrix <- c(0, sst_mean, 1,
                    sst_mean, Inf, 2) |>
  matrix(ncol = 3, byrow = TRUE)

#now we classify our raster using this matrix
sst_reclassed <- classify(sst_cropped_masked, reclass_matrix)

#plot the result
plot(sst_reclassed)

A gut check tells us this looks right; the warmer areas are in the north, nearer to the equator.

This plot is ok, but it would be better if the colours were more appropriate and the legend gave some useful information.

plot(sst_reclassed, col = c("blue", "red"), plg = list(legend = c("Cooler", "Warmer")), las = 1) #the las = 1 argument just rotates the y-axis labels so that they are horizontal

Raster math

The great thing about rasters is that you can do maths with them! For example, doing sst_reclassed + 1 just adds one to each raster value, and doing sst_reclassed*2 multiplies each raster value by two.

As an example, let’s convert our temperature data into Fahrenheit for our confused colleagues in the U.S. The conversion from Celsius to Fahrenheit is: Fahrenheit = (Celsius * 1.8) + 32.

#do the conversion
sst_fahrenheit <- (sst_cropped_masked*1.8) + 32

#plot our new raster
plot(sst_fahrenheit)

Projecting rasters

So far, our data have been in the same coordinate reference system (crs), but we often have to deal with spatial data that have different crs’s. But first, what is a crs? The following section provides a brief summary with links to further reading.

Coordinate reference systems

Click here to expand this section

If we want to know where things are in space, we need to use some kind of spatial reference. We can use our own arbitrary system, e.g. a sampling grid like the one shown in the photo below where we could define the location of each square relative to one of the corners. But normally we want to know where something is on Earth.

There are two types of coordinate systems we can use to represent locations on Earth:

  • Geographic coordinate systems (GCS): uses a 3-D surface (e.g. globe) to define locations on the Earth using longitude and latitude. The 3-D surface is normally an ellipsoid which approximates the Earth but cannot be exact since the Earth is not a smooth surface. This approximation of the Earth is called an datum and can be aligned with the true Earth (the geoid) in different ways depending on whether we are trying to get a good approximation at some particular location (local datum), e.g. California, or best approximation across the whole Earth (geocentric datum). The figure below (sourced from here) shows examples of these datums.

A commonly used geocentric datum is World Geodetic Survey for 1984 (WGS84). This is almost synonymous with the commonly used coordinate reference system EPSG 4326, but EPSG 4326 defines the latitude and longitude coordinates used on the WGS84 ellipsoid (Ref)

  • Projected coordinate system (projection): Unfortunately, we can’t carry around globes all the time when we want to look at a map, and doing calculations, such as distance and area, in 3-D is much more difficult than 2-D. So we need a way of getting from our 3-D globe to a piece of paper (or for younger people, a screen). To do this we need to ‘project’ from a GCS to a projected coordinate system, which is called projection because we can think of this as putting a light in the center of a globe and the light shines through the globe projecting features onto a piece of paper. The figure below (from QGIS docs) illustrates this, showing the 3 projection families:
a) Cylindrical; b) conical, and; c) planar projecions

  1. Cylindrical; b) conical, and; c) planar projecions

All projections are a compromise because they will always distort the shape, area, distances, and directions of the original GCS. A map that preserves shape is called conformal; one that preserves area is called equal-area; one that preserves distance is called equidistant; and one that preserves direction is called azimuthal. There are a huge number of projections available and choosing the correct one can be challenging. Often your choice will be to use the a local projection that is used by goverment or other authorities in the location you are working, e.g. for my work in Bermuda, much of the data was in the Bermuda 2000 National Grid projection (EPSG:3770). For global work where equal area is important, the Molleweide projection is commonly used.

We have only covered the basics of coordinate reference systems because it is a big topic to cover. The following resources are useful for understanding in more depth:

Projecting rasters (continued)

First let’s load some geospatial data that is in a different projection from the data we have been using so far. This is vector data for two marine protected areas (habitat protection zones) in the Great Barrief Reef marine park. This is just a small part of the Great Barrier Reef Marine Park zoning map. Surprisingly, habitat protection zones allow all types of fishing except trawling. You can view more details about the zones and the activities that are allowed here.

#load the zones data
zones <- vect("../data/gbr_habitat_protection_zones.gpkg")

#take a look
zones
#>  class       : SpatVector 
#>  geometry    : polygons 
#>  dimensions  : 2, 19  (geometries, attributes)
#>  extent      : 1488438, 2047821, -2572774, -1826852  (xmin, xmax, ymin, ymax)
#>  source      : gbr_habitat_protection_zones.gpkg
#>  coord. ref. : GDA94 / Australian Albers (EPSG:3577) 
#>  names       :        LEG_NAME          DATE SCHED_NO      SCHED_DESC PART_NO
#>  type        :           <chr>         <chr>    <int>           <chr>   <int>
#>  values      : Great Barrier ~ Gazetted 2004        1 Amalgamated Gr~       2
#>                Great Barrier ~ Gazetted 2004        1 Amalgamated Gr~       2
#>             TYPE       NAME GROUP_ID  CODE       PERMIT_DESC (and 9 more)
#>            <chr>      <chr>    <chr> <int>             <chr>             
#>  Habitat Protec~ HP-16-5126      HPZ    10   Arlington Reef~             
#>  Habitat Protec~ HP-21-5296      HPZ    10 Capricorn Channel

We can see the crs (coord. ref.) is GDA94 / Australian Albers (EPSG:3577). Out of interest, let’s see what happens if we try and plot the raster data and this new data:

plot(sst_cropped_masked)
lines(zones)

We see our raster, but no zones data because the zones are in a different crs.

Let’s get some more detail about the zones crs using the crs() function:

crs(zones)
#> [1] "PROJCRS[\"GDA94 / Australian Albers\",\n    BASEGEOGCRS[\"GDA94\",\n        DATUM[\"Geocentric Datum of Australia 1994\",\n            ELLIPSOID[\"GRS 1980\",6378137,298.257222101,\n                LENGTHUNIT[\"metre\",1]]],\n        PRIMEM[\"Greenwich\",0,\n            ANGLEUNIT[\"degree\",0.0174532925199433]],\n        ID[\"EPSG\",4283]],\n    CONVERSION[\"Australian Albers\",\n        METHOD[\"Albers Equal Area\",\n            ID[\"EPSG\",9822]],\n        PARAMETER[\"Latitude of false origin\",0,\n            ANGLEUNIT[\"degree\",0.0174532925199433],\n            ID[\"EPSG\",8821]],\n        PARAMETER[\"Longitude of false origin\",132,\n            ANGLEUNIT[\"degree\",0.0174532925199433],\n            ID[\"EPSG\",8822]],\n        PARAMETER[\"Latitude of 1st standard parallel\",-18,\n            ANGLEUNIT[\"degree\",0.0174532925199433],\n            ID[\"EPSG\",8823]],\n        PARAMETER[\"Latitude of 2nd standard parallel\",-36,\n            ANGLEUNIT[\"degree\",0.0174532925199433],\n            ID[\"EPSG\",8824]],\n        PARAMETER[\"Easting at false origin\",0,\n            LENGTHUNIT[\"metre\",1],\n            ID[\"EPSG\",8826]],\n        PARAMETER[\"Northing at false origin\",0,\n            LENGTHUNIT[\"metre\",1],\n            ID[\"EPSG\",8827]]],\n    CS[Cartesian,2],\n        AXIS[\"(E)\",east,\n            ORDER[1],\n            LENGTHUNIT[\"metre\",1]],\n        AXIS[\"(N)\",north,\n            ORDER[2],\n            LENGTHUNIT[\"metre\",1]],\n    USAGE[\n        SCOPE[\"Statistical analysis.\"],\n        AREA[\"Australia - Australian Capital Territory; New South Wales; Northern Territory; Queensland; South Australia; Tasmania; Western Australia; Victoria.\"],\n        BBOX[-43.7,112.85,-9.86,153.69]],\n    ID[\"EPSG\",3577]]"

That’s a lot of info, and not very easy to read. This is the well-known text (wkt) representation of the crs. We can see that GDA94 stands for Geocentric Datum of Australia 1994, but we can get a more basic version of the crs using:

crs(zones, describe = TRUE)
#>                        name authority code
#> 1 GDA94 / Australian Albers      EPSG 3577
#>                                                                                                                                                area
#> 1 Australia - Australian Capital Territory; New South Wales; Northern Territory; Queensland; South Australia; Tasmania; Western Australia; Victoria
#>                          extent
#> 1 112.85, 153.69, -43.70, -9.86

This gives just key information we often need to make sense of a crs. We can see that this crs is intended for use for Australia, including all states. We also get the EPSG code: 3577. EPSG stands for European Petroleum Survey Group who developed a database of ‘official’ crs’s, but now people use EPSG to refer to the database itself (see here for more on the history of EPSG).

Let’s check the crs of the data we have been using

crs(sst_cropped_masked, describe = TRUE)
#>     name authority code area         extent
#> 1 WGS 84      EPSG 4326 <NA> NA, NA, NA, NA

The raster is in a crs with EPSG code 4326. This is widely used for global data, and is ‘unprojected’ data, measured in degrees longitude and latitude.

We want to get our rasters into the same crs. In this case we will project the raster into the local crs. Projecting can be done using project() and providing the data we want to project and a crs. In this case we can use the crs of the zones object, which we already know we can get using crs(zones), but we could also just provide the EPSG code:

raster users note: project() replaces projectRaster()

sst_cropped_masked_projected <- project(sst_cropped_masked, crs(zones))

#as an alternative, we can use the EPSG that we went to project into like this:
#project(sst_cropped_masked, "epsg:3577")

It is really important to understand that when we project a raster, we create a new raster in the crs we are projecting into.

Let’s compare our original raster and the projected version:

sst_cropped_masked
#> class       : SpatRaster 
#> dimensions  : 276, 229, 1  (nrow, ncol, nlyr)
#> resolution  : 0.04999999, 0.05  (x, y)
#> extent      : 142.55, 154, -24.5, -10.7  (xmin, xmax, ymin, ymax)
#> coord. ref. : lon/lat WGS 84 (EPSG:4326) 
#> source(s)   : memory
#> varname     : gbr_temp_2023_05_31 
#> name        : CRW_SST_2 
#> min value   :     19.82 
#> max value   :     27.74 
#> unit        :   Celsius
sst_cropped_masked_projected
#> class       : SpatRaster 
#> dimensions  : 306, 258, 1  (nrow, ncol, nlyr)
#> resolution  : 5429.925, 5429.925  (x, y)
#> extent      : 1056277, 2457197, -2834019, -1172462  (xmin, xmax, ymin, ymax)
#> coord. ref. : GDA94 / Australian Albers (EPSG:3577) 
#> source(s)   : memory
#> name        : CRW_SST_2 
#> min value   :  19.82000 
#> max value   :  27.73348 
#> unit        :   Celsius

We can see that the projected raster is in the Australian Albers crs, and the extent, resolution, and dimensions have all changed. This crs has units of meters, which means the size of each raser cells is 5430m x 5430m.

Let’s compare how the two rasters now look.

par(mfrow = c(1,2)) #so we get plots beside each other: 1 row, 2 columns
plot(sst_cropped_masked)
plot(sst_cropped_masked_projected)


par(mfrow = c(1,1)) #revert back to one plot per row

The maps look very similar, but we can see that the shape is changed and the axes are now in different units, reflecting the projection.

Let’s have a look at a summary of the values for each raster.

summary(sst_cropped_masked)
#>    CRW_SST_2    
#>  Min.   :19.82  
#>  1st Qu.:24.25  
#>  Median :25.05  
#>  Mean   :25.19  
#>  3rd Qu.:26.49  
#>  Max.   :27.74  
#>  NA's   :50850
summary(sst_cropped_masked_projected)
#>    CRW_SST_2    
#>  Min.   :19.82  
#>  1st Qu.:24.27  
#>  Median :25.09  
#>  Mean   :25.21  
#>  3rd Qu.:26.53  
#>  Max.   :27.73  
#>  NA's   :66743

The statistics are similar, but not exactly the same. There are also a lot more NA values in the projected raster, largely because there are more cells: 63204 in the original and 78948 in the projected raster.

These changes in the raster are really important to think about when making decisions about projecting rasters. You should ideally project rasters as little as possible. It is normally better to project vector data into the same crs as a raster since vector can be transformed without loss of precision, but this will depend on the requirements of your project.

Click here for more details on projecting rasters

Let’s take a look at some different methods that can be used to estimate cell values when projecting a raster. We will start with a simple 3x3 raster and project it into Molleweide. The red lined grid in the maps is the original raster grid in vector format projected, and there are maps of the original raster projected using 3 common methods to obtain the new cells values: nearest neighbour, bilinear weighted mean, and area weighted average. Each method gives slightly different values.

proj_moll <- "ESRI:54009"

ras2 <- rast(xmin = 0, xmax = 1, ymin = 0, ymax = 1, res = 1/3) 

values(ras2) <- 1:ncell(ras2)

polys <- ras2 |>
  as.polygons(dissolve = F) |>
  project(proj_moll)

ras_near <- project(ras2, proj_moll, meth = "near")
ras_bilinear <- project(ras2, proj_moll)
ras_average <- project(ras2, proj_moll, method = "average")

pal <- hcl.colors(n = 50, "RdYlGn")

padding <- 0

plot_func <- function(input_raster, input_polygon, title_string){
plot(input_raster, type = "classes", col = pal, xlim = c(ext(input_polygon)$xmin-padding, ext(input_polygon)$xmax+padding), ylim = c(ext(input_polygon)$ymin-padding, ext(input_polygon)$ymax+padding), main = title_string, fun = function() lines(input_polygon, col = "red", lwd = 2))
}

par(mfrow = c(2,2))
plot(ras2, main = "Original raster", col = pal)
plot_func(ras_near, polys, "Nearest neighbour")
plot_func(ras_bilinear, polys, "Bilinear interpolation")
plot_func(ras_average, polys, "Area weighted mean")


par(mfrow = c(1,1))

terra defaults to using bilinear interpolation for continuous values. For categorical values it uses the nearest neighbour method because this avoids the resulting raster ending up with values that were not in the original. Other projection methods available are detailed in the help file ?project.

If you want to know why the projected rasters don’t align with the projected vector grid see this Q&A on Stackoverflow

Rasterlayers

A very useful feature of rasters is that they can have many layers. These layers often represent different time periods, such as days or months. Let’s look at a multi-layer raster which has similar sea surface temperature data to that which we have been looking at, but each layer represents the mean temperature in a month. We load the multi-layer raster in the same way as any other raster, using rast():

sst_monthly <- rast("../data/gbr_monthly_temp.tif")

sst_monthly
#> class       : SpatRaster 
#> dimensions  : 301, 261, 36  (nrow, ncol, nlyr)
#> resolution  : 0.05, 0.05  (x, y)
#> extent      : 142, 155.05, -25, -9.95  (xmin, xmax, ymin, ymax)
#> coord. ref. : lon/lat WGS 84 (EPSG:4326) 
#> source      : gbr_monthly_temp.tif 
#> names       : ym_202006, ym_202007, ym_202008, ym_202009, ym_202010, ym_202011, ... 
#> min values  :  19.80000,  18.51452,  19.00613,    21.096,  23.04516,    24.715, ... 
#> max values  :  29.47767,  28.65097,  28.51323,    28.864,  29.41645,    29.452, ...

raster users note: there is no need for stack() anymore. You can load multi-layer rasters (previously called RasterStack) using rast() and create multi-layer rasters using c(), e.g. c(first_raster_layer, second_raster_layer).

We can see that the raster has 36 layers. Let’s plot it to see what we get.

plot(sst_monthly)

We get one map for each raster layer, but not all the data, because you wouldn’t be able to see the maps if they were much smaller. We can plot a specific subset of the data using double square brackets [[]] to select a set of layers. For example, if we wanted the last 4 layers in the raster, which are layers 33 to 36:

plot(sst_monthly[[33:36]])

Happily, we can use the same functions we have learnt on this multi-layer raster. For example, we can convert all the rasters to Fahrenheit:

sst_monthly_fahrenheit <- (sst_monthly * 1.8) + 32

plot(sst_monthly_fahrenheit)

We can also crop and mask our data:

sst_monthly_cropped_masked <- crop(sst_monthly, gbr_boundary, mask = TRUE)

plot(sst_monthly_cropped_masked)

Zonal statistics

We might want to get data for certain areas of our raster, for example the 2 habitat protection zones we loaded earlier.

Let’s remind ourselves what these look like.

plot(zones)

Remember, the zones are in a different crs to our SST data. We will project the zones into the same crs as our raster, this way we do not change our raster data values.

zones_projected <- project(zones, crs(sst_monthly))

Let’s plot the SST data with the zones. We will plot just the first 4 rasters so we can see the zones more clearly. To get the zones on each of the raster maps, we have to add the lines() function that we have used before as an argument to the plot() function:

plot(sst_monthly_cropped_masked[[1:4]], fun = function()lines(zones_projected))

Now everything is in the same projection, we can get the monthly temperature in each of our zones. There is a function called zonal that provides us with summary statistics for any “zones” that we provide; in our case the two habitat protection zones. The default statistic that is returned is the mean value of raster cells in each zone, but you can get other statistics using the fun = argument, e.g. zonal(sst_monthly, zones_projected, fun = "sum"). See ?zonal for more details.

zones_mean_temp <- zonal(sst_monthly, zones_projected)

zones_mean_temp
#>   ym_202006 ym_202007 ym_202008 ym_202009 ym_202010 ym_202011 ym_202012
#> 1  25.05453  24.83376  24.74528  25.20465  26.03271  27.48973  28.85805
#> 2  23.45174  22.09304  21.80063  22.42598  23.92429  25.79377  27.19416
#>   ym_202101 ym_202102 ym_202103 ym_202104 ym_202105 ym_202106 ym_202107
#> 1  29.19292  29.29952  28.63410  27.81125  26.09139  25.24531  25.02013
#> 2  27.55193  27.93409  28.21229  27.03324  25.38172  23.46953  22.41834
#>   ym_202108 ym_202109 ym_202110 ym_202111 ym_202112 ym_202201 ym_202202
#> 1  24.94499  25.15853  27.02342  28.19143  29.33896  29.36065  29.37086
#> 2  22.41714  22.56429  24.54539  26.41297  27.85243  28.45053  28.05780
#>   ym_202203 ym_202204 ym_202205 ym_202206 ym_202207 ym_202208 ym_202209
#> 1  29.45380  28.45853  27.02372  25.72339  24.63791  24.83121  26.06624
#> 2  28.39454  27.26121  25.76123  24.17066  22.05470  21.20344  22.45951
#>   ym_202210 ym_202211 ym_202212 ym_202301 ym_202302 ym_202303 ym_202304
#> 1  27.66313  28.60063  29.12178  29.03708  29.14737  29.03370  29.02902
#> 2  24.46297  26.16570  27.11324  27.54031  28.11492  28.32108  27.63397
#>   ym_202305
#> 1  27.07731
#> 2  25.35108

We’ve got a data frame with each column showing the mean temperature in each zone for one month. However, this format is not great for plotting. Let’s make it into a “long” data frame and add the date as a column:

#transpose the data so that the rows become columns, then make it into a data frame
zones_mean_temp_t <- t(zones_mean_temp) |> 
as.data.frame() 

#add date column in proper format - use 1st day of month as the day for all values
zones_mean_temp_t$ym <- gsub("ym_", "",  rownames(zones_mean_temp_t)) |> 
paste0("01") |>
as.Date(format = "%Y%m%d")

We can now plot a time series of temperature in our zones. The most northerly zone (red line) has the highest temperatures because it is closer to the equator.

plot(zones_mean_temp_t$ym, zones_mean_temp_t$V1, type = "l", 
      ylim = c(min(zones_mean_temp_t$V2), max(zones_mean_temp_t$V1)), 
      col = "red", 
      xlab = "Date",
      ylab = "Sea surface temperature (Celsius)")
lines(zones_mean_temp_t$ym, zones_mean_temp_t$V2, type = "l", col = "blue")

Nice maps

Everyone likes a nice map. So far we have just created fairly basic maps using the plot function in terra. There are many great plotting packages such as tmap and ggplot that can be used to make maps, but we are not going to cover those here. The Geocomputation with R website is an excellent resource on map making and geospatial data in R in general.

We can make pretty good maps just using terra though! Here we will make a nicer map of our sea surface temperature raster data, using a more appropriate colour scheme and adding a scalebar. There are huge number of options for plotting, see the help file ?plot for details - make sure you choose the help file for terra!

plot(sst_cropped_masked, col = hcl.colors(50, palette = "RdYlBu", rev = TRUE), las = 1) #see ?hcl.colors for more info on colour palettes. rev= TRUE because we want to reverse the palette: red colours are the highest values and blue the lowest
lines(gbr_boundary)
sbar(d = 400, type = "bar", divs = 2, below = "km") #400km scale bar with 2 divisions and "km" written below

If you have the leaflet package installed (you need version > 2.1.1), then you can create interactive maps just by using plet() rather than plot():

plet(sst_cropped_masked, col = hcl.colors(50, palette = "RdYlBu", rev = TRUE), tiles = "Streets")
Click to see a range of colour palettes

The swatches below show all the colour scales available via hcl.colors(). The code is directly from ?hcl.colors().

## color swatches for HCL palettes
hcl.swatch <- function(type = NULL, n = 5, nrow = 11,
  border = if (n < 15) "black" else NA) {
    palette <- hcl.pals(type)
    cols <- sapply(palette, hcl.colors, n = n)
    ncol <- ncol(cols)
    nswatch <- min(ncol, nrow)

    par(mar = rep(0.1, 4),
        mfrow = c(1, min(5, ceiling(ncol/nrow))),
        pin = c(1, 0.5 * nswatch),
        cex = 0.7)

    while (length(palette)) {
        subset <- 1:min(nrow, ncol(cols))
        plot.new()
        plot.window(c(0, n), c(0, nrow + 1))
        text(0, rev(subset) + 0.1, palette[subset], adj = c(0, 0))
        y <- rep(subset, each = n)
        rect(rep(0:(n-1), n), rev(y), rep(1:n, n), rev(y) - 0.5,
             col = cols[, subset], border = border)
        palette <- palette[-subset]
        cols <- cols[, -subset, drop = FALSE]
    }

    par(mfrow = c(1, 1), mar = c(5.1, 4.1, 4.1, 2.1), cex = 1)
}
hcl.swatch("qualitative")

hcl.swatch("sequential")

hcl.swatch("diverging")

hcl.swatch("divergingx")