Introduction to Global Fishing Watch data in R

Author

Jason Flower

This workshop will provide you with an introduction to downloading and visualizing Global Fishing Watch data in R. Global Fishing Watch provides data on apparent fishing effort, vessel tracks and events, and more; see the website for details.

Prerequisites

We will use the gfwr package to access Global Fishing Watch data. You will also need packages for manipulating and visualizing the data. You can install all these packages using the following code:

install.packages(c("terra", "sf", "dplyr", "tidyr", "ggplot2", "rnaturalearth", "usethis", "remotes"))

remotes::install_github("GlobalFishingWatch/gfwr")

If you have used gfwr before, make sure you are using v2 or higher as there were some major changes.

If you have problems installing terra, there are more details about installing it here.

Global Fishing Watch API token

To access Global Fishing Watch data, you will need an API token stored on your computer. The token is like a code that you need to able to access the data. This is free, but you need to register for a Global Fishing Watch account if you don’t already have one. Then you can request an API access token. Here is an example of how I filled in the details requested:

Once you have created the token, you need to copy it into your .Renviron file. You can do this by:

  1. Running usethis::edit_r_environ(), which should open your .Renviron file.
  2. Add the words GFW_TOKEN="PASTE_YOUR_TOKEN_HERE" to the file, replacing “PASTE_YOUR_TOKEN_HERE” with your API token.
  3. Save the .Renviron file, and restart your R session, e.g. close and re-open RStudio.

That should be it. If you have any problems, drop me a line before the workshop and we will sort it out!

Introduction

Global Fishing Watch (GFW) provide access to their database of apparent fishing effort and vessel tracks via the gfwr package. This workshop will show you how to download, summarize and visualize fishing effort data in R. GFW use machine learning to predict when vessels identified as fishing vessels are fishing. The model is not 100% accurate and another important caveat is that only vessels that use the automatic vessel identification system (AIS) are present in the data: principally larger vessel (> 24m in length); less than 1% of vessels <12 m length are represented in the data (see GFW website for detailed information).

There is a great tutorial on making maps with gfwr on their website. Especially useful if you want to make nice ggplot maps.

#load the libraries we will need
library(terra) #raster and vector geospatial data handling
library(dplyr) #data manipulation and summarizing
library(tidyr) #making "long" data "wide"
library(sf) #vector geospatial data handling
library(ggplot2) #plots
library(gfwr) #downloading GFW data

Downloading fishing effort data

We will use the function get_raster() from gfwr to download some fishing effort data. We need to give get_raster() several pieces of information, as documented on the gfwr website:

  • The spatial resolution, which can be LOW (0.1 degree) or HIGH (0.01 degree)
  • The temporal resolution, which can be HOURLY, DAILY, MONTHLY, YEARLY or ENTIRE.
  • The variable to group by: FLAG, GEARTYPE, FLAGANDGEARTYPE, MMSI or VESSEL_ID
  • The date range note: this must be 366 days or less
  • The region polygon in sf format or the region code (such as an EEZ code) to filter the raster
  • The source for the specified region. Currently, EEZ, MPA, RFMO or USER_SHAPEFILE (for sf shapefiles).

In this example, we will download 2022 fishing effort data for Vanuatu’s EEZ.

We need a polygon of Vanuatu’s EEZ to include in our query to GFW. I generally get EEZ and other marine boundaries from the Marine Regions website, which also has an R package, mregions2, for downloading data within R (though see emLab’s spatialgridr package for a simpler function get_boundary() for downloading both marine and terrestrial boundaries). For now, we can download Vanuatu’s EEZ direct from the Marine Regions website:

vanuatu_eez <- st_read("https://geo.vliz.be/geoserver/wfs?request=getfeature&service=wfs&version=1.1.0&typename=MarineRegions:eez&filter=%3CFilter%3E%3CPropertyIsEqualTo%3E%3CPropertyName%3Emrgid_eez%3C/PropertyName%3E%3CLiteral%3E8313%3C/Literal%3E%3C/PropertyIsEqualTo%3E%3C/Filter%3E") %>%
  st_set_crs(4326) # we have to set the coordinate reference system since non is included in the web format we downloaded
Reading layer `eez' from data source 
  `https://geo.vliz.be/geoserver/wfs?request=getfeature&service=wfs&version=1.1.0&typename=MarineRegions:eez&filter=%3CFilter%3E%3CPropertyIsEqualTo%3E%3CPropertyName%3Emrgid_eez%3C/PropertyName%3E%3CLiteral%3E8313%3C/Literal%3E%3C/PropertyIsEqualTo%3E%3C/Filter%3E' 
  using driver `GML'
Simple feature collection with 1 feature and 32 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: 163.1667 ymin: -21.64288 xmax: 173.6089 ymax: -12.16266
CRS:           NA

Using only the exact boundary of the EEZ can result in some data close to the boundary not being included, so I will use the bounding box of the EEZ polygon. We might also want to see the fishing effort in the area surrounding the EEZ.

vanuatu_bbox <- vanuatu_eez %>% 
  st_bbox() %>%
  st_as_sfc() %>%
  st_sf()

Now we can build our query to get our fishing effort data:

vanuatu_fishing_effort <- get_raster(spatial_resolution = 'LOW', 
                                   temporal_resolution = 'YEARLY',
                                   group_by = 'FLAGANDGEARTYPE',
                                   start_date = "2022-01-01",
                                   end_date = "2022-12-31", 
                                   region = vanuatu_bbox, 
                                   region_source = 'USER_SHAPEFILE')
Rows: 12532 Columns: 7
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (2): Flag, Geartype
dbl (5): Lat, Lon, Time Range, Vessel IDs, Apparent Fishing Hours

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Let’s have a look at what we got:

head(vanuatu_fishing_effort)
# A tibble: 6 × 7
    Lat   Lon `Time Range` Flag  Geartype    `Vessel IDs` Apparent Fishing Hou…¹
  <dbl> <dbl>        <dbl> <chr> <chr>              <dbl>                  <dbl>
1 -17.6  170.         2022 CHN   drifting_l…           14                  41.2 
2 -12.7  165.         2022 CHN   drifting_l…            3                  20.0 
3 -16.6  171          2022 CHN   drifting_l…           15                  74.2 
4 -15.4  170.         2022 <NA>  drifting_l…            1                   0.71
5 -14.9  170.         2022 <NA>  drifting_l…            1                   1.96
6 -16.9  166.         2022 FJI   drifting_l…            1                   1.87
# ℹ abbreviated name: ¹​`Apparent Fishing Hours`

This is a tibble of Apparent Fishing Hours within Vanuatu’s EEZ for the year 2022, and we have the coordinates (Lat and Lon) where the fishing took place, as well as the the type of fishing gear, vessel flag, and vessel ID.

Summarizing the data

We have spatial data, so the first thing I want to do is make a map! But at the moment, there may be more than one value in each grid cell (Latitude and Longitude), because more than one vessel might have fished in each cell. We can get the total fishing effort in each grid cell:

vanuatu_total_fishing_effort <- vanuatu_fishing_effort %>% 
  group_by(Lon, Lat) %>% 
  summarise(total_effort = sum(`Apparent Fishing Hours`, na.rm = TRUE)) %>% 
  ungroup()

It is important that you do group_by(Lon, Lat) and not group_by(Lat, Lon) because in the next step, the coordinate columns need to be in the order longitude followed by latitude.

Rasterizing the data

Ok, so now we have the total fishing effort in each cell. I am going to turn this data into a raster so that we can map it easily. We use the terra package for this:

vanuatu_effort_2022_raster <- vanuatu_total_fishing_effort %>%
    rast(type = "xyz", crs = "epsg:4326") # we need to have data in xyz format for this to work: first column is "Longitude", next "Latitude" and finally the values we want, in our case fishing effort.

Mapping

We now have a raster that we can plot:

plot(vanuatu_effort_2022_raster)

We have a map! There is on cell, outside the EEZ in the south west corner, that has a much higher value than all others. This results in most cells having the same colour on the colour scale. We can set any value outside the EEZ polygon to NA, called masking, and this will remove the pesky outlier:

vanuatu_effort_2022_raster <- vanuatu_effort_2022_raster %>%
  mask(vanuatu_eez) #this sets any values outside the EEZ polygon to NA

Ok, lets map this again:

plot(vanuatu_effort_2022_raster)
lines(vanuatu_eez, col = "grey40") #add the EEZ boundary as grey line

Much nicer! Why do you think there is no fishing effort around the islands? Is there really no fishing effort?

Click for answers!

Lets get the 24nm boundary for Vanuatu from Marine Regions and plot that:

vanuatu_24nm <- st_read("https://geo.vliz.be/geoserver/wfs?request=getfeature&service=wfs&version=1.1.0&typename=MarineRegions:eez_24nm&filter=%3CFilter%3E%3CPropertyIsEqualTo%3E%3CPropertyName%3Emrgid_eez%3C/PropertyName%3E%3CLiteral%3E8313%3C/Literal%3E%3C/PropertyIsEqualTo%3E%3C/Filter%3E") %>%
  st_set_crs(4326)
Reading layer `eez_24nm' from data source 
  `https://geo.vliz.be/geoserver/wfs?request=getfeature&service=wfs&version=1.1.0&typename=MarineRegions:eez_24nm&filter=%3CFilter%3E%3CPropertyIsEqualTo%3E%3CPropertyName%3Emrgid_eez%3C/PropertyName%3E%3CLiteral%3E8313%3C/Literal%3E%3C/PropertyIsEqualTo%3E%3C/Filter%3E' 
  using driver `GML'
Simple feature collection with 1 feature and 16 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: 166.1282 ymin: -20.66389 xmax: 170.6639 ymax: -12.67013
CRS:           NA
plot(vanuatu_effort_2022_raster)
lines(vanuatu_eez, col = "grey40")
lines(vanuatu_24nm, col = "red")

Now we can see that there is minimal fishing effort within the 24nm contiguous zone. This is because foreign based tuna fishing vessels are not allowed to fish within this zone.

Although our map shows minimal fishing effort, it is important to remember that only vessels with AIS are represented in this data. Smaller, vessels with no AIS might be fishing within the 24nm zone, but they won’t show up in GFW data.

Fishing effort by flag country

Several countries fish within Vanuatu’s EEZ. Lets summarize our original GFW data, but this time include Flag:

vanuatu_total_flag_fishing_effort <- vanuatu_fishing_effort %>% 
  group_by(Lon, Lat, Flag) %>% 
  summarise(total_effort = sum(`Apparent Fishing Hours`, na.rm = TRUE)) %>% 
  ungroup()

head(vanuatu_total_flag_fishing_effort)
# A tibble: 6 × 4
    Lon   Lat Flag  total_effort
  <dbl> <dbl> <chr>        <dbl>
1  163. -21.6 NCL           24.6
2  163. -21.5 NCL           25.5
3  163. -21.4 NCL           43.9
4  163. -21.3 NCL           19.1
5  163. -21.2 NCL           15.3
6  163. -21.1 NCL           34.2

We can turn this into a raster with one layer for each country flag, but we will need to make the data “wide”, so that each country’s effort is in its own column, This is a job for the pivot_wider() function from the tidyr package:

vanuatu_total_flag_fishing_effort_wide <- vanuatu_total_flag_fishing_effort %>%
  pivot_wider(names_from = Flag, 
              values_from = total_effort)

head(vanuatu_total_flag_fishing_effort_wide)
# A tibble: 6 × 13
    Lon   Lat   NCL   CHN   TWN  `NA`   FJI   JPN   STP   VUT   KOR   MYS   COK
  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1  163. -21.6  24.6    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA
2  163. -21.5  25.5    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA
3  163. -21.4  43.9    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA
4  163. -21.3  19.1    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA
5  163. -21.2  15.3    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA
6  163. -21.1  34.2    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA

We can now make this into a multi-layer raster. Each column after the “Lon” and “Lat” columns will be made into a separate raster layer, i.e. we will have one layer per country flag

vanuatu_effort_flag_raster <- vanuatu_total_flag_fishing_effort_wide %>%
  rast(type = "xyz", crs = "epsg:4326") 

Lets try plotting and see what we get!

plot(vanuatu_effort_flag_raster, fun = function(x)lines(vanuatu_eez))

From this we can see that most of the fishing is done by Fiji (FJI) and China (CHN). The only problem if we want to do a quantitative analysis is that we have fishing effort outside the EEZ as well. We can easily mask out values outside the EEZ, but lets see what happens when we then plot:

vanuatu_effort_flag_raster_masked <- mask(vanuatu_effort_flag_raster, vanuatu_eez)

plot(vanuatu_effort_flag_raster_masked)

Several country flags have no fishing effort within the EEZ, but there is still a space for them in the plot since they are NAs. We can remove these using the following code, which only selects layers that sum to more than zero:

vanuatu_effort_flag_raster_masked_eez <- vanuatu_effort_flag_raster_masked %>%
  subset(which(global(vanuatu_effort_flag_raster_masked, "sum", na.rm = TRUE) > 0))

plot(vanuatu_effort_flag_raster_masked_eez, fun = function(x)lines(vanuatu_eez))

Great, we now have only the data within the EEZ. Lets make a plot of fishing effort by flag:

vanuatu_effort_flag_raster_masked_eez %>%
  global("sum", na.rm = TRUE) %>% #get the sum of values in each layer
  tibble::rownames_to_column("Flag") %>% #create a column with the Flag (it got converted to rownames)
  ggplot() +
    geom_col(aes(x= Flag, y = sum, fill = Flag)) + 
    ylab("Total fishing effort in 2022") +
    theme_bw()

So we now see clearly the relative contributions of each flag state to the fishing effort in Vanuatu in 2022.

Fishing effort by gear type

Lets repeat the same analysis we did for flag state, but this time for gear type. We already have gear type in the data we downloaded, so we just need to do the same summarizing and rasterizing we did before:

vanuatu_effort_gear_raster <- vanuatu_fishing_effort %>% 
  group_by(Lon, Lat, Geartype) %>% 
  summarise(total_effort = sum(`Apparent Fishing Hours`, na.rm = TRUE)) %>%  #total fishing effort for each geartype
  ungroup() %>%
  pivot_wider(names_from = Geartype,   #each geartype fishing effort becomes a column
              values_from = total_effort) %>% 
  rast(type = "xyz", crs = "epsg:4326") %>% #create the raster
  mask(vanuatu_eez) %>% #mask values outside the EEZ
  subset(., which(global(., "sum", na.rm = TRUE) > 0)) #remove any layers that are NA
`summarise()` has grouped output by 'Lon', 'Lat'. You can override using the
`.groups` argument.
plot(vanuatu_effort_gear_raster)

So we can see that drifting longlines are the predominant fishing gear. The ‘fishing’ category refers to “fishing vessels that could not be more specifically classified” by GFW, so don’t know what type of gear they are using, but they are fishing.

We can create a graph, same as before:

vanuatu_effort_gear_raster %>%
  global("sum", na.rm = TRUE) %>% #get the sum of values in each layer
  tibble::rownames_to_column("Gear") %>% #create a column with the Gear (it got converted to rownames)
  ggplot() +
    geom_col(aes(x= Gear, y = sum, fill = Gear)) + 
    ylab("Total fishing effort in 2022") +
    theme_bw()

Multiple years of GFW data

Unfortunately you can only get data for one year at a time from Global Fishing Watch using the gfwr package. If we want more than one year we can use a loop. Here is some code that you can use if you want to get several years of data:

start_year <- 2020
end_year <- 2022

#get GFW fishing effort data for each year. lapply creates a list with each years data as an element in the list
vanuatu_fishing_2020_2022 <- lapply(start_year:end_year, function(yr) {
  get_raster(
    spatial_resolution = 'LOW',
    temporal_resolution = "YEARLY",
    group_by = "GEARTYPE",
    start_date = paste0(yr, "-01-01"),
    end_date = paste0(yr, "-12-31"),
    region = vanuatu_bbox,
    region_source = "USER_SHAPEFILE"
  )
}) %>% 
   do.call(bind_rows, .)
Rows: 6491 Columns: 6
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (1): geartype
dbl (5): Lat, Lon, Time Range, Vessel IDs, Apparent Fishing Hours

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Rows: 5820 Columns: 6
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (1): geartype
dbl (5): Lat, Lon, Time Range, Vessel IDs, Apparent Fishing Hours

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Rows: 7318 Columns: 6
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (1): geartype
dbl (5): Lat, Lon, Time Range, Vessel IDs, Apparent Fishing Hours

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

We can now use similar code as before to get the annual total fishing effort.

vanuatu_fishing_2020_2022_raster <- vanuatu_fishing_2020_2022 %>% 
  group_by(Lon, Lat, `Time Range`) %>% #time range just has the year in it
  summarise(total_effort = sum(`Apparent Fishing Hours`, na.rm = TRUE)) %>%
  ungroup() %>%
  pivot_wider(names_from = `Time Range`, values_from = total_effort) %>%
  rast(type = "xyz", crs = "epsg:4326") %>%
  mask(vanuatu_eez)
`summarise()` has grouped output by 'Lon', 'Lat'. You can override using the
`.groups` argument.
plot(vanuatu_fishing_2020_2022_raster, fun = function(x)lines(vanuatu_eez))

Vessel information

What if we want to find info for a particular vessel? Lets find which vessel did the most fishing (greatest fishing effort) within Vanuatu and the surrounding area. First we retrieve fishing effort data for 2022, including the vessel ID in the data:

vessel_fishing_effort <- get_raster(spatial_resolution = 'LOW', 
                                   temporal_resolution = 'YEARLY',
                                   group_by = 'VESSEL_ID',
                                   start_date = "2022-01-01",
                                   end_date = "2022-12-31", 
                                   region = vanuatu_bbox, 
                                   region_source = 'USER_SHAPEFILE')
Rows: 48129 Columns: 16
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr  (6): Vessel ID, Flag, Vessel Name, Gear Type, Vessel Type, CallSign
dbl  (6): Lat, Lon, Time Range, MMSI, IMO, Apparent Fishing Hours
dttm (4): Entry Timestamp, Exit Timestamp, First Transmission Date, Last Tra...

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Next, we find the total fishing effort for each vessel ID, and arrange the data in order of highest to lowest fishing effort:

vessel_total_effort <- vessel_fishing_effort %>%
  group_by(`Vessel ID`) %>%
  summarise(total_effort = sum(`Apparent Fishing Hours`, na.rm = TRUE)) %>%
  ungroup() %>%
  arrange(desc(total_effort))

head(vessel_total_effort)
# A tibble: 6 × 2
  `Vessel ID`                           total_effort
  <chr>                                        <dbl>
1 09b87f274-4055-8e05-ca10-5b8aaa41fcae        6487.
2 efeff1f75-5c37-654d-385d-92a453a61205        5151.
3 877bf9bef-fbd7-57c8-0664-265200a52a25        4970.
4 831651a2d-d0c0-691a-299a-693d01bf73aa        4916.
5 baa426976-6f0f-35e3-3ee5-ffce9b5cb264        4849.
6 b1b73b1b4-4694-813b-09f1-7f721bc67a52        4641.

The first row is the vessel with the highest total fishing effort. This vessel ID is a code that GFW uses to identify vessels within its own databases. A lot of work goes into matching a vessel to an ID, see here for the GFW vignette devoted to this! We can get a bit more info about our top fishing vessel using the get_vessel_info()

top_fisher_info <- get_vessel_info(vessel_fishing_effort$`Vessel ID`[1])
1 total vessels

This gives us lots of info in list format. We can see the self reported name and other details:

top_fisher_info$selfReportedInfo
# A tibble: 1 × 14
  index vesselId   ssvid shipname nShipname flag  callsign imo   messagesCounter
  <dbl> <chr>      <chr> <chr>    <chr>     <chr> <chr>    <lgl>           <int>
1     1 2f11a3a2c… 4125… ZHONGSH… ZHONGSHU… CHN   BZYZ5    NA           10658380
# ℹ 5 more variables: positionsCounter <int>, sourceCode <list>,
#   matchFields <chr>, transmissionDateFrom <chr>, transmissionDateTo <chr>

We can get vessel event data from GFW using get_event(). We can get, there are several categories of event you can query (see ?get_event), but we are going to retrieve port visits:

port_visits <- get_event(event_type = "PORT_VISIT",
            vessels = vessel_total_effort$`Vessel ID`[1],
            start_date = "2022-01-01",
            end_date = "2022-12-31")
[1] "Downloading 7 events from GFW"

This is a tibble with 7 port visit events in it. We can turn this into a spatial object and plot it against our other data to see which port it is going to:

port_visits_vec <- vect(port_visits,
                       geom = c("lon", "lat"),
                       crs = "epsg:4326")

plot(vanuatu_effort_2022_raster)
lines(vanuatu_eez, col = "grey40")
points(port_visits_vec)

I don’t see the port visit points? Problem is, they are off our mapping area. We can adjust the extent of the map to include them, but first we need to know the full extent of our data:

#extent of the ports visit data
ports_extent <- ext(port_visits_vec)

#extent of the fishing effort raster
effort_raster_extent <- ext(vanuatu_effort_2022_raster) 

plot(vanuatu_effort_2022_raster, ext = c(effort_raster_extent[1], ports_extent[2], effort_raster_extent[3], effort_raster_extent[4]))
lines(vanuatu_eez, col = "grey40")
points(port_visits_vec)

We can see the points now, but where are they? We can add in Natural Earth country boundaries so that we have a reference for where the port visit points are.

plot(vanuatu_effort_2022_raster, ext = c(effort_raster_extent[1], ports_extent[2], effort_raster_extent[3], effort_raster_extent[4]))
lines(vanuatu_eez, col = "grey40")
lines(rnaturalearth::ne_countries(scale = "large", continent = "oceania", returnclass = "sv")) #add in Oceania countries
points(port_visits_vec)

We can see that the top fishing vessel visits ports in Fiji, probably to unload the catch.