install.packages(c("terra", "sf", "dplyr", "tidyr", "ggplot2", "rnaturalearth", "usethis", "remotes"))
::install_github("GlobalFishingWatch/gfwr") remotes
Introduction to Global Fishing Watch data in R
This workshop will provide you with an introduction to downloading and visualizing Global Fishing Watch data in R. Global Fishing Watch provides data on apparent fishing effort, vessel tracks and events, and more; see the website for details.
Prerequisites
We will use the gfwr
package to access Global Fishing Watch data. You will also need packages for manipulating and visualizing the data. You can install all these packages using the following code:
If you have used gfwr
before, make sure you are using v2
or higher as there were some major changes.
If you have problems installing terra
, there are more details about installing it here.
Global Fishing Watch API token
To access Global Fishing Watch data, you will need an API token stored on your computer. The token is like a code that you need to able to access the data. This is free, but you need to register for a Global Fishing Watch account if you don’t already have one. Then you can request an API access token. Here is an example of how I filled in the details requested:
Once you have created the token, you need to copy it into your
.Renviron
file. You can do this by:
- Running
usethis::edit_r_environ()
, which should open your.Renviron
file. - Add the words
GFW_TOKEN="PASTE_YOUR_TOKEN_HERE"
to the file, replacing “PASTE_YOUR_TOKEN_HERE” with your API token. - Save the
.Renviron
file, and restart your R session, e.g. close and re-open RStudio.
That should be it. If you have any problems, drop me a line before the workshop and we will sort it out!
Introduction
Global Fishing Watch (GFW) provide access to their database of apparent fishing effort and vessel tracks via the gfwr
package. This workshop will show you how to download, summarize and visualize fishing effort data in R. GFW use machine learning to predict when vessels identified as fishing vessels are fishing. The model is not 100% accurate and another important caveat is that only vessels that use the automatic vessel identification system (AIS) are present in the data: principally larger vessel (> 24m in length); less than 1% of vessels <12 m length are represented in the data (see GFW website for detailed information).
There is a great tutorial on making maps with gfwr
on their website. Especially useful if you want to make nice ggplot maps.
#load the libraries we will need
library(terra) #raster and vector geospatial data handling
library(dplyr) #data manipulation and summarizing
library(tidyr) #making "long" data "wide"
library(sf) #vector geospatial data handling
library(ggplot2) #plots
library(gfwr) #downloading GFW data
Downloading fishing effort data
We will use the function get_raster()
from gfwr
to download some fishing effort data. We need to give get_raster()
several pieces of information, as documented on the gfwr
website:
- The spatial resolution, which can be
LOW
(0.1 degree) orHIGH
(0.01 degree) - The temporal resolution, which can be
HOURLY
,DAILY
,MONTHLY
,YEARLY
orENTIRE
. - The variable to group by:
FLAG
,GEARTYPE
,FLAGANDGEARTYPE
,MMSI
orVESSEL_ID
- The date range
note: this must be 366 days or less
- The region polygon in
sf
format or the region code (such as an EEZ code) to filter the raster - The source for the specified region. Currently,
EEZ
,MPA
,RFMO
orUSER_SHAPEFILE
(forsf
shapefiles).
In this example, we will download 2022 fishing effort data for Vanuatu’s EEZ.
We need a polygon of Vanuatu’s EEZ to include in our query to GFW. I generally get EEZ and other marine boundaries from the Marine Regions website, which also has an R package, mregions2
, for downloading data within R (though see emLab’s spatialgridr
package for a simpler function get_boundary()
for downloading both marine and terrestrial boundaries). For now, we can download Vanuatu’s EEZ direct from the Marine Regions website:
<- st_read("https://geo.vliz.be/geoserver/wfs?request=getfeature&service=wfs&version=1.1.0&typename=MarineRegions:eez&filter=%3CFilter%3E%3CPropertyIsEqualTo%3E%3CPropertyName%3Emrgid_eez%3C/PropertyName%3E%3CLiteral%3E8313%3C/Literal%3E%3C/PropertyIsEqualTo%3E%3C/Filter%3E") %>%
vanuatu_eez st_set_crs(4326) # we have to set the coordinate reference system since non is included in the web format we downloaded
Reading layer `eez' from data source
`https://geo.vliz.be/geoserver/wfs?request=getfeature&service=wfs&version=1.1.0&typename=MarineRegions:eez&filter=%3CFilter%3E%3CPropertyIsEqualTo%3E%3CPropertyName%3Emrgid_eez%3C/PropertyName%3E%3CLiteral%3E8313%3C/Literal%3E%3C/PropertyIsEqualTo%3E%3C/Filter%3E'
using driver `GML'
Simple feature collection with 1 feature and 32 fields
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: 163.1667 ymin: -21.64288 xmax: 173.6089 ymax: -12.16266
CRS: NA
Using only the exact boundary of the EEZ can result in some data close to the boundary not being included, so I will use the bounding box of the EEZ polygon. We might also want to see the fishing effort in the area surrounding the EEZ.
<- vanuatu_eez %>%
vanuatu_bbox st_bbox() %>%
st_as_sfc() %>%
st_sf()
Now we can build our query to get our fishing effort data:
<- get_raster(spatial_resolution = 'LOW',
vanuatu_fishing_effort temporal_resolution = 'YEARLY',
group_by = 'FLAGANDGEARTYPE',
start_date = "2022-01-01",
end_date = "2022-12-31",
region = vanuatu_bbox,
region_source = 'USER_SHAPEFILE')
Rows: 12532 Columns: 7
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (2): Flag, Geartype
dbl (5): Lat, Lon, Time Range, Vessel IDs, Apparent Fishing Hours
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Let’s have a look at what we got:
head(vanuatu_fishing_effort)
# A tibble: 6 × 7
Lat Lon `Time Range` Flag Geartype `Vessel IDs` Apparent Fishing Hou…¹
<dbl> <dbl> <dbl> <chr> <chr> <dbl> <dbl>
1 -17.6 170. 2022 CHN drifting_l… 14 41.2
2 -12.7 165. 2022 CHN drifting_l… 3 20.0
3 -16.6 171 2022 CHN drifting_l… 15 74.2
4 -15.4 170. 2022 <NA> drifting_l… 1 0.71
5 -14.9 170. 2022 <NA> drifting_l… 1 1.96
6 -16.9 166. 2022 FJI drifting_l… 1 1.87
# ℹ abbreviated name: ¹`Apparent Fishing Hours`
This is a tibble of Apparent Fishing Hours
within Vanuatu’s EEZ for the year 2022, and we have the coordinates (Lat
and Lon
) where the fishing took place, as well as the the type of fishing gear, vessel flag, and vessel ID.
Summarizing the data
We have spatial data, so the first thing I want to do is make a map! But at the moment, there may be more than one value in each grid cell (Latitude and Longitude), because more than one vessel might have fished in each cell. We can get the total fishing effort in each grid cell:
<- vanuatu_fishing_effort %>%
vanuatu_total_fishing_effort group_by(Lon, Lat) %>%
summarise(total_effort = sum(`Apparent Fishing Hours`, na.rm = TRUE)) %>%
ungroup()
It is important that you do group_by(Lon, Lat)
and not group_by(Lat, Lon)
because in the next step, the coordinate columns need to be in the order longitude followed by latitude.
Rasterizing the data
Ok, so now we have the total fishing effort in each cell. I am going to turn this data into a raster so that we can map it easily. We use the terra
package for this:
<- vanuatu_total_fishing_effort %>%
vanuatu_effort_2022_raster rast(type = "xyz", crs = "epsg:4326") # we need to have data in xyz format for this to work: first column is "Longitude", next "Latitude" and finally the values we want, in our case fishing effort.
Mapping
We now have a raster that we can plot:
plot(vanuatu_effort_2022_raster)
We have a map! There is on cell, outside the EEZ in the south west corner, that has a much higher value than all others. This results in most cells having the same colour on the colour scale. We can set any value outside the EEZ polygon to NA, called masking, and this will remove the pesky outlier:
<- vanuatu_effort_2022_raster %>%
vanuatu_effort_2022_raster mask(vanuatu_eez) #this sets any values outside the EEZ polygon to NA
Ok, lets map this again:
plot(vanuatu_effort_2022_raster)
lines(vanuatu_eez, col = "grey40") #add the EEZ boundary as grey line
Much nicer! Why do you think there is no fishing effort around the islands? Is there really no fishing effort?
Click for answers!
Lets get the 24nm boundary for Vanuatu from Marine Regions and plot that:
<- st_read("https://geo.vliz.be/geoserver/wfs?request=getfeature&service=wfs&version=1.1.0&typename=MarineRegions:eez_24nm&filter=%3CFilter%3E%3CPropertyIsEqualTo%3E%3CPropertyName%3Emrgid_eez%3C/PropertyName%3E%3CLiteral%3E8313%3C/Literal%3E%3C/PropertyIsEqualTo%3E%3C/Filter%3E") %>%
vanuatu_24nm st_set_crs(4326)
Reading layer `eez_24nm' from data source
`https://geo.vliz.be/geoserver/wfs?request=getfeature&service=wfs&version=1.1.0&typename=MarineRegions:eez_24nm&filter=%3CFilter%3E%3CPropertyIsEqualTo%3E%3CPropertyName%3Emrgid_eez%3C/PropertyName%3E%3CLiteral%3E8313%3C/Literal%3E%3C/PropertyIsEqualTo%3E%3C/Filter%3E'
using driver `GML'
Simple feature collection with 1 feature and 16 fields
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: 166.1282 ymin: -20.66389 xmax: 170.6639 ymax: -12.67013
CRS: NA
plot(vanuatu_effort_2022_raster)
lines(vanuatu_eez, col = "grey40")
lines(vanuatu_24nm, col = "red")
Now we can see that there is minimal fishing effort within the 24nm contiguous zone. This is because foreign based tuna fishing vessels are not allowed to fish within this zone.
Although our map shows minimal fishing effort, it is important to remember that only vessels with AIS are represented in this data. Smaller, vessels with no AIS might be fishing within the 24nm zone, but they won’t show up in GFW data.Fishing effort by flag country
Several countries fish within Vanuatu’s EEZ. Lets summarize our original GFW data, but this time include Flag:
<- vanuatu_fishing_effort %>%
vanuatu_total_flag_fishing_effort group_by(Lon, Lat, Flag) %>%
summarise(total_effort = sum(`Apparent Fishing Hours`, na.rm = TRUE)) %>%
ungroup()
head(vanuatu_total_flag_fishing_effort)
# A tibble: 6 × 4
Lon Lat Flag total_effort
<dbl> <dbl> <chr> <dbl>
1 163. -21.6 NCL 24.6
2 163. -21.5 NCL 25.5
3 163. -21.4 NCL 43.9
4 163. -21.3 NCL 19.1
5 163. -21.2 NCL 15.3
6 163. -21.1 NCL 34.2
We can turn this into a raster with one layer for each country flag, but we will need to make the data “wide”, so that each country’s effort is in its own column, This is a job for the pivot_wider()
function from the tidyr
package:
<- vanuatu_total_flag_fishing_effort %>%
vanuatu_total_flag_fishing_effort_wide pivot_wider(names_from = Flag,
values_from = total_effort)
head(vanuatu_total_flag_fishing_effort_wide)
# A tibble: 6 × 13
Lon Lat NCL CHN TWN `NA` FJI JPN STP VUT KOR MYS COK
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 163. -21.6 24.6 NA NA NA NA NA NA NA NA NA NA
2 163. -21.5 25.5 NA NA NA NA NA NA NA NA NA NA
3 163. -21.4 43.9 NA NA NA NA NA NA NA NA NA NA
4 163. -21.3 19.1 NA NA NA NA NA NA NA NA NA NA
5 163. -21.2 15.3 NA NA NA NA NA NA NA NA NA NA
6 163. -21.1 34.2 NA NA NA NA NA NA NA NA NA NA
We can now make this into a multi-layer raster. Each column after the “Lon” and “Lat” columns will be made into a separate raster layer, i.e. we will have one layer per country flag
<- vanuatu_total_flag_fishing_effort_wide %>%
vanuatu_effort_flag_raster rast(type = "xyz", crs = "epsg:4326")
Lets try plotting and see what we get!
plot(vanuatu_effort_flag_raster, fun = function(x)lines(vanuatu_eez))
From this we can see that most of the fishing is done by Fiji (FJI) and China (CHN). The only problem if we want to do a quantitative analysis is that we have fishing effort outside the EEZ as well. We can easily mask out values outside the EEZ, but lets see what happens when we then plot:
<- mask(vanuatu_effort_flag_raster, vanuatu_eez)
vanuatu_effort_flag_raster_masked
plot(vanuatu_effort_flag_raster_masked)
Several country flags have no fishing effort within the EEZ, but there is still a space for them in the plot since they are NAs. We can remove these using the following code, which only selects layers that sum to more than zero:
<- vanuatu_effort_flag_raster_masked %>%
vanuatu_effort_flag_raster_masked_eez subset(which(global(vanuatu_effort_flag_raster_masked, "sum", na.rm = TRUE) > 0))
plot(vanuatu_effort_flag_raster_masked_eez, fun = function(x)lines(vanuatu_eez))
Great, we now have only the data within the EEZ. Lets make a plot of fishing effort by flag:
%>%
vanuatu_effort_flag_raster_masked_eez global("sum", na.rm = TRUE) %>% #get the sum of values in each layer
::rownames_to_column("Flag") %>% #create a column with the Flag (it got converted to rownames)
tibbleggplot() +
geom_col(aes(x= Flag, y = sum, fill = Flag)) +
ylab("Total fishing effort in 2022") +
theme_bw()
So we now see clearly the relative contributions of each flag state to the fishing effort in Vanuatu in 2022.
Fishing effort by gear type
Lets repeat the same analysis we did for flag state, but this time for gear type. We already have gear type in the data we downloaded, so we just need to do the same summarizing and rasterizing we did before:
<- vanuatu_fishing_effort %>%
vanuatu_effort_gear_raster group_by(Lon, Lat, Geartype) %>%
summarise(total_effort = sum(`Apparent Fishing Hours`, na.rm = TRUE)) %>% #total fishing effort for each geartype
ungroup() %>%
pivot_wider(names_from = Geartype, #each geartype fishing effort becomes a column
values_from = total_effort) %>%
rast(type = "xyz", crs = "epsg:4326") %>% #create the raster
mask(vanuatu_eez) %>% #mask values outside the EEZ
subset(., which(global(., "sum", na.rm = TRUE) > 0)) #remove any layers that are NA
`summarise()` has grouped output by 'Lon', 'Lat'. You can override using the
`.groups` argument.
plot(vanuatu_effort_gear_raster)
So we can see that drifting longlines are the predominant fishing gear. The ‘fishing’ category refers to “fishing vessels that could not be more specifically classified” by GFW, so don’t know what type of gear they are using, but they are fishing.
We can create a graph, same as before:
%>%
vanuatu_effort_gear_raster global("sum", na.rm = TRUE) %>% #get the sum of values in each layer
::rownames_to_column("Gear") %>% #create a column with the Gear (it got converted to rownames)
tibbleggplot() +
geom_col(aes(x= Gear, y = sum, fill = Gear)) +
ylab("Total fishing effort in 2022") +
theme_bw()
Multiple years of GFW data
Unfortunately you can only get data for one year at a time from Global Fishing Watch using the gfwr
package. If we want more than one year we can use a loop. Here is some code that you can use if you want to get several years of data:
<- 2020
start_year <- 2022
end_year
#get GFW fishing effort data for each year. lapply creates a list with each years data as an element in the list
<- lapply(start_year:end_year, function(yr) {
vanuatu_fishing_2020_2022 get_raster(
spatial_resolution = 'LOW',
temporal_resolution = "YEARLY",
group_by = "GEARTYPE",
start_date = paste0(yr, "-01-01"),
end_date = paste0(yr, "-12-31"),
region = vanuatu_bbox,
region_source = "USER_SHAPEFILE"
)%>%
}) do.call(bind_rows, .)
Rows: 6491 Columns: 6
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (1): geartype
dbl (5): Lat, Lon, Time Range, Vessel IDs, Apparent Fishing Hours
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Rows: 5820 Columns: 6
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (1): geartype
dbl (5): Lat, Lon, Time Range, Vessel IDs, Apparent Fishing Hours
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Rows: 7318 Columns: 6
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (1): geartype
dbl (5): Lat, Lon, Time Range, Vessel IDs, Apparent Fishing Hours
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
We can now use similar code as before to get the annual total fishing effort.
<- vanuatu_fishing_2020_2022 %>%
vanuatu_fishing_2020_2022_raster group_by(Lon, Lat, `Time Range`) %>% #time range just has the year in it
summarise(total_effort = sum(`Apparent Fishing Hours`, na.rm = TRUE)) %>%
ungroup() %>%
pivot_wider(names_from = `Time Range`, values_from = total_effort) %>%
rast(type = "xyz", crs = "epsg:4326") %>%
mask(vanuatu_eez)
`summarise()` has grouped output by 'Lon', 'Lat'. You can override using the
`.groups` argument.
plot(vanuatu_fishing_2020_2022_raster, fun = function(x)lines(vanuatu_eez))
Vessel information
What if we want to find info for a particular vessel? Lets find which vessel did the most fishing (greatest fishing effort) within Vanuatu and the surrounding area. First we retrieve fishing effort data for 2022, including the vessel ID in the data:
<- get_raster(spatial_resolution = 'LOW',
vessel_fishing_effort temporal_resolution = 'YEARLY',
group_by = 'VESSEL_ID',
start_date = "2022-01-01",
end_date = "2022-12-31",
region = vanuatu_bbox,
region_source = 'USER_SHAPEFILE')
Rows: 48129 Columns: 16
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (6): Vessel ID, Flag, Vessel Name, Gear Type, Vessel Type, CallSign
dbl (6): Lat, Lon, Time Range, MMSI, IMO, Apparent Fishing Hours
dttm (4): Entry Timestamp, Exit Timestamp, First Transmission Date, Last Tra...
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Next, we find the total fishing effort for each vessel ID, and arrange the data in order of highest to lowest fishing effort:
<- vessel_fishing_effort %>%
vessel_total_effort group_by(`Vessel ID`) %>%
summarise(total_effort = sum(`Apparent Fishing Hours`, na.rm = TRUE)) %>%
ungroup() %>%
arrange(desc(total_effort))
head(vessel_total_effort)
# A tibble: 6 × 2
`Vessel ID` total_effort
<chr> <dbl>
1 09b87f274-4055-8e05-ca10-5b8aaa41fcae 6487.
2 efeff1f75-5c37-654d-385d-92a453a61205 5151.
3 877bf9bef-fbd7-57c8-0664-265200a52a25 4970.
4 831651a2d-d0c0-691a-299a-693d01bf73aa 4916.
5 baa426976-6f0f-35e3-3ee5-ffce9b5cb264 4849.
6 b1b73b1b4-4694-813b-09f1-7f721bc67a52 4641.
The first row is the vessel with the highest total fishing effort. This vessel ID is a code that GFW uses to identify vessels within its own databases. A lot of work goes into matching a vessel to an ID, see here for the GFW vignette devoted to this! We can get a bit more info about our top fishing vessel using the get_vessel_info()
<- get_vessel_info(vessel_fishing_effort$`Vessel ID`[1]) top_fisher_info
1 total vessels
This gives us lots of info in list
format. We can see the self reported name and other details:
$selfReportedInfo top_fisher_info
# A tibble: 1 × 14
index vesselId ssvid shipname nShipname flag callsign imo messagesCounter
<dbl> <chr> <chr> <chr> <chr> <chr> <chr> <lgl> <int>
1 1 2f11a3a2c… 4125… ZHONGSH… ZHONGSHU… CHN BZYZ5 NA 10658380
# ℹ 5 more variables: positionsCounter <int>, sourceCode <list>,
# matchFields <chr>, transmissionDateFrom <chr>, transmissionDateTo <chr>
We can get vessel event data from GFW using get_event()
. We can get, there are several categories of event you can query (see ?get_event
), but we are going to retrieve port visits:
<- get_event(event_type = "PORT_VISIT",
port_visits vessels = vessel_total_effort$`Vessel ID`[1],
start_date = "2022-01-01",
end_date = "2022-12-31")
[1] "Downloading 7 events from GFW"
This is a tibble with 7 port visit events in it. We can turn this into a spatial object and plot it against our other data to see which port it is going to:
<- vect(port_visits,
port_visits_vec geom = c("lon", "lat"),
crs = "epsg:4326")
plot(vanuatu_effort_2022_raster)
lines(vanuatu_eez, col = "grey40")
points(port_visits_vec)
I don’t see the port visit points? Problem is, they are off our mapping area. We can adjust the extent of the map to include them, but first we need to know the full extent of our data:
#extent of the ports visit data
<- ext(port_visits_vec)
ports_extent
#extent of the fishing effort raster
<- ext(vanuatu_effort_2022_raster)
effort_raster_extent
plot(vanuatu_effort_2022_raster, ext = c(effort_raster_extent[1], ports_extent[2], effort_raster_extent[3], effort_raster_extent[4]))
lines(vanuatu_eez, col = "grey40")
points(port_visits_vec)
We can see the points now, but where are they? We can add in Natural Earth country boundaries so that we have a reference for where the port visit points are.
plot(vanuatu_effort_2022_raster, ext = c(effort_raster_extent[1], ports_extent[2], effort_raster_extent[3], effort_raster_extent[4]))
lines(vanuatu_eez, col = "grey40")
lines(rnaturalearth::ne_countries(scale = "large", continent = "oceania", returnclass = "sv")) #add in Oceania countries
points(port_visits_vec)
We can see that the top fishing vessel visits ports in Fiji, probably to unload the catch.