WoSIS Graphql API Masterclass
This master-class aims to explain and exemplify the use of WoSIS Graphql API.
Table of contents
-
WoSIS Graphql API Masterclass
- Table of contents
- Introduction
- What is GraphQL?
- Requirements
- API root endpoint and web interfaces
- Explore current schema
- Explore the documentation
- First queries
- Filtering
- Using variables
- Spatial queries
- Pagination concepts
- Scripting
- Soil data validation and ingest into WoSIS
- Get the values of a property within a polygon
Introduction
WoSIS stands for 'World Soil Information Service', a large database based on PostgreSQL + API's, workflows, dashboards etc., developed and maintained by ISRIC, WDC-Soils. It provides a growing range of quality-assessed and standardised soil profile data for the world. For this, it draws on voluntary contributions of data holders/providers worldwide.
The source data come from different types of surveys ranging from systematic soil surveys (i.e., full profile descriptions) to soil fertility surveys (i.e., mainly top 20 to 30 cm). Further, depending on the nature of the original surveys the range of soil properties can vary greatly (see https://doi.org/10.5194/essd-16-4735-2024/).
Upon their standardisation, the quality-assessed data are made available freely to the international community through several web services, this in compliance with the conditions (licences) specified by the various data providers. This means that we can only serve data with a so-called 'free' licence to the international community (https://data.isric.org/geonetwork/srv/eng/catalog.search#/search?any=wosis_latest). A larger complement of geo-referenced data with a more restrictive licence can only be used by ISRIC itself for producing SoilGrids maps and similar products (i.e. output as a result of advanced data processing). The latter map layers are made freely available to the international community (https://data.isric.org/geonetwork/srv/eng/catalog.search#/search?resultType=details&sortBy=relevance&any=soilgrids250m%202.0&fast=index&_content_type=json&from=1&to=20).
WoSIS workflow for ingesting, processing and disseminating data.
During this master class, you will first learn what GraphQL and API (application programming interface) are. Next, using guided steps, we will explore the basics of WoSIS and GraphQL via a graphical interface. From that point onwards we will slowly increase complexity and use WoSIS data. Building on this, we will show you how to create code that uses soil data from WoSIS.
The workshop requires no previous knowledge of WoSIS or GraphQL. However, it is advisable to have basic coding knowledge of the Python or R languages.
The aim of this master-class is to provide clear instructions and documentation on how to use the WoSIS Graphql API.
WoSIS public products
WoSIS data can be accessed via OGC web services and a GraphQL API.
Until recently, OGC web services provided the main entry point to download and access WoSIS. You can find more information on how to access WoSIS using the SOAP-based OGC web services at https://www.isric.org/explore/wosis/accessing-wosis-derived-datasets.
In 2023, we developed a GraphQL API tool to easily access the data. The aim of this master-class is to show and describe how this tool can be used to explore and download WoSIS data.
What is GraphQL?
GraphQL is a query language for API's. GraphQL isn't tied to any specific database or storage engine. Instead it is backed by your existing code and data.
If you are new to GraphQL it might be good to check the official documentation: https://graphql.org/learn/.
GraphQL works as an abstraction layer between application and database, allowing direct queries to the database using web technologies (HTTP requests) and JSON objects. GraphQL is the brother of REST.
For other good introduction documents on GraphQL see:
- Digital Ocean - Introduction to GraphQL.
- Kadaster GraphQL.
- Workshop spatial graphql.
- Learn GraphQL queries.
- GraphQL queries.
- GraphQL cheatsheet
Requirements
In order to move forwards you do not need to have any extra tools apart from a web browser.
However, if your aim is to use this API in scripting then it is advisable to have knowledge on at least one of the following languages:
- Python
- R
API root endpoint and web interfaces
Root endpoint
The WoSIS GraphQL API root endpoint can be found at:
https://graphql.isric.org/wosis/graphql
This is the main GraphQL root endpoint. This is the endpoint to be used directly by applications and/or code scripts. If you are an advanced GraphQL user and you use a custom script or a GraphQL client this is what you should use.
Nonetheless, if you click on the above link using a web browser you will probably get the following error message:
{"errors":[{"message":"Only `POST` requests are allowed."}]}
This is expected because this GraphQL endpoint expects POST requests and not GET requests. Meaning that it cannot be used directly from a web browser.
To allow use from a web browser, we provide two Web interfaces IDE's that can be used in a graphical way to explore and access data.
Web interfaces IDE's
We provide the following interactive in-browser GraphQL IDE's:
- https://graphql.isric.org/wosis/graphiql using graphiql web interface "interactive in-browser IDE"
- https://graphql.isric.org/wosis/playground using playground web interface IDE
For the exercises in this master-class we will use graphiql, but you are free to use the one you prefer.
Explore current schema
The current WoSIS GraphQL schema is composed of Sites that contain Profiles that have Layers and for each layer several measurementValues can be found per soil observation (e.g., pH assessed in aqueous solution). For a given property, each layer can have one or more measurements (e.g., one layer with several samples.)
- Site A
- Profile H
- Layer X
- measurementValues E
- measurementValues R
- Layer X
- Profile H
- Site B
- Profile J
- Layer Y
- measurementValues E
- measurementValues R
- measurementValues T
- Layer Y
- Profile J
For more information on the WoSIS data model please check this paper in Earth Syst Sci. Data (2024).
Please explore the current schema using graphiql IDE. For this, follow this link https://graphql.isric.org/wosis/graphiql.
You will be at the root:
- wosisLatestObservations - All current observations served from WoSIS (i.e., wosis_latest) with the total number of sites; profiles and respective layers.
- wosisLatestLayers - WoSIS layers, at this level you will get all layers and respective measurements.
- wosisLatestProfiles - WoSIS profiles, contains all Profiles and respective 'lower' levels of WoSIS products (Profiles, Layers and measurements) wosisLatestSites - WoSIS sites, this is probably were you want to start since it contains all levels of WoSIS product (Sites, Profiles, Layers,and measurements)
Use the graphiql interface to spend some time exploring the WoSIS schema.
While expanding wosisLatestProfiles we will get the following:
Please note the objects with the right arrow marked in red. Expand one object and check its contents.
Explore the documentation
One of the advantages of GraphQL is the automatically generated documentation. In order to access the documentation in GraphQL click on the DOCS button marked in red in the image below.
Please spend some time exploring the documentation and try to familiarise yourself with the structure.
The image below shows documentation auto-generated for wosisLatestProfiles.
First queries
It is now time to start exploring WoSIS data using queries.
- Get all WoSIS Latest Observations
query MyQuery {
wosisLatestObservations {
layers
profiles
code
property
procedure
}
}
The query above returns the following error:
{
"errors": [
{
"message": "You must provide a 'first' or 'last' argument to properly paginate the 'wosisLatestObservations' field.",
"locations": [
{
"line": 2,
"column": 3
}
]
}
]
}
In order to avoid overloading the WoSIS API we must always use the parameter first
in all our queries.
The correct way to write our query is:
- Get the first 100 records of WoSIS Latest Observations
query MyQuery {
wosisLatestObservations(first: 100) {
property
procedure
code
layers
profiles
}
}
In practice this query will return all WoSIS Latest Observations because currently we have less than 100 observations.
- Get the first 10 wosisLatestSites random sites
query MyQuery {
wosisLatestSites(first: 10) {
continent
countryName
positionalUncertainty
region
geom {
x
y
geojson
srid
}
}
}
Please note that sites contain mainly spatial data.
- Get the first 10 wosisLatestProfiles profiles without any classification record.
query MyQuery {
wosisLatestProfiles(first: 10) {
continent
region
countryName
datasetCode
latitude
longitude
positionalUncertainty
profileCode
}
}
- Get the first 10 wosisLatestProfiles profiles with all available classification records (i.e., FAO, USDA and WRB).
query MyQuery {
wosisLatestProfiles(first: 10) {
continent
region
countryName
datasetCode
latitude
longitude
positionalUncertainty
profileCode
faoMajorGroup
faoMajorGroupCode
faoPublicationYear
faoSoilUnit
faoSoilUnitCode
usdaGreatGroup
usdaOrderName
usdaPublicationYear
usdaSubgroup
usdaSuborder
wrbPrefixQualifiers
wrbPrincipalQualifiers
wrbPublicationYear
wrbReferenceSoilGroup
wrbReferenceSoilGroupCode
wrbSuffixQualifiers
wrbSupplementaryQualifiers
}
}
Please note that you can use the graphiql IDE to easily create your queries. If you are a beginner, it is recommended that you generate your queries via the user interface.
- Get first 10 sites and for each site get also the first 10 profiles:
query MyQuery {
wosisLatestSites(first: 10) {
continent
countryName
positionalUncertainty
region
geom {
x
y
}
profiles(first: 10) {
profileId
profileCode
datasetCode
year
month
faoMajorGroup
usdaGreatGroup
wrbReferenceSoilGroup
}
}
}
Please note that the following parameters are associated to the profile and not to the site. dataset_code
year
month
day
- Get first 10 sites and for each site the first 10 profiles and for each profile get also the first 10 layers:
query MyQuery {
wosisLatestSites(first: 10) {
continent
countryName
positionalUncertainty
region
geom {
x
y
}
profiles(first: 10) {
profileId
continent
region
countryName
datasetCode
latitude
longitude
positionalUncertainty
profileCode
layers(first: 10) {
layerId
layerNumber
lowerDepth
upperDepth
organicSurface
}
}
}
}
Note that the deeper you go in the dataset structure the slower query execution will be.
Note that if we need to retrieve profiles
we are not forced to start with the sites
. We can retrieve profiles
without querying sites
. The same applies for layers
, if we only need specific layers we can retrieve these layers
without querying profiles
. In the next queries we will show how this is done.
- Get first 10 profiles and for each profile get also the first 10 layers:
query MyQuery {
wosisLatestProfiles(first: 10) {
profileId
continent
region
countryName
datasetCode
latitude
longitude
positionalUncertainty
profileCode
layers(first: 10) {
layerId
layerNumber
lowerDepth
upperDepth
organicSurface
}
}
}
- Get first 10 profiles and for each profile get also the first 10 layers and also the first 10 values for silt:
query MyQuery {
wosisLatestProfiles(first: 10) {
profileId
continent
region
countryName
datasetCode
latitude
longitude
positionalUncertainty
profileCode
layers(first: 10) {
layerId
date
layerNumber
lowerDepth
upperDepth
organicSurface
siltValues(first: 10) {
valueAvg
value
}
}
}
}
- Get first 10 profiles and for each profile get also the first 10 layers and for each layer also get the first 10 values for silt and the first 10 values for organic carbon:
query MyQuery {
wosisLatestProfiles(first: 10) {
profileId
continent
region
countryName
datasetCode
latitude
longitude
positionalUncertainty
profileCode
layers(first: 10) {
layerId
date
layerNumber
lowerDepth
upperDepth
organicSurface
siltValues(first: 10) {
valueAvg
value
}
orgcValues(first: 10) {
valueAvg
value
}
}
}
}
Probably, at this point you see some empty results in the orgcValues
field. This is due to the fact that for some layers there are no organic carbon measurements in the source datasets.
As exemplified, we can request all types of values (Silt; Sand; Organic carbon; pH etc.) but the more data we request the slower the query will be.
Exploratory queries without any filtering can be useful to get acquaited with the data, but at some point it is recommended to apply filters.
Filtering
Perhaps the main advantage of this GraphQL API is the ability to easily filter and explore data. In the majority of cases, however, a user may want to extract specific data. For this, we will make use of Filtering capabilities.
Before we start performing queries please spend some time exploring the filter object inside wosisLatestProfiles
as shown in the image below:
Lets now try some queries with filtering:
- Get first 10 profiles from continent Europe
query MyQuery {
wosisLatestProfiles(
first: 10
filter: { continent: { likeInsensitive: "europe" } }
) {
continent
countryName
region
datasetCode
latitude
longitude
profileId
}
}
- Get first 10 profiles from continent Europe or Africa
query MyQuery {
wosisLatestProfiles(
first: 10
filter: { continent: { in: ["Europe", "Africa"] } }
) {
continent
countryName
region
datasetCode
latitude
longitude
profileId
}
}
OR & AND
In the previous example we used the in
operator, but the same query
can be made using the OR
operator:
query MyQuery {
wosisLatestProfiles(
first: 10
filter: {
or: [
{ continent: { includesInsensitive: "europe" } }
{ continent: { includesInsensitive: "africa" } }
]
}
) {
continent
countryName
region
datasetCode
latitude
longitude
profileId
}
}
Please note that some operators (AND
, OR
etc.) expect an array as input ([]
).
- Get first 5 profiles with the respective first 10 layers from country Netherlands
AND
with at least one layer. In other words, we do not want any profiles without layers in this query.
query MyQuery {
wosisLatestProfiles(
first: 5
filter: {
and: [
{ countryName: { includesInsensitive: "netherlands" } }
{ layersExist: true }
]
}
) {
continent
countryName
region
datasetCode
latitude
longitude
profileId
layers(first: 10) {
layerNumber
lowerDepth
upperDepth
}
}
}
- Get first 10 profiles with WRB classification.
query MyQuery {
wosisLatestProfiles(
first: 10
filter: {
or: [
{ wrbReferenceSoilGroup: { isNull: false } }
{ wrbReferenceSoilGroupCode: { isNull: false } }
]
}
) {
continent
countryName
region
datasetCode
latitude
longitude
profileId
wrbPrefixQualifiers
wrbPrincipalQualifiers
wrbPublicationYear
wrbReferenceSoilGroup
wrbReferenceSoilGroupCode
wrbSuffixQualifiers
wrbSupplementaryQualifiers
}
}
- Get first 3 profiles and respective layers that have at least one Organic Carbon measurement:
query MyQuery {
wosisLatestProfiles(
first: 3
filter: { layersExist: true, layers: { every: { orgcValuesExist: true } } }
) {
continent
region
profileId
datasetCode
layers(first: 10, filter: { orgcValuesExist: true }) {
layerId
layerName
lowerDepth
upperDepth
orgcValues(first: 10) {
value
valueAvg
}
}
}
}
- Get first 1 profiles and respective first 2 layers that have at least one value Values for Bulk density fine earth - 33 kPa:
query MyQuery {
wosisLatestProfiles(
first: 1
filter: {
layersExist: true
layers: { some: { bdfi33lValuesExist: true, orgcValuesExist: true } }
}
) {
continent
region
profileId
datasetCode
layers(
first: 2
filter: { bdfi33lValuesExist: true, orgcValuesExist: true }
) {
layerId
layerName
lowerDepth
upperDepth
bdfi33lValues(first:3){
value
valueAvg
}
orgcValues(first:3){
value
valueAvg
}
}
}
}
Using variables
In GraphQL we can also to use variables in our queries. Variables are important for:
- Scripting, in order to be able to interact with our script variables
- Ingest complex JSON objects into our query
- Make sure the query is easy to read
When using graphiql we have a query variables box. Inside this box we can add our variables in JSON format.
Let us demonstrate the usage of variables in the following queries:
- Get first 10 profiles from continent Europe
Inside Query variables
add first
and continent
variables:
{
"first": 10,
"continent": "Europe"
}
The GraphQL query will be:
query MyQuery($first:Int, $continent:String) {
wosisLatestProfiles(
first: $first
filter: { continent: { likeInsensitive: $continent } }
) {
continent
countryName
region
datasetCode
latitude
longitude
profileId
}
}
In your graphiql you should have something as shown below:
Using arrays [ ]
- Using an array [ ] get first 10 profiles from continent Europe or Africa
Inside Query variables
box:
{
"first": 10,
"continent": ["Europe","Africa"]
}
The GraphQL query will be:
query MyQuery($first:Int, $continent:[String!]) {
wosisLatestProfiles(
first: $first
filter: { continent: { in: $continent } }
) {
continent
countryName
region
datasetCode
latitude
longitude
profileId
}
}
In the next chapter we will make use of variables to better provide JSON components to our queries.
Spatial queries
This API has spatial capabilities. It is possible to perform several spatial queries and apply spatial filters. Spatial components are GeoJSON-based.
In order to use spatial queries, we will use two geometries of Gelderland, a province in the Netherlands, in GeoJSON format as examples.
You can use https://geojson.io to visualise, create and update GeoJSON geometries.
- Simplified geometry of Gelderland region in Geojson format:
{
"type": "FeatureCollection",
"name": "Gelderland MultiPolygon",
"crs": { "type": "name", "properties": { "name": "urn:ogc:def:crs:OGC:1.3:CRS84" } },
"features": [
{ "type": "Feature", "properties": { "prov_name": [ "Gelderland" ] }, "geometry": { "type": "MultiPolygon", "coordinates": [ [ [ [ 5.177260142422514, 51.74291774914947 ], [ 5.126747881386732, 51.737828850498403 ], [ 5.137580867932065, 51.772905259431077 ], [ 5.014540023249575, 51.808984680959583 ], [ 5.031415073146523, 51.841084802107702 ], [ 4.993967909252922, 51.861222725420994 ], [ 5.062358224116345, 51.859362053527242 ], [ 5.180226727863164, 51.96744832651509 ], [ 5.236867149255078, 51.978757478459428 ], [ 5.321611332014112, 51.954919171164796 ], [ 5.486214078473083, 51.98382644510454 ], [ 5.627223829712356, 51.952386168324438 ], [ 5.550342661060417, 52.10541954546126 ], [ 5.459242995490565, 52.080225755481266 ], [ 5.514079463312799, 52.135923065932062 ], [ 5.439875615559026, 52.171197458274222 ], [ 5.44103943147957, 52.205693438951691 ], [ 5.393219147822698, 52.220626892173925 ], [ 5.404643399611359, 52.249630480909225 ], [ 5.533281176545358, 52.27274084169683 ], [ 5.587707385036856, 52.361454261431376 ], [ 5.787257137970521, 52.422573287061603 ], [ 5.876205471530124, 52.522025026941051 ], [ 5.925559518063968, 52.474057592745915 ], [ 6.027857569808684, 52.509606205409327 ], [ 6.099483437203417, 52.469970896552461 ], [ 6.130552948323514, 52.399978162269164 ], [ 6.078506385563601, 52.369523051161245 ], [ 6.066224466859907, 52.318839289847247 ], [ 6.163909067147507, 52.21749619292715 ], [ 6.38185154627214, 52.246112812566473 ], [ 6.492401220236633, 52.177371870181403 ], [ 6.671338986248984, 52.165683203635673 ], [ 6.662399005672591, 52.130167439615931 ], [ 6.760572413121598, 52.118779940206082 ], [ 6.687853003658449, 52.039856158091141 ], [ 6.832754328999235, 51.972938087693585 ], [ 6.721969582522561, 51.89606334135938 ], [ 6.683993990179909, 51.91757645733221 ], [ 6.472507886098918, 51.853823023864017 ], [ 6.390566170881016, 51.87396806966867 ], [ 6.401818441765064, 51.827262656663407 ], [ 6.117889496603739, 51.901659142837225 ], [ 6.166559884993931, 51.840721643435401 ], [ 6.063485632339608, 51.86545122678897 ], [ 5.962978284523374, 51.836913960582471 ], [ 5.946569966406273, 51.813479919592751 ], [ 5.992067051189349, 51.770245909123908 ], [ 5.943962150919553, 51.741816814422592 ], [ 5.893409336802974, 51.777852926426895 ], [ 5.765188291802036, 51.752789880063702 ], [ 5.638112608999517, 51.819025176083443 ], [ 5.493105254357093, 51.830750957327069 ], [ 5.403157084105017, 51.821611677731141 ], [ 5.357568231054432, 51.757890339715857 ], [ 5.300338754648935, 51.737287437014395 ], [ 5.177260142422514, 51.74291774914947 ] ] ] ] } }
]
}
- Points (3) in Gelderland in Geojson format:
{
"type": "FeatureCollection",
"name": "Gelderland points",
"features": [
{
"type": "Feature",
"properties": { "prov_name": [ "Gelderland" ] },
"geometry": {
"type": "MultiPoint",
"coordinates": [
[6.025363925650851,52.501157816882994],
[5.158391536033605,51.775118267397204],
[6.742439219867151,51.96023476075487]
]
}
}
]
}
In order to simplify and make a more easy-to-read query we will make use of variables
in our spatial queries.
- Get first 3 profiles that fall inside Gelderland using the MultiPolygon geometry. In this query we also make sure all profiles have at least one layer.
query MyQuery($geomGelderland: GeoJSON!) {
wosisLatestProfiles(
first: 3
filter: {layersExist: true, geom: {intersects: $geomGelderland}}
) {
continent
region
profileId
datasetCode
latitude
longitude
geom{
geojson
x
y
}
}
}
Inside Query variables
add the geomGelderland
variable:
{
"geomGelderland": {
"type": "MultiPolygon",
"coordinates": [ [ [ [ 5.177260142422514, 51.74291774914947 ], [ 5.126747881386732, 51.737828850498403 ], [ 5.137580867932065, 51.772905259431077 ], [ 5.014540023249575, 51.808984680959583 ], [ 5.031415073146523, 51.841084802107702 ], [ 4.993967909252922, 51.861222725420994 ], [ 5.062358224116345, 51.859362053527242 ], [ 5.180226727863164, 51.96744832651509 ], [ 5.236867149255078, 51.978757478459428 ], [ 5.321611332014112, 51.954919171164796 ], [ 5.486214078473083, 51.98382644510454 ], [ 5.627223829712356, 51.952386168324438 ], [ 5.550342661060417, 52.10541954546126 ], [ 5.459242995490565, 52.080225755481266 ], [ 5.514079463312799, 52.135923065932062 ], [ 5.439875615559026, 52.171197458274222 ], [ 5.44103943147957, 52.205693438951691 ], [ 5.393219147822698, 52.220626892173925 ], [ 5.404643399611359, 52.249630480909225 ], [ 5.533281176545358, 52.27274084169683 ], [ 5.587707385036856, 52.361454261431376 ], [ 5.787257137970521, 52.422573287061603 ], [ 5.876205471530124, 52.522025026941051 ], [ 5.925559518063968, 52.474057592745915 ], [ 6.027857569808684, 52.509606205409327 ], [ 6.099483437203417, 52.469970896552461 ], [ 6.130552948323514, 52.399978162269164 ], [ 6.078506385563601, 52.369523051161245 ], [ 6.066224466859907, 52.318839289847247 ], [ 6.163909067147507, 52.21749619292715 ], [ 6.38185154627214, 52.246112812566473 ], [ 6.492401220236633, 52.177371870181403 ], [ 6.671338986248984, 52.165683203635673 ], [ 6.662399005672591, 52.130167439615931 ], [ 6.760572413121598, 52.118779940206082 ], [ 6.687853003658449, 52.039856158091141 ], [ 6.832754328999235, 51.972938087693585 ], [ 6.721969582522561, 51.89606334135938 ], [ 6.683993990179909, 51.91757645733221 ], [ 6.472507886098918, 51.853823023864017 ], [ 6.390566170881016, 51.87396806966867 ], [ 6.401818441765064, 51.827262656663407 ], [ 6.117889496603739, 51.901659142837225 ], [ 6.166559884993931, 51.840721643435401 ], [ 6.063485632339608, 51.86545122678897 ], [ 5.962978284523374, 51.836913960582471 ], [ 5.946569966406273, 51.813479919592751 ], [ 5.992067051189349, 51.770245909123908 ], [ 5.943962150919553, 51.741816814422592 ], [ 5.893409336802974, 51.777852926426895 ], [ 5.765188291802036, 51.752789880063702 ], [ 5.638112608999517, 51.819025176083443 ], [ 5.493105254357093, 51.830750957327069 ], [ 5.403157084105017, 51.821611677731141 ], [ 5.357568231054432, 51.757890339715857 ], [ 5.300338754648935, 51.737287437014395 ], [ 5.177260142422514, 51.74291774914947 ] ] ] ] }
}
Example of what you should see in graphiql:
The GEOM object corresponds to the geometry. Please spend some time exploring this object in the graphiql interface. Make sure you explore the Filter
capabilities too.
- Using the previous query change the
query variables
to the points geometry:
{
"geomGelderland": {
"type": "MultiPoint",
"coordinates": [
[6.025363925650851,52.501157816882994],
[5.158391536033605,51.775118267397204],
[6.742439219867151,51.96023476075487]
]
}
}
You will see that the same query now produces no results. This is because we are searching for WoSIS Profiles that intersect
the provided geometry. In this case we must use a different spatial filter.
- Get first 3 profiles that fall inside the BBOX of the points in our MultiPoint geometry. In this query we also make sure all profiles have at least one layer.
query MyQuery($geomGelderland: GeoJSON!) {
wosisLatestProfiles(
first: 3
filter: {layersExist: true, geom: {bboxIntersects2D: $geomGelderland}}
) {
continent
region
profileId
datasetCode
latitude
longitude
geom {
geojson
x
y
}
}
}
Pagination concepts
Depending on the way how you create your query it can involve high computational resources. Besides, if not using pagination you could easily create a query that returns a huge number of records, with all the problems that brings.
To solve this issue we enforce pagination in this GraphQL API.
For the moment, in order to make things easier, we propose a simpler list interface for the connections based on Offset-based Pagination. This means we temporary disabled Relay Cursor Connections.
If you are an advanced user and would like to have access to Relay Cursor Connections
please contact us.
The First:
argument
All queries must have a first
argument in the connections. So far we used this in all our queries. This argument indicates the maximum number of items to return.
The Offset:
argument
Offset
is an optional argument that indicates where in the list the server should start when returning items for a particular query.
The arguments first
and offset
are extremely important when you need to extract and download data.
We will make use of pagination in our scripts. We will show how to use pagination and extract a considerable amount of data from WoSIS using this GraphQL API.
Scripting
Python examples
The simplest way to perform a GraphQL request in python is to use the requests
package.
- Get the fist 5 profiles and add it to a Pandas dataframe:
import requests
import json
import pandas as pd
# GraphQL query
query = """
query MyQuery {
wosisLatestProfiles(first: 5) {
continent
region
countryName
datasetCode
latitude
longitude
positionalUncertainty
profileCode
}
}
"""
# GraphQL endpoint
url='https://graphql.isric.org/wosis/graphql'
# Send POST request
r = requests.post(url, json={'query': query})
# Print status_code
print(r.status_code)
# Parse JSON
parsed = json.loads(r.text)
# Convert to pandas dataframe
df = pd.json_normalize(parsed['data']['wosisLatestProfiles'])
# print dataframe
print(df)
The result will be:
Using variables in our script:
- Get the first 3 profiles that are inside Gelderland region and add them to a Pandas dataframe:
import requests
import json
import pandas as pd
# GeoJSON geometry
geomGelderland = {
"type": "MultiPolygon",
"coordinates": [ [ [ [ 5.177260142422514, 51.74291774914947 ], [ 5.126747881386732, 51.737828850498403 ], [ 5.137580867932065, 51.772905259431077 ], [ 5.014540023249575, 51.808984680959583 ], [ 5.031415073146523, 51.841084802107702 ], [ 4.993967909252922, 51.861222725420994 ], [ 5.062358224116345, 51.859362053527242 ], [ 5.180226727863164, 51.96744832651509 ], [ 5.236867149255078, 51.978757478459428 ], [ 5.321611332014112, 51.954919171164796 ], [ 5.486214078473083, 51.98382644510454 ], [ 5.627223829712356, 51.952386168324438 ], [ 5.550342661060417, 52.10541954546126 ], [ 5.459242995490565, 52.080225755481266 ], [ 5.514079463312799, 52.135923065932062 ], [ 5.439875615559026, 52.171197458274222 ], [ 5.44103943147957, 52.205693438951691 ], [ 5.393219147822698, 52.220626892173925 ], [ 5.404643399611359, 52.249630480909225 ], [ 5.533281176545358, 52.27274084169683 ], [ 5.587707385036856, 52.361454261431376 ], [ 5.787257137970521, 52.422573287061603 ], [ 5.876205471530124, 52.522025026941051 ], [ 5.925559518063968, 52.474057592745915 ], [ 6.027857569808684, 52.509606205409327 ], [ 6.099483437203417, 52.469970896552461 ], [ 6.130552948323514, 52.399978162269164 ], [ 6.078506385563601, 52.369523051161245 ], [ 6.066224466859907, 52.318839289847247 ], [ 6.163909067147507, 52.21749619292715 ], [ 6.38185154627214, 52.246112812566473 ], [ 6.492401220236633, 52.177371870181403 ], [ 6.671338986248984, 52.165683203635673 ], [ 6.662399005672591, 52.130167439615931 ], [ 6.760572413121598, 52.118779940206082 ], [ 6.687853003658449, 52.039856158091141 ], [ 6.832754328999235, 51.972938087693585 ], [ 6.721969582522561, 51.89606334135938 ], [ 6.683993990179909, 51.91757645733221 ], [ 6.472507886098918, 51.853823023864017 ], [ 6.390566170881016, 51.87396806966867 ], [ 6.401818441765064, 51.827262656663407 ], [ 6.117889496603739, 51.901659142837225 ], [ 6.166559884993931, 51.840721643435401 ], [ 6.063485632339608, 51.86545122678897 ], [ 5.962978284523374, 51.836913960582471 ], [ 5.946569966406273, 51.813479919592751 ], [ 5.992067051189349, 51.770245909123908 ], [ 5.943962150919553, 51.741816814422592 ], [ 5.893409336802974, 51.777852926426895 ], [ 5.765188291802036, 51.752789880063702 ], [ 5.638112608999517, 51.819025176083443 ], [ 5.493105254357093, 51.830750957327069 ], [ 5.403157084105017, 51.821611677731141 ], [ 5.357568231054432, 51.757890339715857 ], [ 5.300338754648935, 51.737287437014395 ], [ 5.177260142422514, 51.74291774914947 ] ] ] ]
}
# GraphQL query
query = """
query MyQuery($geomGelderland: GeoJSON!) {
wosisLatestProfiles(
first: 3
filter: {layersExist: true, geom: {intersects: $geomGelderland}}
) {
continent
region
profileId
datasetCode
latitude
longitude
geom{
geojson
x
y
}
}
}
"""
# GraphQL endpoint
url='https://graphql.isric.org/wosis/graphql'
# Send POST request
r = requests.post(url, json={'query': query, 'variables': {'geomGelderland': geomGelderland}})
# Print status_code
print(r.status_code)
# Parse JSON
parsed = json.loads(r.text)
# Convert to pandas dataframe
df = pd.json_normalize(parsed['data']['wosisLatestProfiles'])
# print dataframe
print(df)
The result will be:
- Get all WoSIS profiles with layers that exist in Gelderland and also export it to CSV.
import requests
import json
import pandas as pd
# GeoJSON geometry
geomGelderland = {
"type": "MultiPolygon",
"coordinates": [ [ [ [ 5.177260142422514, 51.74291774914947 ], [ 5.126747881386732, 51.737828850498403 ], [ 5.137580867932065, 51.772905259431077 ], [ 5.014540023249575, 51.808984680959583 ], [ 5.031415073146523, 51.841084802107702 ], [ 4.993967909252922, 51.861222725420994 ], [ 5.062358224116345, 51.859362053527242 ], [ 5.180226727863164, 51.96744832651509 ], [ 5.236867149255078, 51.978757478459428 ], [ 5.321611332014112, 51.954919171164796 ], [ 5.486214078473083, 51.98382644510454 ], [ 5.627223829712356, 51.952386168324438 ], [ 5.550342661060417, 52.10541954546126 ], [ 5.459242995490565, 52.080225755481266 ], [ 5.514079463312799, 52.135923065932062 ], [ 5.439875615559026, 52.171197458274222 ], [ 5.44103943147957, 52.205693438951691 ], [ 5.393219147822698, 52.220626892173925 ], [ 5.404643399611359, 52.249630480909225 ], [ 5.533281176545358, 52.27274084169683 ], [ 5.587707385036856, 52.361454261431376 ], [ 5.787257137970521, 52.422573287061603 ], [ 5.876205471530124, 52.522025026941051 ], [ 5.925559518063968, 52.474057592745915 ], [ 6.027857569808684, 52.509606205409327 ], [ 6.099483437203417, 52.469970896552461 ], [ 6.130552948323514, 52.399978162269164 ], [ 6.078506385563601, 52.369523051161245 ], [ 6.066224466859907, 52.318839289847247 ], [ 6.163909067147507, 52.21749619292715 ], [ 6.38185154627214, 52.246112812566473 ], [ 6.492401220236633, 52.177371870181403 ], [ 6.671338986248984, 52.165683203635673 ], [ 6.662399005672591, 52.130167439615931 ], [ 6.760572413121598, 52.118779940206082 ], [ 6.687853003658449, 52.039856158091141 ], [ 6.832754328999235, 51.972938087693585 ], [ 6.721969582522561, 51.89606334135938 ], [ 6.683993990179909, 51.91757645733221 ], [ 6.472507886098918, 51.853823023864017 ], [ 6.390566170881016, 51.87396806966867 ], [ 6.401818441765064, 51.827262656663407 ], [ 6.117889496603739, 51.901659142837225 ], [ 6.166559884993931, 51.840721643435401 ], [ 6.063485632339608, 51.86545122678897 ], [ 5.962978284523374, 51.836913960582471 ], [ 5.946569966406273, 51.813479919592751 ], [ 5.992067051189349, 51.770245909123908 ], [ 5.943962150919553, 51.741816814422592 ], [ 5.893409336802974, 51.777852926426895 ], [ 5.765188291802036, 51.752789880063702 ], [ 5.638112608999517, 51.819025176083443 ], [ 5.493105254357093, 51.830750957327069 ], [ 5.403157084105017, 51.821611677731141 ], [ 5.357568231054432, 51.757890339715857 ], [ 5.300338754648935, 51.737287437014395 ], [ 5.177260142422514, 51.74291774914947 ] ] ] ]
}
# GraphQL query
query = """
query MyQuery($first: Int, $offset: Int, $geomGelderland: GeoJSON!) {
wosisLatestProfiles(
first: $first,
offset: $offset,
filter: {layersExist: true, geom: {intersects: $geomGelderland}}
) {
continent
region
profileId
datasetCode
latitude
longitude
}
}
"""
# GraphQL endpoint
url='https://graphql.isric.org/wosis/graphql'
new_results = True
first = 100
offset = 0
all_results = []
while new_results:
# Send POST request
r = requests.post(url, json={'query': query, 'variables': {'first': first, 'offset': offset, 'geomGelderland': geomGelderland}})
# Parse JSON
parsed = json.loads(r.text)
# Add results to all_results object
all_results.extend(parsed['data']['wosisLatestProfiles'])
# for debugging
# print(json.dumps(parsed, indent=4, sort_keys=True))
# print(len(parsed['data']['wosisLatestProfiles']))
if not 'wosisLatestProfiles' in parsed['data'] or len(parsed['data']['wosisLatestProfiles']) == 0:
print('No more results')
# update new_results
new_results = False
else:
print('We have more results')
# update offset
offset = offset+first
df = pd.json_normalize(all_results)
# print dataframe
print('There are {} WoSIS profiles with layers inside Gelderland region'.format(df.shape[0]))
# Export dataframe to CSV
df.to_csv('wosis_gelderland.csv', index=False)
The result will be:
There are 136 WoSIS profiles with layers inside the Gelderland region
The CSV result file can be found here
R examples
The simplest way to perform a GraphQL request in R is to use {httr}.
- Get the first 5 profiles and add them to a Pandas dataframe:
library(httr)
library(jsonlite)
# GraphQL query
query <- '
query MyQuery {
wosisLatestProfiles(first: 5) {
continent
region
countryName
datasetCode
latitude
longitude
positionalUncertainty
profileCode
}
}
'
# GraphQL endpoint
url <- 'https://graphql.isric.org/wosis/graphql'
# Send POST request
response <- POST(url, body = list(query = query), encode = "json")
# Print status_code
print(status_code(response))
# Parse JSON
parsed <- fromJSON(content(response, "text"), flatten = TRUE)
## convert the from json to dataframe object
df <- as.data.frame(parsed$data$wosisLatestProfiles)
head(df)
The result will be:
Using variables in our script:
- Get the first 3 profiles that are inside Gelderland region and add them to a Pandas dataframe:
library(httr)
library(jsonlite)
geomGelderland <- fromJSON('{
"type": "MultiPolygon",
"coordinates": [ [ [ [ 5.177260142422514, 51.74291774914947 ], [ 5.126747881386732, 51.737828850498403 ], [ 5.137580867932065, 51.772905259431077 ], [ 5.014540023249575, 51.808984680959583 ], [ 5.031415073146523, 51.841084802107702 ], [ 4.993967909252922, 51.861222725420994 ], [ 5.062358224116345, 51.859362053527242 ], [ 5.180226727863164, 51.96744832651509 ], [ 5.236867149255078, 51.978757478459428 ], [ 5.321611332014112, 51.954919171164796 ], [ 5.486214078473083, 51.98382644510454 ], [ 5.627223829712356, 51.952386168324438 ], [ 5.550342661060417, 52.10541954546126 ], [ 5.459242995490565, 52.080225755481266 ], [ 5.514079463312799, 52.135923065932062 ], [ 5.439875615559026, 52.171197458274222 ], [ 5.44103943147957, 52.205693438951691 ], [ 5.393219147822698, 52.220626892173925 ], [ 5.404643399611359, 52.249630480909225 ], [ 5.533281176545358, 52.27274084169683 ], [ 5.587707385036856, 52.361454261431376 ], [ 5.787257137970521, 52.422573287061603 ], [ 5.876205471530124, 52.522025026941051 ], [ 5.925559518063968, 52.474057592745915 ], [ 6.027857569808684, 52.509606205409327 ], [ 6.099483437203417, 52.469970896552461 ], [ 6.130552948323514, 52.399978162269164 ], [ 6.078506385563601, 52.369523051161245 ], [ 6.066224466859907, 52.318839289847247 ], [ 6.163909067147507, 52.21749619292715 ], [ 6.38185154627214, 52.246112812566473 ], [ 6.492401220236633, 52.177371870181403 ], [ 6.671338986248984, 52.165683203635673 ], [ 6.662399005672591, 52.130167439615931 ], [ 6.760572413121598, 52.118779940206082 ], [ 6.687853003658449, 52.039856158091141 ], [ 6.832754328999235, 51.972938087693585 ], [ 6.721969582522561, 51.89606334135938 ], [ 6.683993990179909, 51.91757645733221 ], [ 6.472507886098918, 51.853823023864017 ], [ 6.390566170881016, 51.87396806966867 ], [ 6.401818441765064, 51.827262656663407 ], [ 6.117889496603739, 51.901659142837225 ], [ 6.166559884993931, 51.840721643435401 ], [ 6.063485632339608, 51.86545122678897 ], [ 5.962978284523374, 51.836913960582471 ], [ 5.946569966406273, 51.813479919592751 ], [ 5.992067051189349, 51.770245909123908 ], [ 5.943962150919553, 51.741816814422592 ], [ 5.893409336802974, 51.777852926426895 ], [ 5.765188291802036, 51.752789880063702 ], [ 5.638112608999517, 51.819025176083443 ], [ 5.493105254357093, 51.830750957327069 ], [ 5.403157084105017, 51.821611677731141 ], [ 5.357568231054432, 51.757890339715857 ], [ 5.300338754648935, 51.737287437014395 ], [ 5.177260142422514, 51.74291774914947 ] ] ] ]
}
')
# GraphQL query
query <- "
query MyQuery($geomGelderland: GeoJSON!) {
wosisLatestProfiles(
first: 3
filter: {layersExist: true, geom: {intersects: $geomGelderland}}
) {
continent
region
profileId
datasetCode
latitude
longitude
geom {
geojson
x
y
}
}
}
"
# GraphQL endpoint
url <- "https://graphql.isric.org/wosis/graphql"
# Send POST request
response <- POST(url, body = list(query = query, variables = list(geomGelderland = geomGelderland)), encode = "json")
# Print status_code
print(status_code(response))
# Parse JSON
parsed <- fromJSON(content(response, "text"), flatten = TRUE)
# Convert to data frame
df <- as.data.frame(parsed$data$wosisLatestProfiles)
# Print data frame
head(df)
The result will be:
- Get all WoSIS profiles with layers that exist in Gelderland and also export these to CSV.
library(httr)
library(jsonlite)
library(dplyr)
# GeoJSON geometry
geomGelderland <- fromJSON('{
"type": "MultiPolygon",
"coordinates": [ [ [ [ 5.177260142422514, 51.74291774914947 ], [ 5.126747881386732, 51.737828850498403 ], [ 5.137580867932065, 51.772905259431077 ], [ 5.014540023249575, 51.808984680959583 ], [ 5.031415073146523, 51.841084802107702 ], [ 4.993967909252922, 51.861222725420994 ], [ 5.062358224116345, 51.859362053527242 ], [ 5.180226727863164, 51.96744832651509 ], [ 5.236867149255078, 51.978757478459428 ], [ 5.321611332014112, 51.954919171164796 ], [ 5.486214078473083, 51.98382644510454 ], [ 5.627223829712356, 51.952386168324438 ], [ 5.550342661060417, 52.10541954546126 ], [ 5.459242995490565, 52.080225755481266 ], [ 5.514079463312799, 52.135923065932062 ], [ 5.439875615559026, 52.171197458274222 ], [ 5.44103943147957, 52.205693438951691 ], [ 5.393219147822698, 52.220626892173925 ], [ 5.404643399611359, 52.249630480909225 ], [ 5.533281176545358, 52.27274084169683 ], [ 5.587707385036856, 52.361454261431376 ], [ 5.787257137970521, 52.422573287061603 ], [ 5.876205471530124, 52.522025026941051 ], [ 5.925559518063968, 52.474057592745915 ], [ 6.027857569808684, 52.509606205409327 ], [ 6.099483437203417, 52.469970896552461 ], [ 6.130552948323514, 52.399978162269164 ], [ 6.078506385563601, 52.369523051161245 ], [ 6.066224466859907, 52.318839289847247 ], [ 6.163909067147507, 52.21749619292715 ], [ 6.38185154627214, 52.246112812566473 ], [ 6.492401220236633, 52.177371870181403 ], [ 6.671338986248984, 52.165683203635673 ], [ 6.662399005672591, 52.130167439615931 ], [ 6.760572413121598, 52.118779940206082 ], [ 6.687853003658449, 52.039856158091141 ], [ 6.832754328999235, 51.972938087693585 ], [ 6.721969582522561, 51.89606334135938 ], [ 6.683993990179909, 51.91757645733221 ], [ 6.472507886098918, 51.853823023864017 ], [ 6.390566170881016, 51.87396806966867 ], [ 6.401818441765064, 51.827262656663407 ], [ 6.117889496603739, 51.901659142837225 ], [ 6.166559884993931, 51.840721643435401 ], [ 6.063485632339608, 51.86545122678897 ], [ 5.962978284523374, 51.836913960582471 ], [ 5.946569966406273, 51.813479919592751 ], [ 5.992067051189349, 51.770245909123908 ], [ 5.943962150919553, 51.741816814422592 ], [ 5.893409336802974, 51.777852926426895 ], [ 5.765188291802036, 51.752789880063702 ], [ 5.638112608999517, 51.819025176083443 ], [ 5.493105254357093, 51.830750957327069 ], [ 5.403157084105017, 51.821611677731141 ], [ 5.357568231054432, 51.757890339715857 ], [ 5.300338754648935, 51.737287437014395 ], [ 5.177260142422514, 51.74291774914947 ] ] ] ]
}
')
# GraphQL query
query <- "
query MyQuery($first: Int, $offset: Int, $geomGelderland: GeoJSON!) {
wosisLatestProfiles(
first: $first,
offset: $offset,
filter: {layersExist: true, geom: {intersects: $geomGelderland}}
) {
continent
region
profileId
datasetCode
latitude
longitude
}
}
"
# GraphQL endpoint
url <- "https://graphql.isric.org/wosis/graphql"
new_results <- TRUE
first <- 100
offset <- 0
all_results <- list()
while (new_results) {
# Send POST request
response <- POST(url, body = list(query = query, variables = list(
first = first,
offset = offset, geomGelderland = geomGelderland
)), encode = "json")
# Parse JSON
parsed <- fromJSON(content(response, "text"), flatten = TRUE)
# Add results to all_results list
all_results <- append(all_results, list(parsed$data$wosisLatestProfiles))
if (!"wosisLatestProfiles" %in% names(parsed$data) || length(parsed$data$wosisLatestProfiles) == 0) {
print("No more results")
# update new_results
new_results <- FALSE
} else {
print("We have more results")
# update offset
offset <- offset + first
}
}
df <- bind_rows(all_results) %>% as_tibble()
# print dataframe
cat("There are", nrow(df), "WoSIS profiles with layers inside Gelderland region\n")
# Export dataframe to CSV
write.csv(df, "wosis_gelderland.csv", row.names = FALSE, quote = FALSE)
The result will be:
There are 136 WoSIS profiles with layers inside Gelderland region
CSV result file can be found here
Soil data validation and ingest into WoSIS
The process of ingesting data into WoSIS involves a so-called Extract, Transform and Load (ETL) which is a standardised, semi-automatic process that guides the data processor during the ingestion of new datasets.
This process is assisted by this API and the fist part is mapping the different attributes from the original source data into WoSIS elements such as Observation measurements; site; profile and layer data.
Endpoint etlMappingFeatures contains available features that can be used for this process.
- Get first 10 features in this case observations that have property
pH
and are distributed in WoSIS products.
query MyQuery {
etlMappingFeatures(
first: 10
filter: {distribute: {equalTo: true}, propertyName: {like: "pH"}}
) {
code
category
distribute
featureType
maximum
minimum
name
procedureName
propertyName
unit {
description
symbol
}
}
}
Note that in the above example the API only returns 4 results because we dont have more in the dataset.
Get the values of a property within a polygon
query MyQuery($first: Int, $offset: Int) {
wosisLatestProfiles(
filter: {
layers: { some: { sandValuesExist: true } }
geom: {
within: {
type: "Polygon"
coordinates: [
[
[-5.51345387228184, 9.59126476678042]
[-5.51345387228184, 11.0451128553676]
[-3.45410758209379, 11.0451128553676]
[-3.45410758209379, 9.59126476678042]
[-5.51345387228184, 9.59126476678042]
]
]
}
}
and: {
continent: { likeInsensitive: "Africa" }
countryName: { likeInsensitive: "burkina faso" }
}
}
first: $first
offset: $offset
) {
latitude
longitude
layers(first: $first, filter: { sandValuesExist: true }, offset: $offset) {
sandValues(first: 6) {
profileId
profileCode
layerId
datasetId
continent
region
countryName
date
upperDepth
lowerDepth
valueAvg
licence
methodOptions
}
}
}
}
Variables:
{
"first": 10,
"offset": 0
}