Skip to content
Snippets Groups Projects
Calisto, Luis's avatar
Calisto, Luis authored
added example GraphQL Query for retrieving the values of a property within a polygon

See merge request !6
e704ea4b
History
Name Last commit Last update
images
scripts
LICENSE
README.md

WoSIS Graphql API Masterclass

This master-class aims to explain and exemplify the use of WoSIS Graphql API.


Table of contents


Introduction

WoSIS stands for 'World Soil Information Service', a large database based on PostgreSQL + API's, workflows, dashboards etc., developed and maintained by ISRIC, WDC-Soils. It provides a growing range of quality-assessed and standardised soil profile data for the world. For this, it draws on voluntary contributions of data holders/providers worldwide.

The source data come from different types of surveys ranging from systematic soil surveys (i.e., full profile descriptions) to soil fertility surveys (i.e., mainly top 20 to 30 cm). Further, depending on the nature of the original surveys the range of soil properties can vary greatly (see https://doi.org/10.5194/essd-16-4735-2024/).

Upon their standardisation, the quality-assessed data are made available freely to the international community through several web services, this in compliance with the conditions (licences) specified by the various data providers. This means that we can only serve data with a so-called 'free' licence to the international community (https://data.isric.org/geonetwork/srv/eng/catalog.search#/search?any=wosis_latest). A larger complement of geo-referenced data with a more restrictive licence can only be used by ISRIC itself for producing SoilGrids maps and similar products (i.e. output as a result of advanced data processing). The latter map layers are made freely available to the international community (https://data.isric.org/geonetwork/srv/eng/catalog.search#/search?resultType=details&sortBy=relevance&any=soilgrids250m%202.0&fast=index&_content_type=json&from=1&to=20).

WoSIS workflow WoSIS workflow for ingesting, processing and disseminating data.

During this master class, you will first learn what GraphQL and API (application programming interface) are. Next, using guided steps, we will explore the basics of WoSIS and GraphQL via a graphical interface. From that point onwards we will slowly increase complexity and use WoSIS data. Building on this, we will show you how to create code that uses soil data from WoSIS.

The workshop requires no previous knowledge of WoSIS or GraphQL. However, it is advisable to have basic coding knowledge of the Python or R languages.

The aim of this master-class is to provide clear instructions and documentation on how to use the WoSIS Graphql API.

WoSIS public products

WoSIS data can be accessed via OGC web services and a GraphQL API.

Until recently, OGC web services provided the main entry point to download and access WoSIS. You can find more information on how to access WoSIS using the SOAP-based OGC web services at https://www.isric.org/explore/wosis/accessing-wosis-derived-datasets.

In 2023, we developed a GraphQL API tool to easily access the data. The aim of this master-class is to show and describe how this tool can be used to explore and download WoSIS data.


What is GraphQL?

GraphQL is a query language for API's. GraphQL isn't tied to any specific database or storage engine. Instead it is backed by your existing code and data.

If you are new to GraphQL it might be good to check the official documentation: https://graphql.org/learn/.

GraphQL works as an abstraction layer between application and database, allowing direct queries to the database using web technologies (HTTP requests) and JSON objects. GraphQL is the brother of REST.

For other good introduction documents on GraphQL see:


Requirements

In order to move forwards you do not need to have any extra tools apart from a web browser.

However, if your aim is to use this API in scripting then it is advisable to have knowledge on at least one of the following languages:

  • Python
  • R

API root endpoint and web interfaces

Root endpoint

The WoSIS GraphQL API root endpoint can be found at:

https://graphql.isric.org/wosis/graphql

This is the main GraphQL root endpoint. This is the endpoint to be used directly by applications and/or code scripts. If you are an advanced GraphQL user and you use a custom script or a GraphQL client this is what you should use.

Nonetheless, if you click on the above link using a web browser you will probably get the following error message:

{"errors":[{"message":"Only `POST` requests are allowed."}]}

This is expected because this GraphQL endpoint expects POST requests and not GET requests. Meaning that it cannot be used directly from a web browser.

To allow use from a web browser, we provide two Web interfaces IDE's that can be used in a graphical way to explore and access data.

Web interfaces IDE's

We provide the following interactive in-browser GraphQL IDE's:

For the exercises in this master-class we will use graphiql, but you are free to use the one you prefer.

Explore current schema

The current WoSIS GraphQL schema is composed of Sites that contain Profiles that have Layers and for each layer several measurementValues can be found per soil observation (e.g., pH assessed in aqueous solution). For a given property, each layer can have one or more measurements (e.g., one layer with several samples.)

  1. Site A
    1. Profile H
      1. Layer X
        1. measurementValues E
        2. measurementValues R
  2. Site B
    1. Profile J
      1. Layer Y
        1. measurementValues E
        2. measurementValues R
        3. measurementValues T

For more information on the WoSIS data model please check this paper in Earth Syst Sci. Data (2024).

Please explore the current schema using graphiql IDE. For this, follow this link https://graphql.isric.org/wosis/graphiql.

You will be at the root:

  • wosisLatestObservations - All current observations served from WoSIS (i.e., wosis_latest) with the total number of sites; profiles and respective layers.
  • wosisLatestLayers - WoSIS layers, at this level you will get all layers and respective measurements.
  • wosisLatestProfiles - WoSIS profiles, contains all Profiles and respective 'lower' levels of WoSIS products (Profiles, Layers and measurements) wosisLatestSites - WoSIS sites, this is probably were you want to start since it contains all levels of WoSIS product (Sites, Profiles, Layers,and measurements)

Use the graphiql interface to spend some time exploring the WoSIS schema.

While expanding wosisLatestProfiles we will get the following:

wosisLatestProfiles

Please note the objects with the right arrow marked in red. Expand one object and check its contents.

Explore the documentation

One of the advantages of GraphQL is the automatically generated documentation. In order to access the documentation in GraphQL click on the DOCS button marked in red in the image below.

wosisLatestProfiles

Please spend some time exploring the documentation and try to familiarise yourself with the structure.

The image below shows documentation auto-generated for wosisLatestProfiles.

wosisLatestProfiles

First queries

It is now time to start exploring WoSIS data using queries.

  • Get all WoSIS Latest Observations
query MyQuery {
  wosisLatestObservations {
    layers
    profiles
    code
    property
    procedure
  }
}

The query above returns the following error:

{
  "errors": [
    {
      "message": "You must provide a 'first' or 'last' argument to properly paginate the 'wosisLatestObservations' field.",
      "locations": [
        {
          "line": 2,
          "column": 3
        }
      ]
    }
  ]
}

In order to avoid overloading the WoSIS API we must always use the parameter first in all our queries.

The correct way to write our query is:

  • Get the first 100 records of WoSIS Latest Observations
query MyQuery {
  wosisLatestObservations(first: 100) {
    property
    procedure
    code
    layers
    profiles
  }
}

In practice this query will return all WoSIS Latest Observations because currently we have less than 100 observations.

  • Get the first 10 wosisLatestSites random sites
query MyQuery {
  wosisLatestSites(first: 10) {
    continent
    countryName
    positionalUncertainty
    region
    geom {
      x
      y
      geojson
      srid
    }
  }
}

Please note that sites contain mainly spatial data.

  • Get the first 10 wosisLatestProfiles profiles without any classification record.
query MyQuery {
  wosisLatestProfiles(first: 10) {
    continent
    region
    countryName
    datasetCode
    latitude
    longitude
    positionalUncertainty
    profileCode
  }
}
  • Get the first 10 wosisLatestProfiles profiles with all available classification records (i.e., FAO, USDA and WRB).
query MyQuery {
  wosisLatestProfiles(first: 10) {
    continent
    region
    countryName
    datasetCode
    latitude
    longitude
    positionalUncertainty
    profileCode
    faoMajorGroup
    faoMajorGroupCode
    faoPublicationYear
    faoSoilUnit
    faoSoilUnitCode
    usdaGreatGroup
    usdaOrderName
    usdaPublicationYear
    usdaSubgroup
    usdaSuborder
    wrbPrefixQualifiers
    wrbPrincipalQualifiers
    wrbPublicationYear
    wrbReferenceSoilGroup
    wrbReferenceSoilGroupCode
    wrbSuffixQualifiers
    wrbSupplementaryQualifiers
  }
}

Please note that you can use the graphiql IDE to easily create your queries. If you are a beginner, it is recommended that you generate your queries via the user interface.

  • Get first 10 sites and for each site get also the first 10 profiles:
query MyQuery {
  wosisLatestSites(first: 10) {
    continent
    countryName
    positionalUncertainty
    region
    geom {
      x
      y
    }
    profiles(first: 10) {
      profileId
      profileCode
      datasetCode
      year
      month
      faoMajorGroup
      usdaGreatGroup
      wrbReferenceSoilGroup
    }
  }
}

Please note that the following parameters are associated to the profile and not to the site. dataset_code year month day

  • Get first 10 sites and for each site the first 10 profiles and for each profile get also the first 10 layers:
query MyQuery {
  wosisLatestSites(first: 10) {
    continent
    countryName
    positionalUncertainty
    region
    geom {
      x
      y
    }
    profiles(first: 10) {
      profileId
      continent
      region
      countryName
      datasetCode
      latitude
      longitude
      positionalUncertainty
      profileCode
      layers(first: 10) {
        layerId
        layerNumber
        lowerDepth
        upperDepth
        organicSurface
      }
    }
  }
}

Note that the deeper you go in the dataset structure the slower query execution will be.

Note that if we need to retrieve profiles we are not forced to start with the sites. We can retrieve profiles without querying sites. The same applies for layers, if we only need specific layers we can retrieve these layers without querying profiles. In the next queries we will show how this is done.

  • Get first 10 profiles and for each profile get also the first 10 layers:
query MyQuery {
  wosisLatestProfiles(first: 10) {
    profileId
    continent
    region
    countryName
    datasetCode
    latitude
    longitude
    positionalUncertainty
    profileCode
    layers(first: 10) {
      layerId
      layerNumber
      lowerDepth
      upperDepth
      organicSurface
    }
  }
}
  • Get first 10 profiles and for each profile get also the first 10 layers and also the first 10 values for silt:
query MyQuery {
  wosisLatestProfiles(first: 10) {
    profileId
    continent
    region
    countryName
    datasetCode
    latitude
    longitude
    positionalUncertainty
    profileCode
    layers(first: 10) {
      layerId
      date
      layerNumber
      lowerDepth
      upperDepth
      organicSurface
      siltValues(first: 10) {
        valueAvg
        value
      }
    }
  }
}
  • Get first 10 profiles and for each profile get also the first 10 layers and for each layer also get the first 10 values for silt and the first 10 values for organic carbon:
query MyQuery {
  wosisLatestProfiles(first: 10) {
    profileId
    continent
    region
    countryName
    datasetCode
    latitude
    longitude
    positionalUncertainty
    profileCode
    layers(first: 10) {
      layerId
      date
      layerNumber
      lowerDepth
      upperDepth
      organicSurface
      siltValues(first: 10) {
        valueAvg
        value
      }
      orgcValues(first: 10) {
        valueAvg
        value
      }
    }
  }
}

Probably, at this point you see some empty results in the orgcValues field. This is due to the fact that for some layers there are no organic carbon measurements in the source datasets.

As exemplified, we can request all types of values (Silt; Sand; Organic carbon; pH etc.) but the more data we request the slower the query will be.

Exploratory queries without any filtering can be useful to get acquaited with the data, but at some point it is recommended to apply filters.

Filtering

Perhaps the main advantage of this GraphQL API is the ability to easily filter and explore data. In the majority of cases, however, a user may want to extract specific data. For this, we will make use of Filtering capabilities.

Before we start performing queries please spend some time exploring the filter object inside wosisLatestProfiles as shown in the image below:

wosisLatestProfilesFilter

Lets now try some queries with filtering:

  • Get first 10 profiles from continent Europe
query MyQuery {
  wosisLatestProfiles(
    first: 10
    filter: { continent: { likeInsensitive: "europe" } }
  ) {
    continent
    countryName
    region
    datasetCode
    latitude
    longitude
    profileId
  }
}
  • Get first 10 profiles from continent Europe or Africa
query MyQuery {
  wosisLatestProfiles(
    first: 10
    filter: { continent: { in: ["Europe", "Africa"] } }
  ) {
    continent
    countryName
    region
    datasetCode
    latitude
    longitude
    profileId
  }
}

OR & AND

In the previous example we used the in operator, but the same query can be made using the OR operator:

query MyQuery {
  wosisLatestProfiles(
    first: 10
    filter: {
      or: [
        { continent: { includesInsensitive: "europe" } }
        { continent: { includesInsensitive: "africa" } }
      ]
    }
  ) {
    continent
    countryName
    region
    datasetCode
    latitude
    longitude
    profileId
  }
}

Please note that some operators (AND, OR etc.) expect an array as input ([]).

  • Get first 5 profiles with the respective first 10 layers from country Netherlands AND with at least one layer. In other words, we do not want any profiles without layers in this query.
query MyQuery {
  wosisLatestProfiles(
    first: 5
    filter: {
      and: [
        { countryName: { includesInsensitive: "netherlands" } }
        { layersExist: true }
      ]
    }
  ) {
    continent
    countryName
    region
    datasetCode
    latitude
    longitude
    profileId
    layers(first: 10) {
      layerNumber
      lowerDepth
      upperDepth
    }
  }
}
  • Get first 10 profiles with WRB classification.
query MyQuery {
  wosisLatestProfiles(
    first: 10
    filter: {
      or: [
        { wrbReferenceSoilGroup: { isNull: false } }
        { wrbReferenceSoilGroupCode: { isNull: false } }
      ]
    }
  ) {
    continent
    countryName
    region
    datasetCode
    latitude
    longitude
    profileId
    wrbPrefixQualifiers
    wrbPrincipalQualifiers
    wrbPublicationYear
    wrbReferenceSoilGroup
    wrbReferenceSoilGroupCode
    wrbSuffixQualifiers
    wrbSupplementaryQualifiers
  }
}
  • Get first 3 profiles and respective layers that have at least one Organic Carbon measurement:
query MyQuery {
  wosisLatestProfiles(
    first: 3
    filter: { layersExist: true, layers: { every: { orgcValuesExist: true } } }
  ) {
    continent
    region
    profileId
    datasetCode
    layers(first: 10, filter: { orgcValuesExist: true }) {
      layerId
      layerName
      lowerDepth
      upperDepth
      orgcValues(first: 10) {
        value
        valueAvg
      }
    }
  }
}
  • Get first 1 profiles and respective first 2 layers that have at least one value Values for Bulk density fine earth - 33 kPa:
query MyQuery {
  wosisLatestProfiles(
    first: 1
    filter: {
      layersExist: true
      layers: { some: { bdfi33lValuesExist: true, orgcValuesExist: true } }
    }
  ) {
    continent
    region
    profileId
    datasetCode
    layers(
      first: 2
      filter: { bdfi33lValuesExist: true, orgcValuesExist: true }
    ) {
      layerId
      layerName
      lowerDepth
      upperDepth
      bdfi33lValues(first:3){
        value
        valueAvg
      }
      orgcValues(first:3){
        value
        valueAvg
      }
      
    }
  }
}

Using variables

In GraphQL we can also to use variables in our queries. Variables are important for:

  • Scripting, in order to be able to interact with our script variables
  • Ingest complex JSON objects into our query
  • Make sure the query is easy to read

When using graphiql we have a query variables box. Inside this box we can add our variables in JSON format.

Let us demonstrate the usage of variables in the following queries:

  • Get first 10 profiles from continent Europe

Inside Query variables add first and continent variables:

{
  "first": 10,
  "continent": "Europe"
}

The GraphQL query will be:

query MyQuery($first:Int, $continent:String) {
  wosisLatestProfiles(
    first: $first
    filter: { continent: { likeInsensitive: $continent } }
  ) {
    continent
    countryName
    region
    datasetCode
    latitude
    longitude
    profileId
  }
}

In your graphiql you should have something as shown below:

variables

Using arrays [ ]

  • Using an array [ ] get first 10 profiles from continent Europe or Africa

Inside Query variables box:

{
  "first": 10,
  "continent": ["Europe","Africa"]
}

The GraphQL query will be:

query MyQuery($first:Int, $continent:[String!]) {
  wosisLatestProfiles(
    first: $first
    filter: { continent: { in: $continent } }
  ) {
    continent
    countryName
    region
    datasetCode
    latitude
    longitude
    profileId
  }
}

In the next chapter we will make use of variables to better provide JSON components to our queries.

Spatial queries

This API has spatial capabilities. It is possible to perform several spatial queries and apply spatial filters. Spatial components are GeoJSON-based.

In order to use spatial queries, we will use two geometries of Gelderland, a province in the Netherlands, in GeoJSON format as examples.

You can use https://geojson.io to visualise, create and update GeoJSON geometries.

  1. Simplified geometry of Gelderland region in Geojson format:
{
"type": "FeatureCollection",
"name": "Gelderland MultiPolygon",
"crs": { "type": "name", "properties": { "name": "urn:ogc:def:crs:OGC:1.3:CRS84" } },
"features": [
{ "type": "Feature", "properties": { "prov_name": [ "Gelderland" ] }, "geometry": { "type": "MultiPolygon", "coordinates": [ [ [ [ 5.177260142422514, 51.74291774914947 ], [ 5.126747881386732, 51.737828850498403 ], [ 5.137580867932065, 51.772905259431077 ], [ 5.014540023249575, 51.808984680959583 ], [ 5.031415073146523, 51.841084802107702 ], [ 4.993967909252922, 51.861222725420994 ], [ 5.062358224116345, 51.859362053527242 ], [ 5.180226727863164, 51.96744832651509 ], [ 5.236867149255078, 51.978757478459428 ], [ 5.321611332014112, 51.954919171164796 ], [ 5.486214078473083, 51.98382644510454 ], [ 5.627223829712356, 51.952386168324438 ], [ 5.550342661060417, 52.10541954546126 ], [ 5.459242995490565, 52.080225755481266 ], [ 5.514079463312799, 52.135923065932062 ], [ 5.439875615559026, 52.171197458274222 ], [ 5.44103943147957, 52.205693438951691 ], [ 5.393219147822698, 52.220626892173925 ], [ 5.404643399611359, 52.249630480909225 ], [ 5.533281176545358, 52.27274084169683 ], [ 5.587707385036856, 52.361454261431376 ], [ 5.787257137970521, 52.422573287061603 ], [ 5.876205471530124, 52.522025026941051 ], [ 5.925559518063968, 52.474057592745915 ], [ 6.027857569808684, 52.509606205409327 ], [ 6.099483437203417, 52.469970896552461 ], [ 6.130552948323514, 52.399978162269164 ], [ 6.078506385563601, 52.369523051161245 ], [ 6.066224466859907, 52.318839289847247 ], [ 6.163909067147507, 52.21749619292715 ], [ 6.38185154627214, 52.246112812566473 ], [ 6.492401220236633, 52.177371870181403 ], [ 6.671338986248984, 52.165683203635673 ], [ 6.662399005672591, 52.130167439615931 ], [ 6.760572413121598, 52.118779940206082 ], [ 6.687853003658449, 52.039856158091141 ], [ 6.832754328999235, 51.972938087693585 ], [ 6.721969582522561, 51.89606334135938 ], [ 6.683993990179909, 51.91757645733221 ], [ 6.472507886098918, 51.853823023864017 ], [ 6.390566170881016, 51.87396806966867 ], [ 6.401818441765064, 51.827262656663407 ], [ 6.117889496603739, 51.901659142837225 ], [ 6.166559884993931, 51.840721643435401 ], [ 6.063485632339608, 51.86545122678897 ], [ 5.962978284523374, 51.836913960582471 ], [ 5.946569966406273, 51.813479919592751 ], [ 5.992067051189349, 51.770245909123908 ], [ 5.943962150919553, 51.741816814422592 ], [ 5.893409336802974, 51.777852926426895 ], [ 5.765188291802036, 51.752789880063702 ], [ 5.638112608999517, 51.819025176083443 ], [ 5.493105254357093, 51.830750957327069 ], [ 5.403157084105017, 51.821611677731141 ], [ 5.357568231054432, 51.757890339715857 ], [ 5.300338754648935, 51.737287437014395 ], [ 5.177260142422514, 51.74291774914947 ] ] ] ] } }
]
}

gelderlandPolygon

  1. Points (3) in Gelderland in Geojson format:
{
  "type": "FeatureCollection",
  "name": "Gelderland points",
  "features": [
    {
      "type": "Feature",
     "properties": { "prov_name": [ "Gelderland" ] },
      "geometry": {
        "type": "MultiPoint",
        "coordinates": [
          [6.025363925650851,52.501157816882994],
          [5.158391536033605,51.775118267397204],
          [6.742439219867151,51.96023476075487]
        ]
    }
    }
  ]
}

gelderlandPoints

In order to simplify and make a more easy-to-read query we will make use of variables in our spatial queries.

  • Get first 3 profiles that fall inside Gelderland using the MultiPolygon geometry. In this query we also make sure all profiles have at least one layer.
query MyQuery($geomGelderland: GeoJSON!) {
  wosisLatestProfiles(
    first: 3
    filter: {layersExist: true, geom: {intersects: $geomGelderland}}
  ) {
    continent
    region
    profileId
    datasetCode
    latitude
    longitude
    geom{
      geojson
      x
      y
    }
  }
}

Inside Query variables add the geomGelderland variable:

{
  "geomGelderland": { 
    "type": "MultiPolygon", 
    "coordinates": [ [ [ [ 5.177260142422514, 51.74291774914947 ], [ 5.126747881386732, 51.737828850498403 ], [ 5.137580867932065, 51.772905259431077 ], [ 5.014540023249575, 51.808984680959583 ], [ 5.031415073146523, 51.841084802107702 ], [ 4.993967909252922, 51.861222725420994 ], [ 5.062358224116345, 51.859362053527242 ], [ 5.180226727863164, 51.96744832651509 ], [ 5.236867149255078, 51.978757478459428 ], [ 5.321611332014112, 51.954919171164796 ], [ 5.486214078473083, 51.98382644510454 ], [ 5.627223829712356, 51.952386168324438 ], [ 5.550342661060417, 52.10541954546126 ], [ 5.459242995490565, 52.080225755481266 ], [ 5.514079463312799, 52.135923065932062 ], [ 5.439875615559026, 52.171197458274222 ], [ 5.44103943147957, 52.205693438951691 ], [ 5.393219147822698, 52.220626892173925 ], [ 5.404643399611359, 52.249630480909225 ], [ 5.533281176545358, 52.27274084169683 ], [ 5.587707385036856, 52.361454261431376 ], [ 5.787257137970521, 52.422573287061603 ], [ 5.876205471530124, 52.522025026941051 ], [ 5.925559518063968, 52.474057592745915 ], [ 6.027857569808684, 52.509606205409327 ], [ 6.099483437203417, 52.469970896552461 ], [ 6.130552948323514, 52.399978162269164 ], [ 6.078506385563601, 52.369523051161245 ], [ 6.066224466859907, 52.318839289847247 ], [ 6.163909067147507, 52.21749619292715 ], [ 6.38185154627214, 52.246112812566473 ], [ 6.492401220236633, 52.177371870181403 ], [ 6.671338986248984, 52.165683203635673 ], [ 6.662399005672591, 52.130167439615931 ], [ 6.760572413121598, 52.118779940206082 ], [ 6.687853003658449, 52.039856158091141 ], [ 6.832754328999235, 51.972938087693585 ], [ 6.721969582522561, 51.89606334135938 ], [ 6.683993990179909, 51.91757645733221 ], [ 6.472507886098918, 51.853823023864017 ], [ 6.390566170881016, 51.87396806966867 ], [ 6.401818441765064, 51.827262656663407 ], [ 6.117889496603739, 51.901659142837225 ], [ 6.166559884993931, 51.840721643435401 ], [ 6.063485632339608, 51.86545122678897 ], [ 5.962978284523374, 51.836913960582471 ], [ 5.946569966406273, 51.813479919592751 ], [ 5.992067051189349, 51.770245909123908 ], [ 5.943962150919553, 51.741816814422592 ], [ 5.893409336802974, 51.777852926426895 ], [ 5.765188291802036, 51.752789880063702 ], [ 5.638112608999517, 51.819025176083443 ], [ 5.493105254357093, 51.830750957327069 ], [ 5.403157084105017, 51.821611677731141 ], [ 5.357568231054432, 51.757890339715857 ], [ 5.300338754648935, 51.737287437014395 ], [ 5.177260142422514, 51.74291774914947 ] ] ] ] }
}

Example of what you should see in graphiql:

geomGelderlandExample1

The GEOM object corresponds to the geometry. Please spend some time exploring this object in the graphiql interface. Make sure you explore the Filter capabilities too.

  • Using the previous query change the query variables to the points geometry:
{
  "geomGelderland": {
        "type": "MultiPoint",
        "coordinates": [
          [6.025363925650851,52.501157816882994],
          [5.158391536033605,51.775118267397204],
          [6.742439219867151,51.96023476075487]
        ]
    }
}

You will see that the same query now produces no results. This is because we are searching for WoSIS Profiles that intersect the provided geometry. In this case we must use a different spatial filter.

  • Get first 3 profiles that fall inside the BBOX of the points in our MultiPoint geometry. In this query we also make sure all profiles have at least one layer.
query MyQuery($geomGelderland: GeoJSON!) {
  wosisLatestProfiles(
    first: 3
    filter: {layersExist: true, geom: {bboxIntersects2D: $geomGelderland}}
  ) {
    continent
    region
    profileId
    datasetCode
    latitude
    longitude
    geom {
      geojson
      x
      y
    }
  }
}

Pagination concepts

Depending on the way how you create your query it can involve high computational resources. Besides, if not using pagination you could easily create a query that returns a huge number of records, with all the problems that brings.

To solve this issue we enforce pagination in this GraphQL API.

For the moment, in order to make things easier, we propose a simpler list interface for the connections based on Offset-based Pagination. This means we temporary disabled Relay Cursor Connections.

If you are an advanced user and would like to have access to Relay Cursor Connections please contact us.

The First: argument

All queries must have a first argument in the connections. So far we used this in all our queries. This argument indicates the maximum number of items to return.

The Offset: argument

Offset is an optional argument that indicates where in the list the server should start when returning items for a particular query.

The arguments first and offset are extremely important when you need to extract and download data.

We will make use of pagination in our scripts. We will show how to use pagination and extract a considerable amount of data from WoSIS using this GraphQL API.

Scripting

Python examples

The simplest way to perform a GraphQL request in python is to use the requests package.

  • Get the fist 5 profiles and add it to a Pandas dataframe:
import requests
import json
import pandas as pd

# GraphQL query
query = """
query MyQuery {
  wosisLatestProfiles(first: 5) {
    continent
    region
    countryName
    datasetCode
    latitude
    longitude
    positionalUncertainty
    profileCode
  }
}
"""
# GraphQL endpoint
url='https://graphql.isric.org/wosis/graphql'
# Send POST request
r = requests.post(url, json={'query': query})
# Print status_code
print(r.status_code)
# Parse JSON
parsed = json.loads(r.text)
# Convert to pandas dataframe
df = pd.json_normalize(parsed['data']['wosisLatestProfiles']) 
# print dataframe
print(df)

The result will be:

python_q1_result

Using variables in our script:

  • Get the first 3 profiles that are inside Gelderland region and add them to a Pandas dataframe:
import requests
import json
import pandas as pd

# GeoJSON geometry
geomGelderland = { 
    "type": "MultiPolygon", 
    "coordinates": [ [ [ [ 5.177260142422514, 51.74291774914947 ], [ 5.126747881386732, 51.737828850498403 ], [ 5.137580867932065, 51.772905259431077 ], [ 5.014540023249575, 51.808984680959583 ], [ 5.031415073146523, 51.841084802107702 ], [ 4.993967909252922, 51.861222725420994 ], [ 5.062358224116345, 51.859362053527242 ], [ 5.180226727863164, 51.96744832651509 ], [ 5.236867149255078, 51.978757478459428 ], [ 5.321611332014112, 51.954919171164796 ], [ 5.486214078473083, 51.98382644510454 ], [ 5.627223829712356, 51.952386168324438 ], [ 5.550342661060417, 52.10541954546126 ], [ 5.459242995490565, 52.080225755481266 ], [ 5.514079463312799, 52.135923065932062 ], [ 5.439875615559026, 52.171197458274222 ], [ 5.44103943147957, 52.205693438951691 ], [ 5.393219147822698, 52.220626892173925 ], [ 5.404643399611359, 52.249630480909225 ], [ 5.533281176545358, 52.27274084169683 ], [ 5.587707385036856, 52.361454261431376 ], [ 5.787257137970521, 52.422573287061603 ], [ 5.876205471530124, 52.522025026941051 ], [ 5.925559518063968, 52.474057592745915 ], [ 6.027857569808684, 52.509606205409327 ], [ 6.099483437203417, 52.469970896552461 ], [ 6.130552948323514, 52.399978162269164 ], [ 6.078506385563601, 52.369523051161245 ], [ 6.066224466859907, 52.318839289847247 ], [ 6.163909067147507, 52.21749619292715 ], [ 6.38185154627214, 52.246112812566473 ], [ 6.492401220236633, 52.177371870181403 ], [ 6.671338986248984, 52.165683203635673 ], [ 6.662399005672591, 52.130167439615931 ], [ 6.760572413121598, 52.118779940206082 ], [ 6.687853003658449, 52.039856158091141 ], [ 6.832754328999235, 51.972938087693585 ], [ 6.721969582522561, 51.89606334135938 ], [ 6.683993990179909, 51.91757645733221 ], [ 6.472507886098918, 51.853823023864017 ], [ 6.390566170881016, 51.87396806966867 ], [ 6.401818441765064, 51.827262656663407 ], [ 6.117889496603739, 51.901659142837225 ], [ 6.166559884993931, 51.840721643435401 ], [ 6.063485632339608, 51.86545122678897 ], [ 5.962978284523374, 51.836913960582471 ], [ 5.946569966406273, 51.813479919592751 ], [ 5.992067051189349, 51.770245909123908 ], [ 5.943962150919553, 51.741816814422592 ], [ 5.893409336802974, 51.777852926426895 ], [ 5.765188291802036, 51.752789880063702 ], [ 5.638112608999517, 51.819025176083443 ], [ 5.493105254357093, 51.830750957327069 ], [ 5.403157084105017, 51.821611677731141 ], [ 5.357568231054432, 51.757890339715857 ], [ 5.300338754648935, 51.737287437014395 ], [ 5.177260142422514, 51.74291774914947 ] ] ] ] 
    }

# GraphQL query
query = """
query MyQuery($geomGelderland: GeoJSON!) {
  wosisLatestProfiles(
    first: 3
    filter: {layersExist: true, geom: {intersects: $geomGelderland}}
  ) {
    continent
    region
    profileId
    datasetCode
    latitude
    longitude
    geom{
      geojson
      x
      y
    }
  }
}
"""
# GraphQL endpoint
url='https://graphql.isric.org/wosis/graphql'
# Send POST request
r = requests.post(url, json={'query': query, 'variables': {'geomGelderland': geomGelderland}})
# Print status_code
print(r.status_code)
# Parse JSON
parsed = json.loads(r.text)
# Convert to pandas dataframe
df = pd.json_normalize(parsed['data']['wosisLatestProfiles']) 
# print dataframe
print(df)

The result will be:

python_q2_result

  • Get all WoSIS profiles with layers that exist in Gelderland and also export it to CSV.
import requests
import json
import pandas as pd

# GeoJSON geometry
geomGelderland = { 
    "type": "MultiPolygon", 
    "coordinates": [ [ [ [ 5.177260142422514, 51.74291774914947 ], [ 5.126747881386732, 51.737828850498403 ], [ 5.137580867932065, 51.772905259431077 ], [ 5.014540023249575, 51.808984680959583 ], [ 5.031415073146523, 51.841084802107702 ], [ 4.993967909252922, 51.861222725420994 ], [ 5.062358224116345, 51.859362053527242 ], [ 5.180226727863164, 51.96744832651509 ], [ 5.236867149255078, 51.978757478459428 ], [ 5.321611332014112, 51.954919171164796 ], [ 5.486214078473083, 51.98382644510454 ], [ 5.627223829712356, 51.952386168324438 ], [ 5.550342661060417, 52.10541954546126 ], [ 5.459242995490565, 52.080225755481266 ], [ 5.514079463312799, 52.135923065932062 ], [ 5.439875615559026, 52.171197458274222 ], [ 5.44103943147957, 52.205693438951691 ], [ 5.393219147822698, 52.220626892173925 ], [ 5.404643399611359, 52.249630480909225 ], [ 5.533281176545358, 52.27274084169683 ], [ 5.587707385036856, 52.361454261431376 ], [ 5.787257137970521, 52.422573287061603 ], [ 5.876205471530124, 52.522025026941051 ], [ 5.925559518063968, 52.474057592745915 ], [ 6.027857569808684, 52.509606205409327 ], [ 6.099483437203417, 52.469970896552461 ], [ 6.130552948323514, 52.399978162269164 ], [ 6.078506385563601, 52.369523051161245 ], [ 6.066224466859907, 52.318839289847247 ], [ 6.163909067147507, 52.21749619292715 ], [ 6.38185154627214, 52.246112812566473 ], [ 6.492401220236633, 52.177371870181403 ], [ 6.671338986248984, 52.165683203635673 ], [ 6.662399005672591, 52.130167439615931 ], [ 6.760572413121598, 52.118779940206082 ], [ 6.687853003658449, 52.039856158091141 ], [ 6.832754328999235, 51.972938087693585 ], [ 6.721969582522561, 51.89606334135938 ], [ 6.683993990179909, 51.91757645733221 ], [ 6.472507886098918, 51.853823023864017 ], [ 6.390566170881016, 51.87396806966867 ], [ 6.401818441765064, 51.827262656663407 ], [ 6.117889496603739, 51.901659142837225 ], [ 6.166559884993931, 51.840721643435401 ], [ 6.063485632339608, 51.86545122678897 ], [ 5.962978284523374, 51.836913960582471 ], [ 5.946569966406273, 51.813479919592751 ], [ 5.992067051189349, 51.770245909123908 ], [ 5.943962150919553, 51.741816814422592 ], [ 5.893409336802974, 51.777852926426895 ], [ 5.765188291802036, 51.752789880063702 ], [ 5.638112608999517, 51.819025176083443 ], [ 5.493105254357093, 51.830750957327069 ], [ 5.403157084105017, 51.821611677731141 ], [ 5.357568231054432, 51.757890339715857 ], [ 5.300338754648935, 51.737287437014395 ], [ 5.177260142422514, 51.74291774914947 ] ] ] ] 
    }

# GraphQL query
query = """
query MyQuery($first: Int, $offset: Int, $geomGelderland: GeoJSON!) {
  wosisLatestProfiles(
    first: $first, 
    offset: $offset,
    filter: {layersExist: true, geom: {intersects: $geomGelderland}}
  ) {
    continent
    region
    profileId
    datasetCode
    latitude
    longitude
  }
}
"""
# GraphQL endpoint
url='https://graphql.isric.org/wosis/graphql'

new_results = True
first = 100
offset = 0
all_results = []

while new_results:
    # Send POST request
    r = requests.post(url, json={'query': query, 'variables': {'first': first, 'offset': offset, 'geomGelderland': geomGelderland}})
    # Parse JSON
    parsed = json.loads(r.text)
    # Add results to all_results object
    all_results.extend(parsed['data']['wosisLatestProfiles'])
    # for debugging
    # print(json.dumps(parsed, indent=4, sort_keys=True))
    # print(len(parsed['data']['wosisLatestProfiles']))
    if not 'wosisLatestProfiles' in parsed['data'] or len(parsed['data']['wosisLatestProfiles']) == 0:
        print('No more results')
        # update new_results
        new_results = False
    else:
        print('We have more results')
        # update offset
        offset = offset+first

df = pd.json_normalize(all_results) 
# print dataframe
print('There are {} WoSIS profiles with layers inside Gelderland region'.format(df.shape[0]))
# Export dataframe to CSV
df.to_csv('wosis_gelderland.csv', index=False)

The result will be:

There are 136 WoSIS profiles with layers inside the Gelderland region

The CSV result file can be found here

R examples

The simplest way to perform a GraphQL request in R is to use {httr}.

  • Get the first 5 profiles and add them to a Pandas dataframe:
library(httr)
library(jsonlite)

# GraphQL query
query <- '
query MyQuery {
  wosisLatestProfiles(first: 5) {
    continent
    region
    countryName
    datasetCode
    latitude
    longitude
    positionalUncertainty
    profileCode
  }
}
'

# GraphQL endpoint
url <- 'https://graphql.isric.org/wosis/graphql'

# Send POST request
response <- POST(url, body = list(query = query), encode = "json")

# Print status_code
print(status_code(response))

# Parse JSON
parsed <- fromJSON(content(response, "text"), flatten = TRUE)

## convert the from json to dataframe object
df <- as.data.frame(parsed$data$wosisLatestProfiles)

head(df)

The result will be:

r_q1_result

Using variables in our script:

  • Get the first 3 profiles that are inside Gelderland region and add them to a Pandas dataframe:
library(httr)
library(jsonlite)


geomGelderland <- fromJSON('{
    "type": "MultiPolygon",
    "coordinates": [ [ [ [ 5.177260142422514, 51.74291774914947 ], [ 5.126747881386732, 51.737828850498403 ], [ 5.137580867932065, 51.772905259431077 ], [ 5.014540023249575, 51.808984680959583 ], [ 5.031415073146523, 51.841084802107702 ], [ 4.993967909252922, 51.861222725420994 ], [ 5.062358224116345, 51.859362053527242 ], [ 5.180226727863164, 51.96744832651509 ], [ 5.236867149255078, 51.978757478459428 ], [ 5.321611332014112, 51.954919171164796 ], [ 5.486214078473083, 51.98382644510454 ], [ 5.627223829712356, 51.952386168324438 ], [ 5.550342661060417, 52.10541954546126 ], [ 5.459242995490565, 52.080225755481266 ], [ 5.514079463312799, 52.135923065932062 ], [ 5.439875615559026, 52.171197458274222 ], [ 5.44103943147957, 52.205693438951691 ], [ 5.393219147822698, 52.220626892173925 ], [ 5.404643399611359, 52.249630480909225 ], [ 5.533281176545358, 52.27274084169683 ], [ 5.587707385036856, 52.361454261431376 ], [ 5.787257137970521, 52.422573287061603 ], [ 5.876205471530124, 52.522025026941051 ], [ 5.925559518063968, 52.474057592745915 ], [ 6.027857569808684, 52.509606205409327 ], [ 6.099483437203417, 52.469970896552461 ], [ 6.130552948323514, 52.399978162269164 ], [ 6.078506385563601, 52.369523051161245 ], [ 6.066224466859907, 52.318839289847247 ], [ 6.163909067147507, 52.21749619292715 ], [ 6.38185154627214, 52.246112812566473 ], [ 6.492401220236633, 52.177371870181403 ], [ 6.671338986248984, 52.165683203635673 ], [ 6.662399005672591, 52.130167439615931 ], [ 6.760572413121598, 52.118779940206082 ], [ 6.687853003658449, 52.039856158091141 ], [ 6.832754328999235, 51.972938087693585 ], [ 6.721969582522561, 51.89606334135938 ], [ 6.683993990179909, 51.91757645733221 ], [ 6.472507886098918, 51.853823023864017 ], [ 6.390566170881016, 51.87396806966867 ], [ 6.401818441765064, 51.827262656663407 ], [ 6.117889496603739, 51.901659142837225 ], [ 6.166559884993931, 51.840721643435401 ], [ 6.063485632339608, 51.86545122678897 ], [ 5.962978284523374, 51.836913960582471 ], [ 5.946569966406273, 51.813479919592751 ], [ 5.992067051189349, 51.770245909123908 ], [ 5.943962150919553, 51.741816814422592 ], [ 5.893409336802974, 51.777852926426895 ], [ 5.765188291802036, 51.752789880063702 ], [ 5.638112608999517, 51.819025176083443 ], [ 5.493105254357093, 51.830750957327069 ], [ 5.403157084105017, 51.821611677731141 ], [ 5.357568231054432, 51.757890339715857 ], [ 5.300338754648935, 51.737287437014395 ], [ 5.177260142422514, 51.74291774914947 ] ] ] ]
    }
')

# GraphQL query
query <- "
query MyQuery($geomGelderland: GeoJSON!) {
  wosisLatestProfiles(
    first: 3
    filter: {layersExist: true, geom: {intersects: $geomGelderland}}
  ) {
    continent
    region
    profileId
    datasetCode
    latitude
    longitude
    geom {
      geojson
      x
      y
    }
  }
}
"

# GraphQL endpoint
url <- "https://graphql.isric.org/wosis/graphql"

# Send POST request
response <- POST(url, body = list(query = query, variables = list(geomGelderland = geomGelderland)), encode = "json")

# Print status_code
print(status_code(response))

# Parse JSON
parsed <- fromJSON(content(response, "text"), flatten = TRUE)

# Convert to data frame
df <- as.data.frame(parsed$data$wosisLatestProfiles)

# Print data frame
head(df)

The result will be:

r_q2_result

  • Get all WoSIS profiles with layers that exist in Gelderland and also export these to CSV.
library(httr)
library(jsonlite)
library(dplyr)

# GeoJSON geometry
geomGelderland <- fromJSON('{
    "type": "MultiPolygon",
    "coordinates": [ [ [ [ 5.177260142422514, 51.74291774914947 ], [ 5.126747881386732, 51.737828850498403 ], [ 5.137580867932065, 51.772905259431077 ], [ 5.014540023249575, 51.808984680959583 ], [ 5.031415073146523, 51.841084802107702 ], [ 4.993967909252922, 51.861222725420994 ], [ 5.062358224116345, 51.859362053527242 ], [ 5.180226727863164, 51.96744832651509 ], [ 5.236867149255078, 51.978757478459428 ], [ 5.321611332014112, 51.954919171164796 ], [ 5.486214078473083, 51.98382644510454 ], [ 5.627223829712356, 51.952386168324438 ], [ 5.550342661060417, 52.10541954546126 ], [ 5.459242995490565, 52.080225755481266 ], [ 5.514079463312799, 52.135923065932062 ], [ 5.439875615559026, 52.171197458274222 ], [ 5.44103943147957, 52.205693438951691 ], [ 5.393219147822698, 52.220626892173925 ], [ 5.404643399611359, 52.249630480909225 ], [ 5.533281176545358, 52.27274084169683 ], [ 5.587707385036856, 52.361454261431376 ], [ 5.787257137970521, 52.422573287061603 ], [ 5.876205471530124, 52.522025026941051 ], [ 5.925559518063968, 52.474057592745915 ], [ 6.027857569808684, 52.509606205409327 ], [ 6.099483437203417, 52.469970896552461 ], [ 6.130552948323514, 52.399978162269164 ], [ 6.078506385563601, 52.369523051161245 ], [ 6.066224466859907, 52.318839289847247 ], [ 6.163909067147507, 52.21749619292715 ], [ 6.38185154627214, 52.246112812566473 ], [ 6.492401220236633, 52.177371870181403 ], [ 6.671338986248984, 52.165683203635673 ], [ 6.662399005672591, 52.130167439615931 ], [ 6.760572413121598, 52.118779940206082 ], [ 6.687853003658449, 52.039856158091141 ], [ 6.832754328999235, 51.972938087693585 ], [ 6.721969582522561, 51.89606334135938 ], [ 6.683993990179909, 51.91757645733221 ], [ 6.472507886098918, 51.853823023864017 ], [ 6.390566170881016, 51.87396806966867 ], [ 6.401818441765064, 51.827262656663407 ], [ 6.117889496603739, 51.901659142837225 ], [ 6.166559884993931, 51.840721643435401 ], [ 6.063485632339608, 51.86545122678897 ], [ 5.962978284523374, 51.836913960582471 ], [ 5.946569966406273, 51.813479919592751 ], [ 5.992067051189349, 51.770245909123908 ], [ 5.943962150919553, 51.741816814422592 ], [ 5.893409336802974, 51.777852926426895 ], [ 5.765188291802036, 51.752789880063702 ], [ 5.638112608999517, 51.819025176083443 ], [ 5.493105254357093, 51.830750957327069 ], [ 5.403157084105017, 51.821611677731141 ], [ 5.357568231054432, 51.757890339715857 ], [ 5.300338754648935, 51.737287437014395 ], [ 5.177260142422514, 51.74291774914947 ] ] ] ]
    }
')

# GraphQL query
query <- "
query MyQuery($first: Int, $offset: Int, $geomGelderland: GeoJSON!) {
  wosisLatestProfiles(
    first: $first,
    offset: $offset,
    filter: {layersExist: true, geom: {intersects: $geomGelderland}}
  ) {
    continent
    region
    profileId
    datasetCode
    latitude
    longitude
  }
}
"

# GraphQL endpoint
url <- "https://graphql.isric.org/wosis/graphql"

new_results <- TRUE
first <- 100
offset <- 0
all_results <- list()

while (new_results) {
    # Send POST request
    response <- POST(url, body = list(query = query, variables = list(
        first = first,
        offset = offset, geomGelderland = geomGelderland
    )), encode = "json")
    # Parse JSON
    parsed <- fromJSON(content(response, "text"), flatten = TRUE)
    # Add results to all_results list
    all_results <- append(all_results, list(parsed$data$wosisLatestProfiles))

    if (!"wosisLatestProfiles" %in% names(parsed$data) || length(parsed$data$wosisLatestProfiles) == 0) {
        print("No more results")
        # update new_results
        new_results <- FALSE
    } else {
        print("We have more results")
        # update offset
        offset <- offset + first
    }
}

df <- bind_rows(all_results) %>% as_tibble()
# print dataframe
cat("There are", nrow(df), "WoSIS profiles with layers inside Gelderland region\n")
# Export dataframe to CSV
write.csv(df, "wosis_gelderland.csv", row.names = FALSE, quote = FALSE)

The result will be:

There are 136 WoSIS profiles with layers inside Gelderland region

CSV result file can be found here


Soil data validation and ingest into WoSIS

The process of ingesting data into WoSIS involves a so-called Extract, Transform and Load (ETL) which is a standardised, semi-automatic process that guides the data processor during the ingestion of new datasets.

This process is assisted by this API and the fist part is mapping the different attributes from the original source data into WoSIS elements such as Observation measurements; site; profile and layer data.

Endpoint etlMappingFeatures contains available features that can be used for this process.

  • Get first 10 features in this case observations that have property pH and are distributed in WoSIS products.
query MyQuery {
  etlMappingFeatures(
    first: 10
    filter: {distribute: {equalTo: true}, propertyName: {like: "pH"}}
  ) {
    code
    category
    distribute
    featureType
    maximum
    minimum
    name
    procedureName
    propertyName
    unit {
      description
      symbol
    }
  }
}

Note that in the above example the API only returns 4 results because we dont have more in the dataset.

Get the values of a property within a polygon

query MyQuery($first: Int, $offset: Int) {
  wosisLatestProfiles(
    filter: {
      layers: { some: { sandValuesExist: true } }
      geom: {
        within: {
          type: "Polygon"
          coordinates: [
            [
              [-5.51345387228184, 9.59126476678042]
              [-5.51345387228184, 11.0451128553676]
              [-3.45410758209379, 11.0451128553676]
              [-3.45410758209379, 9.59126476678042]
              [-5.51345387228184, 9.59126476678042]
            ]
          ]
        }
      }
      and: {
        continent: { likeInsensitive: "Africa" }
        countryName: { likeInsensitive: "burkina faso" }
      }
    }
    first: $first
    offset: $offset
  ) {
    latitude
    longitude
    layers(first: $first, filter: { sandValuesExist: true }, offset: $offset) {
      sandValues(first: 6) {
        profileId
        profileCode
        layerId
        datasetId
        continent
        region
        countryName
        date
        upperDepth
        lowerDepth
        valueAvg
        licence
        methodOptions
      }
    }
  }
}

Variables:

{
  "first": 10,
  "offset": 0
}