Skip to content
Snippets Groups Projects

grompy

Tools for working with parcel-based satellite time-series observations from groenmonitor.nl

Introduction

Grompy is a tool to process and access parcel-based satellite observations from GroenMonitor.nl. It was developed because accessing the satellite time-series for a parcel was cumbersome using the existing data structure. Moreover querying groenmonitor data from AgroDataCube is slow and inflexible for researchers who just want to play around with the data. Instead, grompy allows fast and easy access to all parcel observations and can provide simultaneous access to parcel info, optical and radar observations.

The grompy package consists of two components:

  1. A commandline tool (e.g. grompy) to define/check groenmonitor CSV files and finally load them into (SQLite) database tables.
  2. The python package grompy which provides grompy.DataAccesProvider which can be used to efficiently access the time-series data stored in the database.

Command line tool

initializing

The grompy command can be used to load parcel information and groenmonitor CSV files with parcel observation into a database structure. For this purpose a file grompy.yaml is required which provides the information required to process all inputs. This includes the paths to the different CSV files, the path to the shapefile with parcel information and the URI for the database where the data have to be written. The grompy.yaml file is the entry point for all other grompy operations as well as the DataAccessProvider.

The grompy.yaml file can be generated with the command grompy init <year> <data path> and for doing so, grompy assumes a certain folder structure which looks like this:

<data path> /BRP/gewaspercelen_<year>.shp
            /Optisch/ - CSV files with sentinel2 data
            /Radar/ - CSV with radar data 

In practice it is most convenient to keep the grompy.yaml file together with the data and use paths relative to the location of the grompy.yaml file. In this way you can copy to the grompy.yaml file and the corresponding database to another location without having to edit the grompy.yaml file. So change directory to the data folder and execute:

cd <data path>
grompy init 2019 .

The init command creates the grompy.yaml and sets the path to the inputs/outputs based on the input for <data path>. In this case, the current directory .. The grompy.yaml now looks like this:

grompy:
  version: 1.0
parcel_info:
  dsn: sqlite:///./parcel_info.db3
  counts_file: ./Optisch/perceelscount.csv
  shape_file: ./BRP/gewaspercelen_2019.shp
  table_name: parcel_info
datasets:
  sentinel2_reflectance_values:
    dsn: sqlite:///./sentinel2_reflectance_values.db3
    bands:
      NDVI: ./Optisch/zonal_stats_mean_2019_ADC.csv
      B02: ./Optisch/zonal_stats_mean_B02_2019_ADC.csv
      B03: ./Optisch/zonal_stats_mean_B03_2019_ADC.csv
      B04: ./Optisch/zonal_stats_mean_B04_2019_ADC.csv
      B05: ./Optisch/zonal_stats_mean_B05_2019_ADC.csv
      B06: ./Optisch/zonal_stats_mean_B06_2019_ADC.csv
      B07: ./Optisch/zonal_stats_mean_B07_2019_ADC.csv
      B08: ./Optisch/zonal_stats_mean_B08_2019_ADC.csv
      B11: ./Optisch/zonal_stats_mean_B11_2019_ADC.csv
      B12: ./Optisch/zonal_stats_mean_B12_2019_ADC.csv
      B8A: ./Optisch/zonal_stats_mean_B8A_2019_ADC.csv
  sentinel2_reflectance_std:
    dsn: sqlite:///./sentinel2_reflectance_std.db3
    bands:
      NDVI: ./Optisch/zonal_stats_std_2019_ADC.csv
      B02: ./Optisch/zonal_stats_std_B02_2019_ADC.csv
...

The grompy.yaml file specifies several sections:

  • A first section describing the grompy version
  • the 'parcel_info' section providing the path to the shapefile with parcel information as well as a 'counts_file' (see below) which provides for each parcel the satellite pixel count. Finally, the data source name (dsn) for the database to write to, and the name of the output table.
  • the 'datasets' section which provides the database dsn and the paths to the different CSV files belonging to the dataset. The name of the dataset will be used for the output table in the database while the names of the CSV files will be used as the table column names. Note that the number of datasets can be variable as well as the number of CSV files in a dataset. e.g. you can just add datasets or paths to CSV files within a dataset. This aspect provides a lot of flexibility.

After the grompy.yaml has been created you can go to the next step.

note: the counts file is a CSV file with two columns: the fieldID column and the count column which represents the number of useable pixels in the parcel (excluding border pixels). The counts file can be most easily generated by taking one of the input CSV files (which also include the pixel count, but this is ignored during loading) and generate the counts file with awk:

cat <CSV file> | awk 'BEGIN{FS=","}{print $1, $2}' > <counts_file.csv>

checking

The grompy.yaml is a relatively complex input structure and manually checking all paths is rather cumbersome. Therefore, grompy can check if the YAML file is OK by executing:

grompy check <grompy_yaml>

Grompy will now read the YAML and carry out several checks, including:

  • If files exists.
  • If connections to database can be opened.
  • If the CSV files of the different datasets all have the same number of lines.
  • If the shapefile with parcel info has the required attributes.

Grompy will display a lot of output on the screen. If everything is fine, the last line will show:

OK! All inputs seem fine.

If not, open the YAML file with a text editor and correct any problems that are found manually. Next, rerun grompy check to see if all errors are gone. Now we are ready for the final step.

note: You cannot skip the grompy check step because it modifies the YAML file and adds some additional information to it. Running grompy load on an unchecked grompy.yaml will result in grompy asking you to run grompy check first.

loading

See information below for converting grompy 1.1 to 1.2 databases

The final step is to load the parcel information and satellite observations into the database tables. This can be done with the grompy load <grompy_yaml> command. Grompy will now show the following output:

Start loading parcel information. This will take some time...
Starting loading of: sentinel1_backscatter
Starting loading of: sentinel1_coherence
Starting loading of: sentinel2_reflectance_std
Starting loading of: sentinel2_reflectance_values
 |--------------------------------------------------| 0.01% 

In the first stage, grompy will load the parcel information. It uses geopandas to load the shapefile and it will take some time to complete this operation. Next, it will start loading the datasets. Loading of data into the database can easily take several hours depending on the speed of the underlying hardware. Moreover, loading of datasets is done in parallel. Grompy will start as many parallel processes as there are datasets defined in the grompy.yaml files. Therefore, grompy should only be applied on machines with sufficient cores and writing should be done to different database files (in case of SQLite) or a database server with sufficient capacity to handle multiple streams of data. Note that grompy can write all information into one SQLite database, but write locks on the database will cause delays in processing so this is not recommended.

Important: grompy 1.2 offers additional parcel selection options compared to grompy 1.2. To convert a grompy 1.1 database to a grompy 1.2 database it is only necessary to reload the parcel information. The structure of the Sentinel1/2 parcel observations did not change from 1.1 to 1.2. A special option has been added to only reload the parcel info in a grompy database: grompy load --parcel_info_only <grompy_yaml>

Accessing data processed by grompy

See the jupyter notebooks in the notebooks/ directory in the grompy git repository.