grompy
Tools for working with parcel-based satellite time-series observations from groenmonitor.nl
Introduction
Grompy is a tool to process and access parcel-based satellite observations from GroenMonitor.nl. It was developed because accessing the satellite time-series for a parcel was cumbersome using the existing data structure. Moreover querying groenmonitor data from AgroDataCube is slow and inflexible for researchers who just want to play around with the data. Instead, grompy allows fast and easy access to all parcel observations and can provide simultaneous access to parcel info, optical and radar observations.
The grompy package consists of two components:
- A commandline tool (e.g.
grompy
) to define/check groenmonitor CSV files and finally load them into (SQLite) database tables. - The python package
grompy
which providesgrompy.DataAccesProvider
which can be used to efficiently access the time-series data stored in the database.
Command line tool
initializing
The grompy
command can be used to load parcel information and groenmonitor CSV files with parcel observation
into a database structure.
For this purpose a file grompy.yaml
is required which provides the information required to process all inputs.
This includes the paths to the different CSV files, the path to the shapefile with parcel information and
the URI for the database where the data have to be written. The grompy.yaml
file is the entry point for all
other grompy operations as well as the DataAccessProvider
.
The grompy.yaml
file can be generated with the command grompy init <year> <data path>
and for doing so, grompy assumes
a certain folder structure which looks like this:
<data path> /BRP/gewaspercelen_<year>.shp
/Optisch/ - CSV files with sentinel2 data
/Radar/ - CSV with radar data
In practice it is most convenient to keep the grompy.yaml
file together with the data and use paths relative
to the location of the grompy.yaml
file. In this way you can copy to the grompy.yaml file and the corresponding
database to another location without having to edit the grompy.yaml
file. So change directory
to the data folder and execute:
cd <data path>
grompy init 2019 .
The init command creates the grompy.yaml
and sets the path to the inputs/outputs based on the input
for <data path>
. In this case, the current directory .
. The grompy.yaml
now looks like this:
grompy:
version: 1.0
parcel_info:
dsn: sqlite:///./parcel_info.db3
counts_file: ./Optisch/perceelscount.csv
shape_file: ./BRP/gewaspercelen_2019.shp
table_name: parcel_info
datasets:
sentinel2_reflectance_values:
dsn: sqlite:///./sentinel2_reflectance_values.db3
bands:
NDVI: ./Optisch/zonal_stats_mean_2019_ADC.csv
B02: ./Optisch/zonal_stats_mean_B02_2019_ADC.csv
B03: ./Optisch/zonal_stats_mean_B03_2019_ADC.csv
B04: ./Optisch/zonal_stats_mean_B04_2019_ADC.csv
B05: ./Optisch/zonal_stats_mean_B05_2019_ADC.csv
B06: ./Optisch/zonal_stats_mean_B06_2019_ADC.csv
B07: ./Optisch/zonal_stats_mean_B07_2019_ADC.csv
B08: ./Optisch/zonal_stats_mean_B08_2019_ADC.csv
B11: ./Optisch/zonal_stats_mean_B11_2019_ADC.csv
B12: ./Optisch/zonal_stats_mean_B12_2019_ADC.csv
B8A: ./Optisch/zonal_stats_mean_B8A_2019_ADC.csv
sentinel2_reflectance_std:
dsn: sqlite:///./sentinel2_reflectance_std.db3
bands:
NDVI: ./Optisch/zonal_stats_std_2019_ADC.csv
B02: ./Optisch/zonal_stats_std_B02_2019_ADC.csv
...
The grompy.yaml
file specifies several sections:
- A first section describing the grompy version
- the 'parcel_info' section providing the path to the shapefile with parcel information as well as a 'counts_file' (see below) which provides for each parcel the satellite pixel count. Finally, the data source name (dsn) for the database to write to, and the name of the output table.
- the 'datasets' section which provides the database dsn and the paths to the different CSV files belonging to the dataset. The name of the dataset will be used for the output table in the database while the names of the CSV files will be used as the table column names. Note that the number of datasets can be variable as well as the number of CSV files in a dataset. e.g. you can just add datasets or paths to CSV files within a dataset. This aspect provides a lot of flexibility.
After the grompy.yaml
has been created you can go to the next step.
note: the counts file is a CSV file with two columns: the fieldID
column and the count
column which
represents the number of useable pixels in the parcel (excluding border pixels). The counts file can be most
easily generated by taking one of the input CSV files (which also include the pixel count, but this is ignored
during loading) and generate the counts file with awk
:
cat <CSV file> | awk 'BEGIN{FS=","}{print $1, $2}' > <counts_file.csv>
checking
The grompy.yaml
is a relatively complex input structure and manually checking all paths is rather cumbersome.
Therefore, grompy can check if the YAML file is OK by executing:
grompy check <grompy_yaml>
Grompy will now read the YAML and carry out several checks, including:
- If files exists.
- If connections to database can be opened.
- If the CSV files of the different datasets all have the same number of lines.
- If the shapefile with parcel info has the required attributes.
Grompy will display a lot of output on the screen. If everything is fine, the last line will show:
OK! All inputs seem fine.
If not, open the YAML file with a text editor and correct any problems that are found manually. Next, rerun
grompy check
to see if all errors are gone. Now we are ready for the final step.
note: You cannot skip the grompy check
step because it modifies the YAML file and adds some additional
information to it. Running grompy load
on an unchecked grompy.yaml
will result in grompy asking you to
run grompy check
first.
loading
See information below for converting grompy 1.1 to 1.2 databases
The final step is to load the parcel information and satellite observations into the database tables. This can be
done with the grompy load <grompy_yaml>
command. Grompy will now show the following output:
Start loading parcel information. This will take some time...
Starting loading of: sentinel1_backscatter
Starting loading of: sentinel1_coherence
Starting loading of: sentinel2_reflectance_std
Starting loading of: sentinel2_reflectance_values
|--------------------------------------------------| 0.01%
In the first stage, grompy will load the parcel information. It uses geopandas to load the shapefile and
it will take some time to complete this operation. Next, it will start loading the datasets. Loading of
data into the database can easily take several hours depending on the speed of the underlying hardware.
Moreover, loading of datasets is done in parallel. Grompy will start as many parallel processes as there are
datasets defined in the grompy.yaml
files. Therefore, grompy should only be applied on machines with
sufficient cores and writing should be done to different database files (in case of SQLite) or a database
server with sufficient capacity to handle multiple streams of data. Note that grompy can write all information
into one SQLite database, but write locks on the database will cause delays in processing so this is not
recommended.
Important: grompy 1.2 offers additional parcel selection options compared to grompy 1.2. To convert a grompy 1.1 database
to a grompy 1.2 database it is only necessary to reload the parcel information. The structure of the Sentinel1/2
parcel observations did not change from 1.1 to 1.2. A special option has been added to only reload the parcel
info in a grompy database: grompy load --parcel_info_only <grompy_yaml>
Accessing data processed by grompy
See the jupyter notebooks in the notebooks/
directory in the grompy git repository.