From 2417b84b2257d078cc1eb1bfc599f4d8210635b5 Mon Sep 17 00:00:00 2001
From: Hans van den Heuvel <hans1.vandenheuvel@wur.nl>
Date: Thu, 26 Mar 2020 15:08:13 +0100
Subject: [PATCH] Readme in Convert-EUProcessingFactorsDB updated.

---
 Convert-EUProcessingFactorsDB/README.md | 102 ++++++++++--------------
 1 file changed, 44 insertions(+), 58 deletions(-)

diff --git a/Convert-EUProcessingFactorsDB/README.md b/Convert-EUProcessingFactorsDB/README.md
index 6263274..ec339eb 100644
--- a/Convert-EUProcessingFactorsDB/README.md
+++ b/Convert-EUProcessingFactorsDB/README.md
@@ -38,7 +38,7 @@ These are the input and output files of the script. All names are defaults, and
   * A small markdown report is also created, usally called [Report.md](Report.md), but within the zip file is called Readme.md.
   * A csv file with a summary (and counts) of *the remaining data* of the EU sheet, called [Mismatches.csv](Mismatches.csv).
 
-The following is happening in the script, essentially ([more details here](#detailed-workings))
+The following is happening in the script, essentially
 * The script wil try to match the first column (``FromFC``) of [ProcTypeTranslations.csv](ProcTypeTranslations.csv) to the column ``KeyFacets Code`` of the EU sheet. If a match is found, then the second column (``FCToProcType``) of [ProcTypeTranslations.csv](ProcTypeTranslations.csv) will become the field ``idProcessingType``.
 * Then the script will try to match both the ``FromFX`` and ``FXToRpc`` column of [FoodTranslations.csv](FoodTranslations.csv) with the columns ``Matrix FoodEx2 Code`` and ``Matrix Code`` from the EU sheet, *for all rows that didn't already match in the previous step*. If a match was found, then the value of ``FXToProcType`` will be copied to ``idProcessingType``.
 * If no substance file was given, then just copy the field ``ParamCode Active Substance`` to ``idSubstance``. But if a substance was given, then strip the dash from the ``'CASNumber`` column in the substance file, and match the column ``ParamCode Active Substance`` in the EFSA sheet to ``code`` in the substances sheet. If a match was found then copy the modified (without dash) ``CASNumber`` to ``idSubstance``.
@@ -69,61 +69,49 @@ python.exe convert-script -h
 ```
 
 Theses are command line options that are supported.
-  * ``-h`` : shows help, use this to see which default file names are used.
-  * ``-e EFSA_FILE_OR_URL`` : uses ``EFSA_FILE_OR_URL`` as input Excel sheet. This may be a filename, or the URL, but it should be the format as in the [EU Processing Factors file](https://zenodo.org/record/1488653/files/EU_Processing_Factors_db_P.xlsx.xlsx?download=1)
-  * ``-f FOOD_TRANSLATION_FILE`` : uses ``FOOD_TRANSLATION_FILE`` as a food translation file (``.csv``). This file should have the following format
-    * ``FromFX,FXToRpc,FXToProcType``; this line should be the first line (a header). Values are read as string. Values separated by comma.
-    * Lines starting with \# will be ignored and this can be used to insert comments.
-  * ``-m MISMATCH_FILE`` : this uses ``MISMATCH_FILE`` as an output file for the mismatches. Supported format: ``.csv``.
-  * ``-o PROCESSING_FACTOR_FILE`` : this uses ``PROCESSING_FACTOR_FILE`` as an output file for the MCRA formatted Processing Factors file. Supported formats: ``.zip``, ``.xlsx``, ``.csv``. If ``.zip`` is chosen as a format (default) then within the zipfile a ``.csv`` will be written with the MCRA conforming filename. Also a ``Readme.md`` file will be placed, which is just a copy of the report file (see option ``-r``)
-  * ``-p PROCESSING_TRANSLATION_FILE`` : Uses ``PROCESSING_TRANSLATION_FILE`` as a processing type translation file (``.csv``). This file should have the following format
-    * ``FromFC,FCToProcType``; this line should be the first line (a header). Values are read as string. Values seperated by comma.
-    * Lines starting with \# will be ignored and this can be used to insert comments.
-  * ``-r REPORT_FILE`` : this uses ``REPORT_FILE`` as an output report file (a Markdown file). A copy will be placed in the ``PROCESSING_FACTOR_FILE`` (option ``-o``) as ``Readme.md`` *if a zip file was chosen there* as an output file.
-  * ``-t PROCESSING_TYPE_FILE``] : this uses ``PROCESSING_TYPE_FILE`` as input to augment the data in the output file. The format is defined by MCRA.
-  * ``-v`` : writes verbose output. Multiple levels (1-3) of verbosity are possible, by using more ``v``'s. E.g. ``-vv`` or ``-vvv``.
-
-## Detailed workings
-
-The script is basically one long file, with sequential actions happening. No iteration is used, because the data processing is handed over to the ``pandas`` library. The script is diveded into five phases. If the ``-vv`` verbosity is used, these phases will be displayed as output. This is also extensively documented (commented) within the python file itself.
-
-The pandas dataprocessing can be thought of here as an SQL database. The script will read the EU Excel sheet into a database. Using left joins, and copying of columns the sheet/database is extended. Finally a selection of the newly created columns will be exported.
-
-Below a detailed description.
-
-* **PHASE 0. Initialization**
-  * Libraries are imported
-  * Command line arguments are parsed
-  * Objects created/adjusted
-* **PHASE 1. Read input files**
-  * Script reads the [EU Processing Factors file](https://zenodo.org/record/1488653/files/EU_Processing_Factors_db_P.xlsx.xlsx?download=1)
-  * Script reads the (MCRA formatted) files:
-    * A food translation file, [Foodtranslations.csv](Foodtranslations.csv)
-    * A processing translation file, [ProcTypeTranslations.csv](ProcTypeTranslations.csv)
-    * Only for information, a processing translation file, [ProcessingTypes.csv](ProcessingTypes.csv)
-* **PHASE 2. Processing data**
-  * Script will ``left join`` column ``KeyFacets Code`` from the EU sheet with the ``FromFC`` column of [ProcTypeTranslations.csv](ProcTypeTranslations.csv).
-  * The result will ``left join`` column ``Matrix FoodEx2 Code`` from the EU sheet with the ``FromFX`` column of [Foodtranslations.csv](Foodtranslations.csv).
-  * Copy existing columns
-
-|From                      |To                 |
-|:-------------------------|:------------------|
-|ParamCode Active Substance|idSubstance        |
-|ParamName Active Substance|SubstanceName      |
-|Matrix Code               |idFoodUnProcessed  |
-|Raw Primary Commodity     |FoodUnprocessedName|
-|Median PF                 |Nominal            |
-
-*
-  * Add empty columns: ``Upper``,``NominalUncertaintyUpper``,``UpperUncertaintyUpper``
-  * Next, if the first ``left join`` was succesfull (i.e ``FCToProcType`` contains a value), then make a copy of ``FCToProcType`` to a new field, ``idProcessingType``
-  * Next, if the second ``left join`` was succesfull (i.e ``FCToProcType`` does NOT contain a value, and ``FXToProcType`` does), then make a copy of ``FXToProcType`` to ``idProcessingType``
-  * Do a ``left join`` on column ``idProcessingType`` from the sheet with column ``idProcessingType`` from the file [ProcessingTypes.csv](ProcessingTypes.csv)
-  * Now, if column ``idProcessingType`` has an entry, ``idFoodUnProcessed`` will be concatenated with a dash ``-`` and with ``idProcessingType`` and the result will be placed into ``idFoodProcessed``
-* **PHASE 3. Exporting data**
-  * The columns ``idProcessingType``, ``idSubstance``, ``SubstanceName``, ``idFoodProcessed``, ``idFoodUnProcessed``, ``FoodUnprocessedName``, ``Nominal``, ``Upper``, ``NominalUncertaintyUpper``, ``UpperUncertaintyUpper``, ``KeyFacets Interpreted``, ``Matrix Code Interpreted``, ``MCRA_ProcessingType_Description`` are exported, for all rows in which either ``FCToProcType`` or ``FXToProcType`` has an entry.
-* **PHASE 4. Analysing data and creating report**
-  * This has to be expanded further in this readme file.
+```
+usage: Convert-EUProcessingFactorsDB.py [-h] [-v] [-x] [-e [EFSA_FILE]]
+                                        [-t [PROCESSING_TYPE_FILE]]
+                                        [-p [PROCESSING_TRANSLATION_FILE]]
+                                        [-f [FOOD_TRANSLATION_FILE]]
+                                        [-s [SUBSTANCE_TRANSLATION_FILE]]
+                                        [-g [FOOD_COMPOSITION_FILE]]
+                                        [-o [PROCESSING_FACTOR_FILE]]
+
+Converts the EFSA Zendono Excel sheet into an MCRA conforming format, using
+some external translation files.
+
+optional arguments:
+  -h, --help            show this help message and exit
+  -v, --verbosity       Show verbose output
+  -x, --example         Uses input files from the Example subdir.
+  -e [EFSA_FILE], --efsa_file [EFSA_FILE]
+                        The EFSA Zendono Excel sheet (.xlsx); either file or
+                        URL. (default: https://zenodo.org/record/1488653/files
+                        /EU_Processing_Factors_db_P.xlsx.xlsx?download=1)
+  -t [PROCESSING_TYPE_FILE], --processing_type_file [PROCESSING_TYPE_FILE]
+                        The (input) processing type file - format: csv (Comma
+                        Seperated). (default: ProcessingTypes.csv)
+  -p [PROCESSING_TRANSLATION_FILE], --processing_translation_file [PROCESSING_TRANSLATION_FILE]
+                        The (input) processing translation file - format: csv
+                        (Comma Seperated). (default: ProcTypeTranslations.csv)
+  -f [FOOD_TRANSLATION_FILE], --food_translation_file [FOOD_TRANSLATION_FILE]
+                        The (input) food translation file - format: csv (Comma
+                        Seperated). (default: FoodTranslations.csv)
+  -s [SUBSTANCE_TRANSLATION_FILE], --substance_translation_file [SUBSTANCE_TRANSLATION_FILE]
+                        The (input) substance translation file - format: tsv
+                        (Tab Seperated), file not required. (default:
+                        SubstanceTranslations.tsv)
+  -g [FOOD_COMPOSITION_FILE], --food_composition_file [FOOD_COMPOSITION_FILE]
+                        The (input) food composition file - format: xlsx
+                        (Excel), file not required. (default:
+                        FoodComposition.xlsx)
+  -o [PROCESSING_FACTOR_FILE], --processing_factor_file [PROCESSING_FACTOR_FILE]
+                        The (output) processing factor file - format: csv
+                        (Comma Seperated). (default: ProcessingFactors.zip)
+
+For example: use Convert-EUProcessingFactorsDB.py -v -x for a verbose example.
+```
 
 ## Coding
 
@@ -134,5 +122,3 @@ Check your changes using ``pycodestyle`` for example.
 pip install pycodestyle  # To install the programm
 pycodestyle .\Convert-EUProcessingFactorsDB.py  # To check whether the code complies.
 ```
-
-At the moment only one line is not according to the guidelines, a commented line with the URL of the EU website. This one execption is allowed.
\ No newline at end of file
-- 
GitLab