Skip to content
Snippets Groups Projects
Commit 012f05f2 authored by Sven Warris's avatar Sven Warris
Browse files

ran codebaseai

parent 5a083235
Branches
No related tags found
No related merge requests found
Showing
with 2355 additions and 322 deletions
# Analytics # F500 Data Analytics
Application and tools for data analytics and visualizations ## Project Description
\ No newline at end of file
The F500 Data Analytics project provides a suite of tools and scripts for processing, analyzing, and visualizing point cloud data, particularly from Phenospex PLY PointCount files. The project leverages libraries such as Open3D and NumPy to handle 3D data and perform operations like NDVI computation and visualization. Additionally, it includes functionalities for interacting with the Fairdom SEEK API to manage data resources.
## Table of Contents
- [Installation Instructions](#installation-instructions)
- [Usage Guide](#usage-guide)
- [Features](#features)
- [Modules Overview](#modules-overview)
- [Configuration & Customization](#configuration--customization)
- [Testing & Debugging](#testing--debugging)
- [Contributing Guide](#contributing-guide)
- [License & Author Information](#license--author-information)
## Installation Instructions
To set up the project, ensure you have Python installed on your system. Then, install the required dependencies using pip:
```bash
pip install open3d numpy requests pandas
```
## Usage Guide
### Visualizing Point Cloud Data
To visualize point cloud data and compute NDVI, use the `visualization_ply.py` script:
```bash
python analytics/visualizations/visualization_ply.py <path_to_ply_file>
```
### Deleting Resources via Fairdom SEEK API
To delete resources from a Fairdom SEEK server, use the `deleteFAIRObject.py` script:
```bash
python analytics/f500/collecting/deleteFAIRObject.py <token>
```
### F500 Toolkit
The `toolkit.py` script provides a command-line interface for various data processing tasks:
```bash
python analytics/f500/collecting/toolkit.py <command>
```
Available commands include:
- `restructure`
- `pointclouds`
- `verify`
- `histogram`
- `upload`
## Features
- **Point Cloud Visualization**: Visualize and process 3D point cloud data.
- **NDVI Computation**: Compute and visualize NDVI from point cloud data.
- **Resource Management**: Interact with Fairdom SEEK API to manage data resources.
- **Data Processing**: Restructure, verify, and upload data using the F500 toolkit.
## Modules Overview
- **visualization_ply.py**: Handles point cloud visualization and NDVI computation.
- **clearWhites.py**: Loads point cloud data for further processing.
- **deleteFAIRObject.py**: Deletes resources from Fairdom SEEK server.
- **toolkit.py**: Command-line interface for F500 data processing tasks.
## Configuration & Customization
- **API Token**: Ensure you have a valid authorization token for accessing the Fairdom SEEK API.
- **File Paths**: Provide correct paths to PLY files when using visualization scripts.
## Testing & Debugging
- **Error Handling**: Scripts include basic error handling for file paths and API requests.
- **Future Work**: Consider adding more robust error handling and automated tests for critical functionalities.
## Contributing Guide
Contributions are welcome! Please follow these steps:
1. Fork the repository.
2. Create a new branch for your feature or bug fix.
3. Commit your changes with clear messages.
4. Push your changes to your fork.
5. Submit a pull request with a detailed description of your changes.
## License & Author Information
This project is licensed under the MIT License. For more information, see the LICENSE file.
Author: Sven Warris
---
This README provides a comprehensive overview of the F500 Data Analytics project, including installation, usage, and contribution guidelines. Feel free to update sections as the project evolves.
\ No newline at end of file
""" """
ISA & isamodel This script is a data processing tool for F500 PlantEye data. It provides functionalities to restructure raw data, process point clouds, combine histograms, and upload data to a specified platform. The script uses the ISA model for data representation and supports command-line interfaces for different operations.
https://isa-specs.readthedocs.io/en/latest/isamodel.html
Classes:
F500: A class to handle the processing of F500 PlantEye data, including restructuring, point cloud processing, histogram combination, and data upload.
Functions:
commandLineInterface: Sets up the command-line interface for the script.
setLogger: Configures the logging for the script.
removeAfterSpaceFromDataMatrix: Static method to clean up the 'DataMatrix' column in a DataFrame.
createISA: Initializes an ISA investigation object.
writeISAJSON: Writes the ISA investigation object to a JSON file.
copyPots: Static method to copy pot information from a reference DataFrame to a row.
measurementsToFile: Writes the measurements DataFrame to a file.
rawMeasurementsToFile: Static method to write raw measurements to a file.
addPointClouds: Static method to add point cloud file names to a row.
copyPointcloudFile: Static method to copy point cloud files to a specified location.
copyPlotPointcloudFile: Static method to copy plot point cloud files to a specified location.
createSample: Static method to create a sample object.
createAssay: Static method to create an assay object.
createAssayPlot: Static method to create an assay plot object.
correctDataMatrix: Corrects the 'DataMatrix' column in a row based on a reference DataFrame.
finalize: Finalizes the processing of measurements and creates assays.
getDirectoryListing: Returns a directory listing for a given root folder.
restructure: Restructures the raw data into an ISA-compliant format.
processPointclouds: Processes point cloud files and generates derived data.
combineHistograms: Combines histogram data from multiple assays into a single file.
upload: Uploads the processed data to a specified platform.
""" """
import argparse import argparse
import sys import sys
import os import os
...@@ -28,6 +53,24 @@ import datetime ...@@ -28,6 +53,24 @@ import datetime
import string import string
class F500: class F500:
"""
A class to handle the processing of F500 PlantEye data, including restructuring, point cloud processing, histogram combination, and data upload.
Attributes:
description (defaultdict): A dictionary to store descriptions.
columnsToDrop (list): A list of columns to drop from the data.
ISA (dict): A dictionary to store ISA-related data.
datamatrix (list): A list to store data matrix information.
investigation (Investigation): An ISA investigation object.
checkAssayName (re.Pattern): A regex pattern to check assay names.
measurements (DataFrame): A DataFrame to store measurements.
currentFile (str): The current file being processed.
currentRoot (str): The current root directory being processed.
command (str): The command to execute.
assaysDone (set): A set to store completed assays.
samples (dict): A dictionary to store sample objects.
"""
description = defaultdict(str) description = defaultdict(str)
columnsToDrop = [] columnsToDrop = []
ISA = {} ISA = {}
...@@ -42,6 +85,9 @@ class F500: ...@@ -42,6 +85,9 @@ class F500:
samples = None samples = None
def __init__(self): def __init__(self):
"""
Initializes the F500 object with default values and configurations.
"""
# Some columns contain the wrong data, remove those: # Some columns contain the wrong data, remove those:
self.columnsToDrop = ["ndvi_aver","ndvi_bin0","ndvi_bin1","ndvi_bin2","ndvi_bin3","ndvi_bin4","ndvi_bin5", self.columnsToDrop = ["ndvi_aver","ndvi_bin0","ndvi_bin1","ndvi_bin2","ndvi_bin3","ndvi_bin4","ndvi_bin5",
"greenness_aver","greenness_bin0","greenness_bin1","greenness_bin2","greenness_bin3","greenness_bin4","greenness_bin5", "greenness_aver","greenness_bin0","greenness_bin1","greenness_bin2","greenness_bin3","greenness_bin4","greenness_bin5",
...@@ -55,11 +101,12 @@ class F500: ...@@ -55,11 +101,12 @@ class F500:
self.samples = {} self.samples = {}
def commandLineInterface(self): def commandLineInterface(self):
"""
Sets up the command-line interface for the script, defining arguments and subcommands.
"""
my_parser = argparse.ArgumentParser(description='F500 PlantEye data processing tool.') my_parser = argparse.ArgumentParser(description='F500 PlantEye data processing tool.')
sub_parsers = my_parser.add_subparsers(dest="command") sub_parsers = my_parser.add_subparsers(dest="command")
my_parser_restructure = sub_parsers.add_parser("restructure") my_parser_restructure = sub_parsers.add_parser("restructure")
my_parser_restructure.add_argument('--loglevel', help="Application log level (INFO/WARN/ERROR)", default="INFO") my_parser_restructure.add_argument('--loglevel', help="Application log level (INFO/WARN/ERROR)", default="INFO")
my_parser_restructure.add_argument('--logfile', help="Application log file") my_parser_restructure.add_argument('--logfile', help="Application log file")
...@@ -160,6 +207,9 @@ class F500: ...@@ -160,6 +207,9 @@ class F500:
def setLogger(self): def setLogger(self):
"""
Configures the logging for the script based on command-line arguments.
"""
self.logger = logging.getLogger("F500") self.logger = logging.getLogger("F500")
self.logger.setLevel(self.args.loglevel) self.logger.setLevel(self.args.loglevel)
if len(str(self.args.logfile)) > 0: if len(str(self.args.logfile)) > 0:
...@@ -167,6 +217,15 @@ class F500: ...@@ -167,6 +217,15 @@ class F500:
@staticmethod @staticmethod
def removeAfterSpaceFromDataMatrix(row): def removeAfterSpaceFromDataMatrix(row):
"""
Cleans up the 'DataMatrix' column in a DataFrame row by removing text after a space.
Args:
row (Series): A row from a DataFrame.
Returns:
Series: The modified row with cleaned 'DataMatrix' column.
"""
try: try:
row["DataMatrix"] = row["DataMatrix"].strip().split(" ")[0] row["DataMatrix"] = row["DataMatrix"].strip().split(" ")[0]
except: except:
...@@ -174,6 +233,9 @@ class F500: ...@@ -174,6 +233,9 @@ class F500:
return row return row
def createISA(self): def createISA(self):
"""
Initializes an ISA investigation object and sets up the study and metadata.
"""
# Create investigation # Create investigation
self.investigation = Investigation() self.investigation = Investigation()
self.investigation.title = "_".join([self.datamatrix[4], self.datamatrix[3], self.datamatrix[2]]) self.investigation.title = "_".join([self.datamatrix[4], self.datamatrix[3], self.datamatrix[2]])
...@@ -181,7 +243,6 @@ class F500: ...@@ -181,7 +243,6 @@ class F500:
self.investigation.measurements = pandas.DataFrame() self.investigation.measurements = pandas.DataFrame()
self.investigation.plots = set() self.investigation.plots = set()
# Create study, title comes datamatrix file (ID...) # Create study, title comes datamatrix file (ID...)
self.investigation.studies.append(Study()) self.investigation.studies.append(Study())
if self.studyName != None: if self.studyName != None:
...@@ -200,6 +261,9 @@ class F500: ...@@ -200,6 +261,9 @@ class F500:
def writeISAJSON(self): def writeISAJSON(self):
"""
Writes the ISA investigation object to a JSON file.
"""
jsonOutput = open(self.args.json, "w") jsonOutput = open(self.args.json, "w")
jsonOutput.write(json.dumps(self.investigation, cls=ISAJSONEncoder, sort_keys=True, indent=4, separators=(',', ': '))) jsonOutput.write(json.dumps(self.investigation, cls=ISAJSONEncoder, sort_keys=True, indent=4, separators=(',', ': ')))
jsonOutput.close() jsonOutput.close()
...@@ -207,6 +271,17 @@ class F500: ...@@ -207,6 +271,17 @@ class F500:
@staticmethod @staticmethod
def copyPots(row, pots, f500): def copyPots(row, pots, f500):
"""
Copies pot information from a reference DataFrame to a row.
Args:
row (Series): A row from a DataFrame.
pots (DataFrame): A DataFrame containing pot information.
f500 (F500): An instance of the F500 class.
Returns:
Series: The modified row with pot information.
"""
try: try:
row["Pot"] = pots[ (pots["x"] == row["x"]) & (pots["y"] == row["y"]) ]["Pot"].iloc[0] row["Pot"] = pots[ (pots["x"] == row["x"]) & (pots["y"] == row["y"]) ]["Pot"].iloc[0]
if "Treatment" in pots.columns: if "Treatment" in pots.columns:
...@@ -219,6 +294,9 @@ class F500: ...@@ -219,6 +294,9 @@ class F500:
return row return row
def measurementsToFile(self): def measurementsToFile(self):
"""
Writes the measurements DataFrame to a file.
"""
path = "/".join([self.investigationPath, self.investigation.title, self.investigation.studies[0].title]) path = "/".join([self.investigationPath, self.investigation.title, self.investigation.studies[0].title])
filename = "derived/" + self.investigation.studies[0].title + ".csv" filename = "derived/" + self.investigation.studies[0].title + ".csv"
os.makedirs(path + "/derived", exist_ok=True) os.makedirs(path + "/derived", exist_ok=True)
...@@ -226,13 +304,31 @@ class F500: ...@@ -226,13 +304,31 @@ class F500:
@staticmethod @staticmethod
def rawMeasurementsToFile(path, filename, measurements): def rawMeasurementsToFile(path, filename, measurements):
"""
Writes raw measurements to a file.
Args:
path (str): The directory path to save the file.
filename (str): The name of the file.
measurements (list): A list of measurements to write.
"""
os.makedirs(path + "/derived", exist_ok=True) os.makedirs(path + "/derived", exist_ok=True)
df = pandas.DataFrame(measurements) df = pandas.DataFrame(measurements)
df = df.transpose() df = df.transpose()
df.to_csv(path + "/" + filename, sep=";", index=False) df.to_csv(path + "/" + filename, sep=";", index=False)
@staticmethod @staticmethod
def addPointClouds(row, title) : def addPointClouds(row, title):
"""
Adds point cloud file names to a row.
Args:
row (Series): A row from a DataFrame.
title (str): The title to use in the file names.
Returns:
Series: The modified row with point cloud file names.
"""
filename = "{}_{}_full_sx{:03d}_sy{:03d}.ply.gz".format( filename = "{}_{}_full_sx{:03d}_sy{:03d}.ply.gz".format(
title, row["timestamp_file"], title, row["timestamp_file"],
int(row["x"]), int(row["x"]),
...@@ -259,6 +355,14 @@ class F500: ...@@ -259,6 +355,14 @@ class F500:
@staticmethod @staticmethod
def copyPointcloudFile(row, f500, fullPath): def copyPointcloudFile(row, f500, fullPath):
"""
Copies point cloud files to a specified location.
Args:
row (Series): A row from a DataFrame.
f500 (F500): An instance of the F500 class.
fullPath (str): The destination path for the point cloud files.
"""
if f500.args.copyPointcloud == "True": if f500.args.copyPointcloud == "True":
AB = f500.root.split("/")[-1] AB = f500.root.split("/")[-1]
pointcloudPath = "/".join(f500.root.split("/")[:-3]) + "/current/" + AB +'/I/' pointcloudPath = "/".join(f500.root.split("/")[:-3]) + "/current/" + AB +'/I/'
...@@ -299,6 +403,15 @@ class F500: ...@@ -299,6 +403,15 @@ class F500:
@staticmethod @staticmethod
def copyPlotPointcloudFile(row, f500, fullPath, title): def copyPlotPointcloudFile(row, f500, fullPath, title):
"""
Copies plot point cloud files to a specified location.
Args:
row (Series): A row from a DataFrame.
f500 (F500): An instance of the F500 class.
fullPath (str): The destination path for the plot point cloud files.
title (str): The title to use in the file names.
"""
if f500.args.copyPointcloud == "True": if f500.args.copyPointcloud == "True":
f500.logger.warn("The copy plot point cloud will copy a lot of data. However, users are generally not interested in these plot files.") f500.logger.warn("The copy plot point cloud will copy a lot of data. However, users are generally not interested in these plot files.")
...@@ -359,6 +472,20 @@ class F500: ...@@ -359,6 +472,20 @@ class F500:
@staticmethod @staticmethod
def createSample(samples, name, source, organism, taxon, term_source): def createSample(samples, name, source, organism, taxon, term_source):
"""
Creates a sample object if it doesn't already exist.
Args:
samples (dict): A dictionary to store sample objects.
name (str): The name of the sample.
source (Source): The source object for the sample.
organism (str): The organism name.
taxon (str): The taxon ID.
term_source (OntologySourceReference): The ontology source reference.
Returns:
Sample: The created or existing sample object.
"""
if str(name) not in samples: if str(name) not in samples:
sample = Sample(name=str(name), derives_from=[source]) sample = Sample(name=str(name), derives_from=[source])
characteristic_organism = Characteristic(category=OntologyAnnotation(term="Organism"), characteristic_organism = Characteristic(category=OntologyAnnotation(term="Organism"),
...@@ -372,6 +499,15 @@ class F500: ...@@ -372,6 +499,15 @@ class F500:
@staticmethod @staticmethod
def createAssay(row, f500, path, source): def createAssay(row, f500, path, source):
"""
Creates an assay object and adds it to the investigation.
Args:
row (Series): A row from a DataFrame.
f500 (F500): An instance of the F500 class.
path (str): The directory path for the assay.
source (Source): The source object for the assay.
"""
assay = Assay() assay = Assay()
assay.title = row["timestamp_file"] assay.title = row["timestamp_file"]
assay.filename = row["timestamp_file"] assay.filename = row["timestamp_file"]
...@@ -451,6 +587,16 @@ class F500: ...@@ -451,6 +587,16 @@ class F500:
@staticmethod @staticmethod
def createAssayPlot(row, f500, path, source, title): def createAssayPlot(row, f500, path, source, title):
"""
Creates an assay plot object and adds it to the investigation.
Args:
row (Series): A row from a DataFrame.
f500 (F500): An instance of the F500 class.
path (str): The directory path for the assay plot.
source (Source): The source object for the assay plot.
title (str): The title to use in the file names.
"""
assay = Assay() assay = Assay()
assay.title = row["timestamp_file"] assay.title = row["timestamp_file"]
assay.filename = row["timestamp_file"] assay.filename = row["timestamp_file"]
...@@ -484,6 +630,16 @@ class F500: ...@@ -484,6 +630,16 @@ class F500:
f500.investigation.studies[0].assays.append(assay) f500.investigation.studies[0].assays.append(assay)
def correctDataMatrix(row, pots): def correctDataMatrix(row, pots):
"""
Corrects the 'DataMatrix' column in a row based on a reference DataFrame.
Args:
row (Series): A row from a DataFrame.
pots (DataFrame): A DataFrame containing pot information.
Returns:
Series: The modified row with corrected 'DataMatrix' column.
"""
result = pots.loc[(pots['x'] == row["x"]) & (pots['y'] == row['y']), 'Pot'] result = pots.loc[(pots['x'] == row["x"]) & (pots['y'] == row['y']), 'Pot']
# Access the result # Access the result
...@@ -493,6 +649,12 @@ class F500: ...@@ -493,6 +649,12 @@ class F500:
return row return row
def finalize(self, title): def finalize(self, title):
"""
Finalizes the processing of measurements and creates assays.
Args:
title (str): The title to use in the file names.
"""
# CSV will be combined data file (with corrected pot names) and ply file names # CSV will be combined data file (with corrected pot names) and ply file names
# Do this, if the data matrix contains pot names (otherwise it either went wrong or data is from a different project # Do this, if the data matrix contains pot names (otherwise it either went wrong or data is from a different project
# Then list the ply files as Image File # Then list the ply files as Image File
...@@ -533,9 +695,21 @@ class F500: ...@@ -533,9 +695,21 @@ class F500:
self.logger.info("No pots in main measurement file") self.logger.info("No pots in main measurement file")
def getDirectoryListing(self, rootFolder): def getDirectoryListing(self, rootFolder):
"""
Returns a directory listing for a given root folder.
Args:
rootFolder (str): The root folder to list.
Returns:
generator: A generator yielding directory listings.
"""
return os.walk(rootFolder) return os.walk(rootFolder)
def restructure(self): def restructure(self):
"""
Restructures the raw data into an ISA-compliant format.
"""
self.source = Source(name=self.args.source) self.source = Source(name=self.args.source)
self.sourceContainer = Source(name=self.args.sourceContainer) self.sourceContainer = Source(name=self.args.sourceContainer)
self.datamatrix = os.path.basename(self.args.datamatrix_file).split(".")[0].split("_") self.datamatrix = os.path.basename(self.args.datamatrix_file).split(".")[0].split("_")
...@@ -618,6 +792,9 @@ class F500: ...@@ -618,6 +792,9 @@ class F500:
def processPointclouds(self): def processPointclouds(self):
"""
Processes point cloud files and generates derived data.
"""
from PointCloud import PointCloud from PointCloud import PointCloud
self.logger.info("Reading project ISA {}".format(self.args.json)) self.logger.info("Reading project ISA {}".format(self.args.json))
...@@ -691,6 +868,9 @@ class F500: ...@@ -691,6 +868,9 @@ class F500:
def combineHistograms(self): def combineHistograms(self):
"""
Combines histogram data from multiple assays into a single file.
"""
self.logger.info("Reading project ISA {}".format(self.args.json)) self.logger.info("Reading project ISA {}".format(self.args.json))
self.logger.info("Creating combined histogram of {}".format(self.args.histogram)) self.logger.info("Creating combined histogram of {}".format(self.args.histogram))
self.investigation = isajson.load(open(self.args.json, "r")) self.investigation = isajson.load(open(self.args.json, "r"))
...@@ -728,10 +908,11 @@ class F500: ...@@ -728,10 +908,11 @@ class F500:
self.logger.warning("Could not combine data for {}, exception: {}".format(hLabel, e)) self.logger.warning("Could not combine data for {}, exception: {}".format(hLabel, e))
def upload(self): def upload(self):
"""
Uploads the processed data to a specified platform.
"""
self.logger.info("Reading project ISA {}".format(self.args.json)) self.logger.info("Reading project ISA {}".format(self.args.json))
self.logger.info("Uploading data to {}".format(self.args.URL)) self.logger.info("Uploading data to {}".format(self.args.URL))
self.investigation = isajson.load(open(self.args.json, "r")) self.investigation = isajson.load(open(self.args.json, "r"))
fairdom = Fairdom(self.investigation, self.args, self.logger) fairdom = Fairdom(self.investigation, self.args, self.logger)
fairdom.upload() fairdom.upload()
\ No newline at end of file
from azure.storage.blob import BlobServiceClient from azure.storage.blob import BlobServiceClient
from F500 import F500 from F500 import F500
import os
import json
import pandas
import shutil
"""
This script provides functionality to interact with Azure Blob Storage for managing
and processing data related to plant imaging experiments. It extends the F500 class
to include methods for initializing Azure connections, transferring data, and handling
experiment metadata.
"""
class F500Azure(F500):
"""
A class to manage Azure Blob Storage interactions for plant imaging experiments.
This class extends the F500 class and provides additional methods to initialize
Azure connections, transfer data between source and target containers, and handle
experiment metadata.
"""
class F500Azure (F500):
def __init__(self, experimentID): def __init__(self, experimentID):
"""
Initialize the F500Azure class with a specific experiment ID.
Args:
experimentID (str): The unique identifier for the experiment.
"""
super().__init__() super().__init__()
self.experimentID = experimentID self.experimentID = experimentID
def initAzure(self, environment, metadata, logger): def initAzure(self, environment, metadata, logger):
"""
Initialize Azure-related settings and metadata for the experiment.
Args:
environment (dict): A dictionary containing environment-specific settings.
metadata (dict): Metadata related to the experiment.
logger (Logger): Logger instance for logging information.
Side Effects:
Sets various attributes related to the experiment and Azure configuration.
"""
self.logger = logger self.logger = logger
self.args.technologyType ="Imaging" self.args.technologyType = "Imaging"
self.args.technologyPlatform = "PlantEye" self.args.technologyPlatform = "PlantEye"
self.args.sampleType ="Pot" self.args.sampleType = "Pot"
self.args.sampleTypeContainer ="Plot" self.args.sampleTypeContainer = "Plot"
self.args.source = "Plant" self.args.source = "Plant"
self.args.sourceContainer ="Plot" self.args.sourceContainer = "Plot"
self.args.copyPointcloud ="True" self.args.copyPointcloud = "True"
self.args.investigationPath = str(self.experimentID) self.args.investigationPath = str(self.experimentID)
self.args.datamatrix_file = environment["datamatrix_file"] self.args.datamatrix_file = environment["datamatrix_file"]
self.args.json = environment["json"] self.args.json = environment["json"]
self.args.organism = environment["organism"] self.args.organism = environment["organism"]
self.args.taxon = environment["taxon"] self.args.taxon = environment["taxon"]
self.args.start = enviroment["start"] self.args.start = environment["start"]
self.metadata = metadata self.metadata = metadata
def connectToSource(self, sourceConnectionString, sourceContainerName, sourceBlobName): def connectToSource(self, sourceConnectionString, sourceContainerName, sourceBlobName):
"""
Connect to the source Azure Blob Storage container.
Args:
sourceConnectionString (str): Connection string for the source Azure Blob Storage.
sourceContainerName (str): Name of the source container.
sourceBlobName (str): Name of the source blob.
Side Effects:
Initializes the source blob service and container clients.
"""
self.sourceConnectionString = sourceConnectionString self.sourceConnectionString = sourceConnectionString
self.sourceContainerName = sourceContainerName self.sourceContainerName = sourceContainerName
self.sourceBlobName = sourceBlobName self.sourceBlobName = sourceBlobName
self.sourceBlobServiceClient = BlobServiceClient.from_connection_string(sourceConnectionString) self.sourceBlobServiceClient = BlobServiceClient.from_connection_string(sourceConnectionString)
self.sourceContainerClient = self.sourceBlobServiceClient.get_container_client(sourceContainerName) self.sourceContainerClient = self.sourceBlobServiceClient.get_container_client(sourceContainerName)
def connectToTarget(self, targetConnectionString, targetContainerName, targetBlobName): def connectToTarget(self, targetConnectionString, targetContainerName, targetBlobName):
"""
Connect to the target Azure Blob Storage container.
Args:
targetConnectionString (str): Connection string for the target Azure Blob Storage.
targetContainerName (str): Name of the target container.
targetBlobName (str): Name of the target blob.
Side Effects:
Initializes the target blob service and container clients.
"""
self.targetConnectionString = targetConnectionString self.targetConnectionString = targetConnectionString
self.targetContainerName = targetContainerName self.targetContainerName = targetContainerName
self.targetBlobName = targetBlobName self.targetBlobName = targetBlobName
...@@ -43,18 +97,41 @@ class F500Azure (F500): ...@@ -43,18 +97,41 @@ class F500Azure (F500):
self.targetContainerClient = self.targetBlobServiceClient.get_container_client(targetContainerName) self.targetContainerClient = self.targetBlobServiceClient.get_container_client(targetContainerName)
def writeISAJSON(self): def writeISAJSON(self):
"""
Write the investigation data to a JSON file.
Side Effects:
Creates a JSON file with the investigation data.
"""
jsonOutput = open(self.args.json, "w") jsonOutput = open(self.args.json, "w")
jsonOutput.write(json.dumps(self.investigation, cls=ISAJSONEncoder, sort_keys=True, indent=4, separators=(',', ': '))) jsonOutput.write(json.dumps(self.investigation, cls=ISAJSONEncoder, sort_keys=True, indent=4, separators=(',', ': ')))
jsonOutput.close() jsonOutput.close()
def measurementsToFile(self): def measurementsToFile(self):
path = "/".join([self.investigationPath, self.investigation.title, self.investigation.studies[0].title]) """
Write the measurements data to a CSV file.
Side Effects:
Creates directories and a CSV file with the measurements data.
"""
path = "/".join([self.investigationPath, self.investigation.title, self.investigation.studies[0].title])
filename = "derived/" + self.investigation.studies[0].title + ".csv" filename = "derived/" + self.investigation.studies[0].title + ".csv"
os.makedirs(path + "/derived", exist_ok=True) os.makedirs(path + "/derived", exist_ok=True)
self.investigation.measurements.to_csv(path + "/" + filename, sep=";") self.investigation.measurements.to_csv(path + "/" + filename, sep=";")
@staticmethod @staticmethod
def rawMeasurementsToFile(path, filename, measurements): def rawMeasurementsToFile(path, filename, measurements):
"""
Write raw measurements data to a CSV file.
Args:
path (str): The directory path where the file will be saved.
filename (str): The name of the file.
measurements (dict): The measurements data to be written.
Side Effects:
Creates directories and a CSV file with the raw measurements data.
"""
os.makedirs(path + "/derived", exist_ok=True) os.makedirs(path + "/derived", exist_ok=True)
df = pandas.DataFrame(measurements) df = pandas.DataFrame(measurements)
df = df.transpose() df = df.transpose()
...@@ -62,24 +139,46 @@ class F500Azure (F500): ...@@ -62,24 +139,46 @@ class F500Azure (F500):
@staticmethod @staticmethod
def copyPointcloudFile(row, f500, fullPath): def copyPointcloudFile(row, f500, fullPath):
if f500.args.copyPointcloud == "True": """
Copy pointcloud files to a specified directory.
Args:
row (dict): A dictionary containing pointcloud file names.
f500 (F500): An instance of the F500 class.
fullPath (str): The destination directory path.
Side Effects:
Copies pointcloud files to the specified directory.
Exceptions:
Raises an exception if file copying fails.
"""
if f500.args.copyPointcloud == "True":
AB = f500.root.split("/")[-1] AB = f500.root.split("/")[-1]
pointcloudPath = "/".join(f500.root.split("/")[:-3]) + "/current/" + AB +'/I/' pointcloudPath = "/".join(f500.root.split("/")[:-3]) + "/current/" + AB + '/I/'
f500.logger.info("Copying pointclouds from {}{} to {}".format(pointcloudPath, [row["pointcloud_full"],row["pointcloud_mr"],row["pointcloud_sl"],row["pointcloud_mg"]], fullPath)) f500.logger.info("Copying pointclouds from {}{} to {}".format(pointcloudPath, [row["pointcloud_full"], row["pointcloud_mr"], row["pointcloud_sl"], row["pointcloud_mg"]], fullPath))
try: try:
os.makedirs(fullPath, exist_ok=True) os.makedirs(fullPath, exist_ok=True)
shutil.copy(pointcloudPath+row["pointcloud_full"], fullPath) shutil.copy(pointcloudPath + row["pointcloud_full"], fullPath)
shutil.copy(pointcloudPath+row["pointcloud_mr"], fullPath) shutil.copy(pointcloudPath + row["pointcloud_mr"], fullPath)
shutil.copy(pointcloudPath+row["pointcloud_sl"], fullPath) shutil.copy(pointcloudPath + row["pointcloud_sl"], fullPath)
shutil.copy(pointcloudPath+row["pointcloud_mg"], fullPath) shutil.copy(pointcloudPath + row["pointcloud_mg"], fullPath)
except Exception as e: except Exception as e:
f500.logger.warn("Exception in copying files:\n{}".format(e)) f500.logger.warn("Exception in copying files:\n{}".format(e))
#if f500.args.loglevel == "DEBUG": # if f500.args.loglevel == "DEBUG":
# raise e # raise e
else: else:
f500.logger.info("Skipping copy (defined in command line)") f500.logger.info("Skipping copy (defined in command line)")
@staticmethod
def getDirectoryListing(rootFolder): def getDirectoryListing(rootFolder):
return os.walk(rootFolder) """
Get a directory listing for the specified root folder.
Args:
rootFolder (str): The root folder path.
Returns:
generator: A generator yielding directory paths, directory names, and file names.
"""
return os.walk(rootFolder)
\ No newline at end of file
"""
This script is designed to interact with the FAIRDOM platform to create and manage investigations, studies, assays, samples, and data files.
It uses the ISA-Tools library to handle ISA-JSON data structures and the requests library to communicate with the FAIRDOM API.
The script is intended to facilitate the upload of structured experimental data to the FAIRDOM repository.
Classes:
Fairdom: Handles the creation and management of investigations, studies, assays, samples, and data files in FAIRDOM.
Functions:
__init__: Initializes the Fairdom class with investigation data, arguments, and a logger.
createInvestigationJSON: Creates a JSON structure for an investigation.
createStudyJSON: Creates a JSON structure for a study.
createAssayJSON: Creates a JSON structure for an assay.
createDataFileJSON: Creates a JSON structure for a data file.
addSampleToAssayJSON: Adds a sample to an assay JSON structure.
addDataFileToAssayJSON: Adds a data file to an assay JSON structure.
addDataFilesToSampleJSON: Adds data files from an assay to a sample JSON structure.
createSampleJSON: Creates a JSON structure for a sample.
upload: Uploads the investigation, studies, assays, samples, and data files to FAIRDOM.
Note: The script assumes that the user has a valid token for authentication with the FAIRDOM API.
"""
from isatools.isajson import ISAJSONEncoder from isatools.isajson import ISAJSONEncoder
import isatools import isatools
from isatools.model import * from isatools.model import *
...@@ -10,22 +33,51 @@ import time ...@@ -10,22 +33,51 @@ import time
class Fairdom: class Fairdom:
"""
A class to manage the creation and upload of investigations, studies, assays, samples, and data files to the FAIRDOM platform.
Attributes:
investigation: An ISA-Tools investigation object containing the data to be uploaded.
args: Command-line arguments or configuration settings for the upload process.
logger: A logging object to record the process of uploading data.
session: A requests session object configured with headers for authentication with the FAIRDOM API.
"""
def __init__(self, investigation, args, logger): def __init__(self, investigation, args, logger):
"""
Initializes the Fairdom class with the given investigation, arguments, and logger.
Args:
investigation: An ISA-Tools investigation object.
args: An object containing command-line arguments or configuration settings.
logger: A logging object for recording the upload process.
Side Effects:
Updates the session headers with authentication information.
"""
self.investigation = investigation self.investigation = investigation
self.args = args self.args = args
self.args.project = int(self.args.project) self.args.project = int(self.args.project)
self.args.organism = int(self.args.organism) self.args.organism = int(self.args.organism)
self.logger = logger self.logger = logger
headers = {"Content-type": "application/vnd.api+json", headers = {
"Accept": "application/vnd.api+json", "Content-type": "application/vnd.api+json",
"Accept-Charset": "ISO-8859-1", "Accept": "application/vnd.api+json",
"Authorization": "Token {}".format(self.args.token)} "Accept-Charset": "ISO-8859-1",
"Authorization": "Token {}".format(self.args.token)
}
self.session = requests.Session() self.session = requests.Session()
self.session.headers.update(headers) self.session.headers.update(headers)
def createInvestigationJSON(self): def createInvestigationJSON(self):
"""
Creates a JSON structure for an investigation.
Returns:
A dictionary representing the JSON structure of the investigation.
"""
investigationJSON = {} investigationJSON = {}
investigationJSON['data'] = {} investigationJSON['data'] = {}
investigationJSON['data']['type'] = 'investigations' investigationJSON['data']['type'] = 'investigations'
...@@ -34,88 +86,156 @@ class Fairdom: ...@@ -34,88 +86,156 @@ class Fairdom:
investigationJSON['data']['attributes']['description'] = "PlantEye data from NPEC" investigationJSON['data']['attributes']['description'] = "PlantEye data from NPEC"
investigationJSON['data']['relationships'] = {} investigationJSON['data']['relationships'] = {}
investigationJSON['data']['relationships']['projects'] = {} investigationJSON['data']['relationships']['projects'] = {}
investigationJSON['data']['relationships']['projects']['data'] = [{'id' : str(self.args.project) , 'type' : 'projects'}] investigationJSON['data']['relationships']['projects']['data'] = [{'id': str(self.args.project), 'type': 'projects'}]
return investigationJSON return investigationJSON
def createStudyJSON(self, study, investigationID): def createStudyJSON(self, study, investigationID):
"""
Creates a JSON structure for a study.
Args:
study: An ISA-Tools study object.
investigationID: The ID of the investigation to which the study belongs.
Returns:
A dictionary representing the JSON structure of the study.
"""
studyJSON = {} studyJSON = {}
studyJSON['data'] = {} studyJSON['data'] = {}
studyJSON['data']['type'] = 'studies' studyJSON['data']['type'] = 'studies'
studyJSON['data']['attributes'] = {} studyJSON['data']['attributes'] = {}
studyJSON['data']['attributes']['title'] = study.name studyJSON['data']['attributes']['title'] = study.name
studyJSON['data']['attributes']['description'] = "F500 pot data" studyJSON['data']['attributes']['description'] = "F500 pot data"
#studyJSON['data']['attributes']['policy'] = {'access':'view', 'permissions': [{'resource': {'id': '1','type': 'people'},'access': 'manage'}]}
studyJSON['data']['relationships'] = {} studyJSON['data']['relationships'] = {}
studyJSON['data']['relationships']['investigation'] = {} studyJSON['data']['relationships']['investigation'] = {}
studyJSON['data']['relationships']['investigation']['data'] = {'id' : str(investigationID), 'type' : 'investigations'} studyJSON['data']['relationships']['investigation']['data'] = {'id': str(investigationID), 'type': 'investigations'}
return studyJSON return studyJSON
def createAssayJSON(self, assay, studyID): def createAssayJSON(self, assay, studyID):
"""
Creates a JSON structure for an assay.
Args:
assay: An ISA-Tools assay object.
studyID: The ID of the study to which the assay belongs.
Returns:
A dictionary representing the JSON structure of the assay.
"""
assayJSON = {} assayJSON = {}
assayJSON['data'] = {} assayJSON['data'] = {}
assayJSON['data']['type'] = 'assays' assayJSON['data']['type'] = 'assays'
assayJSON['data']['attributes'] = {} assayJSON['data']['attributes'] = {}
assayJSON['data']['attributes']['title'] = assay.filename assayJSON['data']['attributes']['title'] = assay.filename
assayJSON['data']['attributes']['description'] = 'NPEC F500 measurement assay' assayJSON['data']['attributes']['description'] = 'NPEC F500 measurement assay'
assayJSON['data']['attributes']['assay_class'] = {'key' : 'EXP'} assayJSON['data']['attributes']['assay_class'] = {'key': 'EXP'}
assayJSON['data']['attributes']['assay_type'] = {'uri' : "http://jermontology.org/ontology/JERMOntology#Metabolomics"} assayJSON['data']['attributes']['assay_type'] = {'uri': "http://jermontology.org/ontology/JERMOntology#Metabolomics"}
#assayJSON['data']['attributes']['technology_type'] = {'uri' : "http://jermontology.org/ontology/JERMOntology#PlantEye"}
assayJSON['data']['relationships'] = {} assayJSON['data']['relationships'] = {}
assayJSON['data']['relationships']['study'] = {} assayJSON['data']['relationships']['study'] = {}
assayJSON['data']['relationships']['study']['data'] = {'id' : str(studyID), 'type' : 'studies'} assayJSON['data']['relationships']['study']['data'] = {'id': str(studyID), 'type': 'studies'}
assayJSON['data']['relationships']['organisms'] = {} assayJSON['data']['relationships']['organisms'] = {}
assayJSON['data']['relationships']['organisms']['data'] = [{'id' : str(self.args.organism), 'type' : 'organisms'}] assayJSON['data']['relationships']['organisms']['data'] = [{'id': str(self.args.organism), 'type': 'organisms'}]
return assayJSON return assayJSON
def createDataFileJSON(self, data_file): def createDataFileJSON(self, data_file):
"""
Creates a JSON structure for a data file.
Args:
data_file: An object representing a data file.
Returns:
A dictionary representing the JSON structure of the data file.
"""
data_fileJSON = {} data_fileJSON = {}
data_fileJSON['data'] = {} data_fileJSON['data'] = {}
data_fileJSON['data']['type'] = 'data_files' data_fileJSON['data']['type'] = 'data_files'
data_fileJSON['data']['attributes'] = {} data_fileJSON['data']['attributes'] = {}
data_fileJSON['data']['attributes']['title'] = data_file.filename data_fileJSON['data']['attributes']['title'] = data_file.filename
data_fileJSON['data']['attributes']['content_blobs'] = [{'url': 'https://www.wur.nl/upload/854757ab-168f-46d7-b415-f8b501eebaa5_WUR_RGB_standard_2021-site.svg', data_fileJSON['data']['attributes']['content_blobs'] = [{
'original_filename': data_file.filename, 'url': 'https://www.wur.nl/upload/854757ab-168f-46d7-b415-f8b501eebaa5_WUR_RGB_standard_2021-site.svg',
'content-type': 'image/svg+xml'}] 'original_filename': data_file.filename,
'content-type': 'image/svg+xml'
}]
data_fileJSON['data']['relationships'] = {} data_fileJSON['data']['relationships'] = {}
data_fileJSON['data']['relationships']['projects'] = {} data_fileJSON['data']['relationships']['projects'] = {}
data_fileJSON['data']['relationships']['projects']['data'] = [{'id' : str(self.args.project) , 'type' : 'projects'}] data_fileJSON['data']['relationships']['projects']['data'] = [{'id': str(self.args.project), 'type': 'projects'}]
return data_fileJSON return data_fileJSON
def addSampleToAssayJSON(self, sampleID, assayJSON): def addSampleToAssayJSON(self, sampleID, assayJSON):
"""
Adds a sample to an assay JSON structure.
Args:
sampleID: The ID of the sample to be added.
assayJSON: The JSON structure of the assay to which the sample will be added.
"""
if 'samples' not in assayJSON['data']['relationships']: if 'samples' not in assayJSON['data']['relationships']:
assayJSON['data']['relationships']['samples'] = {} assayJSON['data']['relationships']['samples'] = {}
assayJSON['data']['relationships']['samples']['data'] = [] assayJSON['data']['relationships']['samples']['data'] = []
assayJSON['data']['relationships']['samples']['data'].append({'id': str(sampleID), 'type': 'samples'}) assayJSON['data']['relationships']['samples']['data'].append({'id': str(sampleID), 'type': 'samples'})
def addDataFileToAssayJSON(self, data_fileID, assayJSON): def addDataFileToAssayJSON(self, data_fileID, assayJSON):
"""
Adds a data file to an assay JSON structure.
Args:
data_fileID: The ID of the data file to be added.
assayJSON: The JSON structure of the assay to which the data file will be added.
"""
if 'data_files' not in assayJSON['data']['relationships']: if 'data_files' not in assayJSON['data']['relationships']:
assayJSON['data']['relationships']['data_files'] = {} assayJSON['data']['relationships']['data_files'] = {}
assayJSON['data']['relationships']['data_files']['data'] = [] assayJSON['data']['relationships']['data_files']['data'] = []
assayJSON['data']['relationships']['data_files']['data'].append({'id': str(data_fileID), 'type': 'data_files'}) assayJSON['data']['relationships']['data_files']['data'].append({'id': str(data_fileID), 'type': 'data_files'})
def addDataFilesToSampleJSON(self, assayJSON, sampleJSON): def addDataFilesToSampleJSON(self, assayJSON, sampleJSON):
"""
Adds data files from an assay to a sample JSON structure.
Args:
assayJSON: The JSON structure of the assay containing the data files.
sampleJSON: The JSON structure of the sample to which the data files will be added.
"""
if 'data_files' not in sampleJSON['data']['relationships']: if 'data_files' not in sampleJSON['data']['relationships']:
sampleJSON['data']['relationships']['data_files'] = {} sampleJSON['data']['relationships']['data_files'] = {}
sampleJSON['data']['relationships']['data_files']['data'] = [] sampleJSON['data']['relationships']['data_files']['data'] = []
if 'data_files' in assayJSON['data']['relationships']: if 'data_files' in assayJSON['data']['relationships']:
sampleJSON['data']['relationships']['data_files']['data'].extend(assayJSON['data']['relationships']['data_files']['data']) sampleJSON['data']['relationships']['data_files']['data'].extend(assayJSON['data']['relationships']['data_files']['data'])
def createSampleJSON(self, sample): def createSampleJSON(self, sample):
"""
Creates a JSON structure for a sample.
Args:
sample: An ISA-Tools sample object.
Returns:
A dictionary representing the JSON structure of the sample.
"""
sampleJSON = {} sampleJSON = {}
sampleJSON['data'] = {} sampleJSON['data'] = {}
sampleJSON['data']['type'] = 'samples' sampleJSON['data']['type'] = 'samples'
sampleJSON['data']['attributes'] = {} sampleJSON['data']['attributes'] = {}
sampleJSON['data']['attributes']['title'] = sample.name sampleJSON['data']['attributes']['title'] = sample.name
sampleJSON['data']['attributes']['attribute_map'] = {'PotID' : sample.name} sampleJSON['data']['attributes']['attribute_map'] = {'PotID': sample.name}
sampleJSON['data']['relationships'] = {} sampleJSON['data']['relationships'] = {}
sampleJSON['data']['relationships']['projects'] = {} sampleJSON['data']['relationships']['projects'] = {}
sampleJSON['data']['relationships']['projects']['data'] = [{'id' : str(self.args.project), 'type' : 'projects'}] sampleJSON['data']['relationships']['projects']['data'] = [{'id': str(self.args.project), 'type': 'projects'}]
sampleJSON['data']['relationships']['sample_type'] = {} sampleJSON['data']['relationships']['sample_type'] = {}
sampleJSON['data']['relationships']['sample_type']['data'] = {'id' : str(self.args.sample_type), 'type' : 'sample_types'} sampleJSON['data']['relationships']['sample_type']['data'] = {'id': str(self.args.sample_type), 'type': 'sample_types'}
return sampleJSON return sampleJSON
def upload(self): def upload(self):
"""
Uploads the investigation, studies, assays, samples, and data files to the FAIRDOM platform.
Side Effects:
Communicates with the FAIRDOM API to create and upload data structures.
Logs the process and any errors encountered.
Raises:
SystemExit: If an error occurs during the upload process that prevents continuation.
"""
# create investigation # create investigation
investigationJSON = self.createInvestigationJSON() investigationJSON = self.createInvestigationJSON()
self.logger.info("Creating investigation in FAIRDOM at {}".format(self.args.URL)) self.logger.info("Creating investigation in FAIRDOM at {}".format(self.args.URL))
...@@ -123,7 +243,7 @@ class Fairdom: ...@@ -123,7 +243,7 @@ class Fairdom:
if r.status_code == 201 or r.status_code == 200: if r.status_code == 201 or r.status_code == 200:
investigationID = r.json()['data']['id'] investigationID = r.json()['data']['id']
self.logger.info("Investigation id {} created. Status: {}".format(investigationID, r.status_code)) self.logger.info("Investigation id {} created. Status: {}".format(investigationID, r.status_code))
else: else:
self.logger.error("Could not create new investigation, error code {}".format(r.status_code)) self.logger.error("Could not create new investigation, error code {}".format(r.status_code))
exit(1) exit(1)
...@@ -147,7 +267,7 @@ class Fairdom: ...@@ -147,7 +267,7 @@ class Fairdom:
studyID = r.json()['data']['id'] studyID = r.json()['data']['id']
self.currentStudies[sample.name]["id"] = studyID self.currentStudies[sample.name]["id"] = studyID
self.logger.info("Study id {} with ({}) created. Status: {}".format(studyID, sample.name, r.status_code)) self.logger.info("Study id {} with ({}) created. Status: {}".format(studyID, sample.name, r.status_code))
else: else:
self.logger.error("Could not create new study, error code {}".format(r.status_code)) self.logger.error("Could not create new study, error code {}".format(r.status_code))
exit(1) exit(1)
...@@ -155,13 +275,13 @@ class Fairdom: ...@@ -155,13 +275,13 @@ class Fairdom:
assayJSON = self.createAssayJSON(assay, studyJSON['id']) assayJSON = self.createAssayJSON(assay, studyJSON['id'])
# create add data files # create add data files
for data_file in assay.data_files: for data_file in assay.data_files:
if "derived" in data_file.filename or ".ply.gz" in data_file.filename or "ndvi" in data_file.filename: # for now, only upload phenotypic data if "derived" in data_file.filename or ".ply.gz" in data_file.filename or "ndvi" in data_file.filename: # for now, only upload phenotypic data
data_fileJSON = self.createDataFileJSON(data_file) data_fileJSON = self.createDataFileJSON(data_file)
r = self.session.post(self.args.URL + '/data_files', json=data_fileJSON) r = self.session.post(self.args.URL + '/data_files', json=data_fileJSON)
if r.status_code == 201 or r.status_code == 200: if r.status_code == 201 or r.status_code == 200:
data_fileID = r.json()['data']['id'] data_fileID = r.json()['data']['id']
self.logger.info("Data file id {} created ({}). Status: {}".format(data_fileID, data_file.filename, r.status_code)) self.logger.info("Data file id {} created ({}). Status: {}".format(data_fileID, data_file.filename, r.status_code))
else: else:
self.logger.error("Could not create new data file, error code {}".format(r.status_code)) self.logger.error("Could not create new data file, error code {}".format(r.status_code))
exit(1) exit(1)
data_fileJSON['id'] = data_fileID data_fileJSON['id'] = data_fileID
...@@ -174,20 +294,20 @@ class Fairdom: ...@@ -174,20 +294,20 @@ class Fairdom:
sampleID = r.json()['data']['id'] sampleID = r.json()['data']['id']
self.samples[sample.name]['id'] = sampleID self.samples[sample.name]['id'] = sampleID
self.logger.info("Sample id {} created ({}). Status: {}".format(sampleID, sample.name, r.status_code)) self.logger.info("Sample id {} created ({}). Status: {}".format(sampleID, sample.name, r.status_code))
else: else:
self.logger.error("Could not create new sample, error code {}".format(r.status_code)) self.logger.error("Could not create new sample, error code {}".format(r.status_code))
if r.status_code == 422: if r.status_code == 422:
self.logger.info(self.logger.info(self.samples[sample.name])) self.logger.info(self.logger.info(self.samples[sample.name]))
self.logger.info(r.json()) self.logger.info(r.json())
exit(1) exit(1)
sampleID = self.samples[sample.name]['id'] sampleID = self.samples[sample.name]['id']
self.addSampleToAssayJSON(sampleID, assayJSON ) self.addSampleToAssayJSON(sampleID, assayJSON)
sampleJSON = self.samples[sample.name] sampleJSON = self.samples[sample.name]
r = self.session.post(self.args.URL + '/assays', json=assayJSON) r = self.session.post(self.args.URL + '/assays', json=assayJSON)
if r.status_code == 201 or r.status_code == 200: if r.status_code == 201 or r.status_code == 200:
assayID = r.json()['data']['id'] assayID = r.json()['data']['id']
self.logger.info("Assay id {} created. Status: {}".format(assayID, r.status_code)) self.logger.info("Assay id {} created. Status: {}".format(assayID, r.status_code))
else: else:
self.logger.error("Could not create new assay, error code {}".format(r.status_code)) self.logger.error("Could not create new assay, error code {}".format(r.status_code))
if r.status_code == 422: if r.status_code == 422:
self.logger.info(self.logger.info(assayJSON)) self.logger.info(self.logger.info(assayJSON))
...@@ -198,8 +318,4 @@ class Fairdom: ...@@ -198,8 +318,4 @@ class Fairdom:
r = self.session.post(self.args.URL + '/assays', json=assayJSON) r = self.session.post(self.args.URL + '/assays', json=assayJSON)
r.raise_for_status() r.raise_for_status()
else: else:
exit(1) exit(1)
\ No newline at end of file
\ No newline at end of file
...@@ -2,52 +2,114 @@ import open3d as o3d ...@@ -2,52 +2,114 @@ import open3d as o3d
import numpy import numpy
import os import os
"""
This script provides a class for handling point cloud data using the Open3D library.
It includes functionalities for reading point cloud data from a file, calculating various
spectral indices, trimming the point cloud based on z-values, and rendering images of the
point cloud with or without color rescaling.
"""
class PointCloud: class PointCloud:
"""
A class to represent and manipulate a point cloud using Open3D.
Attributes:
----------
pcd : open3d.geometry.PointCloud
The point cloud data.
trimmed : bool
A flag indicating whether the point cloud has been trimmed.
"""
pcd = None pcd = None
trimmed = False trimmed = False
def __init__(self, filename): def __init__(self, filename):
"""
Initializes the PointCloud object by reading point cloud data from a file.
Parameters:
----------
filename : str
The path to the point cloud file in PLY format.
"""
self.pcd = o3d.io.read_point_cloud(filename, format="ply") self.pcd = o3d.io.read_point_cloud(filename, format="ply")
self.trimmed = False self.trimmed = False
def writeHistogram(self, data, filename, timepoint, sampleName, bins, dataRange=None): def writeHistogram(self, data, filename, timepoint, sampleName, bins, dataRange=None):
"""
Writes a histogram of the given data to a file.
Parameters:
----------
data : numpy.ndarray
The data for which the histogram is to be calculated.
filename : str
The path to the file where the histogram will be written.
timepoint : str
The timepoint associated with the data.
sampleName : str
The name of the sample.
bins : int
The number of bins for the histogram.
dataRange : tuple, optional
The lower and upper range of the bins. If not provided, range is (data.min(), data.max()).
Side Effects:
------------
Writes the histogram data to the specified file.
"""
data = data[numpy.isfinite(data)] data = data[numpy.isfinite(data)]
hist, bin_edges = numpy.histogram(data, bins=bins, range=dataRange) hist, bin_edges = numpy.histogram(data, bins=bins, range=dataRange)
f = open(filename, "w") with open(filename, "w") as f:
f.write("timepoint;sample;{}\n".format(";".join(["bin" + str(x) for x in range(0, len(bin_edges))]))) f.write("timepoint;sample;{}\n".format(";".join(["bin" + str(x) for x in range(0, len(bin_edges))])))
f.write("{};{};{}\n".format(timepoint, "edges", ";".join([str(x) for x in bin_edges]))) f.write("{};{};{}\n".format(timepoint, "edges", ";".join([str(x) for x in bin_edges])))
f.write("{};{};{}\n".format(timepoint, sampleName, ";".join([str(x) for x in hist]))) f.write("{};{};{}\n".format(timepoint, sampleName, ";".join([str(x) for x in hist])))
f.close()
def getWavelengths(self): def getWavelengths(self):
"""
Retrieves the wavelengths from the point cloud.
Returns:
-------
numpy.ndarray
The wavelengths as a numpy array. If the point cloud is trimmed, returns a vertically stacked array.
"""
if self.trimmed: if self.trimmed:
#print("trimmed data")
return numpy.vstack(self.pcd.wavelengths) return numpy.vstack(self.pcd.wavelengths)
else: else:
#print("original data")
return numpy.asarray(self.pcd.wavelengths) return numpy.asarray(self.pcd.wavelengths)
def get_psri(self): def get_psri(self):
#(RED − GREEN)/(NIR) """
numpy.seterr(divide='ignore',invalid='ignore') Calculates the Plant Senescence Reflectance Index (PSRI).
wavelengths=self.getWavelengths()
red =wavelengths[:,0] Returns:
#red = (red - min(red))/(max(red)-min(red)) * 70 + 630 -------
green =wavelengths[:,1] numpy.ndarray
#green = (green - min(green))/(max(green)-min(green)) * 80 + 500 The PSRI values calculated as (RED - GREEN) / NIR.
nir =wavelengths[:,3] """
#nir = (nir - min(nir))/(max(nir)-min(nir)) * 300 + 700 numpy.seterr(divide='ignore', invalid='ignore')
return ((red-green)/nir) wavelengths = self.getWavelengths()
red = wavelengths[:, 0]
green = wavelengths[:, 1]
nir = wavelengths[:, 3]
return ((red - green) / nir)
def get_hue(self): def get_hue(self):
numpy.seterr(divide='ignore',invalid='ignore') """
wavelengths=self.getWavelengths() Calculates the hue from the RGB wavelengths.
red =wavelengths[:,0]
green =wavelengths[:,1] Returns:
blue =wavelengths[:,2] -------
numpy.ndarray
The hue values calculated from the RGB wavelengths.
"""
numpy.seterr(divide='ignore', invalid='ignore')
wavelengths = self.getWavelengths()
red = wavelengths[:, 0]
green = wavelengths[:, 1]
blue = wavelengths[:, 2]
hue = numpy.zeros(len(red)) hue = numpy.zeros(len(red))
for c in range(len(hue)): for c in range(len(hue)):
minColor = min([red[c], green[c], blue[c]]) minColor = min([red[c], green[c], blue[c]])
...@@ -59,137 +121,167 @@ class PointCloud: ...@@ -59,137 +121,167 @@ class PointCloud:
hue[c] = 2.0 + (blue[c] - red[c]) / (maxColor - minColor) hue[c] = 2.0 + (blue[c] - red[c]) / (maxColor - minColor)
else: else:
hue[c] = 4.0 + (red[c] - green[c]) / (maxColor - minColor) hue[c] = 4.0 + (red[c] - green[c]) / (maxColor - minColor)
hue[c] = hue[c] * 60.0 hue[c] = hue[c] * 60.0
if (hue[c] < 0): if hue[c] < 0:
hue[c] = hue[c] + 360.0 hue[c] = hue[c] + 360.0
return hue; return hue
def get_greenness(self): def get_greenness(self):
numpy.seterr(divide='ignore',invalid='ignore') """
wavelengths=self.getWavelengths() Calculates the greenness index.
# (2*G-R-B)/(2*R+G+B)
#print(wavelengths) Returns:
return ((2.0*wavelengths[:,1]-wavelengths[:,0] -wavelengths[:,2]) / -------
(2.0*wavelengths[:,1]+wavelengths[:,0] +wavelengths[:,2])) numpy.ndarray
The greenness values calculated as (2*G - R - B) / (2*R + G + B).
"""
numpy.seterr(divide='ignore', invalid='ignore')
wavelengths = self.getWavelengths()
return ((2.0 * wavelengths[:, 1] - wavelengths[:, 0] - wavelengths[:, 2]) /
(2.0 * wavelengths[:, 1] + wavelengths[:, 0] + wavelengths[:, 2]))
def get_ndvi(self): def get_ndvi(self):
numpy.seterr(divide='ignore',invalid='ignore') """
wavelengths=self.getWavelengths() Calculates the Normalized Difference Vegetation Index (NDVI).
return (wavelengths[:,3]-wavelengths[:,0])/(wavelengths[:,3]+wavelengths[:,0])
Returns:
#(RED − BLUE)/(RED + BLUE) -------
numpy.ndarray
The NDVI values calculated as (NIR - RED) / (NIR + RED).
"""
numpy.seterr(divide='ignore', invalid='ignore')
wavelengths = self.getWavelengths()
return (wavelengths[:, 3] - wavelengths[:, 0]) / (wavelengths[:, 3] + wavelengths[:, 0])
def get_npci(self): def get_npci(self):
numpy.seterr(divide='ignore',invalid='ignore') """
wavelengths=self.getWavelengths() Calculates the Normalized Pigment Chlorophyll Index (NPCI).
return ((wavelengths[:,0]-wavelengths[:,2])/(wavelengths[:,0]+wavelengths[:,2]))
Returns:
-------
numpy.ndarray
The NPCI values calculated as (RED - BLUE) / (RED + BLUE).
"""
numpy.seterr(divide='ignore', invalid='ignore')
wavelengths = self.getWavelengths()
return ((wavelengths[:, 0] - wavelengths[:, 2]) / (wavelengths[:, 0] + wavelengths[:, 2]))
def setColors(self, colors): def setColors(self, colors):
"""
Sets the colors of the point cloud.
Parameters:
----------
colors : numpy.ndarray
The colors to be set for the point cloud.
"""
self.pcd.colors = o3d.utility.Vector3dVector(colors) self.pcd.colors = o3d.utility.Vector3dVector(colors)
def render_image(self, filename, image_width, image_height, rescale=True): def render_image(self, filename, image_width, image_height, rescale=True):
"""
Renders an image of the point cloud.
Parameters:
----------
filename : str
The path to the file where the image will be saved.
image_width : int
The width of the image.
image_height : int
The height of the image.
rescale : bool, optional
Whether to rescale the colors before rendering. Default is True.
"""
if rescale: if rescale:
self.render_image_rescale(filename, image_width, image_height) self.render_image_rescale(filename, image_width, image_height)
else: else:
self.render_image_no_rescale(filename, image_width, image_height) self.render_image_no_rescale(filename, image_width, image_height)
def trim(self, zIndex): def trim(self, zIndex):
"""
Trims the point cloud based on the z-values.
Parameters:
----------
zIndex : float
The z-value threshold for trimming the point cloud.
Side Effects:
------------
Modifies the point cloud to only include points with z-values greater than or equal to zIndex.
"""
if zIndex == 0: if zIndex == 0:
return return
# Convert point cloud points to numpy array
self.untrimmedPCD = self.pcd self.untrimmedPCD = self.pcd
points = numpy.asarray(self.pcd.points) points = numpy.asarray(self.pcd.points)
# Create a mask based on the z-values
mask = points[:, 2] >= zIndex mask = points[:, 2] >= zIndex
#print(mask)
# Filter the point cloud
filtered_points = points[mask] filtered_points = points[mask]
# Create a new point cloud with the filtered points
filtered_pcd = o3d.geometry.PointCloud() filtered_pcd = o3d.geometry.PointCloud()
filtered_pcd.points = o3d.utility.Vector3dVector(filtered_points) filtered_pcd.points = o3d.utility.Vector3dVector(filtered_points)
# Optionally, if you want to also filter the colors or normals, you can do the following:
if self.pcd.has_colors(): if self.pcd.has_colors():
colors = numpy.asarray(self.pcd.colors) colors = numpy.asarray(self.pcd.colors)
filtered_colors = colors[mask] filtered_colors = colors[mask]
filtered_pcd.colors = o3d.utility.Vector3dVector(filtered_colors) filtered_pcd.colors = o3d.utility.Vector3dVector(filtered_colors)
wavelengths = numpy.asarray(self.pcd.wavelengths) wavelengths = numpy.asarray(self.pcd.wavelengths)
#print(wavelengths[mask])
filtered_wavelengths = wavelengths[mask] filtered_wavelengths = wavelengths[mask]
filtered_pcd.wavelengths = filtered_wavelengths filtered_pcd.wavelengths = filtered_wavelengths
#print(filtered_pcd.wavelengths)
self.pcd = filtered_pcd self.pcd = filtered_pcd
#print(self.pcd.wavelengths)
self.trimmed = True self.trimmed = True
def render_image_no_rescale(self, filename, image_width, image_height): def render_image_no_rescale(self, filename, image_width, image_height):
''' Create open3D vizualization''' """
# Create the open3d Visualizer class where the mesh can be rendered from Renders an image of the point cloud without rescaling the colors.
Parameters:
----------
filename : str
The path to the file where the image will be saved.
image_width : int
The width of the image.
image_height : int
The height of the image.
Side Effects:
------------
Saves the rendered image to the specified file.
"""
vis = o3d.visualization.Visualizer() vis = o3d.visualization.Visualizer()
# create an invisible window with the desired dimensions of the image
vis.create_window(width=image_width, height=image_height, visible=False) vis.create_window(width=image_width, height=image_height, visible=False)
vis.add_geometry(self.pcd) vis.add_geometry(self.pcd)
vis.update_geometry(self.pcd) vis.update_geometry(self.pcd)
# Get the view control
#ctr = vis.get_view_control()
# In Open3D, the extrinsic matrix is a 4x4 matrix that defines the transformation from world to camera coordinates.
# Set the camera parameters
#camera_params = ctr.convert_to_pinhole_camera_parameters()
#print(camera_params.extrinsic)
#camera_extrinsic = numpy.copy(camera_params.extrinsic)
#camera_extrinsic[:3,3] = [6, 1600, 1600]#camera_extrinsic
#camera_extrinsic[:3,3] = [0, 0, 1600]#camera_extrinsic
#camera_extrinsic[:3,3] = [0, 1600, 1600]#camera_extrinsic
#camera_extrinsic[:3,3] = [0, 1600, 0]#camera_extrinsic
#camera_params.extrinsic = camera_extrinsic
#ctr.convert_from_pinhole_camera_parameters(camera_params, True)
#camera_params = ctr.convert_to_pinhole_camera_parameters()
#print(camera_params.extrinsic)
#ctr.set_camera(camera_pos, lookat, up_dir)
#vis.get_render_option().load_from_json(filename = os.path.dirname(__file__) + '/render_parameters.json')
#vis.update_renderer()
# render the view on the mesh as PNG file and close the (invisible) window
vis.capture_screen_image(filename, do_render=True) vis.capture_screen_image(filename, do_render=True)
vis.destroy_window() vis.destroy_window()
def render_image_rescale(self, filename, image_width, image_height): def render_image_rescale(self, filename, image_width, image_height):
"""
Renders an image of the point cloud with rescaled colors.
# Convert the colors to a numpy array Parameters:
colors =self.getWavelengths() ----------
filename : str
The path to the file where the image will be saved.
image_width : int
The width of the image.
image_height : int
The height of the image.
# Select only the first three channels (RGB) Side Effects:
colors = colors[:,:3] ------------
# Calculate the 1st and 99th percentiles across all color channels Saves the rendered image to the specified file.
"""
colors = self.getWavelengths()
colors = colors[:, :3]
p01 = numpy.percentile(colors, 1) p01 = numpy.percentile(colors, 1)
p99 = numpy.percentile(colors, 99) p99 = numpy.percentile(colors, 99)
# Perform linear stretching to map the 1st percentile value to 0 and the 99th percentile value to 255
scaled_colors = ((colors - p01) / (p99 - p01) * 255) scaled_colors = ((colors - p01) / (p99 - p01) * 255)
# Clip values below 0 and above 255
scaled_colors = numpy.clip(scaled_colors, 0, 255).astype(numpy.uint8) scaled_colors = numpy.clip(scaled_colors, 0, 255).astype(numpy.uint8)
# Convert to double precision
scaled_colors = scaled_colors.astype(numpy.float64) / 255 scaled_colors = scaled_colors.astype(numpy.float64) / 255
# Reshape to 2D array with 3 columns, if needed
if len(scaled_colors.shape) < 2: if len(scaled_colors.shape) < 2:
scaled_colors = numpy.reshape(scaled_colors, (-1, 3)) scaled_colors = numpy.reshape(scaled_colors, (-1, 3))
# Assign the scaled colors back to the point cloud
self.pcd.colors = o3d.utility.Vector3dVector(scaled_colors) self.pcd.colors = o3d.utility.Vector3dVector(scaled_colors)
self.render_image_no_rescale(filename, image_width, image_height)
self.render_image_no_rescale(filename, image_width, image_height) \ No newline at end of file
"""
This script is designed to delete various types of resources from a specified host using the Fairdom SEEK API.
It utilizes the requests library to send HTTP DELETE requests to remove data files, samples, assays, studies,
and investigations from the server. The script requires an authorization token to authenticate the requests.
Usage:
python script_name.py <token>
Where <token> is the authorization token required for accessing the API.
Note: This script performs destructive actions by deleting resources. Use with caution.
"""
import requests import requests
import sys import sys
token = sys.argv[1] def main():
"""Main function to execute the deletion of resources.
This function sets up the session with the necessary headers and iterates over predefined ranges
to delete resources from the server. It deletes data files, samples, assays, studies, and investigations.
Raises:
requests.exceptions.RequestException: If a network-related error occurs during the requests.
"""
token = sys.argv[1]
headers = {
"Content-type": "application/vnd.api+json",
"Accept": "application/vnd.api+json",
"Accept-Charset": "ISO-8859-1",
"Authorization": "Token {}".format(token)
}
session = requests.Session()
session.headers.update(headers)
r = 32000
host = "https://test.fairdom-seek.bif.containers.wurnet.nl/"
headers = {"Content-type": "application/vnd.api+json", # Delete data files
"Accept": "application/vnd.api+json", for i in range(1000, r):
"Accept-Charset": "ISO-8859-1", session.delete(host + "data_files/{}".format(i))
"Authorization": "Token {}".format(token)}
session = requests.Session() # Delete samples
session.headers.update(headers) for i in range(0, 500):
r = 32000 session.delete(host + "samples/{}".format(i))
host = "https://test.fairdom-seek.bif.containers.wurnet.nl/"
for i in range(1000,r):
session.delete(host + "data_files/{}".format(i))
for i in range(0,500):
session.delete(host + "samples/{}".format(i))
for i in range(0,1300): # Delete assays
session.delete(host + "assays/{}".format(i)) for i in range(0, 1300):
for i in range(0,50): session.delete(host + "assays/{}".format(i))
session.delete(host + "studies/{}".format(i))
for i in range(0,20): # Delete studies
session.delete(host + "investigations/{}".format(i)) for i in range(0, 50):
session.delete(host + "studies/{}".format(i))
# Delete investigations
for i in range(0, 20):
session.delete(host + "investigations/{}".format(i))
if __name__ == "__main__":
main()
""" """
SEEK FAIRDOM automatic upload This script automates the process of uploading data to the SEEK FAIRDOM platform. It reads metadata and measurement data from specified files, processes the data, and uploads it to the SEEK platform, creating the necessary structure of investigations, studies, and assays. The script is designed to work with PlantEye data from NPEC and assumes a specific data structure and naming convention.
The script requires the following command-line arguments:
1. Path to the datamatrix file.
2. Path to the investigation directory.
3. Paths to the CSV files containing measurement data.
The script uses the requests library to interact with the SEEK API and pandas for data manipulation.
""" """
import sys import sys
...@@ -13,12 +19,10 @@ import requests ...@@ -13,12 +19,10 @@ import requests
import json import json
import string import string
datamatrix_file = sys.argv[1] datamatrix_file = sys.argv[1]
investigationPath = sys.argv[2] investigationPath = sys.argv[2]
csvs = sys.argv[3:] csvs = sys.argv[3:]
base_url = 'http://localhost:3000' base_url = 'http://localhost:3000'
headers = {"Content-type": "application/vnd.api+json", headers = {"Content-type": "application/vnd.api+json",
...@@ -29,19 +33,8 @@ session = requests.Session() ...@@ -29,19 +33,8 @@ session = requests.Session()
session.headers.update(headers) session.headers.update(headers)
session.auth = ("capsicum.upload@wur.nl", "3#7B&GNC</yp2{k(") session.auth = ("capsicum.upload@wur.nl", "3#7B&GNC</yp2{k(")
"""The **Investigation**, **Study** and **Assay** will be created within **Project** 2"""
containing_project_id = 2 containing_project_id = 2
# some definitions:
# Project > Investigation > Study > Observation Unit > Sample > assay type > assay > unprocessed (raw data folder)
# 'Pilot project' -> Pepper -> Experiment 13 -> Pot 1 -> 2022-02-03 -> Imaging -> planteye -> pointcloud
# In ISA, sample is linked to an assay. Observation unit === Sample, where the named sample is a data of Sample
# Sample.name = data
# Some columns contain the wrong data, remove those:
columnsToDrop = ["ndvi_aver","ndvi_bin0","ndvi_bin1","ndvi_bin2","ndvi_bin3","ndvi_bin4","ndvi_bin5", columnsToDrop = ["ndvi_aver","ndvi_bin0","ndvi_bin1","ndvi_bin2","ndvi_bin3","ndvi_bin4","ndvi_bin5",
"greenness_aver","greenness_bin0","greenness_bin1","greenness_bin2","greenness_bin3","greenness_bin4","greenness_bin5", "greenness_aver","greenness_bin0","greenness_bin1","greenness_bin2","greenness_bin3","greenness_bin4","greenness_bin5",
"hue_aver","hue_bin0","hue_bin1","hue_bin2","hue_bin3","hue_bin4","hue_bin5", "hue_aver","hue_bin0","hue_bin1","hue_bin2","hue_bin3","hue_bin4","hue_bin5",
...@@ -51,15 +44,21 @@ columnsToDrop = ["ndvi_aver","ndvi_bin0","ndvi_bin1","ndvi_bin2","ndvi_bin3","nd ...@@ -51,15 +44,21 @@ columnsToDrop = ["ndvi_aver","ndvi_bin0","ndvi_bin1","ndvi_bin2","ndvi_bin3","nd
metadata = pandas.read_csv(datamatrix_file, sep=";") metadata = pandas.read_csv(datamatrix_file, sep=";")
def removeAfterSpaceFromDataMatrix(row): def removeAfterSpaceFromDataMatrix(row):
"""
Removes any text after a space in the 'DataMatrix' column of a row.
Args:
row (pandas.Series): A row from a DataFrame.
Returns:
pandas.Series: The modified row with updated 'DataMatrix' value.
"""
row["DataMatrix"] = row["DataMatrix"].strip().split(" ")[0] row["DataMatrix"] = row["DataMatrix"].strip().split(" ")[0]
return row return row
metadata = metadata.apply(removeAfterSpaceFromDataMatrix , axis=1) metadata = metadata.apply(removeAfterSpaceFromDataMatrix , axis=1)
# Create ISA structure
datamatrix = os.path.basename(datamatrix_file).split(".")[0].split("_") datamatrix = os.path.basename(datamatrix_file).split(".")[0].split("_")
# Create investigation
investigation = {} investigation = {}
investigation['data'] = {} investigation['data'] = {}
investigation['data']['type'] = 'investigations' investigation['data']['type'] = 'investigations'
...@@ -78,7 +77,6 @@ r = session.post(base_url + '/investigations', json=investigation) ...@@ -78,7 +77,6 @@ r = session.post(base_url + '/investigations', json=investigation)
investigation_id = r.json()['data']['id'] investigation_id = r.json()['data']['id']
r.raise_for_status() r.raise_for_status()
# Create study, title comes datamatrix file (ID...)
study = {} study = {}
study['data'] = {} study['data'] = {}
study['data']['type'] = 'studies' study['data']['type'] = 'studies'
...@@ -94,10 +92,6 @@ study['data']['relationships']['investigation']['data'] = {'id' : investigation_ ...@@ -94,10 +92,6 @@ study['data']['relationships']['investigation']['data'] = {'id' : investigation_
r = session.post(base_url + '/studies', json=study) r = session.post(base_url + '/studies', json=study)
study_id = r.json()['data']['id'] study_id = r.json()['data']['id']
# add meta-data file to investigation
#investigation.filename = datamatrix_file
#store metadata file
os.makedirs("/".join([investigationPath, investigation['data']['attributes']['title']]), exist_ok=True) os.makedirs("/".join([investigationPath, investigation['data']['attributes']['title']]), exist_ok=True)
metadata_csv = "/".join([investigationPath, investigation['data']['attributes']['title'], os.path.basename(datamatrix_file)]) metadata_csv = "/".join([investigationPath, investigation['data']['attributes']['title'], os.path.basename(datamatrix_file)])
metadata.to_csv(metadata_csv, sep="\t") metadata.to_csv(metadata_csv, sep="\t")
...@@ -123,27 +117,28 @@ r.raise_for_status() ...@@ -123,27 +117,28 @@ r.raise_for_status()
populated_data_file = r.json() populated_data_file = r.json()
"""Extract the id and URL to the newly created **data_file**"""
data_file_id = populated_data_file['data']['id'] data_file_id = populated_data_file['data']['id']
data_file_url = populated_data_file['data']['links']['self'] data_file_url = populated_data_file['data']['links']['self']
"""Extract the URL for the local data"""
blob_url = populated_data_file['data']['attributes']['content_blobs'][0]['link'] blob_url = populated_data_file['data']['attributes']['content_blobs'][0]['link']
"""Reset the local file and upload it to the URL"""
upload = session.put(blob_url, data=open(metadata_csv,"r").read(), headers={'Content-Type': 'application/octet-stream'}) upload = session.put(blob_url, data=open(metadata_csv,"r").read(), headers={'Content-Type': 'application/octet-stream'})
upload.raise_for_status() upload.raise_for_status()
# Assay is defined as a collection of files from a measurement
# These are identified by 'f0000'
checkAssayName = re.compile(r"f[0-9]+") checkAssayName = re.compile(r"f[0-9]+")
measurements = pandas.DataFrame() measurements = pandas.DataFrame()
def copyPots(row, pots): def copyPots(row, pots):
"""
Copies pot information from a pots DataFrame to a row based on matching coordinates.
Args:
row (pandas.Series): A row from a DataFrame.
pots (pandas.DataFrame): A DataFrame containing pot information.
Returns:
pandas.Series: The modified row with updated pot information.
"""
row["Pot"] = pots[ (pots["x"] == row["x"]) & (pots["y"] == row["y"]) ]["Pot"].iloc[0] row["Pot"] = pots[ (pots["x"] == row["x"]) & (pots["y"] == row["y"]) ]["Pot"].iloc[0]
if "Treatment" in pots.columns: if "Treatment" in pots.columns:
row["Treatment"] = pots[ (pots["x"] == row["x"]) & (pots["y"] == row["y"]) ]["Treatment"].iloc[0] row["Treatment"] = pots[ (pots["x"] == row["x"]) & (pots["y"] == row["y"]) ]["Treatment"].iloc[0]
...@@ -151,20 +146,55 @@ def copyPots(row, pots): ...@@ -151,20 +146,55 @@ def copyPots(row, pots):
row["Experiment"] = pots[ (pots["x"] == row["x"]) & (pots["y"] == row["y"]) ]["Experiment"].iloc[0] row["Experiment"] = pots[ (pots["x"] == row["x"]) & (pots["y"] == row["y"]) ]["Experiment"].iloc[0]
return row return row
def measurementsToFile(investigation, path, filename, measurements): def measurementsToFile(investigation, path, filename, measurements):
"""
Saves the measurements DataFrame to a CSV file.
Args:
investigation (dict): The investigation dictionary.
path (str): The directory path where the file will be saved.
filename (str): The name of the file.
measurements (pandas.DataFrame): The DataFrame containing measurements.
Side Effects:
Creates directories and writes a CSV file to the specified path.
"""
os.makedirs(path + "/derived", exist_ok=True) os.makedirs(path + "/derived", exist_ok=True)
measurements.to_csv(path + "/" + filename, sep=";") measurements.to_csv(path + "/" + filename, sep=";")
def rawMeasurementsToFile(investigation, path, filename, measurements): def rawMeasurementsToFile(investigation, path, filename, measurements):
"""
Saves the raw measurements to a CSV file.
Args:
investigation (dict): The investigation dictionary.
path (str): The directory path where the file will be saved.
filename (str): The name of the file.
measurements (pandas.DataFrame): The DataFrame containing raw measurements.
Returns:
str: The full path to the saved file.
Side Effects:
Creates directories and writes a CSV file to the specified path.
"""
os.makedirs(path + "/derived", exist_ok=True) os.makedirs(path + "/derived", exist_ok=True)
df = pandas.DataFrame(measurements) df = pandas.DataFrame(measurements)
df = df.transpose() df = df.transpose()
df.to_csv(path + "/" + filename, sep="\t") df.to_csv(path + "/" + filename, sep="\t")
return(path + "/" + filename) return(path + "/" + filename)
def addPointClouds(row, title) : def addPointClouds(row, title):
"""
Adds a point cloud filename to a row based on its coordinates and timestamp.
Args:
row (pandas.Series): A row from a DataFrame.
title (str): The title used in the filename.
Returns:
pandas.Series: The modified row with the point cloud filename added.
"""
filename = "pointcloud/{}_{}_full_sx{:03d}_sy{:03d}.ply.gz".format( filename = "pointcloud/{}_{}_full_sx{:03d}_sy{:03d}.ply.gz".format(
title, row["timestamp_file"], title, row["timestamp_file"],
row["x"], row["x"],
...@@ -173,6 +203,18 @@ def addPointClouds(row, title) : ...@@ -173,6 +203,18 @@ def addPointClouds(row, title) :
return row return row
def createAssay(row, investigation, path, study_id): def createAssay(row, investigation, path, study_id):
"""
Creates an assay and uploads the associated data file to the SEEK platform.
Args:
row (pandas.Series): A row from a DataFrame containing assay data.
investigation (dict): The investigation dictionary.
path (str): The directory path for saving files.
study_id (str): The ID of the study to which the assay belongs.
Side Effects:
Creates directories, writes files, and uploads data to the SEEK platform.
"""
data_file = {} data_file = {}
filename = "derived/" + assay.title + ".csv" filename = "derived/" + assay.title + ".csv"
...@@ -199,17 +241,11 @@ def createAssay(row, investigation, path, study_id): ...@@ -199,17 +241,11 @@ def createAssay(row, investigation, path, study_id):
populated_data_file = r.json() populated_data_file = r.json()
"""Extract the id and URL to the newly created **data_file**"""
data_file_id = populated_data_file['data']['id'] data_file_id = populated_data_file['data']['id']
data_file_url = populated_data_file['data']['links']['self'] data_file_url = populated_data_file['data']['links']['self']
"""Extract the URL for the local data"""
blob_url = populated_data_file['data']['attributes']['content_blobs'][0]['link'] blob_url = populated_data_file['data']['attributes']['content_blobs'][0]['link']
"""Reset the local file and upload it to the URL"""
upload = session.put(blob_url, data=open(fullFilename,"r").read(), headers={'Content-Type': 'application/octet-stream'}) upload = session.put(blob_url, data=open(fullFilename,"r").read(), headers={'Content-Type': 'application/octet-stream'})
upload.raise_for_status() upload.raise_for_status()
...@@ -226,16 +262,22 @@ def createAssay(row, investigation, path, study_id): ...@@ -226,16 +262,22 @@ def createAssay(row, investigation, path, study_id):
assay['data']['relationships']['study']['data'] = {'id' : study_id, 'type' : 'studies'} assay['data']['relationships']['study']['data'] = {'id' : study_id, 'type' : 'studies'}
assay['data']['relationships']['organism'] = {} assay['data']['relationships']['organism'] = {}
assay['data']['relationships']['organism']['data'] = {'id' : 1, 'type' : 'organisms'} assay['data']['relationships']['organism']['data'] = {'id' : 1, 'type' : 'organisms'}
def finalize(investigation, measurements, investigationPath, title, metadata, study_id): def finalize(investigation, measurements, investigationPath, title, metadata, study_id):
# CSV will be combined data file (with corrected pot names) and ply file names """
# Do this, if the data matrix contains pot names (otherwise it either went wrong or data is from a different project Finalizes the processing of measurements by creating assays and saving data files.
# Then list the ply files as Image File
Args:
investigation (dict): The investigation dictionary.
measurements (pandas.DataFrame): The DataFrame containing measurements.
investigationPath (str): The directory path for saving files.
title (str): The title used in filenames.
metadata (pandas.DataFrame): The DataFrame containing metadata.
study_id (str): The ID of the study to which the assays belong.
Side Effects:
Creates directories, writes files, and uploads data to the SEEK platform.
"""
if "Pot" in measurements.columns: if "Pot" in measurements.columns:
pots = measurements.dropna(axis=0, subset=["Pot"]) pots = measurements.dropna(axis=0, subset=["Pot"])
if len(pots) > 0 and "Pot" in pots.columns: if len(pots) > 0 and "Pot" in pots.columns:
...@@ -244,52 +286,35 @@ def finalize(investigation, measurements, investigationPath, title, metadata, st ...@@ -244,52 +286,35 @@ def finalize(investigation, measurements, investigationPath, title, metadata, st
measurements = measurements.drop(columnsToDrop, axis=1) measurements = measurements.drop(columnsToDrop, axis=1)
measurements = measurements.apply(copyPots , axis=1, pots=pots) measurements = measurements.apply(copyPots , axis=1, pots=pots)
measurements = measurements.apply(addPointClouds, axis=1, title=title) measurements = measurements.apply(addPointClouds, axis=1, title=title)
#now create the assays, using the timestamp_file as name
measurements.apply(createAssay, axis=1, investigation = investigation, path = investigationPath, study_id = study_id) measurements.apply(createAssay, axis=1, investigation = investigation, path = investigationPath, study_id = study_id)
investigation.measurements = pandas.concat([investigation.measurements, measurements], axis=0, ignore_index=True) investigation.measurements = pandas.concat([investigation.measurements, measurements], axis=0, ignore_index=True)
previousAssay = "" previousAssay = ""
for csv in csvs: for csv in csvs:
assayName = os.path.basename(csv).split("_")[0] assayName = os.path.basename(csv).split("_")[0]
timestamp = os.path.basename(csv).split("_")[1] timestamp = os.path.basename(csv).split("_")[1]
#print("Reading: {}".format(assayName))
if checkAssayName.match(assayName) != None: if checkAssayName.match(assayName) != None:
try: try:
currentMeasurements = pandas.read_csv(csv, sep="\t", skiprows=[1]) currentMeasurements = pandas.read_csv(csv, sep="\t", skiprows=[1])
currentMeasurements["timestamp_file"] = timestamp currentMeasurements["timestamp_file"] = timestamp
if previousAssay == assayName: if previousAssay == assayName:
# same assay
if len(measurements) == 0: if len(measurements) == 0:
measurements = currentMeasurements measurements = currentMeasurements
else: else:
measurements = pandas.concat([measurements, currentMeasurements], axis=0, ignore_index=True) measurements = pandas.concat([measurements, currentMeasurements], axis=0, ignore_index=True)
else: else:
if len(measurements) > 0: if len(measurements) > 0:
# new assay, process all
finalize(investigation, measurements, investigationPath, previousAssay, metadata, study_id) finalize(investigation, measurements, investigationPath, previousAssay, metadata, study_id)
measurements = currentMeasurements measurements = currentMeasurements
previousAssay = assayName previousAssay = assayName
except: except:
# No data?
pass pass
else: else:
#CSV file is not an assay file
pass pass
if len(measurements) > 0: if len(measurements) > 0:
finalize(investigation, measurements, investigationPath, previousAssay, metadata, study_id) finalize(investigation, measurements, investigationPath, previousAssay, metadata, study_id)
measurementsToFile(investigation, "/".join([investigationPath, investigation.title, investigation.studies[0].title]), "derived/" + investigation.studies[0].title + ".csv", investigation.measurements) measurementsToFile(investigation, "/".join([investigationPath, investigation.title, investigation.studies[0].title]), "derived/" + investigation.studies[0].title + ".csv", investigation.measurements)
#print(json.dumps(investigation, cls=ISAJSONEncoder, sort_keys=True, indent=4, separators=(',', ': '))) \ No newline at end of file
...@@ -13,30 +13,69 @@ import open3d as o3d ...@@ -13,30 +13,69 @@ import open3d as o3d
import tempfile import tempfile
import numpy import numpy
"""
This script processes ISA-JSON files and associated point cloud data files.
It extracts greenness information from point clouds and writes histograms
of the greenness values to specified output files. The script requires an
ISA-JSON file and a list of point cloud files as input.
"""
investigation = isajson.load(open(sys.argv[1], "r")) investigation = isajson.load(open(sys.argv[1], "r"))
study = investigation.studies[0] study = investigation.studies[0]
BINS = 256 BINS = 256
def writeHistogram(data, filename): def writeHistogram(data, filename):
"""
Writes a histogram of the given data to a file.
Parameters:
data (numpy.ndarray): The data for which the histogram is to be computed.
filename (str): The name of the file where the histogram will be written.
Outputs:
A file containing the histogram data. The bin edges and histogram counts
are written in separate lines, separated by semicolons.
Side Effects:
Creates or overwrites the specified file with histogram data.
"""
hist, bin_edges = numpy.histogram(data, bins=BINS) hist, bin_edges = numpy.histogram(data, bins=BINS)
f = open(filename, "w") with open(filename, "w") as f:
f.write(";".join(bin_edges)) f.write(";".join(map(str, bin_edges)))
f.write("\n") f.write("\n")
f.write(";".join(hist)) f.write(";".join(map(str, hist)))
f.close()
def get_greenness(pcd): def get_greenness(pcd):
np.seterr(divide='ignore',invalid='ignore') """
return ((np.asarray(pcd.wavelengths)[:,0]-np.asarray(pcd.wavelengths)[:,2] + 2.0 * np.asarray(pcd.wavelengths)[:,1])/(np.asarray(pcd.wavelengths)[:,0]+ np.asarray(pcd.wavelengths)[:,1] + np.asarray(pcd.wavelengths)[:,2])) Calculates the greenness index for a point cloud.
Parameters:
pcd (open3d.geometry.PointCloud): The point cloud object containing wavelength data.
Returns:
numpy.ndarray: An array of greenness values for each point in the point cloud.
Exceptions:
May raise an exception if the point cloud does not contain wavelength data.
# find each pointcloud in the file list Notes:
The greenness index is calculated using the formula:
(R - B + 2G) / (R + G + B), where R, G, and B are the red, green, and blue
wavelength values, respectively.
"""
np.seterr(divide='ignore', invalid='ignore')
return ((np.asarray(pcd.wavelengths)[:,0] - np.asarray(pcd.wavelengths)[:,2] +
2.0 * np.asarray(pcd.wavelengths)[:,1]) /
(np.asarray(pcd.wavelengths)[:,0] + np.asarray(pcd.wavelengths)[:,1] +
np.asarray(pcd.wavelengths)[:,2]))
# Find each point cloud in the file list
pointclouds = defaultdict(str) pointclouds = defaultdict(str)
for pcd in sys.argv[2:]: for pcd in sys.argv[2:]:
filename = os.path.basename(pcd) filename = os.path.basename(pcd)
pointclouds[filename] = pcd pointclouds[filename] = pcd
# process isa file # Process ISA file
for a in study.assays: for a in study.assays:
print(a.data_files) print(a.data_files)
for df in a.data_files: for df in a.data_files:
...@@ -45,11 +84,10 @@ for a in study.assays: ...@@ -45,11 +84,10 @@ for a in study.assays:
print(com.value) print(com.value)
print(os.path.basename(df.filename)) print(os.path.basename(df.filename))
if ".ply" in df.filename and os.path.basename(df.filename) in pointclouds: if ".ply" in df.filename and os.path.basename(df.filename) in pointclouds:
# copy ply # Copy ply
shutil.copy2(pointclouds[os.path.basename(df.filename)], com.value) shutil.copy2(pointclouds[os.path.basename(df.filename)], com.value)
a.pointcloud = com.value a.pointcloud = com.value
for a in study.assays: for a in study.assays:
print(a.data_files) print(a.data_files)
if a.pointcloud: if a.pointcloud:
...@@ -64,7 +102,3 @@ for a in study.assays: ...@@ -64,7 +102,3 @@ for a in study.assays:
if "greenness.csv" in com.value: if "greenness.csv" in com.value:
greenness = get_greenness(pcd) greenness = get_greenness(pcd)
writeHistogram(greenness, com.value) writeHistogram(greenness, com.value)
""" """
ISA & isamodel This script serves as a command-line interface for the F500 class, which provides various functionalities such as restructuring data, processing point clouds, verifying data, combining histograms, and uploading data. The script determines the command to execute based on user input.
https://isa-specs.readthedocs.io/en/latest/isamodel.html
Modules:
- sys: Provides access to some variables used or maintained by the interpreter and to functions that interact with the interpreter.
- os: Provides a portable way of using operating system-dependent functionality.
- pandas: A data manipulation and analysis library for Python.
- F500: A custom module that contains the F500 class with methods for different data processing tasks.
Usage:
Run the script with the desired command to execute the corresponding functionality.
""" """
import sys import sys
import os import os
import pandas import pandas
from F500 import F500 from F500 import F500
if __name__ == '__main__': if __name__ == '__main__':
f500 = F500() f500 = F500()
f500.commandLineInterface() f500.commandLineInterface()
if f500.args.command == "restructure": if f500.args.command == "restructure":
...@@ -25,8 +30,6 @@ if __name__ == '__main__': ...@@ -25,8 +30,6 @@ if __name__ == '__main__':
f500.combineHistograms() f500.combineHistograms()
elif f500.args.command == "upload": elif f500.args.command == "upload":
f500.upload() f500.upload()
```
Note: Since the `F500` class and its methods are not provided in the script, I cannot add docstrings for them. However, I have added a general docstring for the script itself. If you have access to the `F500` class, you should add docstrings to its methods following the same guidelines.
\ No newline at end of file
import numpy as np """
This script provides functions to calculate various vegetation indices from point cloud data (PCD) using specific wavelength channels.
These indices include the Normalized Difference Vegetation Index (NDVI) for visualization, NDVI, the Normalized Pigment Chlorophyll Index (NPCI),
and a greenness index. The calculations are based on the wavelengths corresponding to different spectral bands.
Functions:
- get_ndvi_for_visualization: Computes NDVI for visualization purposes, scaling the result between 0 and 1.
- get_ndvi: Computes the standard NDVI, with values ranging from -1 to 1.
- get_npci: Computes the NPCI using the red and blue channels.
- get_greenness: Computes a greenness index using the red, green, and blue channels.
"""
import numpy as np
#ndvi value calculated for visualization needs to have a range between 0 and 1
# to (((nirv-red)/(nirv+red))+1)/2
def get_ndvi_for_visualization(pcd): def get_ndvi_for_visualization(pcd):
np.seterr(divide='ignore',invalid='ignore') """
np.asarray(pcd.ndvi)[:,0] = (((np.asarray(pcd.wavelengths)[:,3]-np.asarray(pcd.wavelengths)[:,0])/(np.asarray(pcd.wavelengths)[:,3]+np.asarray(pcd.wavelengths)[:,0]))+1)/2 Calculate the NDVI for visualization purposes, scaling the result between 0 and 1.
return pcd.ndvi[:,0]
Parameters:
pcd (object): A point cloud data object containing 'wavelengths' and 'ndvi' attributes.
'wavelengths' is expected to be a 2D array where the columns correspond to different spectral bands.
Returns:
ndarray: A 1D array of NDVI values scaled between 0 and 1.
Side Effects:
- Modifies the 'ndvi' attribute of the input 'pcd' object.
Notes:
- This function ignores division and invalid operation warnings using numpy's seterr function.
"""
np.seterr(divide='ignore', invalid='ignore')
np.asarray(pcd.ndvi)[:, 0] = (((np.asarray(pcd.wavelengths)[:, 3] - np.asarray(pcd.wavelengths)[:, 0]) /
(np.asarray(pcd.wavelengths)[:, 3] + np.asarray(pcd.wavelengths)[:, 0])) + 1) / 2
return pcd.ndvi[:, 0]
#ndvi value calculated from nir and red channel should be between -1 and 1
# (nirv-red)/(nirv+red)
def get_ndvi(pcd): def get_ndvi(pcd):
np.seterr(divide='ignore',invalid='ignore') """
np.asarray(pcd.ndvi)[:,0] = (np.asarray(pcd.wavelengths)[:,3]-np.asarray(pcd.wavelengths)[:,0])/(np.asarray(pcd.wavelengths)[:,3]+np.asarray(pcd.wavelengths)[:,0]) Calculate the standard NDVI, with values ranging from -1 to 1.
return pcd.ndvi[:,0]
Parameters:
pcd (object): A point cloud data object containing 'wavelengths' and 'ndvi' attributes.
'wavelengths' is expected to be a 2D array where the columns correspond to different spectral bands.
Returns:
ndarray: A 1D array of NDVI values ranging from -1 to 1.
Side Effects:
- Modifies the 'ndvi' attribute of the input 'pcd' object.
Notes:
- This function ignores division and invalid operation warnings using numpy's seterr function.
"""
np.seterr(divide='ignore', invalid='ignore')
np.asarray(pcd.ndvi)[:, 0] = (np.asarray(pcd.wavelengths)[:, 3] - np.asarray(pcd.wavelengths)[:, 0]) / \
(np.asarray(pcd.wavelengths)[:, 3] + np.asarray(pcd.wavelengths)[:, 0])
return pcd.ndvi[:, 0]
#(RED − BLUE)/(RED + BLUE)
def get_npci(pcd): def get_npci(pcd):
np.seterr(divide='ignore',invalid='ignore') """
return ((np.asarray(pcd.wavelengths)[:,0]-np.asarray(pcd.wavelengths)[:,2])/(np.asarray(pcd.wavelengths)[:,0]+np.asarray(pcd.wavelengths)[:,2])) Calculate the Normalized Pigment Chlorophyll Index (NPCI) using the red and blue channels.
#(2*G-R-B)/(R+G+B) Parameters:
pcd (object): A point cloud data object containing 'wavelengths' attribute.
'wavelengths' is expected to be a 2D array where the columns correspond to different spectral bands.
Returns:
ndarray: A 1D array of NPCI values.
Notes:
- This function ignores division and invalid operation warnings using numpy's seterr function.
"""
np.seterr(divide='ignore', invalid='ignore')
return ((np.asarray(pcd.wavelengths)[:, 0] - np.asarray(pcd.wavelengths)[:, 2]) /
(np.asarray(pcd.wavelengths)[:, 0] + np.asarray(pcd.wavelengths)[:, 2]))
def get_greenness(pcd): def get_greenness(pcd):
np.seterr(divide='ignore',invalid='ignore') """
return ((np.asarray(pcd.wavelengths)[:,0]-np.asarray(pcd.wavelengths)[:,2] + 2.0 * np.asarray(pcd.wavelengths)[:,1])/(np.asarray(pcd.wavelengths)[:,0]+ np.asarray(pcd.wavelengths)[:,1] + np.asarray(pcd.wavelengths)[:,2])) Calculate a greenness index using the red, green, and blue channels.
Parameters:
pcd (object): A point cloud data object containing 'wavelengths' attribute.
'wavelengths' is expected to be a 2D array where the columns correspond to different spectral bands.
Returns:
ndarray: A 1D array of greenness index values.
Notes:
- This function ignores division and invalid operation warnings using numpy's seterr function.
"""
np.seterr(divide='ignore', invalid='ignore')
return ((np.asarray(pcd.wavelengths)[:, 0] - np.asarray(pcd.wavelengths)[:, 2] +
2.0 * np.asarray(pcd.wavelengths)[:, 1]) /
(np.asarray(pcd.wavelengths)[:, 0] + np.asarray(pcd.wavelengths)[:, 1] + np.asarray(pcd.wavelengths)[:, 2]))
import numpy as np import numpy as np
# wavelength: [0]R/[1]G/[2]B/[3]NIR all divide by scale """
# replace color R with wavelength [0]/scale This script provides functionality to rescale the wavelengths of a point cloud data (PCD) object.
# replace color G with wavelength [1]/scale The script modifies the color and NIR (Near-Infrared) attributes of the PCD based on the provided scale.
# replace color B with wavelength [2]/scale """
# replace nir [0] with wavelength [3]/scale
def rescale_wavelengths(pcd,scale):
tmp_wvl = np.asarray(pcd.wavelengths)/scale
np.asarray(pcd.colors)[:,0] = tmp_wvl[:,0]
np.asarray(pcd.colors)[:,1] = tmp_wvl[:,1]
np.asarray(pcd.colors)[:,2] = tmp_wvl[:,2]
np.asarray(pcd.nir)[:,0] = tmp_wvl[:,3]
return pcd
def rescale_wavelengths(pcd, scale):
"""
Rescales the wavelengths of a point cloud data (PCD) object by a given scale factor.
This function takes a PCD object with attributes for wavelengths, colors, and NIR values.
It rescales the wavelengths by dividing them by the provided scale factor and updates the
color and NIR attributes of the PCD accordingly.
Parameters:
pcd : object
A point cloud data object that contains 'wavelengths', 'colors', and 'nir' attributes.
The 'wavelengths' attribute is expected to be a 2D array with columns representing
R, G, B, and NIR wavelengths.
scale : float
The scale factor by which to divide the wavelengths.
Returns:
object
The modified PCD object with rescaled color and NIR attributes.
Side Effects:
Modifies the 'colors' and 'nir' attributes of the input PCD object in place.
Exceptions:
This function assumes that the input PCD object has the required attributes and that they
are in the expected format. If not, it may raise AttributeError or IndexError.
Future Work:
Consider adding input validation to ensure the PCD object has the required attributes and
that they are in the expected format. Additionally, handle potential exceptions more gracefully.
"""
tmp_wvl = np.asarray(pcd.wavelengths) / scale
np.asarray(pcd.colors)[:, 0] = tmp_wvl[:, 0]
np.asarray(pcd.colors)[:, 1] = tmp_wvl[:, 1]
np.asarray(pcd.colors)[:, 2] = tmp_wvl[:, 2]
np.asarray(pcd.nir)[:, 0] = tmp_wvl[:, 3]
return pcd
{
"title": "F500 analytics for NPEC",
"description": "Several Python and R scripts for processing the raw F500 data. Uses the ISA-JSON for metadata",
"developer": "Sven Warris",
"mail": "sven.warris@wur.nl",
"link": "https://git.wur.nl/NPEC/analytics"
}
### Summary of Key Findings from Each Report
#### Radon MI Report
- **Maintainability Index Scores**: The codebase has a mix of maintainability scores, with some files scoring low due to high complexity, lack of comments, large file sizes, poor modularization, and code duplication.
- **Improvement Suggestions**: Refactor complex functions, add documentation, modularize code, reduce duplication, simplify logic, and use descriptive naming.
#### Radon CC Report
- **Complexity Scores**: Some functions have high cyclomatic complexity, making them difficult to understand and maintain.
- **Refactoring Suggestions**: Break down complex functions, reduce nested logic, use design patterns, modularize code, and improve error handling.
#### Pylint Report
- **Linting Issues**: The codebase has convention violations, warnings, errors, and refactor suggestions, impacting readability, maintainability, and potential for bugs.
- **Improvement Strategies**: Adopt PEP 8 standards, improve documentation, optimize imports, refactor complex functions, use context managers, update string formatting, and resolve errors.
#### Vulture Report
- **Unused Code**: The codebase contains unused imports, variables, attributes, classes, and methods, which can be removed to streamline the project.
- **Critical Unused Code**: Some unused code might be critical if certain functionalities are expected, requiring careful review before removal.
### Common Issues Across Reports and High-Level Strategies for Improvement
1. **Complexity and Maintainability**: High complexity and low maintainability scores are common issues. Strategies include refactoring complex functions, simplifying logic, and improving modularization.
2. **Documentation and Naming**: Lack of comments and poor naming conventions are prevalent. Strategies include adding comprehensive docstrings and adhering to PEP 8 naming conventions.
3. **Code Duplication and Unused Code**: Code duplication and unused code are identified across reports. Strategies include removing unused code and refactoring duplicated code into reusable components.
4. **Error Handling and Testing**: Potential runtime errors and lack of automated testing are concerns. Strategies include improving error handling and increasing test coverage.
### Overall Assessment of the Codebase’s Quality, Complexity, and Maintainability
- **Quality**: The codebase currently has a low quality score, primarily due to poor adherence to coding standards, lack of documentation, and potential runtime errors. Addressing these issues will significantly enhance the code's quality.
- **Complexity**: While the average complexity score is relatively low, there are outliers with high complexity that need attention. Regular refactoring and code reviews can help manage complexity.
- **Maintainability**: The codebase has a mix of maintainability scores, with some files requiring significant improvement. By focusing on refactoring, documentation, and modularization, maintainability can be improved.
### Recommendations for Improvement
1. **Refactor and Simplify**: Focus on refactoring complex functions and simplifying logic to improve readability and maintainability.
2. **Enhance Documentation**: Add comprehensive docstrings and inline comments to clarify code functionality and usage.
3. **Adopt Coding Standards**: Ensure adherence to PEP 8 standards for naming conventions, line length, and overall code style.
4. **Remove Unused Code**: Safely remove unused imports, variables, and functions to streamline the codebase.
5. **Improve Testing and Error Handling**: Increase automated test coverage and implement consistent error-handling strategies.
6. **Regular Code Reviews**: Conduct regular code reviews to catch potential issues early and promote best practices.
By implementing these strategies, the codebase's quality, complexity, and maintainability can be significantly improved, making it easier to understand, modify, and extend in the future.
\ No newline at end of file
This diff is collapsed.
### Summary of Linting Issues
The pylint report highlights several types of issues across the codebase. These can be grouped into the following categories:
1. **Convention Violations (C):**
- **Trailing Whitespace and Newlines:** Frequent occurrences of trailing whitespace and newlines across multiple files.
- **Naming Conventions:** Many variables, functions, and module names do not conform to PEP 8 naming conventions (e.g., snake_case for variables and functions, UPPER_CASE for constants).
- **Missing Docstrings:** Many modules, classes, and functions lack docstrings, which are essential for understanding the purpose and usage of the code.
- **Line Length:** Numerous lines exceed the recommended maximum line length of 100 characters.
2. **Warnings (W):**
- **Unused Imports and Variables:** Several imports and variables are declared but not used, leading to unnecessary clutter.
- **Redefining Names:** Some variables are redefined from an outer scope, which can lead to confusion and potential errors.
- **Deprecated and Unused Methods:** Usage of deprecated methods and methods that are defined but not used.
3. **Errors (E):**
- **Import Errors:** Some modules are unable to import certain packages, indicating potential issues with dependencies or incorrect paths.
- **Undefined Variables:** Usage of variables that are not defined within the scope.
- **No-Member Errors:** Attempting to access non-existent members of modules, which could indicate incorrect usage or outdated libraries.
4. **Refactor Suggestions (R):**
- **Too Many Arguments/Attributes:** Functions and classes with too many arguments or attributes, suggesting a need for refactoring to improve readability and maintainability.
- **Too Many Branches/Statements:** Functions with excessive branching or statements, indicating complex logic that could be simplified.
5. **Specific Issues:**
- **Consider Using 'with' for Resource Management:** Several instances where resource-allocating operations like file handling should use context managers for better resource management.
- **Consider Using f-Strings:** Recommendations to use f-strings for string formatting for better readability and performance.
### Impact on Code Quality
- **Readability and Maintainability:** The lack of adherence to naming conventions and missing docstrings significantly impacts the readability and maintainability of the code. Developers may find it challenging to understand and modify the code.
- **Potential Bugs:** Undefined variables, import errors, and no-member errors can lead to runtime errors, affecting the reliability of the software.
- **Performance and Efficiency:** Unused imports and variables, along with inefficient string formatting, can lead to unnecessary memory usage and reduced performance.
### Suggested Fixes and Refactoring Strategies
1. **Adopt PEP 8 Standards:**
- Rename variables, functions, and modules to conform to PEP 8 naming conventions.
- Ensure all lines are within the recommended length.
2. **Improve Documentation:**
- Add docstrings to all modules, classes, and functions to describe their purpose and usage.
3. **Optimize Imports:**
- Remove unused imports and variables to clean up the code.
4. **Refactor Complex Functions:**
- Break down functions with too many arguments or complex logic into smaller, more manageable functions.
5. **Use Context Managers:**
- Implement context managers for file operations to ensure proper resource management.
6. **Update String Formatting:**
- Replace old string formatting methods with f-strings for improved readability and performance.
7. **Resolve Errors:**
- Investigate and resolve import errors and undefined variables to ensure the code runs correctly.
### Overall Quality Assessment
Based on the pylint report, the codebase currently has a low quality score of 0.92/10. The code suffers from poor adherence to coding standards, lack of documentation, and potential runtime errors. Addressing the highlighted issues will significantly improve the code's readability, maintainability, and reliability. Prioritizing the most critical issues, such as errors and convention violations, will be essential in enhancing the overall quality of the codebase.
\ No newline at end of file
analytics/lib/rescaleWavelength.py
F 8:0 rescale_wavelengths - A (1)
analytics/lib/computePhenotypes.py
F 6:0 get_ndvi_for_visualization - A (1)
F 13:0 get_ndvi - A (1)
F 19:0 get_npci - A (1)
F 24:0 get_greenness - A (1)
analytics/visualizations/histograms_ply.py
F 23:0 createPNG - A (1)
analytics/visualizations/animate_ply.py
F 28:0 play_motion - A (1)
analytics/f500/collecting/PointCloud.py
M 45:4 PointCloud.get_hue - B (6)
M 15:4 PointCloud.writeHistogram - A (4)
C 5:0 PointCloud - A (3)
M 98:4 PointCloud.trim - A (3)
M 25:4 PointCloud.getWavelengths - A (2)
M 92:4 PointCloud.render_image - A (2)
M 165:4 PointCloud.render_image_rescale - A (2)
M 11:4 PointCloud.__init__ - A (1)
M 33:4 PointCloud.get_psri - A (1)
M 69:4 PointCloud.get_greenness - A (1)
M 77:4 PointCloud.get_ndvi - A (1)
M 83:4 PointCloud.get_npci - A (1)
M 88:4 PointCloud.setColors - A (1)
M 130:4 PointCloud.render_image_no_rescale - A (1)
analytics/f500/collecting/F500.py
M 620:4 F500.processPointclouds - E (31)
M 538:4 F500.restructure - D (21)
M 261:4 F500.copyPointcloudFile - C (11)
M 693:4 F500.combineHistograms - B (10)
M 301:4 F500.copyPlotPointcloudFile - B (9)
M 495:4 F500.finalize - B (9)
C 30:0 F500 - B (6)
M 453:4 F500.createAssayPlot - A (5)
M 209:4 F500.copyPots - A (4)
M 162:4 F500.setLogger - A (2)
M 169:4 F500.removeAfterSpaceFromDataMatrix - A (2)
M 176:4 F500.createISA - A (2)
M 361:4 F500.createSample - A (2)
M 486:4 F500.correctDataMatrix - A (2)
M 44:4 F500.__init__ - A (1)
M 57:4 F500.commandLineInterface - A (1)
M 202:4 F500.writeISAJSON - A (1)
M 221:4 F500.measurementsToFile - A (1)
M 228:4 F500.rawMeasurementsToFile - A (1)
M 235:4 F500.addPointClouds - A (1)
M 374:4 F500.createAssay - A (1)
M 535:4 F500.getDirectoryListing - A (1)
M 730:4 F500.upload - A (1)
analytics/f500/collecting/processPointClouds.py
F 20:0 writeHistogram - A (1)
F 28:0 get_greenness - A (1)
analytics/f500/collecting/Fairdom.py
M 118:4 Fairdom.upload - D (24)
C 12:0 Fairdom - A (4)
M 96:4 Fairdom.addDataFilesToSampleJSON - A (3)
M 84:4 Fairdom.addSampleToAssayJSON - A (2)
M 90:4 Fairdom.addDataFileToAssayJSON - A (2)
M 13:4 Fairdom.__init__ - A (1)
M 28:4 Fairdom.createInvestigationJSON - A (1)
M 40:4 Fairdom.createStudyJSON - A (1)
M 53:4 Fairdom.createAssayJSON - A (1)
M 70:4 Fairdom.createDataFileJSON - A (1)
M 104:4 Fairdom.createSampleJSON - A (1)
analytics/f500/collecting/F500Azure.py
M 64:4 F500Azure.copyPointcloudFile - A (3)
C 5:0 F500Azure - A (2)
M 7:4 F500Azure.__init__ - A (1)
M 12:4 F500Azure.initAzure - A (1)
M 30:4 F500Azure.connectToSource - A (1)
M 38:4 F500Azure.connectToTarget - A (1)
M 45:4 F500Azure.writeISAJSON - A (1)
M 50:4 F500Azure.measurementsToFile - A (1)
M 57:4 F500Azure.rawMeasurementsToFile - A (1)
M 82:4 F500Azure.getDirectoryListing - A (1)
analytics/f500/collecting/fairdom.py
F 235:0 finalize - B (7)
F 146:0 copyPots - A (3)
F 53:0 removeAfterSpaceFromDataMatrix - A (1)
F 155:0 measurementsToFile - A (1)
F 160:0 rawMeasurementsToFile - A (1)
F 167:0 addPointClouds - A (1)
F 175:0 createAssay - A (1)
74 blocks (classes, functions, methods) analyzed.
Average complexity: A (3.135135135135135)
### Highlight of Functions/Methods with Highest Complexity Scores
1. **F500.processPointclouds - E (31)**
- **Impact on Maintainability**: This method has the highest cyclomatic complexity score of 31, indicating a very high level of complexity. Such a high score suggests that the method likely contains numerous conditional statements and branches, making it difficult to understand, test, and maintain. This complexity can lead to increased chances of bugs and errors, as well as making future modifications challenging.
2. **Fairdom.upload - D (24)**
- **Impact on Maintainability**: With a complexity score of 24, this method is also quite complex. It may contain multiple decision points and nested logic, which can obscure the flow of the code and make it harder to follow. This complexity can hinder the ability to quickly identify and fix issues or to extend the functionality.
3. **F500.restructure - D (21)**
- **Impact on Maintainability**: This method's complexity score of 21 suggests it is also quite intricate. Similar to the above methods, it likely involves numerous branches and conditions, which can complicate understanding and maintenance.
### Suggestions for Refactoring or Simplifying the Most Complex Functions/Methods
1. **F500.processPointclouds**
- **Break Down into Smaller Functions**: Identify logical sections within the method and extract them into smaller, well-named functions. This will help isolate different functionalities and make the code more readable.
- **Reduce Nested Logic**: Simplify nested if-else statements by using guard clauses or early returns where possible.
- **Use Design Patterns**: Consider using design patterns like Strategy or Command to encapsulate varying behaviors and reduce complexity.
2. **Fairdom.upload**
- **Modularize Code**: Break down the method into smaller, single-responsibility functions. Each function should handle a specific part of the upload process.
- **Simplify Conditionals**: Use polymorphism or a configuration-driven approach to handle complex conditional logic.
- **Improve Error Handling**: Implement a consistent error-handling strategy to manage exceptions and edge cases more effectively.
3. **F500.restructure**
- **Refactor for Clarity**: Extract complex logic into helper functions with descriptive names to clarify the purpose of each code block.
- **Use Data Structures**: Consider using more appropriate data structures to simplify data manipulation and reduce the number of operations.
- **Optimize Loops**: Review loops for opportunities to simplify or combine them, reducing the overall complexity.
### General Summary of the Codebase’s Complexity and Recommendations for Improvement
- **Overall Complexity**: The average complexity score of the codebase is A (3.135), which indicates that most of the code is relatively simple and maintainable. However, there are a few outliers with significantly higher complexity scores that need attention.
- **Recommendations**:
- **Regular Code Reviews**: Implement regular code reviews focusing on complexity and maintainability to catch potential issues early.
- **Automated Testing**: Increase the coverage of automated tests, especially for complex methods, to ensure that changes do not introduce new bugs.
- **Continuous Refactoring**: Encourage continuous refactoring practices to gradually improve the codebase's structure and readability.
- **Documentation**: Improve documentation for complex methods to aid understanding and future maintenance efforts.
- **Training and Best Practices**: Provide training on best practices for writing clean, maintainable code, and encourage the use of design patterns where appropriate.
By addressing the most complex areas and promoting a culture of clean code, the maintainability and quality of the codebase can be significantly improved.
\ No newline at end of file
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment