Skip to content

Add Arches ETL scripts

Harry Minsky requested to merge add-python-scripts into main

This commit adds the arches etl scripts file to the repository, along with a configuration file for execution (`importConfig.json.example), a requirements.txt file for providing the necessary python packages to execute the script, and a .gitignore file to ignore generated artifacts from the script. The 'wac-arches-etl-script.py' file encompasses (most) of the arches etl process for the wabash arches corridor dataset. As documented in the code itself, the script takes in the source data as a CSV file, splits it into four separate ingest-able CVS files, then runs the shell commands in arches to import the resource models, and then finally the CSVs themselves.

This script is meant to serve as a template for future arches etl processes. Though the script is not designed to be configurable for every use case, the steps it takes (processing a source dataset into separate csvs for resource import, handling the graph and csv import through cli) are meant to apply to any given arches resource import.

Furthermore, the process has not yet been designed to account for 1) circular resource references or 2) concept lists. Those will be handled in subsequent iterations

Merge request reports