-
Notifications
You must be signed in to change notification settings - Fork 0
Getting Started
Cole Herman edited this page Feb 5, 2026
·
3 revisions
Run the following command in terminal:
python makeconfig.py
There will be steps to follow in the terminal to set up your config file.
- Set up your own filepaths.yaml file. Copy the existing filepaths_example.yaml file and rename it to 'filepaths.yaml'.
- Replace the values in filepaths.yaml with your file's path.
- Run terminal command
python process.py [config file name]
- Create a file named "[collection to fix]-fixes.yaml"
- Add yaml configs to select incorrect values, with the fix type, column, and extra info if necessary.
For example:
fixes:
# fix institution having extra space at the end
- type: strip
column: institution
# fix license being wrong type
- type: enforce_string
column: license
# fix location using http over https
- type: regex_replace
column: location
pattern: '^http://(sws\.geonames\.org/)'
# replace http:// with https://
replacement: 'https://\1'
# fix file column missing file extension (append .tif to the end of the ID)
- type: regex_replace
column: file
pattern: '^(.*?)(?<!\.tif)$'
replacement: '\1.tif'- Warning: MAKE SURE THAT IF YOU USE REGEX_REPLACE TO APPEND TO THE END, YOUR PATTERN EXCLUDES VALID DATA.
Otherwise, you could end up with too much appended (in the example above, it would be .tif.tif if the pattern didn't exclude .tif endings specifically).Run terminal command
python autofixcsv.py [collection name]
Run terminal command
python importer-solr.py [importer id] [number of works in importer]