Skip to content

Getting Started

Cole Herman edited this page Feb 5, 2026 · 3 revisions

Validating Metadata

Making a Config File

Run the following command in terminal:

python makeconfig.py

There will be steps to follow in the terminal to set up your config file.

Validating Spreadsheet Data

  1. Set up your own filepaths.yaml file. Copy the existing filepaths_example.yaml file and rename it to 'filepaths.yaml'.
  2. Replace the values in filepaths.yaml with your file's path.
  3. Run terminal command
python process.py [config file name]

Automatically Fixing Metadata

Making a Fix File

  1. Create a file named "[collection to fix]-fixes.yaml"
  2. Add yaml configs to select incorrect values, with the fix type, column, and extra info if necessary.

For example:

fixes:
  # fix institution having extra space at the end
  - type: strip
    column: institution
  
  # fix license being wrong type
  - type: enforce_string
    column: license
  
  # fix location using http over https
  - type: regex_replace
    column: location
    pattern: '^http://(sws\.geonames\.org/)'
    # replace http:// with https://
    replacement: 'https://\1'

    # fix file column missing file extension (append .tif to the end of the ID)
  - type: regex_replace
    column: file
    pattern: '^(.*?)(?<!\.tif)$'
    replacement: '\1.tif'
- Warning: MAKE SURE THAT IF YOU USE REGEX_REPLACE TO APPEND TO THE END, YOUR PATTERN EXCLUDES VALID DATA. 
Otherwise, you could end up with too much appended (in the example above, it would be .tif.tif if the pattern didn't exclude .tif endings specifically).

Running Fixes

Run terminal command

python autofixcsv.py [collection name]

Checking Import Status

Run terminal command

python importer-solr.py [importer id] [number of works in importer]

Clone this wiki locally