Skip to content

28-Ability to run each stage independently#29

Open
jigglepuff wants to merge 20 commits intomasterfrom
28-test-stages
Open

28-Ability to run each stage independently#29
jigglepuff wants to merge 20 commits intomasterfrom
28-test-stages

Conversation

@jigglepuff
Copy link
Copy Markdown

@jigglepuff jigglepuff commented Apr 14, 2020

Closes: #28

New files:
tests/test_fetcher.py - Fetcher unit tests, and entry point to run Fetch stage by itself.

tests/test_parser.py - Parser unit tests, and entry point to run Parse stage by itself.

tests/test_extractor.py - Extractor unit tests, and entry point to run Extract stage by itself.

tests/test_transformer.py - Transformer unit tests, and entry point to run Transform stage by itself.

tests/test_loader.py - Loader unit tests, and entry point to run Load stage by itself.

Changes:
app.py - change from etl import fetcher to from etl.fetcher import Fetcher to avoid module name and variable name collision. This allows us to copy code from app.py to test fixtures.

etl/command_line_args.py - added function arguments to indicate which arguments argparse should expect. This allows test fixtures to reuse this module.

etl/fetcher.py - removed imports that are not used. This also avoids import errors when running from directories other than the project root directory.

etl/utils.py - added custom to_csv() and read_csv() to preserve datatype when exporting and importing CSVs between stages extractor and transformer.

requirements.txt - added dependencies for pytest (python testing framework)

README.md - added instructions to run standalone stages and unit tests

.gitignore - added test yaml files: data/sources/test_sources.yml and data/transform_tasks/test_transform_tasks.yml. We do not need to track these files, since they are for developers to test and customize (for now).

@jigglepuff jigglepuff marked this pull request as ready for review April 22, 2020 02:35
Copy link
Copy Markdown
Collaborator

@mrpetrocket mrpetrocket left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Started reviewing. Haven't finished yet; ran into an error in the extractor. The error happens on master too so I don't think it's from this PR.

vacancy.sqlite

# config files for unit tests
data/source/test_sources.yml
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the database config files seem to be named config_{environment}.yml but these files are named {env}_sources.yml or similar. I would say we should put the environment name in a consistent place in the filename, either front or back.


4. Run `Transformer` only:
```bash
python3 tests/test_transformer.py --local-sources src/BldgCom.csv src/BldgRes.csv src/par.dbf.csv src/prcl.shp.csv src/Prcl.csv
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do those csv files come from running one of the other stages in isolation? If so we should document that here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Ability to run each stage independently

2 participants