28-Ability to run each stage independently#29
Open
jigglepuff wants to merge 20 commits intomasterfrom
Open
Conversation
…to 24-progress_bar
…s to help import and export csv preserving datatype metadata
mrpetrocket
reviewed
May 3, 2020
Collaborator
mrpetrocket
left a comment
There was a problem hiding this comment.
Started reviewing. Haven't finished yet; ran into an error in the extractor. The error happens on master too so I don't think it's from this PR.
| vacancy.sqlite | ||
|
|
||
| # config files for unit tests | ||
| data/source/test_sources.yml |
Collaborator
There was a problem hiding this comment.
the database config files seem to be named config_{environment}.yml but these files are named {env}_sources.yml or similar. I would say we should put the environment name in a consistent place in the filename, either front or back.
|
|
||
| 4. Run `Transformer` only: | ||
| ```bash | ||
| python3 tests/test_transformer.py --local-sources src/BldgCom.csv src/BldgRes.csv src/par.dbf.csv src/prcl.shp.csv src/Prcl.csv |
Collaborator
There was a problem hiding this comment.
Do those csv files come from running one of the other stages in isolation? If so we should document that here.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes: #28
New files:
tests/test_fetcher.py-Fetcherunit tests, and entry point to runFetchstage by itself.tests/test_parser.py-Parserunit tests, and entry point to runParsestage by itself.tests/test_extractor.py-Extractorunit tests, and entry point to runExtractstage by itself.tests/test_transformer.py-Transformerunit tests, and entry point to runTransformstage by itself.tests/test_loader.py-Loaderunit tests, and entry point to runLoadstage by itself.Changes:
app.py- changefrom etl import fetchertofrom etl.fetcher import Fetcherto avoid module name and variable name collision. This allows us to copy code fromapp.pyto test fixtures.etl/command_line_args.py- added function arguments to indicate which arguments argparse should expect. This allows test fixtures to reuse this module.etl/fetcher.py- removed imports that are not used. This also avoids import errors when running from directories other than the project root directory.etl/utils.py- added customto_csv()andread_csv()to preserve datatype when exporting and importing CSVs between stagesextractorandtransformer.requirements.txt- added dependencies forpytest(python testing framework)README.md- added instructions to run standalone stages and unit tests.gitignore- added test yaml files:data/sources/test_sources.ymlanddata/transform_tasks/test_transform_tasks.yml. We do not need to track these files, since they are for developers to test and customize (for now).