forked from jigglepuff/StlOpenDataEtl
-
Notifications
You must be signed in to change notification settings - Fork 6
28-Ability to run each stage independently #29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
jigglepuff
wants to merge
20
commits into
master
Choose a base branch
from
28-test-stages
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
20 commits
Select commit
Hold shift + click to select a range
099973b
Progress Bar: initial implementation using progresbar2 package
jigglepuff dfc37c2
Progress Bar: refactor using enlighten package for stdout workaround
jigglepuff 20b77ae
Progress Bar: implemented detailed progress increments for Extractor
jigglepuff 7401541
Progress Bar: Fixed comment
jigglepuff 40d99eb
Merge branch 'master' of https://github.com/OpenSTL/StlOpenDataEtl in…
jigglepuff a4dbb77
changes to ensure merge is working
jigglepuff 0e5e837
loader.py: fix debug message argument error
jigglepuff 18aa332
Progress bar: change to singleton implementation
jigglepuff 62ea3f3
Progress bar manager: added function to get child progress bars
jigglepuff bc86195
loader.py: added load_all() function
jigglepuff e7accfd
test_fetcher.py: script to run fetcher standalone
jigglepuff 274cb13
test_parser.py, test_extractor.py: run parser and extractor standalone
jigglepuff 2d03e9b
app.py: changed import calls such that library and variable name don'…
jigglepuff 23ec6dc
test_transformer.py: run transformer standalone, added utils function…
jigglepuff 07f6dd0
vacant_table.py: changed to_csv to custom utils.to_csv to preserve da…
jigglepuff b7e0627
test_loader: run loader standalone
jigglepuff 9138b3c
data/test_*.yml: added test config files to go with test scripts
jigglepuff c94330b
vacant_table.py: revert change
jigglepuff 914fdc8
.gitignore: ignore test_transfom_tasks.yaml
jigglepuff a000cec
vacant_table.py: revert more changes
jigglepuff File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -59,7 +59,44 @@ python3 ./app.py --db prod | |
| ``` | ||
| :warning: Example 3. will not work if you don't have the database admin credentials. For more details, [Go to Running with Production Database](#running-with-production-database). | ||
|
|
||
| #### Running individual stages | ||
| To run individual stages (i.e. Fetcher only, Transformer only) without running the entire application, use the following commands: | ||
| 1. Run `Fetcher` only: | ||
| Run with default `test_sources.yml`: | ||
| ```bash | ||
| python3 tests/test_fetcher.py | ||
| ``` | ||
| To run with specific source YAML, run the following command replacing last argument with path to custom YAML: | ||
| ```bash | ||
| python3 tests/test_fetcher.py ./data/sources/sources.yml | ||
| ``` | ||
|
|
||
| 2. Run `Parser` only: | ||
| Use --local-sources to specify local files to parse: | ||
| ```bash | ||
| python3 tests/test_parser.py --local-sources ./src/prcl.mdb ./src/par.dbf ./src/prcl_shape.zip | ||
| ``` | ||
|
|
||
| 3. Run `Extractor` only: | ||
| ```bash | ||
| python3 tests/test_extractor.py --local-sources ./src/prcl.mdb ./src/par.dbf ./src/prcl_shape.zip | ||
| ``` | ||
|
|
||
| 4. Run `Transformer` only: | ||
| ```bash | ||
| python3 tests/test_transformer.py --local-sources src/BldgCom.csv src/BldgRes.csv src/par.dbf.csv src/prcl.shp.csv src/Prcl.csv | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Do those csv files come from running one of the other stages in isolation? If so we should document that here. |
||
| ``` | ||
|
|
||
| 5. Run `Loader` only: | ||
| ```bash | ||
| python3 tests/test_loader.py --local-sources ./src/vacant_table.csv | ||
| ``` | ||
|
|
||
| #### Running unit tests | ||
| To run unit tests, run the following command from project root directory: | ||
| ```bash | ||
| pytest | ||
| ``` | ||
|
|
||
| #### Deactivating Virtual Environment | ||
|
|
||
|
|
||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Empty file.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,23 @@ | ||
| [loggers] | ||
| keys=root | ||
|
|
||
| [handlers] | ||
| keys=consoleHandler | ||
|
|
||
| [formatters] | ||
| keys=simpleFormatter | ||
|
|
||
| [logger_root] | ||
| level=DEBUG | ||
| handlers=consoleHandler | ||
|
|
||
| [handler_consoleHandler] | ||
| class=StreamHandler | ||
| level=DEBUG | ||
| disable_existing_loggers=False | ||
| formatter=simpleFormatter | ||
| args=(sys.stdout,) | ||
|
|
||
| [formatter_simpleFormatter] | ||
| format=%(asctime)s %(levelname)-8s [%(filename)s:%(lineno)d] %(message)s | ||
| datefmt=%Y-%m-%d:%H:%M:%S |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,20 +1,27 @@ | ||
| prcl_shape: | ||
| ESRI_Parcels_Shapefile: | ||
| info: https://www.stlouis-mo.gov/data/datasets/distribution.cfm?id=84 | ||
| url: https://www.stlouis-mo.gov/data/upload/data-files/prcl_shape.zip | ||
|
|
||
| prcl: | ||
| Parcels_Key: | ||
| info: https://www.stlouis-mo.gov/data/datasets/distribution.cfm?id=83 | ||
| url: https://www.stlouis-mo.gov/data/upload/data-files/prcl.zip | ||
|
|
||
| par: | ||
| Parcel_Data: | ||
| info: https://www.stlouis-mo.gov/data/datasets/distribution.cfm?id=85 | ||
| url: https://www.stlouis-mo.gov/data/upload/data-files/par.zip | ||
|
|
||
| lra_public: | ||
| LRA_Iventory_Records: | ||
| info: https://www.stlouis-mo.gov/data/datasets/distribution.cfm?id=65 | ||
| url: https://www.stlouis-mo.gov/data/upload/data-files/lra_public.zip | ||
|
|
||
| bldginsp: | ||
| Building_Inspections: | ||
| info: https://www.stlouis-mo.gov/data/datasets/dataset.cfm?id=11 | ||
| url: https://www.stlouis-mo.gov/data/upload/data-files/bldginsp.zip | ||
|
|
||
| prmbdo: | ||
| Building_Permits: | ||
| info: https://www.stlouis-mo.gov/data/datasets/distribution.cfm?id=3 | ||
| url: https://www.stlouis-mo.gov/data/upload/data-files/prmbdo.zip | ||
|
|
||
| forestry_maintenance_properties: | ||
| Forestry_Property_Maintenance_Data: | ||
| Info: https://www.stlouis-mo.gov/data/datasets/dataset.cfm?id=64 | ||
| url: https://www.stlouis-mo.gov/data/upload/data-files/forestry-maintenance-properties.csv |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,27 @@ | ||
| ESRI_Parcels_Shapefile: | ||
| info: https://www.stlouis-mo.gov/data/datasets/distribution.cfm?id=84 | ||
| url: https://www.stlouis-mo.gov/data/upload/data-files/prcl_shape.zip | ||
|
|
||
| Parcels_Key: | ||
| info: https://www.stlouis-mo.gov/data/datasets/distribution.cfm?id=83 | ||
| url: https://www.stlouis-mo.gov/data/upload/data-files/prcl.zip | ||
|
|
||
| # Parcel_Data: | ||
| # info: https://www.stlouis-mo.gov/data/datasets/distribution.cfm?id=85 | ||
| # url: https://www.stlouis-mo.gov/data/upload/data-files/par.zip | ||
| # | ||
| # LRA_Iventory_Records: | ||
| # info: https://www.stlouis-mo.gov/data/datasets/distribution.cfm?id=65 | ||
| # url: https://www.stlouis-mo.gov/data/upload/data-files/lra_public.zip | ||
| # | ||
| # Building_Inspections: | ||
| # info: https://www.stlouis-mo.gov/data/datasets/dataset.cfm?id=11 | ||
| # url: https://www.stlouis-mo.gov/data/upload/data-files/bldginsp.zip | ||
| # | ||
| # Building_Permits: | ||
| # info: https://www.stlouis-mo.gov/data/datasets/distribution.cfm?id=3 | ||
| # url: https://www.stlouis-mo.gov/data/upload/data-files/prmbdo.zip | ||
| # | ||
| # Forestry_Property_Maintenance_Data: | ||
| # Info: https://www.stlouis-mo.gov/data/datasets/dataset.cfm?id=64 | ||
| # url: https://www.stlouis-mo.gov/data/upload/data-files/forestry-maintenance-properties.csv |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1 @@ | ||
| - vacant_table |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,8 +1,10 @@ | ||
| import argparse | ||
|
|
||
| def getCommandLineArgs(): | ||
| def getCommandLineArgs(local_source=True, db=True): | ||
| parser = argparse.ArgumentParser() | ||
| parser.add_argument('--db', nargs='?', type=str, choices=['dev','prod'], default='dev', help='dev: use local database; prod: use production database') | ||
| parser.add_argument('--local-sources', nargs='+', type=str, help='local data files to use in place of internet sources.') | ||
| if db: | ||
| parser.add_argument('--db', nargs='?', type=str, choices=['dev','prod'], default='dev', help='dev: use local database; prod: use production database') | ||
| if local_source: | ||
| parser.add_argument('--local-sources', nargs='+', type=str, help='local data files to use in place of internet sources.') | ||
| args = parser.parse_args() | ||
| return args |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,12 @@ | ||
|
|
||
| # Declare global variables (callable across files) | ||
| TOTAL_STAGES = 5 | ||
| CSV = '.csv' # comma separated values | ||
| DBF = '.dbf' # dbase | ||
| MDB = '.mdb' # microsoft access database (jet, access, etc.) | ||
| PRJ = '.prj' # .shp support file | ||
| SBN = '.sbn' # .shp support file | ||
| SBX = '.sbx' # .shp support file | ||
| SHP = '.shp' # shapes | ||
| SHX = '.shx' # .shp support file | ||
| SUPPORTED_FILE_EXT = [CSV, DBF, MDB, PRJ, SBN, SBX, SHP, SHX] |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the database config files seem to be named config_{environment}.yml but these files are named {env}_sources.yml or similar. I would say we should put the environment name in a consistent place in the filename, either front or back.