Upt raw_fullacs to raw_acs, raw_acs to raw_spm_acs#57
Upt raw_fullacs to raw_acs, raw_acs to raw_spm_acs#57Kklu78 wants to merge 5 commits intoPolicyEngine:masterfrom
Conversation
MaxGhenis
left a comment
There was a problem hiding this comment.
Running the black code formatter with fix the linting error. You can integrate that to VSCode with these instructions, and you'll also need to adjust the line length to 79.
| url = f"https://www2.census.gov/programs-surveys/acs/data/pums/{year}/1-Year/csv_pus.zip" | ||
| request = requests.get(url) | ||
| file = ZipFile(BytesIO(request.content)) | ||
| file.extractall(f'{year}_pus') |
There was a problem hiding this comment.
@nikhilwoodruff feel free to suggest otherwise given these are large files (larger than others), but to be consistent with other generate functions I think we'll want to avoid writing the source files to disk, and instead load from the zip file directly. This might make most sense as a function, something like this (not sure if it'll work):
def concat_zipped_csvs(url: str, prefix: str) -> pd.DataFrame:
# Creates a DataFrame with the two csvs inside a zip file from a URL.
zf = ZipFile(BytesIO(requests.get(url)))
a = pd.read_csv(zf.open(prefix + "a.csv"))
b = pd.read_csv(zf.open(prefix + "b.csv"))
res = pd.concat([a, b]).fillna(0)
res.columns = res.columns.str.lower()
return resThen called as:
person_df = concat_zipped_csvs(
f"https://www2.census.gov/programs-surveys/acs/data/pums/{year}/1-Year/csv_pus.zip",
"psam_pus"
)And similarly for household.
There was a problem hiding this comment.
I think that's fine as a general approach for large files - though I did envisage something similar to openfisca-uk-data's download() function (either from gcp or GitHub) being useful here
|
|
||
| def create_household_table(person: pd.DataFrame) -> pd.DataFrame: | ||
| return person[["SERIALNO", "ST", "PUMA"]].groupby(person.SERIALNO).first() | ||
| ) No newline at end of file |
MaxGhenis
left a comment
There was a problem hiding this comment.
Couple other small things.
Also for posterity, note that raw_spm_acs.py is essentially just renamed from raw_acs.py, the diff just isn't showing it as such unfortunately, maybe because there's now a new raw_acs.py which is the full ACS.
| ### ACS | ||
| - OpenFisca-US-compatible | ||
| - Contains OpenFisca-US-compatible input arrays. | ||
| - Contains OpenFisca-US-compatible input arrays from the spm research file. |
There was a problem hiding this comment.
| - Contains OpenFisca-US-compatible input arrays from the spm research file. | |
| - Contains OpenFisca-US-compatible input arrays from the SPM research file. |
| REPO = Path(__file__).parent | ||
|
|
||
| DATASETS = (RawCPS, CPS, RawACS, ACS) | ||
| DATASETS = (RawCPS, CPS, RawACS, ACS, RawSPMACS) |
There was a problem hiding this comment.
| DATASETS = (RawCPS, CPS, RawACS, ACS, RawSPMACS) | |
| DATASETS = (RawCPS, CPS, RawACS, RawSPMACS, ACS) |
No description provided.