Skip to content

Commit 6ed68a8

Browse files
authored
Version 0.3.0 (#19)
* API updates (values becomes summary, remove periods, add keywords) * standardize get_() urls * 98% test coverage * update README and version * documentation updates * fix docstrings and add codes to list_industries * adds reading time function, allows for jurisdiction names, and adds jursidiction name to agency list (solves #13 and #14) * documentMetadata fixes and allow for jurisdiction names (direct to API) * add jurisdictionID to list_document_types
1 parent 1ae5ac3 commit 6ed68a8

5 files changed

Lines changed: 410 additions & 166 deletions

File tree

README.md

Lines changed: 40 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -29,29 +29,25 @@ The API organizes data around __document types__, which are then divided into __
2929

3030
A fundamental concept in RegData is the "document." In RegData, a set of documents represents a body of regulations for which we have produced regulatory restriction counts. For example, to produce data on regulatory restrictions imposed by the US Federal government, RegData uses the Code of Federal Regulations (CFR) as the source documents. Within the CFR, RegData identifies a unit of regulation as the title-part combination. The CFR is organized into 50 titles, and within each title are parts, which could have subparts, but not always. Under the parts are sections. Determining this unit of analyses is critical for the context of the data produced by RegData. Producing regulatory restriction data for US states follows the same strategy but uses the state-specific regulatory code.
3131

32-
In requesting data through the API, you must specify the document type and the indicate a preference for *summary* or *document-level*. By default, RegCensus API returns summarized data for the period of interest. This means that if you do not specify the *summary* preference, you will receive the summarized data for a period. The __get_periods__ helper function (described below) returns the periods available for each series.
32+
In requesting data through the API, you must specify the document type and the indicate a preference for *summary* or *document-level*. By default, RegCensus API returns summarized data for the date of interest. This means that if you do not specify the *summary* preference, you will receive the summarized data for a date. The __get_series__ helper function (described below) returns the dates available for each series.
3333

34-
RegCensus API defines a number of periods depending on the series. For example, the total restrictions series of Federal regulations uses two main periods: daily and annual. The daily data produces the number of regulatory restrictions issued on a particular date by the US Federal government. The same data are available on an annual basis.
34+
RegCensus API defines a number of dates depending on the series. For example, the total restrictions series of Federal regulations uses two main dates: daily and annual. The daily data produces the number of regulatory restrictions issued on a particular date by the US Federal government. The same data are available on an annual basis.
3535

36-
There are five helper functions to retrieve information about these key components of regdata. These functions provider the following information: document types, jurisdictions, series, agencies, and periods with data. The list functions begin with __list__.
36+
There are five helper functions to retrieve information about these key components of regdata. These functions provider the following information: document types, jurisdictions, series, agencies, and dates with data. The list functions begin with __list__.
3737

3838
Each document type comprises one or more *series*. The __list_series__ function returns the list of all series when no series id is provided.
3939

4040
```
41-
rc.list_jurisdictions(jurisdictionID = 38)
41+
rc.list_series(jurisdictionID = 38)
4242
```
4343

4444
Listing the jurisdictions is another great place to start. If you are looking for data for a specifc jurisdiction(s), this function
4545
will return the jurisdiction_id for all jurisdiction, which is key for retrieving data on any individual jurisdiction.
4646

47-
The __get_periods__ function returns a list of all series and the years with data available for each jurisdiction.
47+
The __get_series__ function returns a list of all series and the years with data available for each jurisdiction.
4848

4949
The output from this function can serve as a reference for the valid values that can be passed to parameters in the __get_values__ function. The number of records returned is the unique combination of series and jurisdictions that are available in RegData. The function takes the optional argument jurisdiction id.
5050

51-
```
52-
rc.get_periods(jurisdictionID = 38)
53-
```
54-
5551
## Metadata
5652
The __get_*__ functions return the details about RegData metadata. These metadata are not included in the __get_values__ functions that will be described later.
5753

@@ -65,30 +61,52 @@ rc.get_jurisdictions()
6561

6662
### Agencies
6763

68-
The __get_agencies__ function returns a data frame of all agencies with data in RegData. If an ID is supplied, the data frame returns the details about a single agency specified by the id. The data frame includes characteristics of the agencies. Currently, agency data are only available for federal RegData.
64+
The __get_agencies__ function returns a data frame of agencies with data in RegData. Either the `jurisdictionID` or `keyword` arguments must be supplied. If `jurisdictionID` is passed, the data frame will include information for all agencies in that jurisdiction. If `keyword` is supplied, the data frame will include information for all agencies whose name contains the keyword.
65+
66+
The following code snippet will return data for all agencies in the Federal United States:
6967

7068
```
71-
rc.get_agencies()
69+
rc.get_agencies(jurisdiction = 38)
70+
```
71+
72+
Likewise, this code snippet will return data for all agencies (in any jurisdiction) containing the word "education" (not case sensitive):
73+
74+
```
75+
rc.get_agencies(keyword = 'education')
7276
```
7377

7478
Use the value of the agency_id field when pulling values with the __get_values__ function.
7579

7680
### Industries
7781

78-
The __get_industries__ function returns a data frame of industries with data in the API. Presently the only classification system available is the North American Industry Classification System (NAICS). NAICS is used for both countries in North America and Australia, even the latter uses the Australia and New Zealand Standard Industrial Classification (ANZSIC) system. Presently, industry regulations for Australia are based on the NAICS. RegData expands to other countries, the industry codes will be country specific as well as contain mapping to the Standard Industry Codes (SIC) system.
82+
The __get_industries__ function returns a data frame of industries with data in the API. The available standards include the North American Industry Classification System (NAICS), the Bereau of Economic Analysis system (BEA), and the Standard Occupational Classification System (SOC). By default, the function only returns a data frame with 3-digit NAICS industries. The `codeLevel` and `standard` arguments can be used to select from other classifications.
83+
84+
The following line will get you industry information for all 4-digit NAICS industries:
85+
86+
```
87+
rc.get_industries(codeLevel = 4)
88+
```
89+
90+
This line will get you information for the BEA industries:
91+
92+
```
93+
rc.get_industries(standard = 'BEA')
94+
```
95+
96+
Like the __get_agencies__ function, the `keyword` argument may also be used. The following code snippet will return information for all 6-digit NAICS industries with the word "fishing" in the name:
7997

8098
```
81-
rc.get_industries(38)
99+
rc.get_industries(keyword = 'fishing', codeLevel = 6)
82100
```
83101

84102
### Documents
85103

86-
The __get_documents__ function returns a data frame with metadata for document-level data. The fucntion takes two parameters, jurisdictionID (required) and documentType (default value of 3, which is "all regulations").
104+
The __get_documents__ function returns a data frame with metadata for document-level data. The fucntion takes two parameters, jurisdictionID (required) and documentType (default value of 1, which is "all regulations").
87105

88106
The following line will get metadata for documents associated with U.S. Federal healthcare regulations.
89107

90108
```
91-
rc.get_documents(jurisdictionID = 38, documentType = 1)
109+
rc.get_documents(jurisdictionID = 38, documentType = 3)
92110
```
93111

94112
## Values
@@ -104,12 +122,12 @@ The __get_values__ function is the primary function for obtaining RegData from t
104122
* filtered (optional) - specify if poorly-performing industry results should be excluded. Default is True.
105123
* summary (optional) - specify if summary results should be returned, instead of document-level results. Default is True.
106124
* country (optional) - specify if all values for a country's jurisdiction ID should be returned. Default is False.
107-
* industryType (optional): level of NAICS industries to include. Default is '3-Digit'.
125+
* industryLevel (optional): level of NAICS industries to include. Default is 3.
108126
* version (optional): Version ID for datasets with multiple versions, if no ID is given, API returns most recent version
109127
* download (optional): if not False, a path location for a downloaded csv of the results.
110128
* verbose (optional) - value specifying how much debugging information should be printed for each function call. Higher number specifies more information, default is 0.
111129

112-
In the example below, we are interested in the total number of restrictions and total number of words for the US (get_jurisdictions(38)) for the period 2010 to 2019.
130+
In the example below, we are interested in the total number of restrictions and total number of words for the US (get_jurisdictions(38)) for the dates 2010 to 2019.
113131

114132
```
115133
rc.get_values(series = [1,2], jurisdiction = 38, date = [2010, 2019])
@@ -133,7 +151,7 @@ To obtain the restrictions for a specific agency (or agencies), the series id su
133151

134152
```
135153
# Identify all agencies
136-
rc.list_agencies()
154+
rc.list_agencies(jurisdictionID)
137155
138156
# Call the get_values() for this agency and series 91
139157
rc.get_values(series = 91, jurisdiction = 38, date = [1990, 2018], agency = [81, 84])
@@ -167,7 +185,7 @@ Alternatively, we can use the __get_document_values__ function as in the follow
167185
rc.get_document_values(series = [1,2], jurisdiction = 38, date = ['2010-01-01', '2019-01-01'])
168186
```
169187

170-
Note that for document-level queries, a full date (not just the year) is often required. See the __get_periods__ function for specifics by jurisdiction.
188+
Note that for document-level queries, a full date (not just the year) is often required. See the __get_series__ function for specifics by jurisdiction.
171189

172190
### Version
173191

@@ -188,12 +206,12 @@ Suppose we want to attach the agency names and other agency characteristics to t
188206
We can merge the agency data with the values data as in the code snippet below.
189207

190208
```
191-
agencies = rc.get_agencies()
209+
agencies = rc.get_agencies(jurisdictionID = 38)
192210
agency_by_industry = rc.get_values(
193211
series = 92,
194212
jurisdiction = 38,
195-
time = [1990, 2000],
196-
industry = [111, 33],
213+
time = [1990, 2000],
214+
industry = [111, 33],
197215
agency = [66, 111])
198216
agency_restrictions_ind = agency_by_industry.merge(
199217
agencies, by='agency_id')

regcensus/__init__.py

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,14 +1,15 @@
11
__all__ = [
22
'get_values',
33
'get_document_values',
4+
'get_reading_time',
45
'get_series',
56
'get_agencies',
67
'get_jurisdictions',
7-
'get_periods',
88
'get_industries',
99
'get_documents',
1010
'get_versions',
1111
'list_series',
12+
'list_dates',
1213
'list_document_types',
1314
'list_agencies',
1415
'list_jurisdictions',
@@ -18,14 +19,15 @@
1819
from . api import (
1920
get_values,
2021
get_document_values,
22+
get_reading_time,
2123
get_series,
2224
get_agencies,
2325
get_jurisdictions,
24-
get_periods,
2526
get_industries,
2627
get_documents,
2728
get_versions,
2829
list_series,
30+
list_dates,
2931
list_document_types,
3032
list_agencies,
3133
list_jurisdictions,

0 commit comments

Comments
 (0)