Data Quality Assessment Tool

Purpose

dq_tool.py is a command-line tool for performing basic data quality assessment on CSV and Excel files. It checks for completeness, type accuracy, and recognizes common data patterns (such as email, phone, and postal codes) in your datasets. The tool generates a comprehensive report in JSON format and can pretty-print results in the terminal.

Setup

Clone or download this repository.

Create and activate a virtual environment (recommended):

python -m venv .venv
# On Windows:
.venv\Scripts\activate
# On macOS/Linux:
source .venv/bin/activate

Install dependencies:
```
pip install pandas pyarrow tabulate
```
- tabulate is optional, but recommended for pretty terminal output.

Usage

Run the tool from the command line, specifying your input file:

python dq_tool.py --input sample.csv

Use --input to specify the path to your CSV or Excel file.
Optionally, use --report to specify the output report file (default: dq_report.json).

Example:

python dq_tool.py --input data.xlsx --report results/my_report.json

Extending Pattern Rules

To add or modify pattern recognition rules (e.g., to detect new data types):

Open dq_tool.py and locate the check_pattern_recognition function.

Update the pattern_rules dictionary to add new rules or adjust existing ones. For example:

pattern_rules = {
    'email': r'^[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\\.[A-Za-z]{2,}$',
    'phone': r'^\\+?\\d{10,15}$',
    'postal_code': r'^\\d{5}(-\\d{4})?$',
    'custom_rule': r'^your-regex-here$'  # Add your own
}

Save the file and re-run the tool. The new pattern will be automatically included in the analysis.

For questions or contributions, please open an issue or pull request.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.devcontainer		.devcontainer
__pycache__		__pycache__
README.md		README.md
ai_summary.md		ai_summary.md
app.py		app.py
dq_app.py		dq_app.py
dq_tool.py		dq_tool.py
llm_advisor.py		llm_advisor.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Data Quality Assessment Tool

Purpose

Setup

Usage

Extending Pattern Rules

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Data Quality Assessment Tool

Purpose

Setup

Usage

Extending Pattern Rules

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages