How to Run the Student Context Generator

This project processes student documents (PDFs, PowerPoint files) and uses OpenAI models to generate structured "context" reports for each student. The output is saved as JSON with phone numbers as keys.

Overview

What it does:

Takes student files (resumes, SWOT analyses, career plans) as PDFs or PPTX
Extracts text content using PyMuPDF and python-pptx
Sends extracted content to OpenAI (GPT-3.5-turbo or GPT-4o-mini)
Generates a structured profile with skills, experiences, strengths, weaknesses, goals, etc.
Saves output as JSON with phone number as the key

Prerequisites

1. Install Dependencies

pip install -r requirements.txt

Key dependencies:

openai - OpenAI API client
pymupdf (fitz) - PDF extraction
pdfplumber - PDF fallback extraction
python-pptx - PowerPoint extraction
sqlalchemy - Database ORM
pydantic - Data validation
click - CLI framework
rich - Pretty terminal output
python-dotenv - Environment variables

2. Set Up Environment Variables

Create a .env file in the project root:

cp .env.example .env

Edit .env and add your OpenAI API key:

# Required
OPENAI_API_KEY=your_openai_api_key_here

# Optional (defaults shown)
OPENAI_MODEL=gpt-3.5-turbo
DATABASE_URL=sqlite:///data/db/guidance.db
LOG_LEVEL=INFO
MAX_FILE_SIZE_MB=100

Quick Start

Step 1: Initialize the Database

python -m src.cli init-db

To reset an existing database:

python -m src.cli init-db --reset

Step 2: Add a Student

python -m src.cli add-student --name "John Doe" --grade "9-12"

Options:

--name (required): Student's full name
--grade: Grade band (e.g., "9-12", "college")
--email: Student's email address

Step 3: Process Student Documents

python -m src.cli process-document --student-id 1 --file /path/to/resume.pdf
python -m src.cli process-document --student-id 1 --file /path/to/swot.pdf
python -m src.cli process-document --student-id 1 --file /path/to/career_plan.pptx

Supported formats: .pdf, .pptx, .ppt

Step 4: Generate Context (AI Analysis)

python -m src.cli generate-context --student-id 1

This sends the extracted content to OpenAI and generates a structured profile.

Step 5: View the Generated Context

# View as formatted table
python -m src.cli show-context --student-id 1

# View as JSON
python -m src.cli show-context --student-id 1 --format json

# View as YAML
python -m src.cli show-context --student-id 1 --format yaml

Step 6: Export Contexts

Export single student:

python -m src.cli export-student --student-id 1 --format json --output student_1.json

Export all students to Excel:

python -m src.cli export-contexts --output all_contexts.xlsx

Export all students to individual files:

python -m src.cli export-all-students --output-dir exports/

Generating Phone-Keyed JSON

The main output format uses phone numbers as keys. To generate this:

Method 1: Using the Script

python create_phone_json.py

This script:

Reads student contexts from all_students.json
Maps email prefixes to phone numbers from emails_names_phones.txt
Outputs all_students_ph.json with phone numbers as keys

Method 2: Manual Export

After generating contexts for all students:

# First export all to JSON
python -m src.cli export-all-students --output-dir exports/ --format json

# Then run the phone mapping script
python create_phone_json.py

Output Format

The final JSON output (all_students_ph.json) has this structure:

{
  "9876543210": {
    "student_id": 1,
    "profile": {
      "name": "John Doe",
      "skills": ["Python", "Data Analysis", "Communication"],
      "experiences": ["Internship at XYZ Corp", "Research Project"],
      "education": ["B.Sc Computer Science"],
      "strengths": ["Problem-solving", "Team collaboration"],
      "weaknesses": ["Time management"],
      "opportunities": ["Industry certifications", "Graduate studies"],
      "threats": ["Competitive job market"],
      "career_goals": ["Software Engineer at tech company"],
      "interests": ["AI/ML", "Web Development"],
      "achievements": ["Dean's List", "Hackathon Winner"]
    },
    "artifacts_summary": {
      "total_artifacts": 3,
      "artifact_types": ["resume", "swot_analysis", "career_plan"],
      "files": ["resume.pdf", "swot.pdf", "career_plan.pptx"]
    },
    "generated_at": "2025-01-07T12:00:00.000000",
    "model_used": "gpt-4o-mini"
  }
}

Batch Processing

For processing many students at once, use:

python process_all_students.py

This script:

Reads student info from a source file
Creates student records in the database
Processes all associated documents
Generates contexts for each student
Exports to JSON

CLI Command Reference

Command	Description
`init-db`	Initialize or reset the database
`add-student`	Register a new student
`list-students`	View all registered students
`process-document`	Extract text from PDF/PPTX
`show-artifacts`	List processed documents for a student
`show-content`	View extracted text elements
`generate-context`	Generate AI profile from documents
`show-context`	Display generated context
`export-contexts`	Export all contexts to Excel
`export-student`	Export single student context
`export-all-students`	Batch export individual files

Add --help to any command for detailed options:

python -m src.cli generate-context --help

Directory Structure

guidance-agent/
├── src/
│   ├── cli.py                 # Main CLI interface
│   ├── context_generator.py   # OpenAI integration
│   ├── query_handler.py       # Query processing
│   ├── database/
│   │   ├── connection.py      # DB connection
│   │   └── models.py          # Database models
│   └── extractors/
│       ├── pdf_extractor.py   # PDF text extraction
│       └── pptx_extractor.py  # PowerPoint extraction
├── data/
│   ├── db/                    # SQLite database
│   ├── contexts/              # Generated YAML contexts
│   ├── uploads/               # Original documents
│   └── artifacts/             # Processed copies
├── exports/                   # Exported files
├── .env                       # Environment variables (create this)
├── .env.example               # Environment template
└── requirements.txt           # Python dependencies

Troubleshooting

"OPENAI_API_KEY not set"

Create a .env file with your API key (see Prerequisites).

"No artifacts found for student"

Run process-document before generate-context.

PDF extraction issues

The system uses PyMuPDF first, then falls back to pdfplumber. For scanned PDFs (images), OCR is not currently supported.

Database locked errors

Close any other processes using the database, or use --reset with init-db.

Example Workflow

# 1. Setup
pip install -r requirements.txt
cp .env.example .env
# Edit .env and add OPENAI_API_KEY

# 2. Initialize
python -m src.cli init-db

# 3. Add student
python -m src.cli add-student --name "Jane Smith" --email "jane@example.com"

# 4. Process documents
python -m src.cli process-document --student-id 1 --file docs/jane_resume.pdf
python -m src.cli process-document --student-id 1 --file docs/jane_swot.pdf

# 5. Generate context
python -m src.cli generate-context --student-id 1

# 6. View result
python -m src.cli show-context --student-id 1 --format json

# 7. Export
python -m src.cli export-student --student-id 1 --format json --output jane_context.json

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
docs		docs
src		src
.gitignore		.gitignore
.guide.md		.guide.md
Phase1_TODO.md		Phase1_TODO.md
README.md		README.md
README_codex.md		README_codex.md
Updated_README.md		Updated_README.md
add_all_students.py		add_all_students.py
add_sheet_5_students.py		add_sheet_5_students.py
check_correct_excel.py		check_correct_excel.py
check_excel_sheets.py		check_excel_sheets.py
convert_contexts_to_json.py		convert_contexts_to_json.py
convert_json_to_csv.py		convert_json_to_csv.py
create_phone_json.py		create_phone_json.py
demo.md		demo.md
demo_noninteractive.py		demo_noninteractive.py
detailed_excel_analysis.py		detailed_excel_analysis.py
examine_excel.py		examine_excel.py
examine_excel_structure.py		examine_excel_structure.py
example_pipeline.py		example_pipeline.py
excel_to_txt.py		excel_to_txt.py
merge_student_data.py		merge_student_data.py
process_all_students.py		process_all_students.py
requirements.txt		requirements.txt
run_demo.sh		run_demo.sh
temp_txt.txt		temp_txt.txt
test_basic.py		test_basic.py
test_openai.py		test_openai.py
test_pipeline.py		test_pipeline.py
workplan.md		workplan.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

How to Run the Student Context Generator

Overview

Prerequisites

1. Install Dependencies

2. Set Up Environment Variables

Quick Start

Step 1: Initialize the Database

Step 2: Add a Student

Step 3: Process Student Documents

Step 4: Generate Context (AI Analysis)

Step 5: View the Generated Context

Step 6: Export Contexts

Generating Phone-Keyed JSON

Method 1: Using the Script

Method 2: Manual Export

Output Format

Batch Processing

CLI Command Reference

Directory Structure

Troubleshooting

"OPENAI_API_KEY not set"

"No artifacts found for student"

PDF extraction issues

Database locked errors

Example Workflow

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

How to Run the Student Context Generator

Overview

Prerequisites

1. Install Dependencies

2. Set Up Environment Variables

Quick Start

Step 1: Initialize the Database

Step 2: Add a Student

Step 3: Process Student Documents

Step 4: Generate Context (AI Analysis)

Step 5: View the Generated Context

Step 6: Export Contexts

Generating Phone-Keyed JSON

Method 1: Using the Script

Method 2: Manual Export

Output Format

Batch Processing

CLI Command Reference

Directory Structure

Troubleshooting

"OPENAI_API_KEY not set"

"No artifacts found for student"

PDF extraction issues

Database locked errors

Example Workflow

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages