-
Notifications
You must be signed in to change notification settings - Fork 8
feat: Add SMK evaluator to TypeScript SDK #21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
2 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,69 @@ | ||
|
|
||
| To perform the task of evaluating text complexity based on Subject Matter Knowledge (SMK), strictly adhere to the following instructions. | ||
| Role | ||
| You are an expert K-12 Literacy Pedagogue and Text Complexity Evaluator. Your specific focus is analyzing Subject Matter Knowledge (SMK) demands according to the Common Core Qualitative Text Complexity Rubric. | ||
| Objective | ||
| Analyze a provided text relative to a target grade_level. You must determine the extent of background knowledge required to comprehend the text. You must distinguish between Common/Standard knowledge (generally lower/moderate complexity) and Specialized/Theoretical knowledge (generally higher complexity). | ||
| Input Data | ||
| text: The passage to analyze. | ||
| grade_level: The target student grade (integer). | ||
| fk_score: Flesch-Kincaid Grade Level. Note: Use this only as a loose proxy for sentence structure. Do not let a high FK score artificially inflate the Subject Matter Knowledge score if the concepts remain simple. | ||
|
|
||
| 1. The Rubric: Subject Matter Knowledge (SMK) | ||
| 1. Slightly Complex | ||
| Scope: Everyday, practical knowledge, and Introduction to Skills. | ||
| Concept Type: Concrete, directly observable, and familiar. | ||
| Key Indicator: "How-to" texts involving familiar objects (e.g., drawing a cupboard, playing a game, family life). Even if specific terms (like "scale" or "measure") are used, if the application is on a common object, it remains Slightly Complex. | ||
| 2. Moderately Complex | ||
| Scope: Common Discipline-Specific Knowledge or Narrative History. | ||
| Definition: Topics widely introduced in K-8 curricula (Basic American History, Geography, Earth Science, Biology). | ||
| Key Characteristic: The text bridges concrete descriptions with abstract themes (e.g., using farming to discuss justice), OR narrates historical events via sensory details. | ||
| Spatial Reasoning: Texts requiring mental manipulation of maps/routes are generally Moderate, unless the object is a familiar household item (see Slightly Complex). | ||
| 3. Very Complex | ||
| Scope: Specialized Discipline-Specific, Engineering Mechanics, or Political Theory. | ||
| Definition: Topics characteristic of High School (9-12) curricula requiring abstract mental models. | ||
| Key Characteristic: Requires understanding mechanisms (how physics works/propulsion), chemical composition, or undefined political stakes (specific treaties, alliances, or secularization without context). | ||
| 4. Exceedingly Complex | ||
| Scope: Professional or Academic knowledge. | ||
|
|
||
| 2. The Expert Mental Model (Decision Logic) | ||
| Use these refined rules to categorize cases. | ||
| Rule A: The "Layers of Meaning" Check | ||
| Concrete -> Abstract (Moderate): The text describes concrete things (farming) to argue an abstract point (justice, rights). | ||
| Concrete -> Concrete (Slightly): The text describes concrete things (lines, paper) to achieve a concrete result (drawing a cupboard). Do not over-rank practical instructions. | ||
| Rule B: The Science & Engineering Boundary | ||
| Observational (Moderate): Habitats, Water Cycle, observable traits, simple definitions. | ||
| Mechanistic/Theoretical (Very): Engineering mechanics (how propulsion works via reaction), Instrumentation (using a spectroscope), or Chemical/Atomic theory. | ||
| Test: Does the text explain how a machine functions using physical principles? If yes, it is Very Complex. | ||
| Rule C: The History/Social Studies Boundary | ||
| General/Narrative (Moderate): | ||
| Sensory: Battle descriptions focusing on sights/sounds (flashes, smoke). | ||
| Standard Topics: Immigration, Slavery, Government, Geography. Lists of nationalities or religions are "Common Knowledge" for Grades 6-8. | ||
| Political/Contextual (Very): | ||
| Implicit Context: Texts assuming knowledge of specific political factions, treaties, or the causes of events without explanation (e.g., "The Allies," "The Front," "The secularization of the clergy"). | ||
| Test: If the reader must know why two groups are fighting or the specific political history of a revolution to understand the text, it is Very Complex. | ||
| Rule D: The "Technical vs. Practical" Trap | ||
| Scenario: A text teaches a technical skill (e.g., Technical Drawing/Technology) but applies it to a familiar object (a cupboard). | ||
| Decision: Slightly Complex. | ||
| Reasoning: Do not confuse "Technical Vocabulary" (scale, thick lines) with "Theoretical Complexity." If the underlying concept is familiar (furniture), the SMK load is low. | ||
|
|
||
| 3. Critical Calibration Examples | ||
| Text: "Make a rough sketch... How many shelves should the cupboard have?" (Grade 2) -> Slightly Complex. | ||
| Reasoning: (Rule D/Rule A) Although it mentions "scale" and "technology," the task is concrete and relies on everyday knowledge. | ||
| Text: "Hydraulic propulsion works by sucking water at the bow and forcing it sternward." (Grade 10) -> Very Complex. | ||
| Reasoning: (Rule B) Explains a mechanism using physics principles. | ||
| Text: "The Allies fight the enemy's cavalry; we remember the hospitality to priests during the Revolution." (Grade 6) -> Very Complex. | ||
| Reasoning: (Rule C) Assumes undefined knowledge of WWI alliances and the specific political history of the French Revolution. | ||
| Text: "Immigrants from Poland, Italy, and Russia arrived. Most were Catholic or Orthodox." (Grade 7) -> Moderately Complex. | ||
| Reasoning: (Rule C) Standard K-8 topic. Lists of nationalities are content vocabulary, not specialized theory. | ||
|
|
||
| 4. Output Format | ||
| Return your analysis in a valid JSON object. Do not include markdown formatting. | ||
| Keys: | ||
| - identified_topics: List[str] identifying the core subjects. | ||
| - curriculum_check: String explaining if the topics are "Standard/General" (typical for K-8) or "Specialized/High School" (typical for 9-12). | ||
| - assumptions_and_scaffolding: String analyzing what the author assumes the reader knows vs what is explained. | ||
| - friction_analysis: String discussing the gap between Concrete description and Abstract meaning. | ||
| - complexity_score: String (One of: slightly_complex, moderately_complex, very_complex, exceedingly_complex). | ||
|
adnanrhussain marked this conversation as resolved.
|
||
| - reasoning: String synthesizing the decision. | ||
|
|
||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,4 @@ | ||
| Analyze: | ||
| Text: {text} | ||
| Grade: {grade} | ||
| FK Score: {fk_score} |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -117,19 +117,18 @@ await evaluator.evaluate(text: string, grade: string) | |
|
|
||
| --- | ||
|
|
||
| ### 3. Text Complexity Evaluator | ||
| ### 3. Subject Matter Knowledge (SMK) Evaluator | ||
|
|
||
| Composite evaluator that analyzes both vocabulary and sentence structure complexity in parallel. | ||
| Evaluates the background knowledge demands of educational texts relative to grade level. Determines how much prior subject knowledge a student needs to comprehend the text, based on the Common Core Qualitative Text Complexity Rubric. | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Fix please |
||
|
|
||
| **Supported Grades:** 3-12 | ||
|
|
||
| **Uses:** Google Gemini 2.5 Pro + OpenAI GPT-4o (composite) | ||
| **Uses:** Google Gemini 3 Flash Preview | ||
|
|
||
| **Constructor:** | ||
| ```typescript | ||
| const evaluator = new TextComplexityEvaluator({ | ||
| const evaluator = new SmkEvaluator({ | ||
| googleApiKey?: string; // Google API key (required by this evaluator) | ||
| openaiApiKey?: string; // OpenAI API key (required by this evaluator) | ||
| maxRetries?: number; // Optional - Max retry attempts (default: 2) | ||
| telemetry?: boolean | TelemetryOptions; // Optional (default: true) | ||
| logger?: Logger; // Optional - Custom logger | ||
|
|
@@ -145,23 +144,103 @@ await evaluator.evaluate(text: string, grade: string) | |
| **Returns:** | ||
| ```typescript | ||
| { | ||
| score: { | ||
| overall: string; // Overall complexity (highest of the two) | ||
| vocabulary: string; // Vocabulary complexity score | ||
| sentenceStructure: string; // Sentence structure complexity score | ||
| score: 'Slightly complex' | 'Moderately complex' | 'Very complex' | 'Exceedingly complex'; | ||
| reasoning: string; | ||
| metadata: { | ||
| model: string; | ||
| processingTimeMs: number; | ||
| }; | ||
| reasoning: string; // Combined reasoning from both evaluators | ||
| metadata: EvaluationMetadata; | ||
| _internal: { | ||
| vocabulary: EvaluationResult | { error: Error }; | ||
| sentenceStructure: EvaluationResult | { error: Error }; | ||
| identified_topics: string[]; | ||
| curriculum_check: string; | ||
| assumptions_and_scaffolding: string; | ||
| friction_analysis: string; | ||
| complexity_score: 'Slightly complex' | 'Moderately complex' | 'Very complex' | 'Exceedingly complex'; | ||
| reasoning: string; | ||
| }; | ||
| } | ||
| ``` | ||
|
|
||
| **Example:** | ||
| ```typescript | ||
| import { SmkEvaluator } from '@learning-commons/evaluators'; | ||
|
|
||
| const evaluator = new SmkEvaluator({ | ||
| googleApiKey: process.env.GOOGLE_API_KEY, | ||
| }); | ||
|
|
||
| const result = await evaluator.evaluate( | ||
| "Hydraulic propulsion works by sucking water at the bow and forcing it sternward.", | ||
| "10" | ||
| ); | ||
| console.log(result.score); // "Very complex" | ||
| console.log(result.reasoning); | ||
| console.log(result._internal.identified_topics); // ["hydraulics", "propulsion", "physics"] | ||
| ``` | ||
|
|
||
| --- | ||
|
|
||
| ### 4. Text Complexity Evaluator | ||
|
|
||
| Composite evaluator that analyzes vocabulary, sentence structure, and subject matter knowledge complexity in parallel. | ||
|
|
||
| **Supported Grades:** 3-12 | ||
|
|
||
| **Uses:** Google Gemini 2.5 Pro + Google Gemini 3 Flash Preview + OpenAI GPT-4o (composite) | ||
|
|
||
| **Constructor:** | ||
| ```typescript | ||
| const evaluator = new TextComplexityEvaluator({ | ||
| googleApiKey?: string; // Google API key (required by this evaluator) | ||
| openaiApiKey?: string; // OpenAI API key (required by this evaluator) | ||
| maxRetries?: number; // Optional - Max retry attempts (default: 2) | ||
| telemetry?: boolean | TelemetryOptions; // Optional (default: true) | ||
| logger?: Logger; // Optional - Custom logger | ||
| logLevel?: LogLevel; // Optional - Logging verbosity (default: WARN) | ||
| }); | ||
| ``` | ||
|
|
||
| **API:** | ||
| ```typescript | ||
| await evaluator.evaluate(text: string, grade: string) | ||
| ``` | ||
|
|
||
| **Returns:** | ||
| ```typescript | ||
| { | ||
| vocabulary: EvaluationResult<TextComplexityLevel> | { error: Error }; | ||
| sentenceStructure: EvaluationResult<TextComplexityLevel> | { error: Error }; | ||
| subjectMatterKnowledge: EvaluationResult<TextComplexityLevel> | { error: Error }; | ||
| } | ||
| ``` | ||
|
|
||
| Each sub-evaluator result is either a full `EvaluationResult` or `{ error: Error }` if that evaluator failed. An error is only thrown if all three fail. | ||
|
|
||
| **Example:** | ||
| ```typescript | ||
| import { TextComplexityEvaluator } from '@learning-commons/evaluators'; | ||
|
|
||
| const evaluator = new TextComplexityEvaluator({ | ||
| googleApiKey: process.env.GOOGLE_API_KEY, | ||
| openaiApiKey: process.env.OPENAI_API_KEY, | ||
| }); | ||
|
|
||
| const result = await evaluator.evaluate("Your text here", "6"); | ||
|
|
||
| if (!('error' in result.vocabulary)) { | ||
| console.log('Vocabulary:', result.vocabulary.score); | ||
| } | ||
| if (!('error' in result.sentenceStructure)) { | ||
| console.log('Sentence structure:', result.sentenceStructure.score); | ||
| } | ||
| if (!('error' in result.subjectMatterKnowledge)) { | ||
| console.log('Subject matter knowledge:', result.subjectMatterKnowledge.score); | ||
| } | ||
| ``` | ||
|
|
||
| --- | ||
|
|
||
| ### 4. Grade Level Appropriateness Evaluator | ||
| ### 5. Grade Level Appropriateness Evaluator | ||
|
|
||
| Determines appropriate grade level for text. | ||
|
|
||
|
|
@@ -308,6 +387,7 @@ interface BaseEvaluatorConfig { | |
| **Note:** Which API keys are required depends on the evaluator. The SDK validates required keys at runtime based on the evaluator's metadata: | ||
| - **Vocabulary**: Requires both `googleApiKey` and `openaiApiKey` | ||
| - **Sentence Structure**: Requires `openaiApiKey` only | ||
| - **Subject Matter Knowledge**: Requires `googleApiKey` only | ||
| - **Text Complexity**: Requires both `googleApiKey` and `openaiApiKey` | ||
| - **Grade Level Appropriateness**: Requires `googleApiKey` only | ||
|
|
||
|
|
||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gary-mu - Is this "Common Core"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is verbatim from optimized prompt. But I think we can simplify it to just using "Qualitative Text Complexity Rubric" without common core.
In effect, qualitative text complexity is a common core concept: https://www.thecorestandards.org/ELA-Literacy/standard-10-range-quality-complexity/measuring-text-complexity-three-factors/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets review and update this separately, we will have to update a few different places