Here's what I noticed when reviewing the notebooks...

* https://github.com/ds-modules/Small_Models_SP26/blob/main/1-Instructor_Utils/1-1-API_Key_Setup.ipynb: 
  1. Might be worth it to explain that `ls` lists all the files in a given dir in Step 1. You're trying to show that the dir exists happily, right?
  2. All keys written to `.env` are capitalized and later you use `openai_API_KEY`. It still loads, but weird inconsistency. 
* https://github.com/ds-modules/Small_Models_SP26/blob/main/1-Instructor_Utils/1-2-HuggingFace_Hub_Download_gguf.ipynb
  1. `model_file = os.path.join(shared_model_path, "tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf")` and `shared_model_path` is to the readwrite dir(not the read dir) which could cause confusion for students down the line for those with similar set ups if taken out of context.
* https://github.com/ds-modules/Small_Models_SP26/blob/main/2-Beginner_NBs/2-1-LlamaCpp_SmallLM_Demo.ipynb
  1. `This notebook assumes that at least one ‘Small model’ file ending in .gguf has already been downloaded into a directory (see GPT4All_Download_gguf.ipynb for more).` Should this be "see `1-2-HuggingFace_Hub_Download_gguf.ipynb` for more"?
  2. `chat_format="chatml" # Qwen uses ChatML format` suggests to be that the chat_format is determined by the model, but otherwise this was not apparent to me. This does end up being addressed later, so it could be moved up (probably not important) and/or an additional note about how to determine the proper format could be added there.
* https://github.com/ds-modules/Small_Models_SP26/blob/main/2-Beginner_NBs/2-2-Model_Weights_and_Tokens.ipynb
  1. `how a language model actually works with numbers` -> `how a language model actually works with numbers (embeddings)`
  2. Cells with `## 1. Environment Setup`, `## 5. Inside a GGUF File`, `## 7. How Concepts Map to Numbers: Token Embeddings`, and `## 8. Putting It All Together: The Full Pipeline` have `---` at the beginning so they don't render correctly.
  3. The first time you see a tokenizer break up words into smaller word pieces happens mid-way through the notebook with "comparative." Is there a quick way to incorporate this sooner? Even just showing that word tokenized by itself could be good. 
  4. I don't think I'm getting correct numbers printed in section `5.1 Reading GGUF Metadata` when I run the notebook... Context length can't be 3, right?!  <img width="749" height="553" alt="Image" src="https://github.com/user-attachments/assets/60082f50-913a-46e4-b48d-1b0d031f8b81" /> <img width="589" height="241" alt="Image" src="https://github.com/user-attachments/assets/a09b01ee-3d52-4025-8009-970e1733b7b6" />
  5. Similarly, this doesn't feel right (and I guess if it is, it could use additional explanatory text) <img width="489" height="459" alt="Image" src="https://github.com/user-attachments/assets/d8d11735-816b-42a7-a5aa-f1f2f3d62d60" />
  6. Is it worth it to mention `A size (bytes on disk — much smaller than the raw float32 equivalent)` in 5.3 if this isn't printed out? 
  7. Cell in `7.1`doesn't run for me. <img width="902" height="512" alt="Image" src="https://github.com/user-attachments/assets/e9934ff9-4519-4145-9e70-708de713b77b" /> Also you may want to silence the deprecation warning.  
  8. Next steps at the end of the notebook don't make sense. `Inside_Small_Model.ipynb` doesn't exist in the repo that is cloned and `LlamaCpp_SmallLM_Demo.ipynb` is numbered 2.1 (before this notebook 2.2)
* https://github.com/ds-modules/Small_Models_SP26/blob/main/3-API_NBs/3-1-Anthropic_API.ipynb
  1. Experiment 1 haiku cell doesn't run for me (sonnet ok): `NotFoundError: Error code: 404 - {'type': 'error', 'error': {'type': 'not_found_error', 'message': 'model: claude-haiku-4-20250514'}, 'request_id': 'req_011CYsz6HZhhv1XJuj5STKya'}`. Changing to `haiku_model_id = "claude-3-haiku-20240307"` worked for me to run the rest of the notebook.
  2. Discussion of temperature slightly conflicts with nb 2.1, at least when it comes to setting temp to 1.
  3. Cell that goes with `Putting It All Together` doesn't run for me: `BadRequestError: Error code: 400 - {'type': 'error', 'error': {'type': 'invalid_request_error', 'message': 'stop_sequences: each stop sequence must contain non-whitespace'}, 'request_id': 'req_011CYszyuv9rRMfYwWhotTnS'}` Removing the `stop_sequences` parameter makes the cell run. 
* https://github.com/ds-modules/Small_Models_SP26/blob/main/3-API_NBs/3-2-OpenAI_API.ipynb
  1. Cell that begins with `# Send a chat message to GPT-4o-mini` should be removed/moved down because it doesn't belong to the `Checking Available Models` section.
  2. The two cells after the interactive widgit are incredibly similar. Do you need both? Can you provide more context for both/either?
* https://github.com/ds-modules/Small_Models_SP26/blob/main/4-SAT-TestTaker/QwenTestTaker-shared.ipynb
  1. Reflection checkpoints are mentioned at the beginning and have a weird framing out of context/on first read. It also appears that only Checkpoint 3 actually writes to an external file...
  2. Tables of SAT questions include these columns `visuals.type` `visuals.svg_content` that are all `NaN`s. I'd exclude these from output.
  3. The dataframes that are created with `pd.concat()` seem to have rows with the same question twice.. what's up with that? Either more explanation is needed or these should be removed. I don't think the dataframes are even used..
  4. `Ask the model - English` code cell doesn't define `random_para_text`. It should include `random_para_text = random_entry["question"]["paragraph"]`
* https://github.com/ds-modules/Small_Models_SP26/blob/main/5-RAG/RAG_Tutorial.ipynb
  1.  Cells with `## Our Knowledge Base`, `## Finding the Right Document`, `### The Real Test: Different Words, Same Meaning`, `### Visualizing Meaning Space`, `## The Full Pipeline: Retrieve + Generate`, `## Try It Yourself!`, and `## Key Takeaways` have `---` at the beginning so they don't render correctly. 
* https://github.com/ds-modules/Small_Models_SP26/blob/main/6-NoCodeNBs/LLM_Context_Management_and_Dynamic_Prompts.ipynb
  1. I highly doubt that many of the color choices for text have sufficient contrast.
  2. <img width="861" height="533" alt="Image" src="https://github.com/user-attachments/assets/c76de218-53b1-4228-810b-a893467712a7" />
* https://github.com/ds-modules/Small_Models_SP26/blob/main/6-NoCodeNBs/Personal_Writing_Assistant.ipynb
  1. Buttons don't render: <img width="907" height="295" alt="Image" src="https://github.com/user-attachments/assets/a4c87715-ca7d-4e83-9de8-b1c19fce73b4" />




Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Here's what I noticed when reviewing the notebooks... #9

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Here's what I noticed when reviewing the notebooks... #9

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions