Add tokenizer_config support to LoRA YAML configuration #688

breitburg · 2025-12-17T17:00:01Z

This PR adds support for specifying custom tokenizer configuration, including custom chat templates, directly in LoRA YAML config files.

Changes

Add tokenizer_config to CONFIG_DEFAULTS with trust_remote_code default
Support chat_template_file to load Jinja2 templates from file paths
Support inline chat_template strings
Update documentation with usage examples
Add comprehensive unit tests

Usage

Users can now specify tokenizer configuration in their YAML config:

# Option 1: Load chat template from file
tokenizer_config:
  trust_remote_code: true
  chat_template_file: "path/to/chat_template.jinja"

# Option 2: Inline chat template string
tokenizer_config:
  trust_remote_code: true
  chat_template: "{% for message in messages %}..."

Testing

Added unit tests for both file-based and inline chat templates
All existing tests pass
Code formatted with black via pre-commit

Fixes the need to modify source code to use custom chat templates during LoRA fine-tuning.

This change allows users to specify custom tokenizer configuration, including custom chat templates, directly in their LoRA YAML config files. Features: - Add tokenizer_config to CONFIG_DEFAULTS with trust_remote_code default - Support chat_template_file to load Jinja2 templates from file paths - Support inline chat_template strings - Update documentation with usage examples - Add comprehensive unit tests Example usage in YAML config: ```yaml tokenizer_config: trust_remote_code: true chat_template_file: "path/to/chat_template.jinja" ```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add tokenizer_config support to LoRA YAML configuration #688

Add tokenizer_config support to LoRA YAML configuration #688

Uh oh!

breitburg commented Dec 17, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Add tokenizer_config support to LoRA YAML configuration #688

Are you sure you want to change the base?

Add tokenizer_config support to LoRA YAML configuration #688

Uh oh!

Conversation

breitburg commented Dec 17, 2025

Changes

Usage

Testing

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant