Skip to content

Conversation

@breitburg
Copy link

This PR adds support for specifying custom tokenizer configuration, including custom chat templates, directly in LoRA YAML config files.

Changes

  • Add tokenizer_config to CONFIG_DEFAULTS with trust_remote_code default
  • Support chat_template_file to load Jinja2 templates from file paths
  • Support inline chat_template strings
  • Update documentation with usage examples
  • Add comprehensive unit tests

Usage

Users can now specify tokenizer configuration in their YAML config:

# Option 1: Load chat template from file
tokenizer_config:
  trust_remote_code: true
  chat_template_file: "path/to/chat_template.jinja"

# Option 2: Inline chat template string
tokenizer_config:
  trust_remote_code: true
  chat_template: "{% for message in messages %}..."

Testing

  • Added unit tests for both file-based and inline chat templates
  • All existing tests pass
  • Code formatted with black via pre-commit

Fixes the need to modify source code to use custom chat templates during LoRA fine-tuning.

This change allows users to specify custom tokenizer configuration,
including custom chat templates, directly in their LoRA YAML config files.

Features:
- Add tokenizer_config to CONFIG_DEFAULTS with trust_remote_code default
- Support chat_template_file to load Jinja2 templates from file paths
- Support inline chat_template strings
- Update documentation with usage examples
- Add comprehensive unit tests

Example usage in YAML config:
```yaml
tokenizer_config:
  trust_remote_code: true
  chat_template_file: "path/to/chat_template.jinja"
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant