Skip to content

Port ModernBERT's Flash Attention Implementation for Training #26

Description

@vishalbakshi

For our own implementation of Llama/Qwen, we'll want to follow the ModernBERT Flash Attention code to handle non-fp16/bf16 inputs.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions