Name and Version
v0.3.0
Operating systems
Windows
GGML backends
CUDA
Hardware
RTX 3090
Models
No response
Problem description & steps to reproduce
Low tok/s and GPU utilization (70-80%), meaning significant bottlenecks are in place
First Bad Commit
No response
Relevant log output
Logs
Name and Version
v0.3.0
Operating systems
Windows
GGML backends
CUDA
Hardware
RTX 3090
Models
No response
Problem description & steps to reproduce
Low tok/s and GPU utilization (70-80%), meaning significant bottlenecks are in place
First Bad Commit
No response
Relevant log output
Logs