LlamaFIM – Local In‑Line Completion for VS Code

Overview

LlamaFIM is a VS Code extension that provides real‑time inline completions powered by a local Llama.CPP server. It works great for developers who want an on‑premises LLM assistant without sending data to the cloud.

The extension implements the inline completion provider API introduced in VS Code 1.78 and forwards user input to a running Llama.CPP endpoint. The server returns an infill response containing the next chunk of text, which the extension then displays as an inline suggestion.

Features

Modern inline completion experience (no separate suggestion list).
Lightweight client – all heavy lifting occurs on the local Llama.CPP server.
Configurable debounce delay, timeout, and server URL.
Automatic request cancellation when the cursor moves.
Built‑in request timeout to avoid hanging requests.

Installation

Install the extension from the VS Code Marketplace or copy the repository into a folder and use code . to open it.
Ensure a Llama.CPP server is running and reachable. By default the extension expects the endpoint at http://localhost:8080. The server can be started with:
```
./llama.cpp/main -m <model.gguf> --port 8080
```
Reload VS Code or run Reload Window.

Configuration

The extension exposes a handful of workspace settings under the namespace llamafim. Open settings.json and add any of the following options:

Setting	Type	Default	Description
`enabled`	boolean	`true`	Disable the extension entirely.
`debouncedelay`	number	`250`	Milliseconds to debounce inline completion requests.
`url`	string	`http://localhost:8080`	Base URL of the Llama.CPP server (without `/infill`).
`timeout`	number	`3500`	Request timeout in milliseconds.
`contextsize`	number	`4096`	Maximum context size (in tokens) sent to the Llama.CPP server.

Example:

{
    "llamafim.enabled": true,
    "llamafim.debouncedelay": 200,
    "llamafim.url": "http://127.0.0.1:8080",
    "llamafim.timeout": 5000,
    "llamafim.contextsize": 4096
}

Usage

Once configured, simply type in any file. After you pause for the debounce delay, the extension will send the surrounding context to the server and display the returned text as an inline suggestion. The suggestion can be accepted by pressing Tab or rejected by continuing to type.

The provider is registered for all languages ({ pattern: '**' }).

Status bar interaction

When the extension is active a status bar item appears on the right. Clicking it toggles the provider’s enabled state – the next inline suggestion will be shown or suppressed accordingly. This toggle is runtime only; the setting llamafim.enabled only sets the initial value when VS Code starts.

Development

A quick start guide for contributing:

# Install dependencies
npm install

# Compile TypeScript
npm run compile

# Run tests (if available)
npm test

# Start a watch build while you edit
npm run watch

The project uses ESBuild for bundling and TSLint/ESLint for linting.

Files of Interest

src/extension.ts – Entry point, registers the provider.
src/provider.ts – Implements request logic and cancellation.
src/config.ts – Reads and normalises VS Code settings.
src/defs.ts – Type definitions for the Llama.CPP response.

Testing

The test suite lives under test/. It uses Mocha and chai. To run the tests:

npm test

The current tests cover configuration parsing and inline completion logic with mocked fetch responses.

Contributing

Pull requests are welcome! Please:

Fork the repository.
Create a feature branch.
Run the test suite and ensure all tests pass.
Submit a pull request.

Before submitting, run the linter:

npm run lint

License

This project is licensed under the MIT License. See the file for details.

Acknowledgements

Llama.CPP – the lightweight inference engine.
VS Code Extension API

Tip – If you experience performance issues, consider lowering n_predict or increasing debouncedelay in the settings.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.github		.github
.vscode		.vscode
src		src
.gitignore		.gitignore
.vscode-test.mjs		.vscode-test.mjs
.vscodeignore		.vscodeignore
LICENSE.md		LICENSE.md
README.md		README.md
esbuild.js		esbuild.js
eslint.config.mjs		eslint.config.mjs
package-lock.json		package-lock.json
package.json		package.json
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LlamaFIM – Local In‑Line Completion for VS Code

Overview

Features

Installation

Configuration

Usage

Status bar interaction

Development

Files of Interest

Testing

Contributing

License

Acknowledgements

About

Uh oh!

Releases 2

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

LlamaFIM – Local In‑Line Completion for VS Code

Overview

Features

Installation

Configuration

Usage

Status bar interaction

Development

Files of Interest

Testing

Contributing

License

Acknowledgements

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

LlamaFIM – Local In‑Line Completion for VS Code

Packages