LMLocal is a local AI chat assistant for Visual Studio 2022/2026. It integrates with LM Studio, Ollama, Jan and OpenAI-compatible APIs to provide context-aware assistance.
Note
Safe & Controlled: LMLocal is strictly read-only by default. Writing tools are completely optional, must be turned on in Settings, and can be rolled back in one click.
⚡ Preview Release: This extension is currently in preview; features and UI are actively evolving.
Interface & User Experience
- ☁️ In-IDE Chat UI – Tool window for LLM interaction without switching applications.
- 🌊 Streaming Responses – Real-time token delivery for low-latency feedback.
- 🤖 Model Selection – Quick access to switch between available AI models directly from the chat interface.
- 🎨 Visual Themes – Multi-theme support (Dark, Mid-Dark, Mid-Light, Light).
- 📋 Quick Copy – A button above code blocks that copies the code to your clipboard.
↕️ Collapse Large Code Blocks – Limits the height of long code snippets with a scrollbar and an expand option.- 🎭 Role-Based Presets (Instructions) – A window with pre-defined AI presets. You can customize each preset's system prompt and temperature, or toggle them on/off.
Context & Solution Awareness
- 🛠️ Advanced AI Tool Integration – Allows the AI to deeply analyze your open solution, read file contents, and execute actions like building the solution, formatting documents, or running unit tests.
- 📝 Automated Code Editing – Enabled tools can automatically create, delete, or modify code files directly inside Visual Studio.
- 🛡️ Changes & Rollback Manager – Shows all file modifications in a dedicated real-time panel above the chat, allowing you to review diffs, accept changes, or roll them back in one click.
- ➕ Active Window Context – Dedicated "+" button to instantly include active editor content in the request.
- 🧠 Thought/Reasoning Support – Support for reasoning models; "thoughts" are displayed in expandable blocks.
- 🛡️ Smart History Buffering – Automatically hides messages beyond the 200-entry limit to keep the UI responsive.
Efficiency & Token Management
- 📉 Conversation Summarization – Condenses older messages into a concise overview when the conversation grows long.
- 🧹 History Optimization – Strips markdown formatting and trims extra whitespace to reduce token usage.
- 📊 Live Stats – Status bar metrics: real-time speed (tokens/sec) and total token count.
Infrastructure & Settings
- ⚙️ Persistent Settings – Centralized configuration for API URLs, stream timeouts, and history management.
- 🔌 Connect on Startup – Automatically connects to the LLM server on extension startup.
- ⏳ Customizable Timeout – Adjustable streaming inactivity limit for slower local models (0 = never timeout).
- 📂 Local Chat Logging – Saves all conversations to disk in
%LOCALAPPDATA%\LMLocalChat\ChatHistory\for future reference in.jsonlformat. - 🌐 Streamable MCP Support – Integrates with the Model Context Protocol to dynamically scale the AI's toolkit via both local process-based (
stdio) and remote network-based (http) transports.
To use LMLocal, ensure you have:
- Visual Studio 2022 or 2026
- One of the following backends (installed and running):
- LM Studio with local server at
http://127.0.0.1:1234 - Ollama with server at
http://127.0.0.1:11434and a loaded model - Jan with server at
http://127.0.0.1:1337 - Any OpenAI-compatible API (custom URL and optional key)
- LM Studio with local server at
- A chat-capable LLM loaded
- Open Visual Studio.
- Go to
Extensions>Manage Extensions. - Search for LM Local and click Download.
- Restart Visual Studio to complete the installation.
- Download the
.vsixfile from the Marketplace. - Double-click the file and follow the VSIX Installer prompts.
- Launch: Open the LM Local Chat tool window using one of the following methods:
- Method A: Open it directly from the top Extensions menu.
- Method B: In the top menu, go to View ➔ Other Windows ➔ LM Local Chat.
- Position the Window (Optional): Click and drag the opened window to dock it wherever is most convenient for your workflow—for example, right next to the Solution Explorer.
- Configure Your Provider:
- Click the menu icon (
…) and open Settings.... - Under the AI Provider section, select your preferred backend from the dropdown menu:
- LM Studio (local) – Automatically targets
http://127.0.0.1:1234 - Ollama (local) – Automatically targets
http://127.0.0.1:11434 - Jan (local) – Automatically targets
http://127.0.0.1:1337 - OpenAI compatible (custom) – Allows you to supply a custom base URL and authorization keys for remote endpoints or custom gateways.
- LM Studio (local) – Automatically targets
- Note: Choosing a local provider automatically configures the correct default port and endpoint structure.
- Tip: If you have multiple providers, it is recommended to set them up first via the "Providers..." menu option.
- Click the menu icon (
- Verify the Connection:
- Click the "Test" button located directly to the right of the API Base URL input field.
- This instantly pings the specified endpoint to verify if the server is active, accessible, and correctly responding.
- Select an Instruction Preset (Optional): Open the AI Instructions... window from the menu to select from pre-defined AI presets.
- Each preset has its own pre-configured system prompt and temperature.
- You can toggle individual presets or parameters on/off. If a custom instruction or preset is disabled, it will be automatically hidden in the main chat selection dropdown.
- Context (Optional): Click the
+button to include the entire content of the active document into the conversation. - Chat: Type your message and click Send or hit
Enter⌨️.
- Keyboard Shortcuts: Standard hotkeys work perfectly inside the chat window—use
Ctrl + Cto copy text andCtrl + Vto paste your messages (the right-click context menu is disabled). - Copying AI Code: To copy code blocks generated by the model, click the
Copybutton located in the top-right corner of the code block. - Model Reasoning: The model's internal thinking process is neatly hidden inside the collapsible
Thoughtsblock at the beginning of the response. Click it anytime to expand and view the full logic. - Token & Context Tracking: Hover your mouse over the top connection bar (where the model name is shown). If supported by your provider (like LM Studio), a tooltip will appear showing exactly how many tokens have been consumed out of the maximum available context limit.
- Model Selection: Click the model name in the top header to open the Select model window, where you can search, filter, and quickly switch between available models.
- ➕ Active Window Context: Click the
+button to instantly include the entire content of the file currently open within your active Visual Studio solution.- Auto-turn off: The button automatically deactivates after the request is sent, as the document becomes part of the active chat history.
- UI & Logs: The attached file content is kept hidden to avoid cluttering the chat UI, but it is tracked and visible in the extension logs.
- ⏹️ Stop – Cancel an active generation.
- 🗑️ Clear chat... – Click the menu icon (
…) to wipe the current session history and start fresh. Use this to clear the chat context if the history is consuming too many tokens.
The "AI Instructions..." window allows you to define specialized System Prompts (roles) and creativity levels (temperature) for different development tasks. The extension comes with pre-configured behavior templates like Default, Improve, Review, Plan, Bugfix, Explain, and Tests.
Tip
Performance & Accuracy Tip: > For the best results, always select your desired mode (e.g., Bugfix, Review, Explain) before sending your message.
Once configured, you can instantly switch between these system roles using the dropdown menu directly in the main chat bar.
- Click the menu icon (
…) and select "AI Instructions...". - Select a target mode/role from the left panel (e.g.,
RevieworBugfix). - Configure its behavior in the right panel:
- Mode Toggle Checkbox: Check or uncheck this box to show or hide this specific mode in your main chat bar dropdown.
- System Prompt: Enter the base instructions that define the AI's role, processing rules, and operational constraints (e.g., telling the
Testsmode to act as a QA Engineer and strictly generate xUnit tests in C#). - Temperature: Set the randomness/creativity threshold. Use values closer to
0(e.g.,0.1or0.2) for rigid, deterministic tasks like compiling and bug fixing, and closer to1for architectural planning or brainstorming.
💡 Note: Always check your specific model's official documentation for recommended temperature settings, as some local models require strict defaults or a value of
0to function properly without breaking formatting or structure. - Click Save to apply the changes to your chat environment.
The "Providers..." dialog allows you to create and save multiple provider profiles (servers) so you don't have to re-enter your API keys and base URLs every time. You can store as many profiles as you need, including both local servers (like Ollama running on your machine) and cloud remote services (like Groq, OpenAI, or Gemini).
Once configured, you can seamlessly switch between your saved profiles via the main settings.
🔒 Privacy & Data Usage Note: Unlike local servers which keep 100% of your data offline on your machine, cloud remote providers process your requests on external servers. Data retention policies vary significantly by provider - some services may use your prompt history and codebase context for model training by default. Always verify the provider's privacy policy and terms of service before transmitting proprietary or sensitive source code.
Here is a quick end-to-end example of how to configure a custom remote endpoint and activate it inside the extension.
- Click the menu icon (
…) and select "Providers...". - Click "+ Add Profile" and fill in the fields:
- Profile name:
Ollama cloud - Provider type: Select OpenAI compatible from the dropdown.
- API base URL:
https://ollama.com/ - API key: Enter your cloud provider API key.
-
💡 Note: The extension allows any profile names, but if you create multiple profiles with completely identical fields, the system will always use the first one.*
- Profile name:
- Click Apply, then click Save Changes to close the window.
- Open the menu (
…) again and select "Settings...". - Under the AI Provider dropdown, select your newly created
Ollama cloudprofile. - Save settings, and you are ready to chat!
- Click the model name (or "Select model..." placeholder) in the top header.
- Search, filter, and select your desired model from the window to activate it.
| Provider | Provider Type | API Base URL |
|---|---|---|
| Ollama | OpenAI compatible | https://ollama.com/ |
| Groq | OpenAI compatible | https://api.groq.com/openai/ |
| Mistral | OpenAI compatible | https://api.mistral.ai/ |
| Cohere | OpenAI compatible | https://api.cohere.ai/compatibility/ |
| OpenRouter | OpenAI compatible | https://openrouter.ai/api/ |
| Google AI Studio | Gemini (cloud) | https://generativelanguage.googleapis.com |
| GitHub Models | Github Models via Azure (cloud) | https://models.inference.ai.azure.com/ |
| Provider | Provider Type | API Base URL |
|---|---|---|
| OpenAI | OpenAI compatible | https://api.openai.com |
| DeepSeek | DeepSeek (cloud) | https://api.deepseek.com |
These tools let the AI read your code, search, edit files, build, and run tests. You control what it can do.
In Settings (two checkboxes):
Enable built‑in AI tools (read‑only)– The AI can open and read files, but cannot change anything.Enable built‑in AI tools (write/modify)– The AI can create, change, or delete files.
Tip: Turn on write/modify only for projects that are under version control (e.g., Git). That way you can always see what changed and revert if needed.
In the Built‑in Tools… dialog (list of built‑in tools):
Open this from the extension menu. You’ll see all built‑in tools (for example, delete_file, replace_file_content). You can enable or disable each tool separately. Even if the global write/modify checkbox is on, you can still turn off specific tools like delete_file. Use “Enable All” or “Disable All” to change many at once, then click Save.
When a tool edits a file, the changes are applied immediately to the actual files in your solution. LMLocal tracks all modified files and shows them in a collapsible Changes panel inside the chat window. This list persists across solution reloads and Visual Studio restarts, so you can always review what the AI did.
The panel lets you:
- Click any file to see a diff of the changes.
- See labels:
New,Modified, orDeletednext to each file. - Switch between List view and Tree view.
- Click
Review all– opens a side‑by‑side diff window for all changed files. - Click
Open all– opens all changed files in Visual Studio editor tabs. - Click
Discard all– reverts all changes using internal backups (files are restored to their state before the AI edits). - Click
Accept all– confirms the changes, removes the internal backups, and clears the list (you can no longer revert them afterward).
create_file– Creates a new file with initial content.delete_file– Deletes a file from the solution.find_files– Searches for files by name.list_directory– Lists files and folders in a given path.get_solution_overview– Returns a summary of projects, folders, and files.set_file_project_status– Includes or excludes a file from a project.
read_file_lines– Reads a specific range of lines.search_file_content– Searches for a text string (case‑insensitive) inside solution files.get_active_document– Returns the path and full text of the currently open document.
replace_file_content– Replaces the entire file with new text.replace_file_lines– Replaces a range of lines (by numbers) with new content.insert_file_lines– Inserts lines at a specific position.format_document– Applies Visual Studio’s code formatting to the file.optimize_usings– Removes unusedusingstatements and sorts the rest in C# files.
inspect_type– Shows members, base types, interfaces, and dependencies of a class/struct/interface.find_symbol_references– Finds all references to a symbol (class, method, etc.) across the solution, with line numbers and context (uses Roslyn).
build_solution– Builds the whole solution (runs asynchronously).run_tests– Runsdotnet testfor a specific.csprojand shows live output.
When the "Strip formatting from history" option is enabled in the extension settings, LMLocal automatically runs a cleanup pass on previous conversation turns before forwarding the payload to your AI backend. This reduces token overhead for local models by flattening structural Markdown syntax into lightweight plain text.
Note
Under the Hood Only: This optimization is invisible in the user interface. Your active chat window will always display responses with full Markdown rendering, code highlighting, and structural styling. The stripping process only alters the raw background history array sent to the model to save context tokens.
- Code Block Enclosures: Triple backticks (```) are stripped; code contents remain as plain text.
- Headers: Heading markers (
#,##, etc.) are removed, keeping only the text content. - Text Emphasis: Bold (
**text**→text), italics (*text*→text), and strikethroughs (~~text~~→text) are flattened. - Inline Code: Inline backticks (
code→code) are dropped. - Links & Media: Hyperlinks (
[label](url)→label) and images (→alt) discard their URLs/paths, preserving only their descriptive text labels. - List & Structural Layouts: Bullets (
-,*,+,1.,2.), blockquote symbols (>), and horizontal rules (---,***,___) are erased. - Whitespace Compaction: Extra whitespace is trimmed, and redundant blank lines are clamped down (any sequence of 3 or more consecutive newlines is compressed into exactly 2 newlines).
LMLocal supports external tool integration via the Model Context Protocol (MCP). This allows you to hook up custom or third-party servers to give your local AI even more capabilities.
- Protocol Version: Compatible with the MCP
2025-11-25specification standard. - Tools-Only Support: LMLocal exclusively loads and registers Tools exposed by your MCP servers. Other MCP features like custom Prompts or Resources are currently ignored and will not be utilized by the assistant.
- Transports: Supports HTTP-based streamable protocols (
http) and (stdio). - NOT Supported: Legacy
sse(Server-Sent Events) transports are unsupported. - No Execution Restrictions: Currently, the extension does not restrict, sandbox, or prompt for manual confirmation when the AI invokes an MCP tool. Connected tools execute automatically when called by the model.
Warning
Security Notice & Trusted Sources Only
- Trust Infrastructure: Only connect to MCP servers and URLs that you fully trust or host locally yourself.
- Review Third-Party Tools: Before enabling a public or third-party MCP endpoint, review its exposed tools and documentation to ensure it does not execute unauthorized commands or compromise sensitive project data.
You can set up and manage connections to external MCP servers directly inside the configuration dialog:
- Open the LM Local Chat tool window.
- Click the menu icon (
…) in the top-right corner. - Select "MCP Extensions..." from the dropdown menu.
- In the dialog:
- Check "Enable Model Context Protocol (MCP)" to turn the feature on.
- Paste or edit your JSON configuration directly into the built-in text editor.
- Click the "Discover Tools" button to validate your settings and instantly verify connection availability.
The extension saves your settings locally to %LOCALAPPDATA%\LMLocalChat\mcp.json.
You can organize your configuration using either the servers or mcpServers root keys.
Note
LMLocal Custom Extensions The following parameters are custom LMLocal properties and are not part of the official MCP specification:
"disabled"(boolean): Temporarily deactivates an entire server process or HTTP connection without deleting its configuration block."permissions"(object): Used to mute specific tools discovered on the server.
Example 1: Public HTTP Server (with Tool Permissions)
{
"mcpServers": {
"microsoft-learn": {
"type": "http",
"url": "https://learn.microsoft.com/api/mcp",
"permissions": {
"microsoft_code_sample_search": "disable"
}
}
}
}Example 2: Demonstrates how to configure endpoints requiring a GitHub Personal Access Token (PAT).
{
"servers": {
"github-copilot": {
"type": "http",
"url": "https://api.githubcopilot.com/mcp/",
"headers": {
"Authorization": "Bearer ghp_your_personal_access_token_here"
},
"disabled": false
}
}
}Example 3: Illustrates the required schema structure for connecting local executable-based MCP servers
{
"servers": {
"OmniToolBox": {
"type": "stdio",
"command": "C:\\MyMCP\\OmniToolBox.exe"
}
}
}Model Context Protocol .NET SDK — Use this official Microsoft SDK to build and compile your own custom MCP servers compatible with LMLocal.
https://github.com/modelcontextprotocol/csharp-sdk
| Issue | Solution |
|---|---|
| No model shown | Ensure a model is fully loaded in the LM Studio "Server" tab. |
| Connection Error | Check if the LM Studio Server is ON at http://127.0.0.1:1234. Click ↻ to retry. |
| UI Lag | Restart the tool window or check your local machine resources (CPU/GPU). |
LMLocal keeps things simple and stores your preferences locally. Configuration files are maintained in:
%LOCALAPPDATA%\LMLocalChat\
- License: MIT License. See LICENSE.txt for details.
- Components:
markedv15.0.12 (MIT)highlight.jsv11.9.0 (BSD-3-Clause)