Skip to content

fix(memos-local-plugin): system message ordering + local endpoint apiKey relaxation#1921

Open
redashes1984 wants to merge 1 commit into
MemTensor:mainfrom
redashes1984:fix/system-message-ordering
Open

fix(memos-local-plugin): system message ordering + local endpoint apiKey relaxation#1921
redashes1984 wants to merge 1 commit into
MemTensor:mainfrom
redashes1984:fix/system-message-ordering

Conversation

@redashes1984

@redashes1984 redashes1984 commented Jun 15, 2026

Copy link
Copy Markdown

Changes

This PR fixes two issues in memOS-local-plugin core/llm providers that block self-hosted deployments using local inference servers (e.g., vLLM, llama.cpp) with Qwen3.6 and similar models.


Fix 1: System message ordering (HTTP 400 from vLLM)

File: apps/memos-local-plugin/core/llm/client.ts

Problem: When using Qwen3.6 via vLLM, skill.crystallize and reward.r_human LLM calls fail with:

jinja2.exceptions.TemplateError: System message must be at the beginning.

Root cause: normalizeMessages() passed the input array unchanged, so dynamically constructed messages could place role: "system" after user/assistant turns. Qwen3.6`s Jinja2 chat template enforces system messages at the start.

Impact: ~2,300 HTTP 400 errors per day on a single deployment.

Fix: normalizeMessages() now:

  1. Extracts all system messages regardless of position
  2. Merges them into a single leading system message
  3. Appends all non-system messages in original order

Includes a fast path: if there is exactly one system message already at position 0 with no later systems, the original array is returned unchanged (zero allocation overhead).


Fix 2: apiKey is required even for local endpoints

File: apps/memos-local-plugin/core/llm/providers/openai.ts

Problem: openai_compatible provider throws LLM_UNAVAILABLE: openai_compatible provider requires config.llm.apiKey even when the endpoint is a local self-hosted inference server (e.g., http://10.10.4.8:8000/v1) that does not require authentication.

Impact: Users cannot use the openai_compatible provider with local vLLM/llama.cpp deployments without setting a dummy apiKey.

Fix: Added isLocalhostOrPrivateUrl() helper that detects:

  • localhost, 127.0.0.1, ::1
  • Private IP ranges: 10.x.x.x, 172.16-31.x.x, 192.168.x.x

When the endpoint is a private/local address, apiKey becomes optional. The Authorization header is only sent when apiKey is configured. Cloud endpoints (e.g., api.openai.com) still require apiKey as before.


Testing

Verified on a live deployment with Qwen3.6-27B-FP8 via vLLM:

Metric Before After
HTTP 400 errors (system message ordering) ~2,300/day 0
LLM_UNAVAILABLE errors (apiKey) 100% 0
skill.crystallize failures 100% 0
reward.r_human failures 100% 0

@redashes1984 redashes1984 force-pushed the fix/system-message-ordering branch from f7172b5 to 63ff5dc Compare June 15, 2026 03:53
@redashes1984 redashes1984 changed the title fix(memos-local-plugin): ensure system messages always appear first in chat completions fix(memos-local-plugin): system message ordering + local endpoint apiKey relaxation Jun 15, 2026
…hat messages

Fixes HTTP 400 errors with Qwen3.6/vLLM which enforces 'system message must
be at the beginning' in its Jinja2 chat template.

When skill crystallization or reward scoring constructs messages with the
system role appearing after user/assistant turns, vLLM's apply_chat_template
throws:

  jinja2.exceptions.TemplateError: System message must be at the beginning.

normalizeMessages() now:
1. Extracts all system messages regardless of position
2. Merges them into a single leading system message
3. Appends all non-system messages in original order

Includes a fast path: if there's exactly one system already at position 0
with no later systems, returns the original array unchanged.
@redashes1984 redashes1984 force-pushed the fix/system-message-ordering branch from 63ff5dc to d7955e1 Compare June 15, 2026 07:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant