Conversation
…crashes Three locations in store.c called sqlite3_prepare_v2 without checking the return code. If the statement fails to prepare (DB corruption, malformed SQL), subsequent bind_text and sqlite3_step calls dereference NULL, crashing the server. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…t access The watcher's projects hash table was written by the main thread (watch/unwatch) and iterated by the watcher thread (poll_once) with no synchronization. Added cbm_mutex_t to the watcher struct and wrapped all hash table operations. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Decorators previously only created DECORATES edges. A @login_required decorator was invisible to "find all references" queries because those look for CALLS and USAGE edges. Now resolve_decorator emits both DECORATES and CALLS edges. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…row memory helpers Four new inline helpers in platform.h alongside existing safe_realloc: - safe_free(ptr): frees and NULLs any pointer (prevents double-free) - safe_str_free(&str): frees const char* and NULLs (replaces free((void*)str)) - safe_buf_free(buf, &count): frees array and zeros its count - safe_grow(arr, n, cap, factor): one-line capacity-doubling realloc
There was a problem hiding this comment.
Pull request overview
This PR introduces new “safe” memory helper utilities and includes several robustness/concurrency fixes across the watcher, UI logging/threading, Cypher lexing/parsing, and SQLite statement preparation.
Changes:
- Add
safe_free/safe_str_free/safe_buf_free/safe_growhelpers inplatform.h. - Improve thread-safety and lifecycle management (watcher project table mutex, UI log mutex init API, thread detach helper + usage).
- Harden parsing/lexing and SQLite calls (truncate long string literals safely, add lexer overflow test, check
sqlite3_prepare_v2return values, emit extra semantic edges).
Reviewed changes
Copilot reviewed 17 out of 18 changed files in this pull request and generated 11 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/test_cypher.c | Adds regression test for long string literal truncation behavior. |
| src/cypher/cypher.c | Adds truncation guard in string lexing; adds several OOM checks in parser. |
| src/foundation/platform.h | Adds new memory helper macros/inlines. |
| src/foundation/compat_thread.h | Exposes cbm_thread_detach. |
| src/foundation/compat_thread.c | Implements cbm_thread_detach for Win32/POSIX. |
| src/watcher/watcher.c | Adds mutex protecting projects hash table and wraps some accesses. |
| src/ui/http_server.h | Exposes cbm_ui_log_init API. |
| src/ui/http_server.c | Requires explicit log mutex initialization; detaches background indexing thread handle. |
| src/main.c | Calls cbm_ui_log_init() during startup. |
| src/store/store.c | Checks sqlite3_prepare_v2 results before proceeding. |
| src/pipeline/pass_semantic.c | Adds a CALLS edge for decorators to improve reference queries. |
| .gitignore | Ignores additional local/Claude-related files. |
| .claude/skills/, .claude/settings.json, .claude/agents/ | Adds Claude workflow/skill configuration files. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| const int max_blen = CBM_SZ_4K - 1; | ||
| while (*pos < len && input[*pos] != quote) { | ||
| if (blen >= max_blen) { (*pos)++; continue; } | ||
| if (input[*pos] == '\\' && *pos + SKIP_ONE < len) { | ||
| (*pos)++; |
There was a problem hiding this comment.
In lex_string_literal, the overflow truncation check runs before the escape-handling branch. If the string is already at max length and the next bytes are an escape like \", the code will skip only the backslash, then see " on the next iteration and treat it as the closing quote (even though it was escaped), prematurely terminating the string. To preserve correct lexing, handle escape sequences even when truncating (e.g., move the truncation logic after the escape handling, or when truncating and seeing \\ skip both bytes when possible).
| /* Safe free: frees and NULLs a pointer to prevent double-free / use-after-free. | ||
| * Accepts void** so it works with any pointer type via the macro. */ | ||
| static inline void safe_free_impl(void **pp) { | ||
| if (pp && *pp) { | ||
| free(*pp); | ||
| *pp = NULL; | ||
| } | ||
| } | ||
| #define safe_free(ptr) safe_free_impl((void **)(void *)&(ptr)) | ||
|
|
||
| /* Safe string free: frees a const char* and NULLs it. | ||
| * Casts away const so callers don't need the (void*) dance. */ | ||
| static inline void safe_str_free(const char **sp) { | ||
| if (sp && *sp) { | ||
| free((void *)*sp); | ||
| *sp = NULL; | ||
| } | ||
| } | ||
|
|
||
| /* Safe buffer free: frees a heap array and zeros its element count. | ||
| * Use for dynamic arrays paired with a size_t count. */ | ||
| static inline void safe_buf_free_impl(void **buf, size_t *count) { | ||
| if (buf && *buf) { | ||
| free(*buf); | ||
| *buf = NULL; | ||
| } | ||
| if (count) { | ||
| *count = 0; | ||
| } | ||
| } | ||
| #define safe_buf_free(buf, countp) safe_buf_free_impl((void **)(void *)&(buf), (countp)) | ||
|
|
There was a problem hiding this comment.
safe_free/safe_buf_free pass &(ptr) cast to void ** into safe_free_impl/safe_buf_free_impl. Converting a T ** to void ** and then writing through it is undefined behavior in C (the pointer-to-pointer types are not compatible). Prefer a macro-based implementation that frees and assigns NULL directly to the lvalue (or use _Generic to dispatch to type-correct helper functions) so you never write through a mis-typed void **.
| /* Safe grow: doubles capacity and reallocs when count reaches cap. | ||
| * Usage: safe_grow(arr, count, cap, growth_factor) | ||
| * Evaluates to the new arr (NULL on OOM — old memory freed by safe_realloc). */ | ||
| #define safe_grow(arr, n, cap, factor) do { \ | ||
| if ((size_t)(n) >= (size_t)(cap)) { \ | ||
| (cap) *= (factor); \ | ||
| (arr) = safe_realloc((arr), (size_t)(cap) * sizeof(*(arr))); \ |
There was a problem hiding this comment.
safe_grow updates cap before verifying that safe_realloc succeeded. On OOM, safe_realloc frees the old buffer and returns NULL, so after the macro runs you can end up with (arr)==NULL but an enlarged (cap), preventing future growth attempts and making later code prone to NULL deref. Consider computing a new_cap first, reallocating, and only committing cap=new_cap when the reallocation succeeded (also consider guarding cap==0 and checking for cap*sizeof(*arr) overflow).
| /* Safe grow: doubles capacity and reallocs when count reaches cap. | |
| * Usage: safe_grow(arr, count, cap, growth_factor) | |
| * Evaluates to the new arr (NULL on OOM — old memory freed by safe_realloc). */ | |
| #define safe_grow(arr, n, cap, factor) do { \ | |
| if ((size_t)(n) >= (size_t)(cap)) { \ | |
| (cap) *= (factor); \ | |
| (arr) = safe_realloc((arr), (size_t)(cap) * sizeof(*(arr))); \ | |
| /* Safe grow: grows capacity and reallocs when count reaches cap. | |
| * Usage: safe_grow(arr, count, cap, growth_factor) | |
| * On success, updates arr and cap together. On OOM/overflow, arr becomes | |
| * NULL only if safe_realloc freed the old buffer; cap is left unchanged. */ | |
| #define safe_grow(arr, n, cap, factor) do { \ | |
| if ((size_t)(n) >= (size_t)(cap)) { \ | |
| size_t _safe_old_cap = (size_t)(cap); \ | |
| size_t _safe_new_cap = _safe_old_cap ? (_safe_old_cap * (size_t)(factor)) : 1u; \ | |
| if (_safe_new_cap <= _safe_old_cap) { \ | |
| _safe_new_cap = _safe_old_cap + 1u; \ | |
| } \ | |
| if (sizeof(*(arr)) != 0 && _safe_new_cap <= (SIZE_MAX / sizeof(*(arr)))) { \ | |
| void *_safe_tmp = safe_realloc((arr), _safe_new_cap * sizeof(*(arr))); \ | |
| if (_safe_tmp) { \ | |
| (arr) = _safe_tmp; \ | |
| (cap) = _safe_new_cap; \ | |
| } else { \ | |
| (arr) = NULL; \ | |
| } \ | |
| } else { \ | |
| (arr) = NULL; \ | |
| } \ |
| const char *sql = "SELECT type, COUNT(*) FROM edges WHERE project = ?1 GROUP BY type ORDER " | ||
| "BY COUNT(*) DESC;"; | ||
| sqlite3_stmt *stmt = NULL; | ||
| sqlite3_prepare_v2(s->db, sql, CBM_NOT_FOUND, &stmt, NULL); | ||
| if (sqlite3_prepare_v2(s->db, sql, CBM_NOT_FOUND, &stmt, NULL) != SQLITE_OK || !stmt) { | ||
| return CBM_NOT_FOUND; | ||
| } |
There was a problem hiding this comment.
If the second sqlite3_prepare_v2 fails, this function now returns immediately, but at that point out->node_labels may already have been allocated/populated by the first query. That causes a leak unless the caller explicitly frees out on failure (current callers don't check the return value). Suggest adding a single cleanup path (e.g., goto fail) that finalizes statements and calls cbm_store_schema_free(out) (or equivalent) before returning an error.
| @@ -272,17 +279,22 @@ | |||
|
|
|||
| project_state_t *s = state_new(project_name, root_path); | |||
| cbm_ht_set(w->projects, s->project_name, s); | |||
| cbm_mutex_unlock(&w->projects_lock); | |||
There was a problem hiding this comment.
cbm_watcher_watch calls state_new(...) under the new lock but doesn’t handle OOM: if state_new returns NULL, cbm_ht_set(w->projects, s->project_name, s) will dereference NULL, and the mutex will remain locked on any early return you add later. Add a NULL check for s and ensure the mutex is always unlocked on all exit paths.
| poll_ctx_t ctx = { | ||
| .w = w, | ||
| .now = now_ns(), | ||
| .reindexed = 0, | ||
| }; | ||
| cbm_mutex_lock(&w->projects_lock); | ||
| cbm_ht_foreach(w->projects, poll_project, &ctx); | ||
| cbm_mutex_unlock(&w->projects_lock); |
There was a problem hiding this comment.
cbm_watcher_poll_once holds projects_lock while invoking poll_project, which can call index_fn and run a full pipeline/index (potentially long-running). This blocks cbm_watcher_watch/unwatch (called from MCP session handling) for the duration of indexing and risks lock contention/dead time. A more robust approach is to snapshot the current project list under the lock (or increment refcounts), release the lock, and then perform the expensive index_fn work without holding projects_lock.
| /* Create and start watcher in background thread */ | ||
| /* Initialize log mutex before any threads are created */ | ||
| cbm_ui_log_init(); | ||
|
|
There was a problem hiding this comment.
cbm_ui_log_append now drops all lines until cbm_ui_log_init() runs, but main installs the log sink and immediately emits logs (server.start) before calling cbm_ui_log_init(). That means early startup logs will never reach the UI ring buffer. Call cbm_ui_log_init() before cbm_log_set_sink(cbm_ui_log_append) / before the first log emission (or restore a thread-safe init-on-first-use pattern).
| int cbm_thread_detach(cbm_thread_t *t) { | ||
| return pthread_detach(t->handle); | ||
| } |
There was a problem hiding this comment.
On POSIX, cbm_thread_detach calls pthread_detach(t->handle) but leaves t->handle unchanged. That makes it easy to accidentally cbm_thread_join/cbm_thread_detach again on the same cbm_thread_t and get undefined/erroneous behavior. Consider clearing/poisoning the handle after a successful detach (to match the Win32 path) or tracking a detached state in cbm_thread_t.
| /* The string should be truncated to CBM_SZ_4K - 1 (4095) characters. */ | ||
| ASSERT_EQ((int)strlen(r.tokens[0].text), 4095); |
There was a problem hiding this comment.
This test hardcodes the expected truncated length as 4095, which assumes CBM_SZ_4K == 4096. To keep the test consistent if the buffer size constant ever changes, prefer asserting against (CBM_SZ_4K - 1) (or another exported constant) rather than a literal.
| /* The string should be truncated to CBM_SZ_4K - 1 (4095) characters. */ | |
| ASSERT_EQ((int)strlen(r.tokens[0].text), 4095); | |
| /* The string should be truncated to CBM_SZ_4K - 1 characters. */ | |
| ASSERT_EQ((int)strlen(r.tokens[0].text), (CBM_SZ_4K - 1)); |
| /* Safe free: frees and NULLs a pointer to prevent double-free / use-after-free. | ||
| * Accepts void** so it works with any pointer type via the macro. */ | ||
| static inline void safe_free_impl(void **pp) { | ||
| if (pp && *pp) { | ||
| free(*pp); | ||
| *pp = NULL; | ||
| } | ||
| } | ||
| #define safe_free(ptr) safe_free_impl((void **)(void *)&(ptr)) | ||
|
|
||
| /* Safe string free: frees a const char* and NULLs it. | ||
| * Casts away const so callers don't need the (void*) dance. */ | ||
| static inline void safe_str_free(const char **sp) { | ||
| if (sp && *sp) { | ||
| free((void *)*sp); | ||
| *sp = NULL; | ||
| } | ||
| } | ||
|
|
||
| /* Safe buffer free: frees a heap array and zeros its element count. | ||
| * Use for dynamic arrays paired with a size_t count. */ | ||
| static inline void safe_buf_free_impl(void **buf, size_t *count) { | ||
| if (buf && *buf) { | ||
| free(*buf); | ||
| *buf = NULL; | ||
| } | ||
| if (count) { | ||
| *count = 0; | ||
| } | ||
| } | ||
| #define safe_buf_free(buf, countp) safe_buf_free_impl((void **)(void *)&(buf), (countp)) | ||
|
|
||
| /* Safe grow: doubles capacity and reallocs when count reaches cap. | ||
| * Usage: safe_grow(arr, count, cap, growth_factor) | ||
| * Evaluates to the new arr (NULL on OOM — old memory freed by safe_realloc). */ | ||
| #define safe_grow(arr, n, cap, factor) do { \ | ||
| if ((size_t)(n) >= (size_t)(cap)) { \ | ||
| (cap) *= (factor); \ | ||
| (arr) = safe_realloc((arr), (size_t)(cap) * sizeof(*(arr))); \ | ||
| } \ | ||
| } while (0) |
There was a problem hiding this comment.
The PR description focuses on adding memory helper functions, but this PR also changes watcher concurrency behavior, Cypher lexing/parsing, SQLite error handling, pipeline edge emission, HTTP server threading, and adds multiple .claude/* workflow files. Consider splitting these into separate PRs (or updating the description) so reviewers can assess each behavioral change independently and reduce merge risk.
Adds helper functions which encapsulate repeated patterns throughout the codebase. I have a larger fix available with the call sites all modified, but this needs checked out first, because its the basis of all that. Its four helper functions, simpler to the
safe_reallocfunction.