Skip to content

Bugfix/mpc parse segfault#177

Open
Memnoc wants to merge 12 commits intoorangeduck:masterfrom
Memnoc:bugfix/mpc-parse-segfault
Open

Bugfix/mpc parse segfault#177
Memnoc wants to merge 12 commits intoorangeduck:masterfrom
Memnoc:bugfix/mpc-parse-segfault

Conversation

@Memnoc
Copy link

@Memnoc Memnoc commented Jan 9, 2026

Hey Daniel,
Sorry about the long PR, this fix took me places 🛝 I have started with a much more naive fix that worked in my follow along repo, but that did not work for main, so here we are 😄
What I am proposing here is a change that spans various aspects of the library.
Full transparency, I fixed things as I went, AI helped a lot in double-checking the logic and suggesting useful changes, and as it stands all the suite of tests passes with the changes proposed.
I have made very intentionally heavy use of comments where I have changed, please feel free to not like that, but it helped me (and hopefully you) double checking the fix as a whole.

Problem

When a parser name in the grammar string doesn't match any supplied parser (e.g., <Lispy> vs mpc_new("lispy")), mpca_lang segfaults.

The root cause: va_arg returns garbage when called past the end of the argument list, not NULL. The existing code assumes NULL would indicate exhaustion, but instead it dereferences garbage pointers.

Example that crashes:

mpc_parser_t *Lispy = mpc_new("lispy");  // lowercase
mpca_lang(MPCA_LANG_DEFAULT, " rule : <Lispy> ; ", Lispy);  // uppercase — segfault

Solution

  1. NULL addition via macro - mpca_lang now auto-appends NULL to the argument list, guaranteeing a terminator exists

  2. Exhaustion tracking - Added va_exhausted flag to prevent reading past the sentinel on subsequent calls

  3. Error capture - Added error_msg field to preserve the first error for proper reporting

  4. Error behavior - Errors now caught at definition time instead of silently deferring to parse time

  5. Helpful suggestions - When parser not found, suggests similar names (e.g., "Did you mean 'lispy'?")

Here is an overview of the changes

mpc.h

  • Added mpca_lang_internal declaration
  • Added mpca_lang macro wrapper that appends NULL sentinel

mpc.c

  • Added va_exhausted and error_msg fields to mpca_grammar_st_t
  • Modified mpca_grammar_find_parser to check exhaustion before calling va_arg, capture errors, and provide suggestions
  • Modified mpca_grammar, mpca_lang_file, mpca_lang_pipe, mpca_lang_internal, mpca_lang_contents to initialize new fields and surface captured errors

tests/grammar.c

  • Updated test_missingrule to expect fail-fast error behavior

tests/issue_184.c (new)

  • Added test for case-insensitive parser name suggestion

tests/test.c

  • Registered new test suite

Makefile

  • Added -Wno-variadic-macros flag

Behavior Change

Errors are now caught at mpca_lang call time rather than during mpc_parse. This is intentional, and the idea is that failing fast is more desirable than silent corruption followed by a crash.

Before:

err = mpca_lang(...);  // returns NULL (appears successful)
mpc_parse(...);        // segfault or delayed error

After:

err = mpca_lang(...);  // returns error immediately
// can handle error before attempting to parse

Testing

  • All 31 tests pass (file, static, dynamic builds)
  • Valgrind clean (no new leaks introduced as far as I could verify)
  • New test to specifically validates issue #184 scenario

Fixes #184

- Add mpca_lang macro wrapper that auto-appends NULL sentinel
- Add va_exhausted and error_msg fields to mpca_grammar_st_t
- Add -Wno-variadic-macros to suppress warnings
- Check va_exhausted before reading va_arg
- Set va_exhausted when NULL sentinel reached
- Capture error_msg for proper reporting
- Add case-insensitive suggestion for mismatched names
- Initialize va_exhausted=0 and error_msg=NULL at all 5 sites
- Check error_msg after mpca_lang_st and convert to proper error
- Rename mpca_lang to mpca_lang_internal
Error now caught at definition time instead of parse time
Tests case-insensitive suggestion when parser name differs by case
…grammar, mpca_lang_file, mpca_lang_pipe, mpca_lang_contents
@orangeduck
Copy link
Owner

Hi @Memnoc,

Thanks for the hard work on this.

This looks like a much more complicated change than I was expecting.

Before this change, if you NULL terminate the list of parsers in mpca_lang don't you already get an error when the parser is not found? If so shouldn't the only change required be the macro wrapper you use (and maybe a change to the error message?).

I was aware this automatic NULL termination could be achieved my wrapping the function in a macro but opted against it to avoid the use of macros.

Please tell me if you interpretation is not correct.

Thanks,

Dan

@Memnoc
Copy link
Author

Memnoc commented Jan 11, 2026

Hi @Memnoc,

Thanks for the hard work on this.

This looks like a much more complicated change than I was expecting.

Before this change, if you NULL terminate the list of parsers in mpca_lang don't you already get an error when the parser is not found? If so shouldn't the only change required be the macro wrapper you use (and maybe a change to the error message?).

I was aware this automatic NULL termination could be achieved my wrapping the function in a macro but opted against it to avoid the use of macros.

Please tell me if you interpretation is not correct.

Thanks,

Dan

Hey Daniel,

Thank you for taking the time to respond. You raise an important point, one that I have initially pondered myself.

You are correct in saying that you get an error on the first failed look up. The problem is when mcp continues parsing and makes a second look up, which causes the crash.

While the macro adds NULL, the va_exhausted flag prevents the second lookup from calling va_arg again (and gives me the chance to intercept with a useful error message).

I have considered other approaches, for example, do we need to do two lookups? My understanding is yes, we do, because the structure of the library wants a Rule: <Parser> relationship, so define Rule as: whatever Lispy matches.

So ok, we need both; that said, a better strategy than my fix (Lazy Loading) would be Eager Loading - TL;DR: we call all data at once, at initialization, and thus we would not have any issue.

Initialization:
  → call `va_arg` in loop until NULL
  → store ALL parsers in array upfront

Parse grammar:
  → encounter <number>, search array
  → encounter <operator>, search array
  → encounter <Lispy>, search array — not found, return error
  → encounter "rule", search array — not found, return error

But, this would have been a much bigger refactoring, so I have excluded it. Also, more philosophically, seems to me mpc is all about being scrappy and efficient, sacrificing maybe some formality, so I thought the fix embraced that spirit well 😄

That being said, please do feel free to correct my understanding and proposing a different approach, I'd be more than happy to help if I can!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants