Research
MEDIUM

CVE-2026-56412: How an Autonomous AI Found an Incomplete Fix in libexpat

Jun 20, 2026·Medium severity·CWE-416·8 min read

Background

CVE-2026-50219 was a class of vulnerabilities in libexpat where five functions could be called from within active XML handlers, leading to heap use-after-free conditions. The fix, merged via PR #1246, introduced an isCalledFromInsideHandler() guard on all five functions: XML_GetBuffer, XML_Parse, XML_ParseBuffer, XML_ParserFree, and XML_ParserReset. The security community accepted the fix. The CVE was closed. Vorthix disagreed.

How the AI Found It

Phase 1 — Understanding the fix's assumption

The agent read PR #1246 and identified exactly what the fix assumed: that m_handlerCallDepth would be incremented by beforeHandler() and decremented by afterHandler() around every handler invocation. The guard isCalledFromInsideHandler() reads this counter and returns true only when it is greater than zero. If the counter stays at zero during a handler call, the guard silently returns false — and all five protected functions are bypassed.

Phase 2 — Finding every call site

Rather than reading only the fixed code path, the agent mapped every call site that could invoke m_characterDataHandler in xmlparse.c. In doContent(), the normal text processing path, the handler is invoked correctly with beforeHandler() / afterHandler() wrapping. In doCdataSection(), the XML_TOK_DATA_CHARS case invoked the same handler with no beforeHandler(). No afterHandler(). The depth counter stays at zero. The guard returns false while a handler is actively executing inside a CDATA section.

Phase 3 — Dynamic confirmation

A minimal proof of concept was constructed: a parser, a character data handler that calls XML_ParserFree on the global parser instance, and a single CDATA XML input. Under seventeen lines of C. Compiled with AddressSanitizer and run against the patched library (commit 429059e, the latest upstream master including the full merged PR #1246 fix).

vorthix-agent · live
18:43:40    INFO      ITERATION 42/50
18:43:40    INFO      Calling XOR-1...
18:48:31    INFO      XOR-1 responded in 291.2s
18:48:31    THINK     Reading PR #1246 fix as a claim...
18:48:31    THINK     Assumption: m_handlerCallDepth incremented before every handler fires
18:55:09    PLAN      Mapping all call sites of m_characterDataHandler in xmlparse.c...
18:55:09    THINK     doContent() — beforeHandler() present ✓
18:55:09    THINK     doCdataSection() XML_TOK_DATA_CHARS — NO beforeHandler() ✗
19:01:52    TOOL      Compiling PoC against commit 429059e with AddressSanitizer...
19:05:54    RESULT    heap-use-after-free confirmed · xmlparse.c:4622
19:05:54    RESULT    CVE-2026-56412 — PROVEN

The Vulnerable Path

xmlparse.c — doCdataSection() (vulnerable)
/* XML_TOK_DATA_CHARS — vulnerable path, no handler depth tracking */
case XML_TOK_DATA_CHARS:
    if (parser->m_characterDataHandler) {
        if (MUST_CONVERT(enc, s)) {
            /* ... conversion ... */
            charDataHandler(parser->m_handlerArg, (XML_Char *)buf, (int)(bufEnd - buf));
        } else {
            charDataHandler(parser->m_handlerArg, (const XML_Char *)s,
                            (int)((const char *)next - s) / sizeof(XML_Char));
        }
    }

No beforeHandler(parser) call. No afterHandler(parser) call. m_handlerCallDepth stays at zero.

The Fix

xmlparse.c — doCdataSection() (fixed)
/* XML_TOK_DATA_CHARS — fixed path, handler depth correctly tracked */
case XML_TOK_DATA_CHARS:
    if (parser->m_characterDataHandler) {
        if (MUST_CONVERT(enc, s)) {
            /* ... conversion ... */
            beforeHandler(parser);
            charDataHandler(parser->m_handlerArg, (XML_Char *)buf, (int)(bufEnd - buf));
            afterHandler(parser);
        } else {
            beforeHandler(parser);
            charDataHandler(parser->m_handlerArg, (const XML_Char *)s,
                            (int)((const char *)next - s) / sizeof(XML_Char));
            afterHandler(parser);
        }
    }

AddressSanitizer Output

asan-output.txt
=================================================================
==AddressSanitizer: heap-use-after-free on address 0x...
READ of size 4 at 0x... thread T0
    #0 0x... in doCdataSection xmlparse.c:4622
    #1 0x... in XML_ParseBuffer xmlparse.c:2103
    #2 0x... in main poc.c:14
SUMMARY: AddressSanitizer: heap-use-after-free xmlparse.c:4622 in doCdataSection

Timeline

DateEvent
Jun 15, 2026Latest upstream commit (429059e) — PR #1246 fix present in master
Jun 20, 2026Finding reported to maintainer via private disclosure
Jun 20, 2026Maintainer confirmed finding valid — same day
Jun 20, 2026PR #1278 opened with fix and green CI
Jun 20, 2026PR approved by maintainer
Jun 20, 2026CVE-2026-56412 requested and assigned
Jun 20, 2026PR merged into libexpat master

Impact

libexpat is a dependency of Python (pyexpat), git, cmake, and many other widely deployed tools. Any application that parses untrusted XML containing CDATA sections and invokes one of the five protected functions from within a character data handler is affected. The attack input is standard valid XML — no malformed input required. Silent heap use-after-free in production — exactly the class of bug most dangerous in a widely deployed library.

“A fix is a claim. Every claim has an assumption. Find the assumption.”