Parallel-parseable structured documents
HSV enables fully parallel processing of HTML-like structured documents. Represent HTML as HSV, split at any delimiter, parse chunks on separate cores. No speculation, no state synchronization, linear scaling.
Proof-of-concept: go-html-parallel — parallel HTML parsing in ~50 lines of Go.
HTML and XML parsing has been sequential for 30 years. Research into parallel HTML parsing achieved limited results:
| Project | Year | Approach | Result |
|---|---|---|---|
| HPar | 2013 | Speculative data-parallel | 2.4x on 4 cores |
| ZOOMM | 2013 | Qualcomm parallel browser | 2x (whole engine) |
| Servo | 2017 | Off-main-thread parsing | Tokenization only |
Why so little progress? The inherent complexity of HTML's stateful parsing model:
&, <, >, "< means different things in different placesYou cannot split an HTML file in the middle and parse the chunks independently. The parser must process byte-by-byte from the start.
HSV represents HTML-like structures using its own delimiters:
<div class="container">
<p>Hello world</p>
<a href="https://example.com">
Click here
</a>
</div>
[STX] html:div [US] [SSA]
html:class [US] container [RS]
html:p [US] Hello world [RS]
html:a [US] [SSA]
html:href [US] https://example.com [RS]
html:text [US] Click here
[ESA]
[ESA] [ETX]
The structure uses HSV control characters:
SSA / ESA (0x86/0x87) — nesting (like opening/closing tags)[RS] (RS) — sibling elements[US] (US) — attribute/value pairshtml: prefix — namespace conventionHTML-in-HSV inherits HSV's parallel-parseable structure.
| Aspect | Traditional HTML | HTML in HSV |
|---|---|---|
| Parsing | Sequential state machine | Parallel split |
| Escaping | Required (<, &) | Never |
| Split point | Cannot split safely | Any delimiter |
| Multi-core | Single-threaded | Trivially parallel |
Find a delimiter (SSA, ESA, RS, US), split, parse chunks on separate cores. Same as any HSV data.
Quotes, angles, and ampersands are just content:
<p>Use <div> for containers</p>
<p>A & B & C</p>
<a href="?a=1&b=2">Link</a>
[STX] html:p [US] Use for containers [FS]
html:p [US] A & B & C [FS]
html:a [US] [SSA] html:href [US] ?a=1&b=2 [RS] html:text [US] Link [ESA] [ETX]
The control characters (0x86, 0x87, 0x1E, 0x1F) never appear in normal text, so no escaping is ever needed.
Multi-line content with formatting:
[STX] html:article [US] [SSA]
html:h1 [US] The Title [RS]
html:p [US] First paragraph with "quotes" and . [RS]
html:p [US] Second paragraph.
This continues on a new line.
And another. [RS]
html:blockquote [US] [SSA]
html:p [US] A nested quote with special chars: <>&"'
[ESA]
[ESA] [ETX]
Newlines are literal. Quotes are literal. Everything except the reserved control characters is data.
HTML-in-HSV is essentially an AST (Abstract Syntax Tree) format:
[STX] tag [US] div [RS]
attr:class [US] container [RS]
children [US] [SSA]
tag [US] p [RS] text [US] Hello [RS]
tag [US] a [RS] attr:href [US] /link [RS] text [US] Click
[ESA] [ETX]
This makes it ideal for:
The mapping is straightforward:
| HTML | HSV |
|---|---|
<tag> | html:tag [US] [SSA] |
</tag> | [ESA] |
attr="value" | html:attr [US] value |
| Text content | Value after [US] |
| Sibling elements | Separated by [RS] or [FS] |
< > & | < > & (literal) |
HTML-in-HSV gives you:
30 years of sequential HTML parsing. HSV changes that.