HSV vs JSON, CSV, XML, and others
| Format | Hierarchy | Escaping | Binary | Human Readable | Streaming | Parsing |
|---|---|---|---|---|---|---|
| HSV | Unlimited | Never* | Native (DLE) | Data only | Native | Parallel |
| JSON | Unlimited | Required | Base64 | Yes | Awkward | Sequential |
| CSV | 1 level | Quoting | Base64 | Yes | Line-based | Sequential |
| XML | Unlimited | Entities | Base64/CDATA | Verbose | SAX parsers | Sequential |
| MessagePack | Unlimited | Never | Native | Binary | Native | Sequential |
| Protocol Buffers | Schema | Never | Native | Binary | Native | Sequential |
* Text mode never needs escaping. Binary mode uses DLE transparency (~0.4% overhead).
{"name": "Alice", "age": 30, "city": "NYC"}
42 bytes
name␟Alice␞age␟30␞city␟NYC
26 bytes (38% smaller)
{"msg": "He said \"hello\""}
msg␟He said "hello"
{"path": "C:\\Users\\data"}
path␟C:\Users\data
{"text": "line 1\nline 2\nline 3"}
text␟line 1
line 2
line 3
{"name": "Alice"}
{"name": "Bob"}
{"name": "Carol"}
Breaks if data contains newlines
␂name␟Alice␜name␟Bob␜name␟Carol␃
Newlines in data are fine. ␂=start ␜=separator ␃=end
{"user": {"name": "Alice", "email": "a@b.com"}}
user␟␎name␟Alice␞email␟a@b.com␏
Legend: ␟ = key:value · ␞ = properties · ␝ = array · ␎/␏ = nested (SO/SI) · ␜ = records · ␂/␃ = start/end
Both JSON and HSV support unlimited nesting. HSV uses SO/SI (Shift Out/Shift In) characters for nesting depth.
| Aspect | CSV | HSV |
|---|---|---|
| Hierarchy | 1 level (rows and columns) | Unlimited (SO/SI nesting) |
| Delimiter | Comma (printable, common in data) | Control codes (never in data) |
| Quoting | Required for special chars | Never |
| Newlines in data | Requires quoting | Just works |
| Named fields | Header row convention | Built-in key-value |
| Nested data | Not supported | Supported |
name,bio
Alice,"Software engineer, loves coding"
Bob,"Said ""hello"" yesterday"
␂name␟Alice␞bio␟Software engineer, loves coding␜name␟Bob␞bio␟Said "hello" yesterday␃
CSV requires quoting when data contains commas or quotes. HSV never needs quoting.
<user>
<name>Alice</name>
<age>30</age>
</user>
56 bytes
name␟Alice␞age␟30
17 bytes (70% smaller)
<msg>x < y & a > b</msg>
msg␟x < y & a > b
HSV can represent document trees. Angle brackets are just visible nesting delimiters—SO/SI are invisible ones:
<div class="box">
<p>Hello <b>world</b></p>
</div>
tag␟div␞class␟box␞children␟␎
tag␟p␞children␟␎
text␟Hello
␝tag␟b␞text␟world
␏
␏
Where HSV wins: No entity escaping. Literal <, &, " in text content.
Where HTML wins: Human authoring, view source, 30 years of browser tooling.
Abstract syntax trees are nested structures with node types and properties—a natural fit for HSV:
{"type": "BinaryExpr",
"op": "+",
"left": {"type": "Num", "value": 1},
"right": {"type": "Num", "value": 2}}
type␟BinaryExpr␞op␟+␞left␟␎type␟Num␞value␟1␏␞right␟␎type␟Num␞value␟2␏
Compiler toolchains, linters, and code formatters can exchange ASTs without escaping operators like \, ", or &&.
Templates mix literal text with structure. HSV keeps them separate without escape sequences:
Hello {{name}},
Your balance is ${{amount}}.
Click <a href="{{url}}">here</a>.
Escaping needed for quotes, angles, curlies
text␟Hello ␝var␟name␝text␟,
Your balance is $␝var␟amount␝text␟.
Click ␝tag␟a␞href␟␎var␟url␏␞text␟here␝text␟.
Structure is separate from content
Template compilation becomes tree manipulation, not string surgery.
| Aspect | MessagePack / Protobuf | HSV |
|---|---|---|
| Size | Smallest | Slightly larger |
| Human readable | No (binary) | Yes (data visible) |
| Debuggable | Needs tools | Any text viewer |
| Escaping | Never | Never |
| Schema | Optional/required | Not needed |
| Text tools | grep, sed, awk fail | Work fine |
| Embed binary blobs | Native | DLE transparency |
HSV fills the gap: More efficient than JSON/XML, more debuggable than binary, and can still embed raw bytes when needed.
HSV uses DLE (Data Link Escape) for binary transparency—the same technique BISYNC used in 1967:
{"image": "iVBORw0KGgo..."}
Base64: 33% size overhead
image␟␐␂<raw PNG bytes>␐␃
DLE: ~0.4% overhead (escape only 0x10)
How it works:
␐␂ (DLE STX) = start binary section␐␃ (DLE ETX) = end binary section␐␐ (DLE DLE) = literal DLE byteInside binary mode, only DLE needs escaping. All 17 other control codes become literal data.
| Method | Delimiter | Newlines in data | Framing |
|---|---|---|---|
| NDJSON | Newline | Must escape | Implicit |
| JSON with length prefix | Byte count | OK | Binary header |
| Server-Sent Events | data: prefix |
Must escape | Text protocol |
| HSV | ␂/␃ + ␜ | OK | Native |
HSV is the only format here that supports parallel parsing at every data level.
Why other formats require sequential parsing:
| Format | Why Sequential |
|---|---|
| JSON | Escape state (\") spans tokens—can't split safely |
| CSV | Quote state spans cells—newline might be inside quotes |
| XML | Entity state (&) and CDATA sections span boundaries |
| MessagePack | Length-prefixed—must decode header to find next item |
| Protobuf | Varint lengths—must decode sequentially to find boundaries |
HSV has no escape state. Every separator byte (FS, GS, RS, US) is always a separator. Split on any of them, parse chunks independently.
File level: Scan for STX/ETX → split into blocks → parse blocks in parallel Record level: Scan for FS (␜) → split into records → parse records in parallel Property level: Scan for RS (␞) → split into pairs → parse pairs in parallel Array level: Scan for GS (␝) → split into items → parse items in parallel
Every split is just "find byte, cut." No state machine. No lookahead. No backtracking.
| Use Case | Recommended | Why |
|---|---|---|
| Config files | JSON, TOML, YAML | Human editing matters |
| API responses (simple) | HSV or JSON | HSV smaller, JSON more tooling |
| Streaming data | HSV | Native framing, no escaping |
| Log files | HSV | Multiline content, greppable |
| Data export | HSV or CSV | HSV for nested/complex data |
| High-performance | Protobuf, FlatBuffers | Binary is fastest |
| Deep nesting (10+ levels) | HSV, JSON, or XML | All support unlimited nesting |
HSV is ideal when:
Stick with JSON when: