Hierarchical Separated Values
HSV is both a file format (like CSV) and an RFC 20 conformant streaming protocol. A carefully curated set of ASCII control codes standardized for modern use. We forgot about control codes and spent 60 years escaping quotes instead. No escaping. No quotes. All of Unicode is valid data. Parsing parallelizes trivially.
CSV → 1 level (comma, quoting required) JSON → unlimited (escaping required) HSV → unlimited + framing + protocol (no escaping)
Parallel parsing: HSV is the only format that supports parallel parsing at every data level. No escape state spans record boundaries. Scan for STX/ETX, split, parse chunks on separate cores. CSV, JSON, MsgPack, Protobuf — all stuck in single-threaded processing. No escape state at any level. Every split is just "find byte, cut."
Save .hsv files and grep them, or stream records over a socket with acknowledgments. Binary formats like MessagePack and Protobuf are serialization-only.
| HSV | CSV | JSON | MsgPack | Protobuf | |
|---|---|---|---|---|---|
| File format | ✓ | ✓ | ✓ | ✗ | ✗ |
| Hierarchy | Unlimited | 1 level | Unlimited | Unlimited | Unlimited |
| Text tools work | ✓ | ✓ | ✓ | ✗ | ✗ |
| No escaping | ✓ | Quoting | ✗ | ✓ | ✓ |
| Streaming | ✓ | Lines | ✗ | ✓ | ✓ |
| Framing (STX/ETX) | ✓ | ✗ | ✗ | ✗ | ✗ |
| Protocol (ACK/NAK) | ✓ | ✗ | ✗ | ✗ | ✗ |
| Parallel parsing | ✓ | ✗ | ✗ | ✗ | ✗ |
HSV is CSV with hierarchy, JSON without escaping, and unlike binary-only formats, you can debug with cat and grep.
key [US] value key-value pair pair [RS] pair [RS] pair object item [GS] item [GS] item array key [US] [SSA] nested [ESA] nested value [STX] [SSA] children [ESA] [ETX] nested body (container) [STX] record [FS] record [ETX] stream
STX/ETX framing solves the streaming problem:
[STX] name [US] Alice [RS] role [US] admin [FS] name [US] Bob [RS] role [US] user [ETX]
Unlike NDJSON, newlines in data don't break parsing. Unlike length-prefix, it's human-inspectable.
SOH (Start of Header) adds control/metadata before the data block. Headers are a form of control information—routing, formatting, protocol metadata—distinct from content.
[SOH] hsv [US] 1.0 [RS] content-type [US] users [STX] name [US] Alice [RS] role [US] admin [ETX]
Structure: [SOH] header [STX] data [ETX] — header properties use the same format as data. When present, hsv [US] 1.0 is the recommended first property for version identification.
This mirrors the original ASCII design: SOH marks control information (header), STX marks content (text). A message can be header-only (pure control), content-only (pure data), or both.
Everything outside STX…ETX is ignored—no special comment syntax needed:
This text is ignored [STX] name [US] Alice [RS] age [US] 30 [ETX] So is this
SSA/ESA nest anywhere—in values or in bodies. A container (directory, archive, document) uses SSA/ESA to wrap its children:
[SOH] name [US] file.txt [STX] content here [ETX] file (body is content) [SOH] name [US] folder [STX] [SSA] directory (body is children) [SOH] name [US] a.txt [STX] hello [ETX] [FS] [SOH] name [US] b.txt [STX] world [ETX] [ESA] [ETX]
After STX: if SSA follows, it's a container. Otherwise, it's content.
Recursive nesting:
[SOH] name [US] project [STX] [SSA]
[SOH] name [US] README.md [STX] # Hello [ETX]
[FS]
[SOH] name [US] src [STX] [SSA]
[SOH] name [US] main.rs [STX] fn main() {} [ETX]
[FS]
[SOH] name [US] lib.rs [STX] pub fn lib() {} [ETX]
[ESA] [ETX]
[ESA] [ETX]
Files, directories, archives, documents—same structure. [SOH] header [STX] content [ETX] for leaves, [SOH] header [STX] [SSA] children [ESA] [ETX] for containers. FS separates siblings.
Two modes for binary data:
[SO] shifted-bytes [SI] shift encoding (default) [DLE] [SPA] raw-bytes [DLE] [EPA] binary transparency (BISYNC-style)
Shift encoding: Control codes (0x00-0x06, 0x0E-0x1F, 0x7F-0x9F) are shifted to ASCII (0x21-0x61). Output is valid UTF-8. Pass-through: BEL, BS, TAB, LF, VT, FF, CR (0x07-0x0D).
DLE transparency: For raw binary blobs. Inside [DLE][SPA]...[DLE][EPA], all bytes pass through literally. Only DLE needs escaping ([DLE][DLE] = literal DLE).
type [US] image [RS] data [US] [DLE] [SPA] <raw PNG> [DLE] [EPA] [RS] width [US] 800
Uses original ASCII semantics: SO/SI (1963), DLE/BISYNC (1960s), SPA/EPA (ISO 6429, 1983).
Reference implementations in Rust and Python:
Twenty-six bytes reserved (22 C0 + 4 C1). Three bytes forbidden. Everything else is data.
| Code | Hex | Name | Purpose |
|---|---|---|---|
| Framing | |||
| 1 | 0x01 | SOH | Start header (control/metadata) |
| 2 | 0x02 | STX | Start text (content/data) |
| 3 | 0x03 | ETX | End data block |
| 4 | 0x04 | EOT | End stream |
| Protocol | |||
| 5 | 0x05 | ENQ | Enquiry (request ACK) |
| 6 | 0x06 | ACK | Acknowledge (success) |
| 21 | 0x15 | NAK | Negative acknowledge (error) |
| 24 | 0x18 | CAN | Cancel operation |
| Flow Control | |||
| 17 | 0x11 | DC1/XON | Resume transmission |
| 19 | 0x13 | DC3/XOFF | Pause transmission |
| 22 | 0x16 | SYN | Sync/keepalive |
| Device Control | |||
| 18 | 0x12 | DC2 | Connect to device/service |
| 20 | 0x14 | DC4 | Disconnect (preferred stop) |
| Chunked Transfer | |||
| 23 | 0x17 | ETB | End block (more coming) |
| 25 | 0x19 | EM | End of medium (data exhausted) |
| Nesting (values or bodies) | |||
| 134 | 0x86 | SSA | Start nesting (values, containers) |
| 135 | 0x87 | ESA | End nesting |
| Binary | |||
| 14 | 0x0E | SO | Shift Out (enter shift mode) |
| 15 | 0x0F | SI | Shift In (exit shift mode) |
| 16 | 0x10 | DLE | Data Link Escape (transparency modifier) |
| 150 | 0x96 | SPA | Start of Protected Area (binary block) |
| 151 | 0x97 | EPA | End of Protected Area (binary block) |
| Structure | |||
| 28 | 0x1C | FS | Record separator |
| 29 | 0x1D | GS | Array element separator |
| 30 | 0x1E | RS | Property separator |
| 31 | 0x1F | US | Key-value separator |
Three control characters are forbidden in HSV streams:
| Code | Hex | Name | Reason |
|---|---|---|---|
| 0 | 0x00 | NUL | String terminator in C; causes truncation |
| 26 | 0x1A | SUB | Ctrl+Z EOF on Windows; corrupts streams |
| 27 | 0x1B | ESC | Terminal escape sequences; security risk |
If these bytes appear in data, use binary mode (shift encoding or DLE transparency).
Seven control characters are allowed as literal data (not reserved for protocol):
| Code | Hex | Name | Common Use |
|---|---|---|---|
| 7 | 0x07 | BEL | Audible/visual alert |
| 8 | 0x08 | BS | Backspace |
| 9 | 0x09 | TAB | Horizontal tab |
| 10 | 0x0A | LF | Line feed (newline) |
| 11 | 0x0B | VT | Vertical tab |
| 12 | 0x0C | FF | Form feed (page break) |
| 13 | 0x0D | CR | Carriage return |
These are common whitespace and formatting characters. They pass through HSV unchanged.
Control codes are written as [CODE] with the standard abbreviation. Examples: [STX], [ETX], [SSA], [ESA].
Unicode's Control Pictures block (U+2400–U+243F) provides visible glyphs for C0 controls, but C1 controls (SSA, ESA) have no standard pictures. The [CODE] notation is consistent across all control characters.
| Notation | Hex | Name |
|---|---|---|
| [US] | 0x1F | Unit Separator |
| [RS] | 0x1E | Record Separator |
| [GS] | 0x1D | Group Separator |
| [FS] | 0x1C | File Separator |
| [SOH] | 0x01 | Start of Header |
| [STX] | 0x02 | Start of Text |
| [ETX] | 0x03 | End of Text |
| [SO] | 0x0E | Shift Out |
| [SI] | 0x0F | Shift In |
| [SSA] | 0x86 | Start of Selected Area |
| [ESA] | 0x87 | End of Selected Area |
In actual data, these are single bytes. The notation makes them visible in documentation.
HSV defines two MIME types based on binary content:
| MIME Type | Use When | Encoding |
|---|---|---|
text/hsv; charset=utf-8 |
Text-safe HSV (no DLE blocks) | Valid UTF-8 |
application/hsv |
Binary HSV (may contain DLE blocks) | Arbitrary bytes |
text/hsv: Use when content is guaranteed UTF-8 safe. This includes:
[SO]...[SI]) — shifted bytes map to ASCII 0x21-0x60application/hsv: Use when content may contain raw binary. This includes:
[DLE][SPA]...[DLE][EPA]) — arbitrary bytes pass throughThe C1 control codes (SSA, ESA, SPA, EPA at 0x86-0x87, 0x96-0x97) are valid Unicode code points with proper UTF-8 encodings (e.g., U+0086 → C2 86). They do not break UTF-8 validity.
Reserved: 0x01-0x06 0x0E-0x0F 0x11-0x19 0x1C-0x1F 0x86-0x87 0x96-0x97 (25 bytes)
Forbidden: 0x00 0x1A 0x1B (NUL, SUB, ESC)
Data: Everything else (all of Unicode + BEL BS TAB LF VT FF CR)
Structure:
[US] key:value
[RS] properties
[GS] array items
[FS] records
Framing: [SOH] ... [STX] ... [ETX] (header, data, end)
Nesting: [SSA] ... [ESA] (0x86/0x87, in values or bodies)
Container: [SOH] ... [STX] [SSA] ... [ESA] [ETX] (directory, archive, document)
Binary: [SO] shifted-byte [SI] (shift encoding for control codes)
[DLE] [SPA] raw-bytes [DLE] [EPA] (DLE transparency for blobs)
Protocol: [ENQ] / [ACK] / [NAK] / [CAN]
Flow: [DC1] / [DC3] / [SYN] (XON / XOFF / sync)
Device: [DC2] / [DC4] (connect / disconnect)
Chunked: [ETB] / [EM] (block done / data exhausted)
Stream: [EOT] (end of stream)
Extension: .hsv
MIME types: text/hsv; charset=utf-8 (text-safe, no DLE)
application/hsv (binary, may have DLE)