Hierarchical Separated Values
HSV is both a file format (like CSV) and a streaming protocol. HSV is a carefully curated set of primary control codes standardized for modern use. In 1963, ASCII defined these characters for structured data (like JSON)—then we forgot about them and spent 60 years escaping quotes instead. No escaping. No quotes. All of Unicode is valid data, and binary mode handles raw bytes. It streams, and parsing parallelizes trivially.
CSV → 1 level (comma, quoting required) JSON → unlimited (escaping required) HSV → unlimited + framing + protocol (no escaping)
Save .hsv files and grep them, or stream records over a socket with acknowledgments. Binary formats like MessagePack and Protobuf are serialization-only.
| HSV | CSV | JSON | MsgPack | Protobuf | |
|---|---|---|---|---|---|
| File format | ✓ | ✓ | ✓ | ✗ | ✗ |
| Hierarchy | Unlimited | 1 level | Unlimited | Unlimited | Unlimited |
| Text tools work | ✓ | ✓ | ✓ | ✗ | ✗ |
| No escaping | ✓ | Quoting | ✗ | ✓ | ✓ |
| Streaming | ✓ | Lines | ✗ | ✓ | ✓ |
| Framing (STX/ETX) | ✓ | ✗ | ✗ | ✗ | ✗ |
| Protocol (ACK/NAK) | ✓ | ✗ | ✗ | ✗ | ✗ |
HSV is CSV with hierarchy, JSON without escaping, and unlike binary-only formats, you can debug with cat and grep.
key ␟ value key-value pair pair ␞ pair ␞ pair object item ␝ item ␝ item array key ␟ ␎ nested ␏ unlimited nesting (SO/SI) ␂ record ␜ record ␃ stream
{"name": "Alice", "age": "30", "city": "NYC"}
name␟Alice␞age␟30␞city␟NYC
40% smaller. No quotes, braces, colons, or commas.
{"msg": "He said \"hello\""}
msg␟He said "hello"
{"text": "line 1\nline 2"}
text␟line 1
line 2
STX/ETX framing solves the streaming problem:
␂name␟Alice␞role␟admin␜name␟Bob␞role␟user␃
Unlike NDJSON, newlines in data don't break parsing. Unlike length-prefix, it's human-inspectable.
SOH (Start of Header) adds metadata before the data block:
␁hsv␟1.0␞content-type␟users␂name␟Alice␞role␟admin␃
Structure: ␁ header ␂ data ␃ — header properties use the same format as data. When present, hsv␟1.0 is the recommended first property for version identification.
Everything outside STX…ETX is ignored—no special comment syntax needed:
This text is ignored ␂name␟Alice␞age␟30␃ So is this
DLE (Data Link Escape) enables raw binary in values:
␐␂ [any bytes] ␐␃ binary section ␐␐ literal DLE byte
Inside binary mode, only DLE needs escaping. All other bytes—including the 7 structure separators—are literal data. ~0.4% overhead for random binary.
type␟image␞data␟␐␂<raw PNG>␐␃␞width␟800
This is BISYNC-style transparency from 1967, finally applied to data serialization.
Reference implementations in Rust and Python:
Eighteen invisible bytes reserved. Everything else is data.
| Code | Hex | Name | Purpose |
|---|---|---|---|
| Framing | |||
| 1 | 0x01 | SOH | Start header |
| 2 | 0x02 | STX | Start data block |
| 3 | 0x03 | ETX | End data block |
| 4 | 0x04 | EOT | End stream |
| Protocol | |||
| 5 | 0x05 | ENQ | Enquiry (request ACK) |
| 6 | 0x06 | ACK | Acknowledge (success) |
| 21 | 0x15 | NAK | Negative acknowledge (error) |
| 24 | 0x18 | CAN | Cancel operation |
| Flow Control | |||
| 17 | 0x11 | XON | Resume transmission |
| 19 | 0x13 | XOFF | Pause transmission |
| 22 | 0x16 | SYN | Sync/keepalive |
| Nesting | |||
| 14 | 0x0E | SO | Start nested structure |
| 15 | 0x0F | SI | End nested structure |
| 16 | 0x10 | DLE | Binary mode escape |
| Structure | |||
| 28 | 0x1C | FS | Record separator |
| 29 | 0x1D | GS | Array element separator |
| 30 | 0x1E | RS | Property separator |
| 31 | 0x1F | US | Key-value separator |
Tools may display control codes as: ␁ ␂ ␃ ␄ ␅ ␆ ␎ ␏ ␐ ␑ ␓ ␕ ␖ ␘ ␜ ␝ ␞ ␟
Reserved: 0x01-0x06 0x0E-0x11 0x13 0x15-0x16 0x18 0x1C-0x1F (18 bytes) Data: Everything else (all of Unicode) Structure: ␟ (US) key:value ␞ (RS) properties ␝ (GS) array items ␜ (FS) records Framing: ␁...␂...␃ (SOH header, STX data, ETX end) Nesting: ␎...␏ (SO/SI, unlimited depth) Binary: ␐␂...␐␃ (DLE transparency) Protocol: ␅/␆/␕ (ENQ/ACK/NAK) Flow: ␑/␓/␖ (XON/XOFF/SYN) Stream: ␄ (EOT end of stream) Extension: .hsv MIME type: application/hsv