「‍」 Lingenic

History: Hierarchical Data Formats

The forgotten hierarchy in every computer

1963: A Hierarchy Built In

When the American Standards Association published ASCII (ANSI X3.4-1963), they included four separator characters specifically designed for hierarchical data:

CodeHexNameIntended Use
280x1CFile Separator (FS)Separate files or major sections
290x1DGroup Separator (GS)Separate groups within a file
300x1ERecord Separator (RS)Separate records within a group
310x1FUnit Separator (US)Separate fields within a record

The codepoints were deliberately ordered: as the number decreases, the scope increases.

31 (US) < 30 (RS) < 29 (GS) < 28 (FS) field record group file smallest ────────────────────► largest

The Original Vision

The ASCII committee envisioned hierarchical data storage decades before JSON, XML, or even CSV became widespread:

FILE GROUP RECORD abc (units) RECORD xyz GROUP RECORD ...
"These four can be used to subdivide data into structured groupings... The specific meaning of each separator is left to the application."
— Paraphrased from early ASCII documentation

Transmission Control

ASCII also defined transmission control characters for framing data:

CodeHexNamePurpose
10x01SOHStart of Header
20x02STXStart of Text
30x03ETXEnd of Text
40x04EOTEnd of Transmission
160x10DLEData Link Escape

These were used in serial communication to frame messages. A typical transmission:

␁ [header info] ␂ [actual data content] ␃ │ │ │ │ └── data starts └── data ends └── header starts

BISYNC and Binary Transparency (1967)

IBM's Binary Synchronous Communications protocol faced a problem: what if the data contains control characters?

Their solution: DLE (Data Link Escape). When you need to send arbitrary binary:

Normal mode: ␂ [text data] ␃ ↑ STX/ETX have meaning Binary mode: ␐␂ [any bytes including ␂␃] ␐␃ ↑ DLE escapes the control chars Literal DLE: ␐␐ ↑ DLE DLE = one DLE byte

Inside a DLE-transparent section, only DLE itself needs escaping. All other bytes—including STX, ETX, and the separators—are literal data.

This elegant solution has been available since 1967. HSV uses it unchanged.

What Went Wrong

These characters were designed for hierarchical data. But the computing industry forgot them:

Every format reinvented hierarchy using printable characters—requiring escaping, quoting, and complex parsers.

The Cost of Forgetting

Using printable characters as delimiters means they can appear in data:

CSV problem: name,city → OK "hello, world" → needs quoting "she said ""hi""" → needs escape escaping JSON problem: {"msg": "hello"} → OK {"msg": "he said \"hi\""} → needs escaping nested escaping hell... → 🔥

The ASCII separators (0x1C–0x1F) never appear in normal text. They don't need escaping. They were designed for exactly this purpose.

HSV: Returning to the Original Design

HSV uses ASCII control characters as they were intended—all 18 of them:

ASCIIHSV Purpose
Framing
SOH (0x01)Start header metadata
STX (0x02)Start data block
ETX (0x03)End data block
EOT (0x04)End of stream
Protocol
ENQ (0x05)Enquiry (request acknowledgment)
ACK (0x06)Acknowledge (success)
NAK (0x15)Negative acknowledge (error)
CAN (0x18)Cancel operation
Flow Control
XON (0x11)Resume transmission
XOFF (0x13)Pause transmission
SYN (0x16)Sync/keepalive
Nesting & Binary
SO (0x0E)Start nested structure
SI (0x0F)End nested structure
DLE (0x10)Binary transparency (escape to raw bytes)
Structure
FS (0x1C)Separate records
GS (0x1D)Separate array elements
RS (0x1E)Separate properties
US (0x1F)Separate key from value

This isn't a new invention. It's a return to what the ASCII committee designed 60 years ago—including BISYNC's DLE transparency for binary data, and full protocol support for bidirectional communication.

Why It Works

Non-printable: Control codes never appear in normal text, user input, or file content. No escaping needed.

Universal: Every computer system supports ASCII. These bytes work everywhere.

Invisible: The structure doesn't pollute the data. You see content, not syntax.

Hierarchical: Unlimited nesting via SO/SI, plus four structure levels and framing. Covers all data structures.

The Irony

Every computer already has these characters. Every programming language can use them. They've been in every text file specification since 1963.

We just forgot to use them.