「‍」 Lingenic

History: Hierarchical Data Formats

The forgotten hierarchy in every computer

1963: A Hierarchy Built In

When the American Standards Association published ASCII (ANSI X3.4-1963), they included four separator characters specifically designed for hierarchical data:

CodeHexSymbolNameIntended Use
280x1C[FS]File SeparatorSeparate files or major sections
290x1D[GS]Group SeparatorSeparate groups within a file
300x1E[RS]Record SeparatorSeparate records within a group
310x1F[US]Unit SeparatorSeparate fields within a record

The codepoints were deliberately ordered: as the number decreases, the scope increases.

US < RS < GS < FS field record group file smallest largest

The Original Vision

The ASCII committee envisioned hierarchical data storage decades before JSON, XML, or even CSV became widespread:

FILE FS GROUP GS RECORD RS a US b US c (units) RECORD x US y US z GROUP RECORD ...
"These four can be used to subdivide data into structured groupings... The specific meaning of each separator is left to the application."
— Paraphrased from ANSI X3.4-1963; see RFC 20 (1969)

Transmission Control

ASCII also defined transmission control characters for framing data:

CodeHexNamePurpose
10x01SOHStart of Header (control/metadata)
20x02STXStart of Text (content/data)
30x03ETXEnd of Text
40x04EOTEnd of Transmission
160x10DLEData Link Escape

The distinction is key: header = control, text = content. Headers carry routing, metadata, and protocol information. Text carries the actual payload. This is the same pattern used in email headers, HTTP headers, and packet headers today.

These were used in serial communication to frame messages. A typical transmission:

SOH [control/metadata] STX [content/data] ETX control section content section

BISYNC and Binary Transparency (1967)

IBM's Binary Synchronous Communications protocol faced a problem: what if the data contains control characters?

Their solution: DLE (Data Link Escape). When you need to send arbitrary binary:

Normal mode: STX [text data] ETX ^ STX/ETX have meaning Binary mode: DLE STX [any bytes] DLE ETX ^ DLE escapes the control chars Literal DLE: DLE DLE ^ DLE DLE = one DLE byte

Inside a DLE-transparent section, only DLE itself needs escaping. All other bytes—including STX, ETX, and the separators—are literal data.

This elegant solution has been available since 1967. HSV uses it unchanged.

What Went Wrong

These characters were designed for hierarchical data. But the computing industry forgot them:

Every format reinvented hierarchy using printable characters—requiring escaping, quoting, and complex parsers.

The Cost of Forgetting

Using printable characters as delimiters means they can appear in data:

CSV problem: name,city -> OK "hello, world" -> needs quoting "she said ""hi""" -> needs escape escaping JSON problem: {"msg": "hello"} -> OK {"msg": "he said \"hi\""} -> needs escaping nested escaping hell... -> :(

The ASCII separators (0x1C–0x1F) never appear in normal text. They don't need escaping. They were designed for exactly this purpose.

HSV: Returning to the Original Design

HSV uses ASCII control characters as they were intended—all 22 of them:

Note: SOH (header) and STX (text) reflect the original ASCII distinction between control and content. Headers carry control information—routing, metadata, protocol ops. Text carries the actual data. A message can be header-only (pure control), text-only (pure content), or both.

ASCIIHSV Purpose
Framing
SOH (0x01)Start header (control/metadata)
STX (0x02)Start text (content/data)
ETX (0x03)End block
EOT (0x04)End stream
Protocol
ENQ (0x05)Enquiry (request acknowledgment)
ACK (0x06)Acknowledge (success)
NAK (0x15)Negative acknowledge (error)
CAN (0x18)Cancel operation
Flow Control
XON (0x11)Resume transmission
XOFF (0x13)Pause transmission
SYN (0x16)Sync/keepalive
Device Control
DC2 (0x12)Connect to device/service
DC4 (0x14)Disconnect (preferred stop)
Chunked Transfer
ETB (0x17)End block (more coming)
EM (0x19)End of medium (data exhausted)
Nesting
SSA (0x86)Start nested structure (C1: Start of Selected Area)
ESA (0x87)End nested structure (C1: End of Selected Area)
Binary
DLE (0x10)Binary transparency (escape to raw bytes)
SO (0x0E)Shift Out (binary: enter shifted mode)
SI (0x0F)Shift In (binary: exit shifted mode)
Structure
FS (0x1C)Separate records
GS (0x1D)Separate array elements
RS (0x1E)Separate properties
US (0x1F)Separate key from value

This isn't a new invention. It's a return to what the ASCII committee designed 60 years ago—including BISYNC's DLE transparency for binary data, and full protocol support for bidirectional communication.

Why It Works

Non-printable: Control codes never appear in normal text, user input, or file content. No escaping needed.

Universal: Every computer system supports ASCII. These bytes work everywhere.

Invisible: The structure doesn't pollute the data. You see content, not syntax.

Hierarchical: Unlimited nesting via SSA/ESA, plus four structure levels and framing. Covers all data structures.

Control Pictures

Unicode 1.0 (1991) added Control Pictures (U+2400–U+243F)—visible glyphs for control characters like [FS] [GS] [RS] [US]. These make control characters displayable in documentation and editors, yet they too remain largely forgotten.

The Irony

Every computer already has these characters. Every programming language can use them. They've been in every text file specification since 1963.

We just forgot to use them.