「‍」 Lingenic

HSV v1.0

Hierarchical Separated Values

HSV is both a file format (like CSV) and an RFC 20 conformant streaming protocol. A carefully curated set of ASCII control codes standardized for modern use. We forgot about control codes and spent 60 years escaping quotes instead. No escaping. No quotes. All of Unicode is valid data. Parsing parallelizes trivially.

CSV  →  1 level    (comma, quoting required)
JSON →  unlimited  (escaping required)
HSV  →  unlimited  + framing + protocol (no escaping)

Parallel parsing: HSV is the only format that supports parallel parsing at every data level. No escape state spans record boundaries. Scan for STX/ETX, split, parse chunks on separate cores. CSV, JSON, MsgPack, Protobuf — all stuck in single-threaded processing. No escape state at any level. Every split is just "find byte, cut."

Where It Fits

Save .hsv files and grep them, or stream records over a socket with acknowledgments. Binary formats like MessagePack and Protobuf are serialization-only.

HSV CSV JSON MsgPack Protobuf
File format
Hierarchy Unlimited 1 level Unlimited Unlimited Unlimited
Text tools work
No escaping Quoting
Streaming Lines
Framing (STX/ETX)
Protocol (ACK/NAK)
Parallel parsing

HSV is CSV with hierarchy, JSON without escaping, and unlike binary-only formats, you can debug with cat and grep.

Structure

key [US] value                       key-value pair
pair [RS] pair [RS] pair             object
item [GS] item [GS] item             array
key [US] [SSA] nested [ESA]          nested value
[STX] [SSA] children [ESA] [ETX]     nested body (container)
[STX] record [FS] record [ETX]       stream

Streaming

STX/ETX framing solves the streaming problem:

[STX] name [US] Alice [RS] role [US] admin [FS] name [US] Bob [RS] role [US] user [ETX]

Unlike NDJSON, newlines in data don't break parsing. Unlike length-prefix, it's human-inspectable.

Headers (Control)

SOH (Start of Header) adds control/metadata before the data block. Headers are a form of control information—routing, formatting, protocol metadata—distinct from content.

[SOH] hsv [US] 1.0 [RS] content-type [US] users [STX] name [US] Alice [RS] role [US] admin [ETX]

Structure: [SOH] header [STX] data [ETX] — header properties use the same format as data. When present, hsv [US] 1.0 is the recommended first property for version identification.

This mirrors the original ASCII design: SOH marks control information (header), STX marks content (text). A message can be header-only (pure control), content-only (pure data), or both.

Mixed Content

Everything outside STX…ETX is ignored—no special comment syntax needed:

This text is ignored
[STX] name [US] Alice [RS] age [US] 30 [ETX]
So is this

Hierarchical Containers

SSA/ESA nest anywhere—in values or in bodies. A container (directory, archive, document) uses SSA/ESA to wrap its children:

[SOH] name [US] file.txt [STX] content here [ETX]           file (body is content)
[SOH] name [US] folder [STX] [SSA]                          directory (body is children)
  [SOH] name [US] a.txt [STX] hello [ETX]
  [FS]
  [SOH] name [US] b.txt [STX] world [ETX]
[ESA] [ETX]

After STX: if SSA follows, it's a container. Otherwise, it's content.

Recursive nesting:

[SOH] name [US] project [STX] [SSA]
  [SOH] name [US] README.md [STX] # Hello [ETX]
  [FS]
  [SOH] name [US] src [STX] [SSA]
    [SOH] name [US] main.rs [STX] fn main() {} [ETX]
    [FS]
    [SOH] name [US] lib.rs [STX] pub fn lib() {} [ETX]
  [ESA] [ETX]
[ESA] [ETX]

Files, directories, archives, documents—same structure. [SOH] header [STX] content [ETX] for leaves, [SOH] header [STX] [SSA] children [ESA] [ETX] for containers. FS separates siblings.

Binary Mode

Two modes for binary data:

[SO] shifted-bytes [SI]           shift encoding (default)
[DLE] [SPA] raw-bytes [DLE] [EPA] binary transparency (BISYNC-style)

Shift encoding: Control codes (0x00-0x06, 0x0E-0x1F, 0x7F-0x9F) are shifted to ASCII (0x21-0x61). Output is valid UTF-8. Pass-through: BEL, BS, TAB, LF, VT, FF, CR (0x07-0x0D).

DLE transparency: For raw binary blobs. Inside [DLE][SPA]...[DLE][EPA], all bytes pass through literally. Only DLE needs escaping ([DLE][DLE] = literal DLE).

type [US] image [RS] data [US] [DLE] [SPA] <raw PNG> [DLE] [EPA] [RS] width [US] 800

Uses original ASCII semantics: SO/SI (1963), DLE/BISYNC (1960s), SPA/EPA (ISO 6429, 1983).

Implementations

Reference implementations in Rust and Python:

github.com/LingenicLLC/HSV

The Control Codes

Twenty-six bytes reserved (22 C0 + 4 C1). Three bytes forbidden. Everything else is data.

CodeHexNamePurpose
Framing
10x01SOHStart header (control/metadata)
20x02STXStart text (content/data)
30x03ETXEnd data block
40x04EOTEnd stream
Protocol
50x05ENQEnquiry (request ACK)
60x06ACKAcknowledge (success)
210x15NAKNegative acknowledge (error)
240x18CANCancel operation
Flow Control
170x11DC1/XONResume transmission
190x13DC3/XOFFPause transmission
220x16SYNSync/keepalive
Device Control
180x12DC2Connect to device/service
200x14DC4Disconnect (preferred stop)
Chunked Transfer
230x17ETBEnd block (more coming)
250x19EMEnd of medium (data exhausted)
Nesting (values or bodies)
1340x86SSAStart nesting (values, containers)
1350x87ESAEnd nesting
Binary
140x0ESOShift Out (enter shift mode)
150x0FSIShift In (exit shift mode)
160x10DLEData Link Escape (transparency modifier)
1500x96SPAStart of Protected Area (binary block)
1510x97EPAEnd of Protected Area (binary block)
Structure
280x1CFSRecord separator
290x1DGSArray element separator
300x1ERSProperty separator
310x1FUSKey-value separator

Forbidden Bytes

Three control characters are forbidden in HSV streams:

CodeHexNameReason
00x00NULString terminator in C; causes truncation
260x1ASUBCtrl+Z EOF on Windows; corrupts streams
270x1BESCTerminal escape sequences; security risk

If these bytes appear in data, use binary mode (shift encoding or DLE transparency).

Allowed as Data

Seven control characters are allowed as literal data (not reserved for protocol):

CodeHexNameCommon Use
70x07BELAudible/visual alert
80x08BSBackspace
90x09TABHorizontal tab
100x0ALFLine feed (newline)
110x0BVTVertical tab
120x0CFFForm feed (page break)
130x0DCRCarriage return

These are common whitespace and formatting characters. They pass through HSV unchanged.

Notation

Control codes are written as [CODE] with the standard abbreviation. Examples: [STX], [ETX], [SSA], [ESA].

Unicode's Control Pictures block (U+2400–U+243F) provides visible glyphs for C0 controls, but C1 controls (SSA, ESA) have no standard pictures. The [CODE] notation is consistent across all control characters.

NotationHexName
[US]0x1FUnit Separator
[RS]0x1ERecord Separator
[GS]0x1DGroup Separator
[FS]0x1CFile Separator
[SOH]0x01Start of Header
[STX]0x02Start of Text
[ETX]0x03End of Text
[SO]0x0EShift Out
[SI]0x0FShift In
[SSA]0x86Start of Selected Area
[ESA]0x87End of Selected Area

In actual data, these are single bytes. The notation makes them visible in documentation.

MIME Types

HSV defines two MIME types based on binary content:

MIME TypeUse WhenEncoding
text/hsv; charset=utf-8 Text-safe HSV (no DLE blocks) Valid UTF-8
application/hsv Binary HSV (may contain DLE blocks) Arbitrary bytes

When to Use Each

text/hsv: Use when content is guaranteed UTF-8 safe. This includes:

application/hsv: Use when content may contain raw binary. This includes:

The C1 control codes (SSA, ESA, SPA, EPA at 0x86-0x87, 0x96-0x97) are valid Unicode code points with proper UTF-8 encodings (e.g., U+0086 → C2 86). They do not break UTF-8 validity.

Quick Reference

Reserved:  0x01-0x06 0x0E-0x0F 0x11-0x19 0x1C-0x1F 0x86-0x87 0x96-0x97  (25 bytes)
Forbidden: 0x00 0x1A 0x1B  (NUL, SUB, ESC)
Data:      Everything else (all of Unicode + BEL BS TAB LF VT FF CR)

Structure:
  [US]   key:value
  [RS]   properties
  [GS]   array items
  [FS]   records

Framing:   [SOH] ... [STX] ... [ETX]  (header, data, end)
Nesting:   [SSA] ... [ESA]  (0x86/0x87, in values or bodies)
Container: [SOH] ... [STX] [SSA] ... [ESA] [ETX]  (directory, archive, document)
Binary:    [SO] shifted-byte [SI]  (shift encoding for control codes)
           [DLE] [SPA] raw-bytes [DLE] [EPA]  (DLE transparency for blobs)
Protocol:  [ENQ] / [ACK] / [NAK] / [CAN]
Flow:      [DC1] / [DC3] / [SYN]  (XON / XOFF / sync)
Device:    [DC2] / [DC4]  (connect / disconnect)
Chunked:   [ETB] / [EM]  (block done / data exhausted)
Stream:    [EOT]  (end of stream)

Extension:   .hsv
MIME types:  text/hsv; charset=utf-8  (text-safe, no DLE)
             application/hsv          (binary, may have DLE)