「‍」 Lingenic

Shift Encoding

Binary-safe encoding using [SO] and [SI]

Shift Encoding uses two ASCII control characters to make any binary data safe for text transport. Zero overhead for text. Efficient batching for binary. No classification needed.

The Idea

Control codes (0x00-0x1F, 0x7F-0x9F) break text systems. [NUL] terminates strings. [ETX] ends transmissions. [ESC] triggers escape sequences.

Shift encoding moves control codes to safe ranges:

[SO] <shifted bytes> [SI]

Inside [SO] ... [SI], control codes are shifted to non-control ranges. Outside, bytes pass through unchanged.

What Gets Shifted

Pass through (7 bytes):

ByteNamePurpose
0x07[BEL]Bell
0x08[BS]Backspace
0x09[TAB]Horizontal tab
0x0A[LF]Line feed
0x0B[VT]Vertical tab
0x0C[FF]Form feed
0x0D[CR]Carriage return

SPACE (0x20) and NBSP (0xA0) are outside the control ranges — they pass through naturally.

Text files pass through unchanged. Zero overhead.

Complete Shift Mapping

C0 Controls → Printable ASCII (+0x21)

FromNameToChar
0x00[NUL]0x21!
0x01[SOH]0x22"
0x02[STX]0x23#
0x03[ETX]0x24$
0x04[EOT]0x25%
0x05[ENQ]0x26&
0x06[ACK]0x27'
0x07-0x0D pass through (see above)
0x0E[SO]0x2F/
0x0F[SI]0x300
0x10[DLE]0x311
0x11[DC1]0x322
0x12[DC2]0x333
0x13[DC3]0x344
0x14[DC4]0x355
0x15[NAK]0x366
0x16[SYN]0x377
0x17[ETB]0x388
0x18[CAN]0x399
0x19[EM]0x3A:
0x1A[SUB]0x3B;
0x1B[ESC]0x3C<
0x1C[FS]0x3D=
0x1D[GS]0x3E>
0x1E[RS]0x3F?
0x1F[US]0x40@

C0 shifted: 25 bytes (7 from 0x00-0x06, 3 from 0x0E-0x10, 15 from 0x11-0x1F)

DEL + C1 Controls → Printable ASCII (−0x3E)

FromNameToChar
0x7F[DEL]0x41A
0x80[PAD]0x42B
0x81[HOP]0x43C
0x82[BPH]0x44D
0x83[NBH]0x45E
0x84[IND]0x46F
0x85[NEL]0x47G
0x86[SSA]0x48H
0x87[ESA]0x49I
0x88[HTS]0x4AJ
0x89[HTJ]0x4BK
0x8A[VTS]0x4CL
0x8B[PLD]0x4DM
0x8C[PLU]0x4EN
0x8D[RI]0x4FO
0x8E[SS2]0x50P
0x8F[SS3]0x51Q
0x90[DCS]0x52R
0x91[PU1]0x53S
0x92[PU2]0x54T
0x93[STS]0x55U
0x94[CCH]0x56V
0x95[MW]0x57W
0x96[SPA]0x58X
0x97[EPA]0x59Y
0x98[SOS]0x5AZ
0x99[SGCI]0x5B[
0x9A[SCI]0x5C\
0x9B[CSI]0x5D]
0x9C[ST]0x5E^
0x9D[OSC]0x5F_
0x9E[PM]0x60`
0x9F[APC]0x61a

DEL+C1 shifted: 33 bytes (0x7F-0x9F)

Total shifted: 58 control codes. All targets are ASCII (0x21-0x61). Output is valid UTF-8.

Binary Transparency Mode

For raw binary blobs, use DLE transparency (BISYNC-style) instead of shift encoding:

[DLE] [SPA] raw bytes [DLE] [EPA]

Inside the block, all bytes pass through literally. Only DLE needs escaping.

Two modes, different jobs:

ModeUse caseOverhead
SO/SI shiftText with occasional control codes2 bytes per control run
DLE transparencyRaw binary blobs1 byte per DLE in data

Shift encoding is the default. DLE transparency is for when you have dense binary and don't want per-byte shifting overhead.

Batching

Consecutive control codes share one [SO] / [SI] pair:

Input:  0x00 0x00 0x00 (3 [NUL] bytes)
Output: [SO] 0x21 0x21 0x21 [SI] (5 bytes)
              !    !    !

Run of 100 [NUL] bytes: 102 bytes ( [SO] + 100 × 0x21 + [SI] ).

Efficiency

Content TypeOverhead
Pure text (no controls)0%
Text with newlines/tabs0%
Mixed text/binary2 bytes per control run
Pure binary (worst case)~15% typical

Compare to base64: 33% overhead always.

Why Not Base64?

Base64

Is this text or binary?
If binary, encode whole chunk
If mixed, need delimiters
33% overhead, always

Shift Encoding

Don't classify. Just encode.
Adapts byte-by-byte.
Text passes through.
Binary gets shifted.

Base64 is block-oriented — you encode a classified thing.
Shift is byte-oriented — you encode a stream.

Example

PNG header bytes: 0x89 0x50 0x4E 0x47 0x0D 0x0A 0x1A 0x0A

[SO] K [SI] P N G [CR] [LF] [SO] ; [SI] [LF]
     ↑                           ↑
   0x89                        0x1A
   −0x3E                       +0x21

Implementation

Shift Encode

void shift_encode(uint8_t *in, size_t len, uint8_t *out) {
    int shifted = 0;
    for (size_t i = 0; i < len; i++) {
        uint8_t b = in[i];
        // Pass through: BEL-CR (0x07-0x0D)
        int pass = (b >= 0x07 && b <= 0x0D);
        int ctrl = !pass && (b < 0x20 || (b >= 0x7F && b < 0xA0));
        if (ctrl && !shifted) { *out++ = 0x0E; shifted = 1; }  // [SO]
        if (!ctrl && shifted) { *out++ = 0x0F; shifted = 0; }  // [SI]
        *out++ = ctrl ? (b < 0x20 ? b + 0x21 : b - 0x3E) : b;
    }
    if (shifted) *out++ = 0x0F;  // [SI]
}

Shift Decode

void shift_decode(uint8_t *in, size_t len, uint8_t *out) {
    int shifted = 0;
    for (size_t i = 0; i < len; i++) {
        uint8_t b = in[i];
        if (b == 0x0E) shifted = 1;        // [SO]
        else if (b == 0x0F) shifted = 0;   // [SI]
        else *out++ = shifted ? (b < 0x41 ? b - 0x21 : b + 0x3E) : b;
    }
}

DLE Transparency (raw binary)

// Wrap raw binary in DLE transparency
void dle_wrap(uint8_t *in, size_t len, uint8_t *out) {
    *out++ = 0x10; *out++ = 0x96;  // [DLE] [SPA]
    for (size_t i = 0; i < len; i++) {
        if (in[i] == 0x10) *out++ = 0x10;  // escape DLE
        *out++ = in[i];
    }
    *out++ = 0x10; *out++ = 0x97;  // [DLE] [EPA]
}

// Unwrap DLE transparency block
void dle_unwrap(uint8_t *in, size_t len, uint8_t *out) {
    // Skip [DLE] [SPA] at start, [DLE] [EPA] at end
    for (size_t i = 2; i < len - 2; i++) {
        if (in[i] == 0x10) i++;  // skip escape, take next byte
        *out++ = in[i];
    }
}

Properties

Etymology

[SO] (Shift Out) and [SI] (Shift In) were defined in ASCII (1963) for modal character set switching.

[DLE] (Data Link Escape) was defined for binary transparency in BISYNC (1960s). [SPA] (Start of Protected Area) and [EPA] (End of Protected Area) are C1 controls from ISO 6429 (1983).

The solutions were always there. We just forgot to use them.