Binary-safe encoding using [SO] and [SI]
Shift Encoding uses two ASCII control characters to make any binary data safe for text transport. Zero overhead for text. Efficient batching for binary. No classification needed.
Control codes (0x00-0x1F, 0x7F-0x9F) break text systems. [NUL] terminates strings. [ETX] ends transmissions. [ESC] triggers escape sequences.
Shift encoding moves control codes to safe ranges:
[SO] <shifted bytes> [SI]
Inside [SO] ... [SI], control codes are shifted to non-control ranges. Outside, bytes pass through unchanged.
Pass through (7 bytes):
| Byte | Name | Purpose |
|---|---|---|
| 0x07 | [BEL] | Bell |
| 0x08 | [BS] | Backspace |
| 0x09 | [TAB] | Horizontal tab |
| 0x0A | [LF] | Line feed |
| 0x0B | [VT] | Vertical tab |
| 0x0C | [FF] | Form feed |
| 0x0D | [CR] | Carriage return |
SPACE (0x20) and NBSP (0xA0) are outside the control ranges — they pass through naturally.
Text files pass through unchanged. Zero overhead.
| From | Name | To | Char |
|---|---|---|---|
| 0x00 | [NUL] | 0x21 | ! |
| 0x01 | [SOH] | 0x22 | " |
| 0x02 | [STX] | 0x23 | # |
| 0x03 | [ETX] | 0x24 | $ |
| 0x04 | [EOT] | 0x25 | % |
| 0x05 | [ENQ] | 0x26 | & |
| 0x06 | [ACK] | 0x27 | ' |
| 0x07-0x0D pass through (see above) | |||
| 0x0E | [SO] | 0x2F | / |
| 0x0F | [SI] | 0x30 | 0 |
| 0x10 | [DLE] | 0x31 | 1 |
| 0x11 | [DC1] | 0x32 | 2 |
| 0x12 | [DC2] | 0x33 | 3 |
| 0x13 | [DC3] | 0x34 | 4 |
| 0x14 | [DC4] | 0x35 | 5 |
| 0x15 | [NAK] | 0x36 | 6 |
| 0x16 | [SYN] | 0x37 | 7 |
| 0x17 | [ETB] | 0x38 | 8 |
| 0x18 | [CAN] | 0x39 | 9 |
| 0x19 | [EM] | 0x3A | : |
| 0x1A | [SUB] | 0x3B | ; |
| 0x1B | [ESC] | 0x3C | < |
| 0x1C | [FS] | 0x3D | = |
| 0x1D | [GS] | 0x3E | > |
| 0x1E | [RS] | 0x3F | ? |
| 0x1F | [US] | 0x40 | @ |
C0 shifted: 25 bytes (7 from 0x00-0x06, 3 from 0x0E-0x10, 15 from 0x11-0x1F)
| From | Name | To | Char |
|---|---|---|---|
| 0x7F | [DEL] | 0x41 | A |
| 0x80 | [PAD] | 0x42 | B |
| 0x81 | [HOP] | 0x43 | C |
| 0x82 | [BPH] | 0x44 | D |
| 0x83 | [NBH] | 0x45 | E |
| 0x84 | [IND] | 0x46 | F |
| 0x85 | [NEL] | 0x47 | G |
| 0x86 | [SSA] | 0x48 | H |
| 0x87 | [ESA] | 0x49 | I |
| 0x88 | [HTS] | 0x4A | J |
| 0x89 | [HTJ] | 0x4B | K |
| 0x8A | [VTS] | 0x4C | L |
| 0x8B | [PLD] | 0x4D | M |
| 0x8C | [PLU] | 0x4E | N |
| 0x8D | [RI] | 0x4F | O |
| 0x8E | [SS2] | 0x50 | P |
| 0x8F | [SS3] | 0x51 | Q |
| 0x90 | [DCS] | 0x52 | R |
| 0x91 | [PU1] | 0x53 | S |
| 0x92 | [PU2] | 0x54 | T |
| 0x93 | [STS] | 0x55 | U |
| 0x94 | [CCH] | 0x56 | V |
| 0x95 | [MW] | 0x57 | W |
| 0x96 | [SPA] | 0x58 | X |
| 0x97 | [EPA] | 0x59 | Y |
| 0x98 | [SOS] | 0x5A | Z |
| 0x99 | [SGCI] | 0x5B | [ |
| 0x9A | [SCI] | 0x5C | \ |
| 0x9B | [CSI] | 0x5D | ] |
| 0x9C | [ST] | 0x5E | ^ |
| 0x9D | [OSC] | 0x5F | _ |
| 0x9E | [PM] | 0x60 | ` |
| 0x9F | [APC] | 0x61 | a |
DEL+C1 shifted: 33 bytes (0x7F-0x9F)
Total shifted: 58 control codes. All targets are ASCII (0x21-0x61). Output is valid UTF-8.
For raw binary blobs, use DLE transparency (BISYNC-style) instead of shift encoding:
[DLE] [SPA] raw bytes [DLE] [EPA]
Inside the block, all bytes pass through literally. Only DLE needs escaping.
Two modes, different jobs:
| Mode | Use case | Overhead |
|---|---|---|
| SO/SI shift | Text with occasional control codes | 2 bytes per control run |
| DLE transparency | Raw binary blobs | 1 byte per DLE in data |
Shift encoding is the default. DLE transparency is for when you have dense binary and don't want per-byte shifting overhead.
Consecutive control codes share one [SO] / [SI] pair:
Input: 0x00 0x00 0x00 (3 [NUL] bytes)
Output: [SO] 0x21 0x21 0x21 [SI] (5 bytes)
! ! !
Run of 100 [NUL] bytes: 102 bytes ( [SO] + 100 × 0x21 + [SI] ).
| Content Type | Overhead |
|---|---|
| Pure text (no controls) | 0% |
| Text with newlines/tabs | 0% |
| Mixed text/binary | 2 bytes per control run |
| Pure binary (worst case) | ~15% typical |
Compare to base64: 33% overhead always.
Is this text or binary?
If binary, encode whole chunk
If mixed, need delimiters
33% overhead, always
Don't classify. Just encode.
Adapts byte-by-byte.
Text passes through.
Binary gets shifted.
Base64 is block-oriented — you encode a classified thing.
Shift is byte-oriented — you encode a stream.
PNG header bytes: 0x89 0x50 0x4E 0x47 0x0D 0x0A 0x1A 0x0A
[SO] K [SI] P N G [CR] [LF] [SO] ; [SI] [LF]
↑ ↑
0x89 0x1A
−0x3E +0x21
void shift_encode(uint8_t *in, size_t len, uint8_t *out) {
int shifted = 0;
for (size_t i = 0; i < len; i++) {
uint8_t b = in[i];
// Pass through: BEL-CR (0x07-0x0D)
int pass = (b >= 0x07 && b <= 0x0D);
int ctrl = !pass && (b < 0x20 || (b >= 0x7F && b < 0xA0));
if (ctrl && !shifted) { *out++ = 0x0E; shifted = 1; } // [SO]
if (!ctrl && shifted) { *out++ = 0x0F; shifted = 0; } // [SI]
*out++ = ctrl ? (b < 0x20 ? b + 0x21 : b - 0x3E) : b;
}
if (shifted) *out++ = 0x0F; // [SI]
}
void shift_decode(uint8_t *in, size_t len, uint8_t *out) {
int shifted = 0;
for (size_t i = 0; i < len; i++) {
uint8_t b = in[i];
if (b == 0x0E) shifted = 1; // [SO]
else if (b == 0x0F) shifted = 0; // [SI]
else *out++ = shifted ? (b < 0x41 ? b - 0x21 : b + 0x3E) : b;
}
}
// Wrap raw binary in DLE transparency
void dle_wrap(uint8_t *in, size_t len, uint8_t *out) {
*out++ = 0x10; *out++ = 0x96; // [DLE] [SPA]
for (size_t i = 0; i < len; i++) {
if (in[i] == 0x10) *out++ = 0x10; // escape DLE
*out++ = in[i];
}
*out++ = 0x10; *out++ = 0x97; // [DLE] [EPA]
}
// Unwrap DLE transparency block
void dle_unwrap(uint8_t *in, size_t len, uint8_t *out) {
// Skip [DLE] [SPA] at start, [DLE] [EPA] at end
for (size_t i = 2; i < len - 2; i++) {
if (in[i] == 0x10) i++; // skip escape, take next byte
*out++ = in[i];
}
}
[SO] (Shift Out) and [SI] (Shift In) were defined in ASCII (1963) for modal character set switching.
[DLE] (Data Link Escape) was defined for binary transparency in BISYNC (1960s). [SPA] (Start of Protected Area) and [EPA] (End of Protected Area) are C1 controls from ISO 6429 (1983).
The solutions were always there. We just forgot to use them.