Type Encoding Rules

← Codec Overview | API | Type Rules | Examples | RLP Comparison


Overview

This document specifies the exact encoding and decoding rules for each supported type in the Sigilaris byte codec. These rules serve as the technical specification for the codec implementation.

Key Principles:

Primitive Types

Unit

Encoding Rule:

Unit → empty byte sequence (ByteVector.empty)

Decoding Rule:

Consumes no bytes, returns Unit and original byte sequence as remainder

Examples:

import org.sigilaris.core.codec.byte.*
import scodec.bits.ByteVector

val unitEncoded = ByteEncoder[Unit].encode(())
// Result: ByteVector(empty)

val unitDecoded = ByteDecoder[Unit].decode(ByteVector(0x01, 0x02))
// Result: Right(DecodeResult((), ByteVector(0x01, 0x02)))

Use Case: Unit is useful as a marker or flag in product types where a field's presence is significant but it carries no data.

Byte

Encoding Rule:

Byte → single byte

Decoding Rule:

Read 1 byte, return as Byte with remainder

Examples:

val b: Byte = 0x42
val byteEncoded = ByteEncoder[Byte].encode(b)
// Result: ByteVector(0x42)

val byteDecoded = ByteDecoder[Byte].decode(byteEncoded)
// Result: Right(DecodeResult(0x42, ByteVector(empty)))

Long

Encoding Rule:

Long → 8-byte big-endian representation

Decoding Rule:

Read 8 bytes, interpret as big-endian Long

Examples:

val n: Long = 42L
val longEncoded = ByteEncoder[Long].encode(n)
// Result: ByteVector(0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x2a)

val longDecoded = ByteDecoder[Long].decode(longEncoded)
// Result: Right(DecodeResult(42L, ByteVector(empty)))

Note: Long uses fixed 8-byte encoding for simplicity and consistency. For space-efficient integer encoding, use BigInt instead.

Instant

Encoding Rule:

Instant → epoch milliseconds as Long (8 bytes)

Decoding Rule:

Read 8 bytes as Long, convert to Instant via Instant.ofEpochMilli

Examples:

import java.time.Instant

val timestamp = Instant.parse("2024-01-01T00:00:00Z")
val instantEncoded = ByteEncoder[Instant].encode(timestamp)
// Result: epoch milliseconds encoded as Long

val instantDecoded = ByteDecoder[Instant].decode(instantEncoded)
// Result: Right(DecodeResult(timestamp, ByteVector(empty)))

Numeric Types

BigNat (Natural Numbers)

BigNat represents non-negative integers (0, 1, 2, ...) with variable-length encoding.

Type Definition:

type BigNat = BigInt :| Positive0  // non-negative BigInt

Encoding Rules:

The encoding uses three ranges for space efficiency:

  1. Single-byte range (0x00 ~ 0x80): Values 0-128

    value n (0 ≤ n ≤ 128) → single byte 0xnn
  2. Short data range (0x81 ~ 0xf7): Data length 1-119 bytes

    [0x80 + data_length][data_bytes]
    data_length: 1 to 119 (0xf7 - 0x80)
  3. Long data range (0xf8 ~ 0xff): Data length 120+ bytes

    [0xf8 + (length_byte_count - 1)][length_bytes][data_bytes]
    length_byte_count: 1 to 8

Encoding Examples:

Value Encoded Bytes Explanation
0 0x00 Single byte
1 0x01 Single byte
128 0x80 Single byte (inclusive)
129 0x81 81 Length 1, data 0x81
255 0x81 ff Length 1, data 0xff
256 0x82 01 00 Length 2, data 0x0100
65535 0x82 ff ff Length 2, data 0xffff
65536 0x83 01 00 00 Length 3, data 0x010000

Decoding Algorithm:

def decodeBigNat(bytes: ByteVector): (BigNat, ByteVector) =
  val head = bytes.head & 0xff

  if head <= 0x80 then
    // Single byte: value is 0-128
    (BigInt(head), bytes.tail)

  else if head <= 0xf7 then
    // Short data: 1-119 byte data
    val dataLength = head - 0x80
    val (dataBytes, remainder) = bytes.tail.splitAt(dataLength)
    (BigInt(1, dataBytes.toArray), remainder)

  else
    // Long data: 120+ byte data
    val lengthByteCount = head - 0xf7
    val (lengthBytes, afterLength) = bytes.tail.splitAt(lengthByteCount)
    val dataLength = BigInt(1, lengthBytes.toArray).toLong
    val (dataBytes, remainder) = afterLength.splitAt(dataLength)
    (BigInt(1, dataBytes.toArray), remainder)

Roundtrip Property:

import io.github.iltotore.iron.*
import io.github.iltotore.iron.constraint.numeric.Positive0

def testRoundtrip(n: BigInt :| Positive0): Boolean =
  val bignatEncoded = ByteEncoder[BigInt :| Positive0].encode(n)
  val bignatDecoded = ByteDecoder[BigInt :| Positive0].decode(bignatEncoded)
  bignatDecoded match
    case Right(DecodeResult(value, remainder)) =>
      value == n && remainder.isEmpty
    case Left(_) => false

// All these should return true:
// testRoundtrip(BigInt(0).refineUnsafe)
// testRoundtrip(BigInt(128).refineUnsafe)
// testRoundtrip(BigInt(255).refineUnsafe)
// testRoundtrip(BigInt(65536).refineUnsafe)

BigInt (Signed Integers)

BigInt extends BigNat encoding with sign information.

Encoding Rules:

Convert signed BigInt to BigNat using sign-magnitude encoding:

n >= 0encode (n * 2) as BigNat
n < 0encode (n * (-2) + 1) as BigNat

Key Insight:

Encoding Examples:

Value Transformation BigNat Encoded
-2 (-2) * (-2) + 1 = 5 5 0x05
-1 (-1) * (-2) + 1 = 3 3 0x03
0 0 * 2 = 0 0 0x00
1 1 * 2 = 2 2 0x02
2 2 * 2 = 4 4 0x04
127 127 * 2 = 254 254 0x81 fe
-128 -128 * (-2) + 1 = 257 257 0x82 01 01

Decoding Rules:

Given decoded BigNat x:

x % 2 == 0x / 2        // even: positive
x % 2 == 1  →  (x - 1) / (-2)  // odd: negative

Roundtrip Verification:

Value Encode Decode Check
-2 5 (5-1)/(-2) = -2 ✓
-1 3 (3-1)/(-2) = -1 ✓
0 0 0/2 = 0 ✓
1 2 2/2 = 1 ✓
2 4 4/2 = 2 ✓

Space Efficiency:

Small integers (positive and negative) encode efficiently:

Product Types

Tuples

Tuples are encoded as concatenated fields in order:

(A, B) → [A encoded][B encoded]
(A, B, C) → [A encoded][B encoded][C encoded]

Encoding:

val tuple = (42L, 100L)
val tupleEncoded = ByteEncoder[(Long, Long)].encode(tuple)
// Result: [42L encoded][100L encoded]

Decoding: Fields are decoded sequentially, consuming bytes from left to right.

Case Classes

Case classes use automatic derivation via Mirror.ProductOf:

case class User(id: Long, balance: Long)
→ [id encoded][balance encoded]

Encoding:

case class User(id: Long, balance: Long)

val user = User(1L, 100L)
val userEncoded = ByteEncoder[User].encode(user)
// Result: [id encoded][balance encoded]

Field Order: Fields are encoded in the order they appear in the case class definition.

Collection Types

List

Lists preserve order and encode with size prefix:

List(a1, a2, ..., an) → [size:BigNat][a1][a2]...[an]

Encoding:

val list = List(1, 2, 3).map(BigInt(_))
val listEncoded = ByteEncoder[List[BigInt]].encode(list)
// Result: [0x03][0x02][0x04][0x06]
//         size=3, then 1→2, 2→4, 3→6

Decoding: 1. Decode size as BigNat 2. Decode exactly size elements 3. Return list and remainder

Empty List:

List() → [0x00]  // size = 0

Option

Option is encoded as zero or one-element list:

None → [0x00]  // size = 0
Some(x) → [0x01][x encoded]  // size = 1, element x

Encoding:

val some: Option[Long] = Some(42L)
val someEncoded = ByteEncoder[Option[Long]].encode(some)
// Result: [0x01][42L encoded as 8 bytes]

val none: Option[Long] = None
val noneEncoded = ByteEncoder[Option[Long]].encode(none)
// Result: [0x00]

Why This Works: The context (type) distinguishes Option[Long] from Long. The byte 0x00 means:

Set

Sets encode with deterministic ordering:

Set(a1, a2, ..., an) → [size:BigNat][sorted_a1][sorted_a2]...[sorted_an]

Deterministic Sorting: 1. Encode each element to bytes 2. Sort encoded bytes lexicographically 3. Concatenate with size prefix

Encoding:

val set = Set(3, 1, 2).map(BigInt(_))
val setEncoded = ByteEncoder[Set[BigInt]].encode(set)
// Elements encode as: 3→0x06, 1→0x02, 2→0x04
// Sorted: 0x02, 0x04, 0x06
// Result: [0x03][0x02][0x04][0x06]

Why Sorting: Set iteration order is undefined in Scala. Sorting encoded bytes ensures the same Set always produces the same byte sequence, which is critical for blockchain hashing and signing.

Lexicographic Order: Bytes are compared left to right:

Map

Maps are encoded as deterministically sorted sets of tuples:

Map(k1 → v1, k2 → v2) → Set((k1, v1), (k2, v2))
→ [size:BigNat][sorted_tuple1][sorted_tuple2]...

Encoding:

val map = Map(1L -> 10L, 2L -> 20L)
val mapEncoded = ByteEncoder[Map[Long, Long]].encode(map)
// Each entry (1L, 10L) is encoded as tuple
// Tuples are sorted by their encoded bytes
// Result: [size][sorted entries]

Tuple Encoding: Each (K, V) pair is encoded as product: [K encoded][V encoded]

Deterministic Ordering: Like Set, Map entries are sorted by their encoded tuple bytes, ensuring consistent encoding regardless of Map iteration order.

Custom Types

Using contramap (Encoder)

Create encoder for custom type by transforming to existing type:

case class UserId(value: Long)

given ByteEncoder[UserId] = ByteEncoder[Long].contramap(_.value)

Using map/emap (Decoder)

Create decoder by transforming decoded value:

import org.sigilaris.core.failure.DecodeFailure
import cats.syntax.either.*

// Simple transformation
given ByteDecoder[UserId] = ByteDecoder[Long].map(UserId(_))

// With validation
case class PositiveInt(value: Int)

given ByteDecoder[PositiveInt] = ByteDecoder[Long].emap: n =>
  if n > 0 && n <= Int.MaxValue then
    PositiveInt(n.toInt).asRight
  else
    DecodeFailure(s"Value $n is not a positive Int").asLeft

Error Cases

Decoding Failures

Common error scenarios:

Insufficient bytes:

val incomplete = ByteVector(0x01)  // claims length 1, but no data
val result = ByteDecoder[BigInt :| Positive0].decode(incomplete)
// Result: Left(DecodeFailure("Insufficient bytes..."))

Empty bytes for BigNat:

val empty = ByteVector.empty
val result2 = ByteDecoder[BigInt :| Positive0].decode(empty)
// Result: Left(DecodeFailure("Empty bytes"))

Validation failures:

// Custom validation in emap
val negative = ByteVector(0x03)  // encodes -1 as BigInt
// If decoded as PositiveInt, validation fails

Performance Characteristics

Space Complexity

Type Space Notes
Unit 0 bytes No data
Byte 1 byte Fixed
Long 8 bytes Fixed
Instant 8 bytes Fixed
BigInt 0-64 1 byte Single byte range
BigInt 65-128 1 byte Single byte range
BigInt 129-32767 3 bytes Short data range
List[A] n elements 1+ + n*sizeof(A) Size prefix + elements
Set[A] n elements 1+ + n*sizeof(A) Size prefix + sorted
Map[K,V] n entries 1+ + n*(sizeof(K)+sizeof(V)) Size + sorted tuples

Time Complexity

Operation Complexity Notes
Encode primitive O(1) Constant time
Encode BigNat O(log n) Proportional to value size
Encode List[A] O(n) Linear in list size
Encode Set[A] O(n log n) Due to sorting
Encode Map[K,V] O(n log n) Due to sorting
Decode primitive O(1) Constant time
Decode BigNat O(log n) Read variable bytes
Decode List[A] O(n) Linear in list size
Decode Set[A] O(n) No sorting needed
Decode Map[K,V] O(n) No sorting needed

Roundtrip Property

For all supported types, the following must hold:

encode(decode(encode(value))) == encode(value)
decode(encode(value)) == Right(DecodeResult(value, ByteVector.empty))

This property is verified via property-based tests using hedgehog-munit.


← Codec Overview | API | Type Rules | Examples | RLP Comparison