Type Encoding Rules

← Codec Overview | API | Type Rules | Examples | RLP Comparison

Overview

This document specifies the exact encoding and decoding rules for each supported type in the Sigilaris byte codec. These rules serve as the technical specification for the codec implementation.

Key Principles:

Deterministic: Same input always produces same output
Space-efficient: Small values use fewer bytes
Reversible: Decoding inverts encoding exactly (roundtrip property)
Type-safe: Errors are caught at compile time when possible

Primitive Types

Unit

Encoding Rule:

Unit → empty byte sequence (ByteVector.empty)

Decoding Rule:

Consumes no bytes, returns Unit and original byte sequence as remainder

Examples:

import org.sigilaris.core.codec.byte.*
import scodec.bits.ByteVector

val unitEncoded = ByteEncoder[Unit].encode(())
// Result: ByteVector(empty)

val unitDecoded = ByteDecoder[Unit].decode(ByteVector(0x01, 0x02))
// Result: Right(DecodeResult((), ByteVector(0x01, 0x02)))

Use Case: Unit is useful as a marker or flag in product types where a field's presence is significant but it carries no data.

Byte

Encoding Rule:

Byte → single byte

Decoding Rule:

Read 1 byte, return as Byte with remainder

Examples:

val b: Byte = 0x42
val byteEncoded = ByteEncoder[Byte].encode(b)
// Result: ByteVector(0x42)

val byteDecoded = ByteDecoder[Byte].decode(byteEncoded)
// Result: Right(DecodeResult(0x42, ByteVector(empty)))

Long

Encoding Rule:

Long → 8-byte big-endian representation

Decoding Rule:

Read 8 bytes, interpret as big-endian Long

Examples:

val n: Long = 42L
val longEncoded = ByteEncoder[Long].encode(n)
// Result: ByteVector(0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x2a)

val longDecoded = ByteDecoder[Long].decode(longEncoded)
// Result: Right(DecodeResult(42L, ByteVector(empty)))

Note: Long uses fixed 8-byte encoding for simplicity and consistency. For space-efficient integer encoding, use BigInt instead.

Instant

Encoding Rule:

Instant → epoch milliseconds as Long (8 bytes)

Decoding Rule:

Read 8 bytes as Long, convert to Instant via Instant.ofEpochMilli

Examples:

import java.time.Instant

val timestamp = Instant.parse("2024-01-01T00:00:00Z")
val instantEncoded = ByteEncoder[Instant].encode(timestamp)
// Result: epoch milliseconds encoded as Long

val instantDecoded = ByteDecoder[Instant].decode(instantEncoded)
// Result: Right(DecodeResult(timestamp, ByteVector(empty)))

Numeric Types

BigNat (Natural Numbers)

BigNat represents non-negative integers (0, 1, 2, ...) with variable-length encoding.

Type Definition:

type BigNat = BigInt :| Positive0  // non-negative BigInt

Encoding Rules:

The encoding uses three ranges for space efficiency:

Single-byte range (0x00 ~ 0x80): Values 0-128

value n (0 ≤ n ≤ 128) → single byte 0xnn

Short data range (0x81 ~ 0xf7): Data length 1-119 bytes

[0x80 + data_length][data_bytes]
data_length: 1 to 119 (0xf7 - 0x80)

Long data range (0xf8 ~ 0xff): Data length 120+ bytes

[0xf8 + (length_byte_count - 1)][length_bytes][data_bytes]
length_byte_count: 1 to 8

Encoding Examples:

Value	Encoded Bytes	Explanation
0	`0x00`	Single byte
1	`0x01`	Single byte
128	`0x80`	Single byte (inclusive)
129	`0x81 81`	Length 1, data 0x81
255	`0x81 ff`	Length 1, data 0xff
256	`0x82 01 00`	Length 2, data 0x0100
65535	`0x82 ff ff`	Length 2, data 0xffff
65536	`0x83 01 00 00`	Length 3, data 0x010000

Decoding Algorithm:

def decodeBigNat(bytes: ByteVector): (BigNat, ByteVector) =
  val head = bytes.head & 0xff

  if head <= 0x80 then
    // Single byte: value is 0-128
    (BigInt(head), bytes.tail)

  else if head <= 0xf7 then
    // Short data: 1-119 byte data
    val dataLength = head - 0x80
    val (dataBytes, remainder) = bytes.tail.splitAt(dataLength)
    (BigInt(1, dataBytes.toArray), remainder)

  else
    // Long data: 120+ byte data
    val lengthByteCount = head - 0xf7
    val (lengthBytes, afterLength) = bytes.tail.splitAt(lengthByteCount)
    val dataLength = BigInt(1, lengthBytes.toArray).toLong
    val (dataBytes, remainder) = afterLength.splitAt(dataLength)
    (BigInt(1, dataBytes.toArray), remainder)

Roundtrip Property:

import io.github.iltotore.iron.*
import io.github.iltotore.iron.constraint.numeric.Positive0

def testRoundtrip(n: BigInt :| Positive0): Boolean =
  val bignatEncoded = ByteEncoder[BigInt :| Positive0].encode(n)
  val bignatDecoded = ByteDecoder[BigInt :| Positive0].decode(bignatEncoded)
  bignatDecoded match
    case Right(DecodeResult(value, remainder)) =>
      value == n && remainder.isEmpty
    case Left(_) => false

// All these should return true:
// testRoundtrip(BigInt(0).refineUnsafe)
// testRoundtrip(BigInt(128).refineUnsafe)
// testRoundtrip(BigInt(255).refineUnsafe)
// testRoundtrip(BigInt(65536).refineUnsafe)

BigInt (Signed Integers)

BigInt extends BigNat encoding with sign information.

Encoding Rules:

Convert signed BigInt to BigNat using sign-magnitude encoding:

n >= 0  →  encode (n * 2) as BigNat
n < 0   →  encode (n * (-2) + 1) as BigNat

Key Insight:

Even numbers represent positive values: 2n → n
Odd numbers represent negative values: 2n+1 → -(n+1)

Encoding Examples:

Value	Transformation	BigNat	Encoded
-2	(-2) * (-2) + 1 = 5	5	`0x05`
-1	(-1) * (-2) + 1 = 3	3	`0x03`
0	0 * 2 = 0	0	`0x00`
1	1 * 2 = 2	2	`0x02`
2	2 * 2 = 4	4	`0x04`
127	127 * 2 = 254	254	`0x81 fe`
-128	-128 * (-2) + 1 = 257	257	`0x82 01 01`

Decoding Rules:

Given decoded BigNat x:

x % 2 == 0  →  x / 2        // even: positive
x % 2 == 1  →  (x - 1) / (-2)  // odd: negative

Roundtrip Verification:

Value	Encode	Decode Check
-2	5	(5-1)/(-2) = -2 ✓
-1	3	(3-1)/(-2) = -1 ✓
0	0	0/2 = 0 ✓
1	2	2/2 = 1 ✓
2	4	4/2 = 2 ✓

Space Efficiency:

Small integers (positive and negative) encode efficiently:

Values -64 to 64: single byte
Values -16384 to 16383: two bytes

Product Types

Tuples

Tuples are encoded as concatenated fields in order:

(A, B) → [A encoded][B encoded]
(A, B, C) → [A encoded][B encoded][C encoded]

Encoding:

val tuple = (42L, 100L)
val tupleEncoded = ByteEncoder[(Long, Long)].encode(tuple)
// Result: [42L encoded][100L encoded]

Decoding: Fields are decoded sequentially, consuming bytes from left to right.

Case Classes

Case classes use automatic derivation via Mirror.ProductOf:

case class User(id: Long, balance: Long)
→ [id encoded][balance encoded]

Encoding:

case class User(id: Long, balance: Long)

val user = User(1L, 100L)
val userEncoded = ByteEncoder[User].encode(user)
// Result: [id encoded][balance encoded]

Field Order: Fields are encoded in the order they appear in the case class definition.

Collection Types

List

Lists preserve order and encode with size prefix:

List(a1, a2, ..., an) → [size:BigNat][a1][a2]...[an]

Encoding:

val list = List(1, 2, 3).map(BigInt(_))
val listEncoded = ByteEncoder[List[BigInt]].encode(list)
// Result: [0x03][0x02][0x04][0x06]
//         size=3, then 1→2, 2→4, 3→6

Decoding: 1. Decode size as BigNat 2. Decode exactly size elements 3. Return list and remainder

Empty List:

List() → [0x00]  // size = 0

Option

Option is encoded as zero or one-element list:

None → [0x00]  // size = 0
Some(x) → [0x01][x encoded]  // size = 1, element x

Encoding:

val some: Option[Long] = Some(42L)
val someEncoded = ByteEncoder[Option[Long]].encode(some)
// Result: [0x01][42L encoded as 8 bytes]

val none: Option[Long] = None
val noneEncoded = ByteEncoder[Option[Long]].encode(none)
// Result: [0x00]

Why This Works: The context (type) distinguishes Option[Long] from Long. The byte 0x00 means:

In BigNat context: natural number 0
In Option[A] context: None (zero elements)

Set

Sets encode with deterministic ordering:

Set(a1, a2, ..., an) → [size:BigNat][sorted_a1][sorted_a2]...[sorted_an]

Deterministic Sorting: 1. Encode each element to bytes 2. Sort encoded bytes lexicographically 3. Concatenate with size prefix

Encoding:

val set = Set(3, 1, 2).map(BigInt(_))
val setEncoded = ByteEncoder[Set[BigInt]].encode(set)
// Elements encode as: 3→0x06, 1→0x02, 2→0x04
// Sorted: 0x02, 0x04, 0x06
// Result: [0x03][0x02][0x04][0x06]

Why Sorting: Set iteration order is undefined in Scala. Sorting encoded bytes ensures the same Set always produces the same byte sequence, which is critical for blockchain hashing and signing.

Lexicographic Order: Bytes are compared left to right:

0x01 < 0x02 < 0x03 < ... < 0xff
0x01 0x00 < 0x01 0x01
0x01 0xff < 0x02 0x00

Map

Maps are encoded as deterministically sorted sets of tuples:

Map(k1 → v1, k2 → v2) → Set((k1, v1), (k2, v2))
→ [size:BigNat][sorted_tuple1][sorted_tuple2]...

Encoding:

val map = Map(1L -> 10L, 2L -> 20L)
val mapEncoded = ByteEncoder[Map[Long, Long]].encode(map)
// Each entry (1L, 10L) is encoded as tuple
// Tuples are sorted by their encoded bytes
// Result: [size][sorted entries]

Tuple Encoding: Each (K, V) pair is encoded as product: [K encoded][V encoded]

Deterministic Ordering: Like Set, Map entries are sorted by their encoded tuple bytes, ensuring consistent encoding regardless of Map iteration order.

Custom Types

Using contramap (Encoder)

Create encoder for custom type by transforming to existing type:

case class UserId(value: Long)

given ByteEncoder[UserId] = ByteEncoder[Long].contramap(_.value)

Using map/emap (Decoder)

Create decoder by transforming decoded value:

import org.sigilaris.core.failure.DecodeFailure
import cats.syntax.either.*

// Simple transformation
given ByteDecoder[UserId] = ByteDecoder[Long].map(UserId(_))

// With validation
case class PositiveInt(value: Int)

given ByteDecoder[PositiveInt] = ByteDecoder[Long].emap: n =>
  if n > 0 && n <= Int.MaxValue then
    PositiveInt(n.toInt).asRight
  else
    DecodeFailure(s"Value $n is not a positive Int").asLeft

Error Cases

Decoding Failures

Common error scenarios:

Insufficient bytes:

val incomplete = ByteVector(0x01)  // claims length 1, but no data
val result = ByteDecoder[BigInt :| Positive0].decode(incomplete)
// Result: Left(DecodeFailure("Insufficient bytes..."))

Empty bytes for BigNat:

val empty = ByteVector.empty
val result2 = ByteDecoder[BigInt :| Positive0].decode(empty)
// Result: Left(DecodeFailure("Empty bytes"))

Validation failures:

// Custom validation in emap
val negative = ByteVector(0x03)  // encodes -1 as BigInt
// If decoded as PositiveInt, validation fails

Performance Characteristics

Space Complexity

Type	Space	Notes
Unit	0 bytes	No data
Byte	1 byte	Fixed
Long	8 bytes	Fixed
Instant	8 bytes	Fixed
BigInt 0-64	1 byte	Single byte range
BigInt 65-128	1 byte	Single byte range
BigInt 129-32767	3 bytes	Short data range
`List[A]` n elements	1+ + n*sizeof(A)	Size prefix + elements
`Set[A]` n elements	1+ + n*sizeof(A)	Size prefix + sorted
`Map[K,V]` n entries	1+ + n*(sizeof(K)+sizeof(V))	Size + sorted tuples

Time Complexity

Operation	Complexity	Notes
Encode primitive	O(1)	Constant time
Encode BigNat	O(log n)	Proportional to value size
Encode `List[A]`	O(n)	Linear in list size
Encode `Set[A]`	O(n log n)	Due to sorting
Encode `Map[K,V]`	O(n log n)	Due to sorting
Decode primitive	O(1)	Constant time
Decode BigNat	O(log n)	Read variable bytes
Decode `List[A]`	O(n)	Linear in list size
Decode `Set[A]`	O(n)	No sorting needed
Decode `Map[K,V]`	O(n)	No sorting needed

Roundtrip Property

For all supported types, the following must hold:

encode(decode(encode(value))) == encode(value)
decode(encode(value)) == Right(DecodeResult(value, ByteVector.empty))

This property is verified via property-based tests using hedgehog-munit.

← Codec Overview | API | Type Rules | Examples | RLP Comparison