Type System Architecture¶
Design Philosophy¶
Memory-Bounded Computing Paradigm¶
Hakka JSON's type system is architected around a fundamental constraint: memory exhaustion occurs before computational limits in large-scale JSON processing. Traditional JSON libraries fail catastrophically when processing billions of small objects—not due to CPU bottlenecks, but because pointer overhead causes Out-Of-Memory (OOM) conditions.
This reality drives our memory-first design philosophy:
- Memory is the primary constraint, not CPU cycles
- Pointer overhead elimination takes precedence over algorithmic optimizations
- Deduplication strategies prevent redundant allocations at the type system level
- Compact representations enable processing datasets 2-10x larger than traditional approaches
Python-First Integration Strategy¶
Unlike typical C++ libraries that retrofit Python bindings, Hakka JSON's type system is designed from inception for Python integration:
- Reference counting semantics mirror Python's object lifetime model
- Explicit memory management enables cross-language boundary safety
- ABI-stable C interface decouples from CPython implementation details
- Zero-copy data sharing between Python and C++ without serialization overhead
Zero-Abstraction Performance Contract¶
Every architectural decision maintains a zero-abstraction guarantee:
- CRTP inheritance eliminates virtual function overhead entirely
- Compile-time polymorphism resolves all type dispatch at build time
- Tagged pointer encoding embeds metadata without storage cost
- Handle indirection compiles to direct array access (2-3 CPU cycles)
Design Constraints & Trade-offs¶
Constraint: Memory Density vs. Access Patterns¶
Design Decision: 32-bit handles instead of 64-bit pointers
Rationale: In JSON workloads with millions of objects, pointer storage dominates memory usage. A 50% reduction in pointer size enables processing 2x larger datasets.
Trade-off:
- ✅ Benefit: 1 billion objects per manager (30-bit index space)
- ❌ Limitation: Cannot address unlimited object counts
- Verdict: Acceptable for practical JSON processing scenarios
Constraint: Type Safety vs. Memory Overhead¶
Design Decision: CRTP inheritance instead of virtual dispatch
Rationale: Virtual function tables add 8 bytes per object—catastrophic overhead for billions of small JSON values.
Trade-off:
- ✅ Benefit: Zero runtime polymorphism cost, aggressive inlining
- ❌ Limitation: More complex template code, longer compile times
- Verdict: Essential for memory-critical applications
Constraint: Thread Safety vs. Performance¶
Design Decision: Manager-level locking with atomic reference counting
Rationale: Fine-grained per-object locking would add 8+ bytes per object. Coarse-grained manager locking amortizes synchronization cost.
Trade-off:
- ✅ Benefit: Minimal memory overhead, simple reasoning model
- ❌ Limitation: Manager contention under high concurrency
- Verdict: Appropriate for typical JSON processing patterns (read-heavy, batch operations)
Constraint: Python Integration vs. C++ Idioms¶
Design Decision: Explicit reference counting instead of RAII
Rationale: Python's reference counting model conflicts with C++ scope-based destruction. Objects must survive across language boundaries.
Trade-off:
- ✅ Benefit: Seamless Python integration, predictable lifetime
- ❌ Limitation: Manual memory management, potential reference cycles
- Verdict: Necessary for primary use case (Python scientific computing)
Memory Architecture¶
Physical Memory Layout¶
Memory Organization (Conceptual View):
┌─────────────────────────────────────────────────────────────────────┐
│ Application Memory Space │
├─────────────────────────────────────────────────────────────────────┤
│ Handle Tokens (32-bit each) │
│ ┌─────┬─────┬─────┬─────┬─────┬─────┬─────┬─────┬─────┬─────┬─────┐ │
│ │ 0x2 │ 0x4 │ 0x8 │ 0x4 │ 0x2 │ 0xC │ 0x8 │ 0x4 │ 0x2 │ 0x2 │ ... │ │
│ │0001 │0000 │0005 │0002 │0003 │0010 │0007 │0001 │0004 │0002 │ │ │
│ └─────┴─────┴─────┴─────┴─────┴─────┴─────┴─────┴─────┴─────┴─────┘ │
├─────────────────────────────────────────────────────────────────────┤
│ Manager Registry (4 managers × 8 bytes = 32 bytes) │
│ ┌─────────────┬─────────────┬─────────────┬─────────────┐ │
│ │ Scalar Mgr │ String Mgr │ Array Mgr │ Object Mgr │ │
│ │ 0x7F8A... │ 0x7F8B... │ 0x7F8C... │ 0x7F8D... │ │
│ └─────────────┴─────────────┴─────────────┴─────────────┘ │
├─────────────────────────────────────────────────────────────────────┤
│ Scalar Manager Memory Pool │
│ ┌─────────────────────────────────────────────────────────────────┐ │
│ │ Handles Vector: [Tagged Ptr | Tagged Ptr | Tagged Ptr | ...] │ │
│ │ Hash Map: {hash → index, hash → index, ...} │ │
│ │ Freelist: [3, 7, 12, ...] (min-heap of available indices) │ │
│ └─────────────────────────────────────────────────────────────────┘ │
├─────────────────────────────────────────────────────────────────────┤
│ String Manager Memory Pool (similar structure) │
│ Array Manager Memory Pool (similar structure) │
│ Object Manager Memory Pool (similar structure) │
├─────────────────────────────────────────────────────────────────────┤
│ Actual JSON Objects (pointed to by tagged pointers) │
│ ┌─────────────────────────────────────────────────────────────────┐ │
│ │ [int64: 42][refcount: 3] [double: 3.14][refcount: 1] ... │ │
│ │ [string: "hello"][refcount: 5] [array: vector<handles>] ... │ │
│ └─────────────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────────┘
Memory Behavior Characteristics¶
Allocation Patterns¶
Cold Start: Empty managers with minimal overhead
Manager State: handles_=[], freelist_=[], hash_map_={}
Memory Usage: ~100 bytes per manager (fixed overhead)
Warm State: Active deduplication and reuse
Manager State: handles_=[obj1, obj2, null, obj4, ...],
freelist_=[2, 7, 15],
hash_map_={hash(42)→0, hash("hello")→1, ...}
Memory Usage: Objects + Handles + Hash entries + Freelist
Memory Pressure Response: Aggressive compaction
Trigger: freelist.size() > handles.size() / 2
Action: Remove trailing nullptrs, shrink vectors, rebuild heap
Result: Minimal memory footprint, compact address space
Deduplication Behavior¶
Primitive Values: Automatic sharing across all references
// Physical memory: ONE JsonIntCompact object
auto h1 = JsonIntCompact::create(42); // Creates object, hash entry
auto h2 = JsonIntCompact::create(42); // Finds existing, increments refcount
auto h3 = JsonIntCompact::create(42); // Finds existing, increments refcount
// Memory: 1 object × (8 bytes value + 8 bytes refcount) = 16 bytes total
Structured Values: Independent instances, shared elements
// Physical memory: TWO JsonArrayCompact objects
auto a1 = JsonArrayCompact::create(); // Creates array object
auto a2 = JsonArrayCompact::create(); // Creates separate array object
a1.push_back(JsonIntCompact::create(42)); // References shared int object
a2.push_back(JsonIntCompact::create(42)); // References same shared int object
// Memory: 2 arrays + 1 shared integer
Memory Efficiency Multipliers¶
Scenario 1: Boolean-Heavy JSON (e.g., feature flags)
{"feature1": true, "feature2": false, "feature3": true, ...}
- Traditional: N × (8-byte pointer + 16-byte bool object) = 24N bytes
- Hakka JSON: N × 4-byte handle + 2 × 8-byte singletons = 4N + 16 bytes
- Efficiency: ~6x memory reduction for large N
Scenario 2: Repeated String Keys (e.g., database records)
[{"id": 1, "name": "Alice"}, {"id": 2, "name": "Bob"}, ...]
- Traditional: 2N × (8-byte pointer + string object) per record
- Hakka JSON: 2N × 4-byte handle + 2 × shared string objects
- Efficiency: ~4x memory reduction + O(1) key lookup cost
Scenario 3: Deep Nested Structures (e.g., syntax trees)
{"node": {"left": {"value": 1}, "right": {"value": 2}}}
- Traditional: Pointer overhead dominates small objects
- Hakka JSON: Handle compaction + reference sharing
- Efficiency: 2-3x memory reduction depending on nesting depth
System Shape¶
Context Layer: Application Integration¶
┌───────────────────────────────────────────────────────────────────────┐
│ Application Context │
│ │
│ Python Scientific Computing │ C++ Memory Efficiency │
│ ┌─────────────────────────────┐ │ ┌─────────────────────────────────┐│
│ │ import hakka_json │ │ │ #include <hakka_json.hpp> ││
│ │ data = hakka_json.loads(..) │ │ │ auto handle = JsonInt::create() ││
│ │ result = data["key"] │ │ │ auto view = handle.get_view() ││
│ └─────────────────────────────┘ │ └─────────────────────────────────┘│
│ │ │ │ │
│ ▼ │ ▼ │
│ ┌──────────────────────────────────────────────────────────────────┐ │
│ │ C ABI Boundary (Foreign Function Interface) │ │
│ │ uint64_t hakka_json_create_int(int64_t value); │ │
│ │ void hakka_json_retain(uint64_t handle); │ │
│ │ void hakka_json_release(uint64_t handle); │ │
│ └──────────────────────────────────────────────────────────────────┘ │
└───────────────────────────────────────────────────────────────────────┘
Container Layer: Type System Organization¶
┌──────────────────────────────────────────────────────────────────────┐
│ Type System Container │
│ │
│ ┌─────────────────────┐ ┌─────────────────────────────┐ │
│ │ Primitive Types │ │ Structured Types │ │
│ │ (Immutable) │ │ (Mutable) │ │
│ │ │ │ │ │
│ │ ┌─────────────────┐ │ │ ┌─────────────────────────┐ │ │
│ │ │ JsonIntCompact │ │ │ │ JsonArrayCompact │ │ │
│ │ │ JsonFloatCompact│ │ │ │ - vector<handles> │ │ │
│ │ │JsonStringCompact│ │ │ │ - Python list-like │ │ │
│ │ │ │ │ │ └─────────────────────────┘ │ │
│ │ │ NaN-boxed: │ │ │ ┌─────────────────────────┐ │ │
│ │ │ - JsonBool │ │ │ │ JsonObjectCompact │ │ │
│ │ │ - JsonNull │ │ │ │ - parallel arrays │ │ │
│ │ │ - JsonInvalid │ │ │ │ - Python dict-like │ │ │
│ │ └─────────────────┘ │ │ └─────────────────────────┘ │ │
│ │ │ │ │ │
│ │ Automatic │ │ Independent │ │
│ │ Deduplication │ │ Storage │ │
│ └─────────────────────┘ └─────────────────────────────┘ │
│ │ │ │
│ └─────────────────┬───────────────────┘ │
│ ▼ │
│ ┌──────────────────────────────────────────────────────────────────┐ │
│ │ Handle Management Layer │ │
│ │ 32-bit tokens + Manager registry │ │
│ └──────────────────────────────────────────────────────────────────┘ │
└──────────────────────────────────────────────────────────────────────┘
Component Layer: Internal Architecture¶
┌──────────────────────────────────────────────────────────────────────┐
│ Component Architecture │
│ │
│ ┌──────────────────────────────────────────────────────────────────┐ │
│ │ JsonHandleCompact │ │
│ │ ┌──────────────────────────────────────────────────────────────┐│ │
│ │ │ HandleManagerToken data (32-bit) ││ │
│ │ │ ┌─────────┬─────────────────────────────────────────────────┐││ │
│ │ │ │Type(2b) │ Index (30-bit) │││ │
│ │ │ └─────────┴─────────────────────────────────────────────────┘││ │
│ │ └──────────────────────────────────────────────────────────────┘│ │
│ └──────────────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌──────────────────────────────────────────────────────────────────┐ │
│ │ Manager Registry (Singleton) │ │
│ │ ┌─────────┬─────────┬─────────┬─────────┐ │ │
│ │ │Scalar │String │Array │Object │ │ │
│ │ │Manager │Manager │Manager │Manager │ │ │
│ │ └─────────┴─────────┴─────────┴─────────┘ │ │
│ └──────────────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌──────────────────────────────────────────────────────────────────┐ │
│ │ Individual Manager │ │
│ │ ┌──────────────────────────────────────────────────────────────┐│ │
│ │ │ handles_: vector<OwnedUniformCompactPointer> ││ │
│ │ │ ┌──────────────────────────────────────────────────────────┐ ││ │
│ │ │ │[TaggedPtr][TaggedPtr][nullptr][TaggedPtr][TaggedPtr]... │ ││ │
│ │ │ └──────────────────────────────────────────────────────────┘ ││ │
│ │ │ ││ │
│ │ │ freelist_: vector<size_t> (min-heap) ││ │
│ │ │ ┌──────────────────────────────────────────────────────────┐ ││ │
│ │ │ │[2][5][8][15][...] ← available indices │ ││ │
│ │ │ └──────────────────────────────────────────────────────────┘ ││ │
│ │ │ ││ │
│ │ │ hash_to_index_map_: unordered_map<uint64_t, size_t> ││ │
│ │ │ ┌──────────────────────────────────────────────────────────┐ ││ │
│ │ │ │{hash(42)→0, hash("hello")→1, hash(3.14)→3, ...} │ ││ │
│ │ │ └──────────────────────────────────────────────────────────┘ ││ │
│ │ └──────────────────────────────────────────────────────────────┘│ │
│ └──────────────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌──────────────────────────────────────────────────────────────────┐ │
│ │ JSON Objects (Heap) │ │
│ │ ┌─────────────┬─────────────┬─────────────┬─────────────┐ │ │
│ │ │JsonInt(42) │JsonString │JsonFloat │JsonArray │ │ │
│ │ │refcount=3 │("hello") │(3.14) │elements=[..]│ │ │
│ │ └─────────────┴─────────────┴─────────────┴─────────────┘ │ │
│ └──────────────────────────────────────────────────────────────────┘ │
└──────────────────────────────────────────────────────────────────────┘
Operability¶
Upgrade Strategy¶
Binary Compatibility Guarantees¶
C ABI Stability: The Foreign Function Interface maintains strict ABI compatibility:
// Stable across minor versions
uint64_t hakka_json_create_int(int64_t value);
void hakka_json_retain(uint64_t handle);
void hakka_json_release(uint64_t handle);
uint32_t hakka_json_get_type(uint64_t handle);
Migration Pathways¶
From Traditional JSON Libraries¶
Phase 1: Drop-in Replacement
# Before: using standard library
import json
data = json.loads(json_string)
# After: minimal change
import hakka_json
data = hakka_json.loads(json_string) # Same API, better memory efficiency
Operational Monitoring¶
Memory Usage Patterns¶
TBD
Summary¶
This architectural view emphasizes the systematic design decisions that make Hakka JSON uniquely suited for memory-constrained, Python-integrated, large-scale JSON processing. The type system architecture achieves:
- Memory-First Design: 50-90% memory reduction through handle compaction, deduplication, and NaN-boxing
- Python Integration: Native reference counting and ABI-stable FFI for seamless cross-language operation
- Zero-Cost Abstractions: CRTP inheritance and compile-time polymorphism with no runtime overhead
- Operational Resilience: Comprehensive monitoring, migration tools, and failure recovery mechanisms
The architecture enables processing datasets that would otherwise exceed available memory while maintaining the performance characteristics and Python integration requirements of modern scientific computing workloads.