Component Deep Dive: src/compressor.rs
The compressor is a stateless helper that bridges the uncompressed Page representation and its compressed byte-packed form. It serializes pages with bincode, compresses/decompresses using lz4_flex, and returns cache-friendly wrappers.
Source Highlights
src/compressor.rs
9 pub struct Compressor {}
12 impl Compressor {
13 pub fn new() -> Self { … }
16 pub fn compress(&self, Arc<PageCacheEntryUncompressed>) -> PageCacheEntryCompressed
23 pub fn decompress(&self, Arc<PageCacheEntryCompressed>) -> PageCacheEntryUncompressed
}
Pipeline Overview
Arc<PageCacheEntryUncompressed>
│
│ 1) Extract Page struct
▼
bincode::serialize(Page) → Vec<u8>
│
│ 2) Compress with lz4_flex::compress_prepend_size
▼
Vec<u8> (compressed blob)
│
└─ wrapped as PageCacheEntryCompressed { page: Vec<u8> }
Decompression Path
Arc<PageCacheEntryCompressed>
│
│ 1) Extract Vec<u8> (LZ4 frame with size prefix)
▼
lz4_flex::decompress_size_prepended(Vec<u8>)
│
│ 2) Deserialize via bincode -> Page
▼
PageCacheEntryUncompressed { page: Page }
ASCII Visualization
┌───────────────────────┐
│ Page (struct) │
│ - page_metadata │
│ - entries: Vec<Entry>│
└──────────┬────────────┘
│
bincode serialize │ lz4 compress (size prepended)
Arc<PageCacheEntryUncompressed> ────────────────────────────► Vec<u8>
│
▼
PageCacheEntryCompressed
Design Choices
-
bincode Serialization
Lightweight binary serializer with low overhead for Serde-compatible types. Keeps serialization/deserialization cost minimal when pages churn between caches. -
lz4_flex Compression
compress_prepend_sizeanddecompress_size_prependedembed the uncompressed length at the front of the blob. The IO layer can then recover the exact page size without decompressing metadata. -
Arc-Based API
AcceptingArc<PageCacheEntry*>avoids cloning large vectors.Compressoraccesses the underlying data viaArc::as_ref, maintaining shared ownership semantics consistent with the caches.
Error Handling
Currently, serialization/deserialization calls .unwrap(). This reflects a prototype assumption that the in-memory structures are always valid. Production code should surface Result and handle corrupt blobs gracefully (especially when reading from disk).
Integration Points
PageHandler::decompress_from_cpccallsCompressor::decompressto materialize pages into UPC.- Future UPC eviction will call
Compressor::compressbefore inserting into CPC. - WAL and snapshot routines can reuse the same API to convert between formats without duplicating logic.
Extensibility Ideas
- Support alternative compression strategies (e.g., ZSTD) via feature flags.
- Embed checksum or schema version in the compressed envelope for forward compatibility.
- Track compression ratios to inform caching heuristics or page splitting.