Component Deep Dive: src/page_handler.rs

PageHandler orchestrates page retrieval across caches and disk. It is the central service that operations use to obtain Pages in uncompressed form, optimizing for cache hits and minimizing lock contention.

Source Layout

src/page_handler.rs
 9  pub struct PageHandler {
        page_io: Arc<PageIO>,
        uncompressed_page_cache: Arc<RwLock<PageCache<PageCacheEntryUncompressed>>>,
        compressed_page_cache : Arc<RwLock<PageCache<PageCacheEntryCompressed>>>,
        compressor: Arc<Compressor>,
     }
16  impl PageHandler {
17      fn fetch_from_upc(&self, id) -> Result<Arc<PageCacheEntryUncompressed>, &str>
26      fn decompress_from_cpc(&self, id) -> Result<(), &str>
42      fn fetch_from_fs(&self, id, path, offset) -> Result<(), &str>
50      pub fn get_page(&self, page_meta: PageMetadata) -> Option<Arc<PageCacheEntryUncompressed>>
77      pub fn get_pages(&self, page_metas: Vec<PageMetadata>) -> Vec<Arc<PageCacheEntryUncompressed>>
   }

Component Diagram

┌─────────────────────────┐
│     PageHandler         │
│ ┌─────────────────────┐ │
│ │  Uncompressed Cache │◄┼───┐
│ └─────────────────────┘ │   │
│ ┌─────────────────────┐ │   │
│ │   Compressed Cache  │◄┼───┤
│ └─────────────────────┘ │   │
│ ┌─────────────────────┐ │   │
│ │    Compressor       │◄┼───┤
│ └─────────────────────┘ │   │
│ ┌─────────────────────┐ │   │
│ │      PageIO         │◄┼───┘
│ └─────────────────────┘ │
└─────────────────────────┘

Inputs are PageMetadata (from TableMetaStore); outputs are Arc<PageCacheEntryUncompressed> handles suitable for direct reading/mutation.

Single Page Retrieval (get_page)

Input: PageMetadata { id, disk_path, offset }

1) UPC attempt:
      - read-lock UPC
      - if entry exists -> clone Arc & return

2) CPC attempt:
      - read-lock CPC
      - if entry exists -> clone Arc(blob)
      - drop lock
      - decompress(blob) via Compressor
      - write-lock UPC, add page
      - read-lock UPC again to return Arc

3) Disk fallback:
      - PageIO::read_from_path(disk_path, offset) -> compressed entry
      - write-lock CPC, insert entry
      - decompress_from_cpc(id)
      - final UPC read-lock to retrieve Arc

4) If any step fails, return None.

ASCII Sequence Diagram

PageHandler          UPC               CPC             Compressor           PageIO
    │                 │                 │                  │                   │
    │--read lock------│                 │                  │                   │
    │   miss          │                 │                  │                   │
    │                 │--read lock------│                  │                   │
    │                 │   miss          │                  │                   │
    │                 │                 │--read lock-------│                   │
    │                 │                 │   miss           │                   │
    │                 │                 │--write lock---------------------------│
    │                 │                 │           read_from_path(path,off)    │
    │                 │                 │<------------------------------------- │
    │                 │                 │--insert compressed blob-------------->│
    │                 │                 │--release lock                        │
    │                 │                 │--read lock--------------------------- │
    │                 │                 │   hit                                 │
    │                 │                 │--clone Arc(blob)                      │
    │                 │                 │--drop lock                            │
    │                 │                 │--► decompress(blob) ◄---------------- │
    │                 │--write lock-----│                  │                   │
    │                 │   add page      │                  │                   │
    │                 │--drop lock------│                  │                   │
    │--read lock------│                 │                  │                   │
    │   hit           │                 │                  │                   │
    │--return Arc--------------------------------------------------------------→

Batch Retrieval (get_pages)

The batch method aggressively minimizes lock contention and preserves original page order.

Inputs: Vec<PageMetadata> (order preserved)

order := Vec<PageId>
meta_map := HashMap<PageId, PageMetadata>
result := Vec<Arc<PageCacheEntryUncompressed>>
already_pushed := HashSet<PageId>

1) UPC Sweep (read lock):
     For each id in order:
         if UPC has id:
             push Arc into result
             mark id in already_pushed
             remove id from meta_map

2) CPC Sweep (read lock):
     Collect (id, Arc<PageCacheEntryCompressed>) for remaining ids.
     Remove hits from meta_map.

3) Decompress outside of locks:
     For each (id, blob):
         decompress blob -> UPC add (write lock per page)

4) Disk Fetch for leftovers:
     For each meta in meta_map:
         fetch_from_fs(id, path, offset)  // populates CPC
         decompress_from_cpc(id)         // populates UPC

5) Final UPC read lock:
     For each id in order not already pushed, read UPC and append to result.

ASCII Flow

order: ["p1","p2","p3"]

┌─────────────┐ read-lock ┌─────────────────────┐
│ UPC store   │──────────►│ hits = {"p1"}       │
└─────────────┘           └─────────────────────┘

┌─────────────┐ read-lock ┌─────────────────────┐
│ CPC store   │──────────►│ hits = {"p2": blob} │
└─────────────┘           └─────────────────────┘

decompress blob("p2") → UPC write-lock add

remaining meta_map = {"p3": PageMetadata}
fetch_from_fs("p3")  → CPC add
decompress_from_cpc("p3") → UPC add

Final UPC read-lock → collect ["p1","p2","p3"] in order

Concurrency Strategy

  • Locks are scoped to the minimal region accessing shared state:
    • UPC and CPC read locks are held only during lookups.
    • Decompression and CPU-heavy work occurs outside locking regions.
    • UPC writes happen one page at a time, keeping lock durations short.
  • Arc clones allow the caller to hold onto pages without keeping caches locked.

Error Handling & TODOs

  • fetch_from_fs and decompress_from_cpc return Result, but get_page treats errors as cache misses and tries subsequent layers. Fatal errors propagate as None.
  • Future enhancements:
    • Distinguish between “not found” and “I/O error” to inform callers.
    • Batch decompression writes by staging multiple additions inside a single lock guard when contention becomes an issue.
    • Integrate background prefetch and writeback once the scheduler is wired in.