Component Deep Dive: src/metadata_store.rs

The metadata store is the in-memory catalog that maps logical column ranges to physical page locations and their MVCC (multi-version concurrency control) history. It is the authoritative guide for locating and versioning pages during reads and writes.

Source Map

src/metadata_store.rs
 47  #[derive(Clone)] pub struct PageMetadata { id, disk_path, offset }
 53  pub struct MVCCKeeperEntry { page_id, locked_by, commit_time }
 59  pub struct TableMetaStoreEntry { start_idx, end_idx, page_metas }
 65  pub struct RangeScanMetaResponse { page_metas: Vec<Arc<PageMetadata>> }
102  pub struct TableMetaStore { col_data, page_data }
145  impl TableMetaStore { pub fn new() -> Self { … } }
168                         pub fn get_latest_page_meta(&self, …)
184                         fn add_new_page_meta(&mut self, …)
192                         fn add_new_page_to_col(&mut self, …)
214                         pub fn get_ranged_pages_meta(&self, …)

Data Structures

High-Level Layout

┌────────────────────────────────────────────────────────────────────┐
│ TableMetaStore                                                     │
│                                                                    │
│ col_data : HashMap<Column, Arc<RwLock<Vec<TableMetaStoreEntry>>>>  │
│ page_data: HashMap<PageId, Arc<PageMetadata>>                      │
└────────────────────────────────────────────────────────────────────┘

Column Catalog (col_data)

Column "temperature"
   │
   ▼
Arc<RwLock<Vec<TableMetaStoreEntry>>>
   │
   ├─ Entry[0]: covers rows [0, 1024)
   │            page_metas:
   │              ┌─────────────────────────────────────────┐
   │              │ MVCCKeeperEntry                         │
   │              │  page_id    = "pA"                      │
   │              │  commit_time= 1692200000000             │
   │              │  locked_by  = 0 (placeholder)           │
   │              └─────────────────────────────────────────┘
   │              ┌─────────────────────────────────────────┐
   │              │ MVCCKeeperEntry                         │
   │              │  page_id    = "pB"                      │
   │              │  commit_time= 1692200100000             │
   │              └─────────────────────────────────────────┘
   │
   └─ Entry[1]: covers rows [1024, 2048)
                page_metas: [ MVCCKeeperEntry(page_id="pC", …) ]
  • Each TableMetaStoreEntry spans a contiguous [start_idx, end_idx) range.
  • page_metas holds MVCC versions sorted by commit_time (newest last).
  • Arc<RwLock<_>> allows many readers to examine range metadata concurrently while enabling writers to append new versions.

Page Catalog (page_data)

page_data
  "pA" -> Arc<PageMetadata { id="pA", disk_path="/data/t0.bin", offset=4096 }>
  "pB" -> Arc<PageMetadata { id="pB", disk_path="/data/t0.bin", offset=8192 }>
  "pC" -> Arc<PageMetadata { id="pC", disk_path="/data/t1.bin", offset=0    }>

PageMetadata owns the disk coordinates for each page. By centralizing the actual metadata objects here, col_data can store only page IDs (cheap clones) while page_data maintains shared ownership via Arc.

Core Operations

Initialization

TableMetaStore::new()
  │
  ├─ col_data  := {}
  └─ page_data := {}

Registering a New Page

fn add_new_page_to_col(col, disk_path, offset)
   │
   ├─ new_page_id := add_new_page_meta(disk_path, offset)
   │     └─ PageMetadata::new(disk_path, offset)
   │     └─ page_data[page_id] = Arc<PageMetadata>
   │
   ├─ ensure col_data[col] exists (Arc<RwLock<Vec<_>>>)
   │
   └─ write-lock Vec<TableMetaStoreEntry>
         ├─ if empty → push entry with range [0,1) and MVCCKeeperEntry(page_id, commit_time=now)
         └─ else     → extend last entry.end_idx += 1
                        last_entry.page_metas.push(MVCCKeeperEntry(page_id, commit_time=now))

Note: Page IDs are currently hard-coded ("1111111"). Real ID generation is a documented TODO.

Fetch Latest Page for a Column

get_latest_page_meta(column)
   │
   ├─ read-lock Arc<RwLock<Vec<TableMetaStoreEntry>>>
   ├─ take last TableMetaStoreEntry
   ├─ take its last MVCCKeeperEntry (most recent commit)
   └─ look up page_data[page_id] → Option<&Arc<PageMetadata>>

The function returns a borrowed Arc<PageMetadata>, allowing the caller to clone it without prolonging the read lock.

Range Scan Metadata (get_ranged_pages_meta)

Inputs: column, l_bound, r_bound, commit_time_upper_bound

1) Acquire read lock on column vector.
2) Binary-search first range whose end_idx > l_bound.
3) Iterate forward until start_idx >= r_bound:
       For each TableMetaStoreEntry:
          - Binary search MVCC versions to find the newest commit_time ≤ upper bound.
          - Collect page_id clones.
4) Drop read lock.
5) Map collected page_ids -> Arc<PageMetadata> via page_data.
6) Return RangeScanMetaResponse { page_metas: Vec<Arc<PageMetadata>> }.

ASCII Flow Diagram

Range request: column="temperature", [1500, 2600), commit_time ≤ T

┌─────────────┐ read-lock ┌────────────────────────────────┐
│ TableMeta…  │──────────►│ Vec<TableMetaStoreEntry>       │
└─────────────┘           │  [0,1024) → versions:[pA(T=90),│
                          │                pB(T=110)]      │
                          │ [1024,2048) → versions:[pC(T=150)]│
                          │ [2048,3072) → versions:[pD(T=200)]│
                          └────────────────────────────────┘
                                 │ select commits ≤ T
                                 ▼
                         page_ids = ["pC","pD"]
                                 │ drop lock
                                 ▼
                        lookup page_data for coordinates
                                 │
                                 ▼
               RangeScanMetaResponse { page_metas: [Arc(PageMetadata{pC}), Arc(PageMetadata{pD})] }

Concurrency Characteristics

  • Column-level contention: Readers obtain RwLock::read guards and immediately clone Arc<PageMetadata> handles, minimising lock duration.
  • MVCC version ordering: Append-only pattern avoids resorting by always pushing newer commits at the end. Binary search assumes commit_times are monotonic.
  • Thread-safety TODOs: MVCCKeeperEntry.locked_by is a plain u8; future work will replace it with an atomic counter for concurrent write coordination.

Future Enhancements

  • Real page ID generation (UUIDs or incremental IDs) within PageMetadata::new.
  • Splitting and merging ranges when Page boundaries are rebalanced.
  • Persisting the metadata store to disk or a WAL to survive restarts.
  • Tracking per-page reference counts to coordinate cache eviction with metadata removal.