Component Deep Dive: src/page_io.rs
PageIO handles disk interactions for compressed pages. It defines the on-disk format, performs direct reads/writes with fixed-size metadata prefixes, and takes care to minimize kernel caching where appropriate.
Source Outline
src/page_io.rs
13 const PREFIX_META_SIZE: usize = 64;
15 #[derive(Archive, Deserialize, Serialize, Debug)]
17 struct Metadata { read_size: usize }
21 pub struct PageIO {}
24 impl PageIO {
pub fn read_from_path(&self, path, offset) -> PageCacheEntryCompressed
pub fn write_to_path(&self, path, offset, data: Vec<u8>) -> io::Result<()>
}
On-Disk Layout
Offset: N
┌─────────────────────────────────────────────────────────────┐
│ 64-byte Metadata Prefix (rkyv serialized) │
│ - read_size : usize (size of compressed payload) │
└─────────────────────────────────────────────────────────────┘
Offset: N + 64
┌─────────────────────────────────────────────────────────────┐
│ Compressed Page Bytes (output of Compressor::compress) │
└─────────────────────────────────────────────────────────────┘
Metadata is serialized using rkyv, a zero-copy serialization framework. During reads, the code uses rkyv::archived_root to interpret the metadata without allocating or deserializing.
Read Path (read_from_path)
1. Open file at `path`.
2. (macOS only) Disable readahead and kernel caching via fcntl F_RDAHEAD/F_NOCACHE.
3. Seek to requested `offset`.
4. Read 64 bytes into `meta_buffer`.
5. Interpret buffer as archived Metadata; extract `read_size`.
6. Seek to `offset + 64`.
7. Read `read_size` bytes into `ret_buffer`.
8. Return PageCacheEntryCompressed { page: ret_buffer }.
ASCII Flow
File descriptor ──► seek(offset)
│
├─ read 64 bytes → meta_buffer
│
├─ rkyv::archived_root(meta_buffer)
│ │
│ └─ read_size
│
├─ seek(offset + 64)
│
└─ read read_size bytes → Vec<u8>
│
▼
PageCacheEntryCompressed
Write Path (write_to_path)
1. Create/truncate file at `path`.
2. (macOS) Disable readahead/caching.
3. Seek to `offset`.
4. Encode Metadata { read_size = data.len() } using rkyv AllocSerializer.
5. Copy serialized bytes into 64-byte buffer (zero-padded).
6. Allocate combined buffer (64 + data.len()).
7. Append metadata prefix + compressed payload.
8. Write combined buffer with a single write.
9. Call fd.sync_all() to ensure durability.
ASCII Flow
Vec<u8> (compressed page)
│
├─ rkyv serialize Metadata { read_size = len }
│
├─ pad to 64 bytes → meta_buffer[64]
│
├─ combined = meta_buffer || data
│
├─ write_all(combined) at offset
│
└─ sync_all()
Design Rationale
-
Fixed Metadata Size: Aligns metadata reads and writes, allowing simple seeks and enabling future expansion (e.g., storing compression algorithm identifiers) by reserving unused bytes in the fixed prefix.
-
Zero-Copy Metadata:
rkyv::archived_rootavoids heap allocations when reading metadata, which matters for tight IO loops. -
Kernel Cache Controls: Disabling caching on macOS prevents the OS page cache from double-buffering data already managed by the user-space caches.
-
Single Syscall Writes: Combining metadata and payload into one buffer reduces syscall overhead and ensures atomicity of prefix+data writes.
Integration Points
PageHandler::fetch_from_fsusesPageIO::read_from_pathto populate the CPC when a page is missing from caches.- Future flush logic will call
write_to_pathwhen evicting dirty pages or writing WAL checkpoints. - Metadata offsets recorded in
TableMetaStore::PageMetadatacorrespond directly to the offsets passed here.
Considerations & TODOs
- Concurrency: The current write logic uses
File::create, which truncates the file. Appending to existing files will requireOpenOptions::new().write(true)and careful offset management. - Checksums: Adding checksums or version stamps to the metadata prefix would help detect corruption.
- Error Propagation:
read_from_pathunwraps I/O results. Production code should convert failures into recoverable errors.