Adaptive Checkpointer: Sublinear State Saving for Simulations¶
Practical implementation of √T-based checkpointing for distributed simulations
Key Features¶
- Sublinear Memory: O(√T) storage complexity
- Adaptive Checkpointing: Dynamic interval adjustment
- Tiered Storage: RAM → Disk → S3
- Rollback Optimization: Average O(√T) recovery time
- Distributed Ready: Thread-safe and cloud-enabled
Quick Start¶
```python from adaptive_checkpointer import AdaptiveCheckpointer, TieredBackend
Configure storage hierarchy¶
storage = TieredBackend() storage.add_ram_layer(10_000) # First 10k events in RAM storage.add_disk_layer(100_000, "ckpt/") # Next 90k on disk storage.add_s3_layer("simulation-bucket") # Rest on cloud
Initialize checkpointer¶
checkpointer = AdaptiveCheckpointer( base_interval=500, storage=storage )
Simulation loop¶
state = {"counter": 0} for event_id in range(1, 100_001): # Update state state["counter"] += event_id
# Checkpoint decision
if checkpointer.should_checkpoint(event_id):
checkpointer.save_checkpoint(event_id, state)
# Handle rollbacks (1% probability)
if random.random() < 0.01:
target = max(0, event_id - random.randint(10, 100))
_, state = checkpointer.get_last_checkpoint(target)