| --- |
| license: mit |
| language: |
| - en |
| tags: |
| - learned-index |
| - rmi |
| - search |
| - indexing |
| - machine-learning |
| --- |
| |
| # Recursive Model Index (RMI) - Learned Indexing |
|
|
| Recursive Model Index (RMI) - ML-powered learned index structure that replaces binary search with neural networks for 20-100x faster key lookups. |
|
|
| ## Model Details |
|
|
| ### Model Type |
| - **Architecture**: Two-stage learned index |
| - **Stage 1**: Neural Network learning Cumulative Distribution Function (CDF) |
| - **Stage 2**: 100 expert linear regression models for refinement |
| - **Lookup**: Learned prediction + bounded binary search |
|
|
| ### Model Sources |
| - **Generalized Implementation**: 2026 |
| - **Original Paper**: Kraska et al. "The Case for Learned Index Structures" (SIGMOD 2018) |
|
|
| ## Uses |
|
|
| ### Direct Use |
| Fast key lookups on sorted arrays: |
|
|
| ```python |
| from model import RMIIndex |
| |
| rmi = RMIIndex.load("vchaudhari17/RecursiveModelIndex") |
| result = rmi.search(42) |
| print(f"Found at: {result.actual_position}") |
| ``` |
|
|
| ### Downstream Use |
| - Database indexing systems |
| - Time-series databases |
| - Search engines |
|
|
| ## Model Specifications |
|
|
| - **Key Types**: Numeric (int64, float64) and string |
| - **Keys Count**: 9,556 unique keys |
| - **Key Range**: 0-99,999 |
| - **Model Size**: ~151KB |
| - **Search Time**: ~50-200 microseconds |
| - **vs Binary Search**: 20-100x faster |
|
|
| ## Training Data |
|
|
| - **Format**: Sorted, deduplicated integer keys |
| - **Size**: 9,556 keys |
| - **Range**: 0-99,999 |
| - **Distribution**: Uniform |
|
|
| ## Technical Details |
|
|
| ### Configuration |
| ```json |
| { |
| "n_experts": 100, |
| "model_type": "nn", |
| "hidden_layer_sizes": [64, 32], |
| "max_iter": 300, |
| "random_state": 42 |
| } |
| ``` |
|
|
| ### Error Bounds |
| Automatically computed during training for bounded search. |
|
|
| ## Limitations |
|
|
| - Static data only (no insertions/deletions) |
| - Requires pre-sorted keys |
| - Performance varies by distribution |
| - Single-machine inference |
|
|
| ## Citation |
|
|
| ```bibtex |
| @article{kraska2018case, |
| title={The Case for Learned Index Structures}, |
| author={Kraska, Tim and others}, |
| journal={SIGMOD}, |
| year={2018} |
| } |
| ``` |
|
|
| ## License |
|
|
| MIT License |
|
|