| # MIRAS Language Model | |
| A character-level language model trained on Shakespeare using the MIRAS (Memory-Integrated Recurrent Attention System) architecture. | |
| ## Model Details | |
| - **Embedding dimension**: 384 | |
| - **Layers**: 4 | |
| - **Block size**: 128 | |
| - **Memory type**: deep | |
| - **Attentional bias**: l2 | |
| - **Retention**: l2 | |
| - **Vocabulary size**: 65 | |
| ## Installation | |
| ```bash | |
| pip install torch huggingface_hub | |
| ``` | |
| ## Usage | |
| ### Quick Start | |
| ```python | |
| from huggingface_hub import hf_hub_download | |
| import torch | |
| # Download files | |
| for f in ["modeling_miras.py", "model.pt", "config.json"]: | |
| hf_hub_download(repo_id="av-codes/miras-shakespeare", filename=f, local_dir="./miras") | |
| # Import and load | |
| import sys | |
| sys.path.insert(0, "./miras") | |
| from modeling_miras import load_miras_model | |
| model, encode, decode, config = load_miras_model("./miras") | |
| model.eval() | |
| # Generate text | |
| context = torch.zeros((1, 1), dtype=torch.long) | |
| output = model.generate(context, max_new_tokens=200, temperature=0.8) | |
| print(decode(output[0].tolist())) | |
| ``` | |
| ### Using the Helper Function | |
| ```python | |
| from modeling_miras import load_miras_model | |
| # Load directly from Hub | |
| model, encode, decode, config = load_miras_model("av-codes/miras-shakespeare") | |
| # Generate | |
| import torch | |
| context = torch.zeros((1, 1), dtype=torch.long) | |
| generated = model.generate(context, max_new_tokens=100) | |
| print(decode(generated[0].tolist())) | |
| ``` | |
| ## Files | |
| - `model.pt` - Model weights and architecture config | |
| - `config.json` - Full configuration including vocabulary | |
| - `modeling_miras.py` - Complete model architecture code | |
| ## Training | |
| Trained for 5000 iterations on the TinyShakespeare dataset. | |
| ## Architecture | |
| MIRAS uses a novel memory-based attention mechanism with configurable: | |
| - **Memory type**: `linear` (matrix memory) or `deep` (MLP memory) | |
| - **Attentional bias**: `l2`, `lp`, or `huber` loss functions | |
| - **Retention**: `l2`, `kl`, or `elastic` weight update rules | |