miras-shakespeare / README.md
av-codes's picture
Upload folder using huggingface_hub
1ee2101 verified
# MIRAS Language Model
A character-level language model trained on Shakespeare using the MIRAS (Memory-Integrated Recurrent Attention System) architecture.
## Model Details
- **Embedding dimension**: 384
- **Layers**: 4
- **Block size**: 128
- **Memory type**: deep
- **Attentional bias**: l2
- **Retention**: l2
- **Vocabulary size**: 65
## Installation
```bash
pip install torch huggingface_hub
```
## Usage
### Quick Start
```python
from huggingface_hub import hf_hub_download
import torch
# Download files
for f in ["modeling_miras.py", "model.pt", "config.json"]:
hf_hub_download(repo_id="av-codes/miras-shakespeare", filename=f, local_dir="./miras")
# Import and load
import sys
sys.path.insert(0, "./miras")
from modeling_miras import load_miras_model
model, encode, decode, config = load_miras_model("./miras")
model.eval()
# Generate text
context = torch.zeros((1, 1), dtype=torch.long)
output = model.generate(context, max_new_tokens=200, temperature=0.8)
print(decode(output[0].tolist()))
```
### Using the Helper Function
```python
from modeling_miras import load_miras_model
# Load directly from Hub
model, encode, decode, config = load_miras_model("av-codes/miras-shakespeare")
# Generate
import torch
context = torch.zeros((1, 1), dtype=torch.long)
generated = model.generate(context, max_new_tokens=100)
print(decode(generated[0].tolist()))
```
## Files
- `model.pt` - Model weights and architecture config
- `config.json` - Full configuration including vocabulary
- `modeling_miras.py` - Complete model architecture code
## Training
Trained for 5000 iterations on the TinyShakespeare dataset.
## Architecture
MIRAS uses a novel memory-based attention mechanism with configurable:
- **Memory type**: `linear` (matrix memory) or `deep` (MLP memory)
- **Attentional bias**: `l2`, `lp`, or `huber` loss functions
- **Retention**: `l2`, `kl`, or `elastic` weight update rules