av-codes
/

miras-shakespeare

Model card Files Files and versions

miras-shakespeare / README.md

av-codes's picture

Upload folder using huggingface_hub

1ee2101 verified 2 months ago

|

history blame contribute delete

1.91 kB

	# MIRAS Language Model

	A character-level language model trained on Shakespeare using the MIRAS (Memory-Integrated Recurrent Attention System) architecture.

	## Model Details
	- Embedding dimension: 384
	- Layers: 4
	- Block size: 128
	- Memory type: deep
	- Attentional bias: l2
	- Retention: l2
	- Vocabulary size: 65

	## Installation

	```bash
	pip install torch huggingface_hub
	```

	## Usage

	### Quick Start

	```python
	from huggingface_hub import hf_hub_download
	import torch

	# Download files
	for f in ["modeling_miras.py", "model.pt", "config.json"]:
	hf_hub_download(repo_id="av-codes/miras-shakespeare", filename=f, local_dir="./miras")

	# Import and load
	import sys
	sys.path.insert(0, "./miras")
	from modeling_miras import load_miras_model

	model, encode, decode, config = load_miras_model("./miras")
	model.eval()

	# Generate text
	context = torch.zeros((1, 1), dtype=torch.long)
	output = model.generate(context, max_new_tokens=200, temperature=0.8)
	print(decode(output[0].tolist()))
	```

	### Using the Helper Function

	```python
	from modeling_miras import load_miras_model

	# Load directly from Hub
	model, encode, decode, config = load_miras_model("av-codes/miras-shakespeare")

	# Generate
	import torch
	context = torch.zeros((1, 1), dtype=torch.long)
	generated = model.generate(context, max_new_tokens=100)
	print(decode(generated[0].tolist()))
	```

	## Files

	- `model.pt` - Model weights and architecture config
	- `config.json` - Full configuration including vocabulary
	- `modeling_miras.py` - Complete model architecture code

	## Training
	Trained for 5000 iterations on the TinyShakespeare dataset.

	## Architecture

	MIRAS uses a novel memory-based attention mechanism with configurable:
	- Memory type: `linear` (matrix memory) or `deep` (MLP memory)
	- Attentional bias: `l2`, `lp`, or `huber` loss functions
	- Retention: `l2`, `kl`, or `elastic` weight update rules