|
|
--- |
|
|
license: mit |
|
|
language: |
|
|
- en |
|
|
library_name: transformers |
|
|
tags: |
|
|
- sclm |
|
|
- stateful |
|
|
- memory |
|
|
- earcp |
|
|
- text-generation |
|
|
- conversational |
|
|
pipeline_tag: text-generation |
|
|
base_model: mistralai/Mistral-7B-v0.1 |
|
|
widget: |
|
|
- text: "The wizard Elara lived in Silverwood forest. One day, she discovered" |
|
|
example_title: "Fantasy Story" |
|
|
- text: "In the year 2050, humanity had finally achieved" |
|
|
example_title: "Science Fiction" |
|
|
- text: "The detective examined the crime scene carefully. The clues pointed to" |
|
|
example_title: "Mystery" |
|
|
inference: |
|
|
parameters: |
|
|
max_new_tokens: 100 |
|
|
temperature: 0.7 |
|
|
top_p: 0.9 |
|
|
repetition_penalty: 1.1 |
|
|
--- |
|
|
|
|
|
# π§ SCLM: Stateful Coherent Language Model |
|
|
|
|
|
**SCLM** adds **persistent latent memory** to transformer language models, enabling better coherence across long conversations and multi-turn generation. |
|
|
|
|
|
## π― Key Features |
|
|
|
|
|
- **Persistent State**: Memory that evolves across conversation turns |
|
|
- **Entity Coherence**: Maintains context about characters, places, and objects |
|
|
- **Edit Mode**: Make local changes without affecting global memory |
|
|
- **Lightweight**: Only 91.7M additional parameters (2.44% overhead) |
|
|
|
|
|
## π Architecture: EARCP |
|
|
|
|
|
``` |
|
|
EARCP = Encapsulation + Alignment + Revision + Coherence + Propagation |
|
|
``` |
|
|
|
|
|
| Component | Function | |
|
|
|-----------|----------| |
|
|
| **Encapsulation** | GRU-style state update from hidden states | |
|
|
| **Alignment** | Cross-attention between state and hidden layers | |
|
|
| **Revision** | Drift detection and correction | |
|
|
| **Coherence** | Mixture-of-Experts for consistency | |
|
|
| **Propagation** | State injection into transformer layers | |
|
|
|
|
|
## π§ Model Details |
|
|
|
|
|
| Parameter | Value | |
|
|
|-----------|-------| |
|
|
| Base Model | mistralai/Mistral-7B-v0.1 | |
|
|
| EARCP Parameters | 91.7M | |
|
|
| Latent State Dim | 256 | |
|
|
| Injection Layers | [8, 16] | |
|
|
| Alpha (injection strength) | 0.02 | |
|
|
| Experts | 2 | |
|
|
|
|
|
## π Quick Start |
|
|
|
|
|
```python |
|
|
# Note: Full SCLM requires custom loading (see below) |
|
|
# The inference widget uses the base model only |
|
|
|
|
|
from transformers import AutoTokenizer |
|
|
import torch |
|
|
|
|
|
# Load tokenizer |
|
|
tokenizer = AutoTokenizer.from_pretrained("amewebstudio/ananke-sclm") |
|
|
|
|
|
# For full SCLM functionality, load weights separately: |
|
|
# 1. Load base Mistral-7B |
|
|
# 2. Load EARCP weights from earcp_weights.pt |
|
|
# 3. Apply SCLM wrapper |
|
|
``` |
|
|
|
|
|
## π Validation Results |
|
|
|
|
|
| Test | Result | |
|
|
|------|--------| |
|
|
| Forward Pass | β
| |
|
|
| State Evolution | β
(norm: 0 β 4.6 β 7.5) | |
|
|
| Coherent Generation | β
| |
|
|
| Edit Mode | β
| |
|
|
| Entity Memory | β
(Elara, Nimbus retained) | |
|
|
|
|
|
## π‘ Use Cases |
|
|
|
|
|
- **Interactive Fiction**: Characters and plot points remain consistent |
|
|
- **Long Conversations**: Context persists without growing prompts |
|
|
- **Creative Writing**: Maintain story coherence across chapters |
|
|
- **Role-Playing**: NPCs remember past interactions |
|
|
|
|
|
## π Citation |
|
|
|
|
|
```bibtex |
|
|
@article{amega2025sclm, |
|
|
title={SCLM: Stateful Coherent Language Models with EARCP Architecture}, |
|
|
author={Amega, Mike}, |
|
|
year={2025}, |
|
|
note={Ame Web Studio} |
|
|
} |
|
|
``` |
|
|
|
|
|
## π€ Author |
|
|
|
|
|
**Mike Amega** - [Ame Web Studio](https://github.com/Volgat) |
|
|
|
|
|
--- |
|
|
|
|
|
*SCLM is an experimental architecture exploring persistent memory in language models.* |
|
|
|