Feature Extraction
Transformers
Safetensors
PyTorch
English
eden
text-enhancement
grammar-correction
text-rewriting
encoder-decoder
transformer
custom_code
Instructions to use Rybib/EDEN with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Rybib/EDEN with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("feature-extraction", model="Rybib/EDEN", trust_remote_code=True)# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("Rybib/EDEN", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
| # EDEN architecture | |
| EDEN is a standard encoder-decoder Transformer trained from scratch for text | |
| enhancement. This document describes how the model is built. | |
| ## Overview | |
| The model reads a rough source sentence and generates a polished target | |
| sentence. It uses a shared byte-level BPE vocabulary for both the input and the | |
| output, and the input embedding matrix is tied to the output projection. | |
| ``` | |
| rough text | |
| | | |
| v | |
| [byte-level BPE tokenizer] | |
| | | |
| v | |
| [embedding + sinusoidal positional encoding] | |
| | | |
| v | |
| [Transformer encoder, 8 layers] -> memory | |
| | | |
| v | |
| [Transformer decoder, 8 layers] (attends to memory, causal self-attention) | |
| | | |
| v | |
| [tied linear language-model head] | |
| | | |
| v | |
| polished text | |
| ``` | |
| ## Configuration | |
| | Field | Value | Meaning | | |
| | --- | --- | --- | | |
| | `vocab_size` | 24000 | Byte-level BPE vocabulary size | | |
| | `d_model` | 640 | Hidden size | | |
| | `n_heads` | 10 | Attention heads per block | | |
| | `n_layers` | 8 | Encoder layers, and decoder layers | | |
| | `dim_feedforward` | 2560 | Feed-forward inner size | | |
| | `dropout` | 0.1 | Dropout probability | | |
| | `max_len` | 512 | Maximum positions | | |
| ## Key design choices | |
| * **Tied embeddings.** The language-model head shares its weight matrix with the | |
| input embedding. This reduces parameters and tends to improve quality on | |
| vocabulary-heavy tasks. | |
| * **Pre-norm blocks.** The encoder and decoder use `norm_first=True`, which makes | |
| deep Transformers more stable to train. | |
| * **GELU activations** in the feed-forward blocks. | |
| * **Sinusoidal positional encoding** stored as a buffer. In the Transformers | |
| integration this buffer is persistent so it is saved and restored correctly | |
| through safetensors and meta-device loading. | |
| * **Padding-aware attention.** Padding tokens are masked in both the encoder and | |
| the decoder, and the decoder uses a causal mask for self-attention. | |
| ## Special tokens | |
| | Token | Id | Role | | |
| | --- | --- | --- | | |
| | `[UNK]` | 0 | Unknown token | | |
| | `[PAD]` | 1 | Padding | | |
| | `[BOS]` | 2 | Beginning of sequence and decoder start | | |
| | `[EOS]` | 3 | End of sequence | | |
| ## Generation | |
| For inference the model supports three strategies: | |
| * **Beam search** (default), with a length penalty and a repetition penalty. | |
| This gives the most conservative, reliable edits. | |
| * **Greedy** decoding. | |
| * **Sampling** with temperature, top-k, and top-p filtering. | |
| Long inputs are split into sentence-aware chunks that each fit inside the 512 | |
| token window, rewritten independently, and joined back together. | |
| ## Two code paths, one architecture | |
| The exact same layer structure is defined in two places: | |
| * `eden/model.py` is the reference model used by the training engine. | |
| * `modeling_eden.py` is the Hugging Face Transformers wrapper. | |
| Because the module names and shapes match, a checkpoint trained with the engine | |
| loads into the Transformers model without any key remapping. The conversion | |
| script in `scripts/convert_checkpoint_to_hf.py` performs this step and writes the | |
| safetensors weights. | |