Feature Extraction
Transformers
Safetensors
PyTorch
English
eden
text-enhancement
grammar-correction
text-rewriting
encoder-decoder
transformer
custom_code
Instructions to use Rybib/EDEN with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Rybib/EDEN with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("feature-extraction", model="Rybib/EDEN", trust_remote_code=True)# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("Rybib/EDEN", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
File size: 2,977 Bytes
2f65125 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 | # EDEN architecture
EDEN is a standard encoder-decoder Transformer trained from scratch for text
enhancement. This document describes how the model is built.
## Overview
The model reads a rough source sentence and generates a polished target
sentence. It uses a shared byte-level BPE vocabulary for both the input and the
output, and the input embedding matrix is tied to the output projection.
```
rough text
|
v
[byte-level BPE tokenizer]
|
v
[embedding + sinusoidal positional encoding]
|
v
[Transformer encoder, 8 layers] -> memory
|
v
[Transformer decoder, 8 layers] (attends to memory, causal self-attention)
|
v
[tied linear language-model head]
|
v
polished text
```
## Configuration
| Field | Value | Meaning |
| --- | --- | --- |
| `vocab_size` | 24000 | Byte-level BPE vocabulary size |
| `d_model` | 640 | Hidden size |
| `n_heads` | 10 | Attention heads per block |
| `n_layers` | 8 | Encoder layers, and decoder layers |
| `dim_feedforward` | 2560 | Feed-forward inner size |
| `dropout` | 0.1 | Dropout probability |
| `max_len` | 512 | Maximum positions |
## Key design choices
* **Tied embeddings.** The language-model head shares its weight matrix with the
input embedding. This reduces parameters and tends to improve quality on
vocabulary-heavy tasks.
* **Pre-norm blocks.** The encoder and decoder use `norm_first=True`, which makes
deep Transformers more stable to train.
* **GELU activations** in the feed-forward blocks.
* **Sinusoidal positional encoding** stored as a buffer. In the Transformers
integration this buffer is persistent so it is saved and restored correctly
through safetensors and meta-device loading.
* **Padding-aware attention.** Padding tokens are masked in both the encoder and
the decoder, and the decoder uses a causal mask for self-attention.
## Special tokens
| Token | Id | Role |
| --- | --- | --- |
| `[UNK]` | 0 | Unknown token |
| `[PAD]` | 1 | Padding |
| `[BOS]` | 2 | Beginning of sequence and decoder start |
| `[EOS]` | 3 | End of sequence |
## Generation
For inference the model supports three strategies:
* **Beam search** (default), with a length penalty and a repetition penalty.
This gives the most conservative, reliable edits.
* **Greedy** decoding.
* **Sampling** with temperature, top-k, and top-p filtering.
Long inputs are split into sentence-aware chunks that each fit inside the 512
token window, rewritten independently, and joined back together.
## Two code paths, one architecture
The exact same layer structure is defined in two places:
* `eden/model.py` is the reference model used by the training engine.
* `modeling_eden.py` is the Hugging Face Transformers wrapper.
Because the module names and shapes match, a checkpoint trained with the engine
loads into the Transformers model without any key remapping. The conversion
script in `scripts/convert_checkpoint_to_hf.py` performs this step and writes the
safetensors weights.
|