Instructions to use Rybib/EDEN with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Rybib/EDEN with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("feature-extraction", model="Rybib/EDEN", trust_remote_code=True)# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("Rybib/EDEN", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
EDEN architecture
EDEN is a standard encoder-decoder Transformer trained from scratch for text enhancement. This document describes how the model is built.
Overview
The model reads a rough source sentence and generates a polished target sentence. It uses a shared byte-level BPE vocabulary for both the input and the output, and the input embedding matrix is tied to the output projection.
rough text
|
v
[byte-level BPE tokenizer]
|
v
[embedding + sinusoidal positional encoding]
|
v
[Transformer encoder, 8 layers] -> memory
|
v
[Transformer decoder, 8 layers] (attends to memory, causal self-attention)
|
v
[tied linear language-model head]
|
v
polished text
Configuration
| Field | Value | Meaning |
|---|---|---|
vocab_size |
24000 | Byte-level BPE vocabulary size |
d_model |
640 | Hidden size |
n_heads |
10 | Attention heads per block |
n_layers |
8 | Encoder layers, and decoder layers |
dim_feedforward |
2560 | Feed-forward inner size |
dropout |
0.1 | Dropout probability |
max_len |
512 | Maximum positions |
Key design choices
- Tied embeddings. The language-model head shares its weight matrix with the input embedding. This reduces parameters and tends to improve quality on vocabulary-heavy tasks.
- Pre-norm blocks. The encoder and decoder use
norm_first=True, which makes deep Transformers more stable to train. - GELU activations in the feed-forward blocks.
- Sinusoidal positional encoding stored as a buffer. In the Transformers integration this buffer is persistent so it is saved and restored correctly through safetensors and meta-device loading.
- Padding-aware attention. Padding tokens are masked in both the encoder and the decoder, and the decoder uses a causal mask for self-attention.
Special tokens
| Token | Id | Role |
|---|---|---|
[UNK] |
0 | Unknown token |
[PAD] |
1 | Padding |
[BOS] |
2 | Beginning of sequence and decoder start |
[EOS] |
3 | End of sequence |
Generation
For inference the model supports three strategies:
- Beam search (default), with a length penalty and a repetition penalty. This gives the most conservative, reliable edits.
- Greedy decoding.
- Sampling with temperature, top-k, and top-p filtering.
Long inputs are split into sentence-aware chunks that each fit inside the 512 token window, rewritten independently, and joined back together.
Two code paths, one architecture
The exact same layer structure is defined in two places:
eden/model.pyis the reference model used by the training engine.modeling_eden.pyis the Hugging Face Transformers wrapper.
Because the module names and shapes match, a checkpoint trained with the engine
loads into the Transformers model without any key remapping. The conversion
script in scripts/convert_checkpoint_to_hf.py performs this step and writes the
safetensors weights.