EDEN: Encoder Decoder Enhancement Network

Screenshot 2026-06-19 at 12.34.20 PM

EDEN is a from-scratch PyTorch encoder-decoder Transformer that rewrites rough text into clean, polished text. It fixes spelling, grammar, punctuation, and phrasing while keeping the original meaning. The model was built and trained from the ground up (architecture, tokenizer, and training loop) and runs comfortably on a single machine, including Apple Silicon.

This repository contains everything needed to use the model, retrain it, and extend it:

  • The trained model weights in safetensors format.
  • A Hugging Face Transformers integration (AutoModel with trust_remote_code).
  • The full training, fine-tuning, and evaluation engine.
  • A local web dashboard for training and trying the model in a browser.

Model summary

Property Value
Architecture Encoder-decoder Transformer with tied embeddings
Parameters About 107 million
Encoder layers 8
Decoder layers 8
Hidden size 640
Attention heads 10
Feed-forward size 2560
Vocabulary 24,000 byte-level BPE tokens
Max sequence length 512 tokens
Held-out validation loss 0.123 (cross entropy)
Precision float32

Quick start

First install the two dependencies (one time):

pip3 install torch transformers

Option 1: chat with EDEN in the terminal (recommended)

This opens a simple interactive interface, similar to Ollama. Type or paste rough text, press Enter, and get the cleaned-up version. Type /bye or press Ctrl+D to quit.

python3 examples/try_eden.py

macOS users can also double-click Try EDEN.command to open the same interface in a terminal window.

Example session:

>>> their are alot of reasons why this dont work proper
There are a lot of reasons why this do not work proper.

>>> /bye
Goodbye.

Option 2: one terminal command

Paste this whole line into your terminal to clean a single sentence:

python3 -c "from transformers import AutoModel, AutoTokenizer; t=AutoTokenizer.from_pretrained('Rybib/EDEN', trust_remote_code=True); m=AutoModel.from_pretrained('Rybib/EDEN', trust_remote_code=True).eval(); print(m.enhance(t, 'i relly wnt this to sound beter'))"

Option 3: a Python script

The lines below are Python, not terminal commands. Save them as a file such as run.py, then run python3 run.py. Do not paste them straight into the terminal.

from transformers import AutoModel, AutoTokenizer

model_id = "Rybib/EDEN"

tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoModel.from_pretrained(model_id, trust_remote_code=True).eval()

rough = "i relly wnt this sentance to sound more profesional"
print(model.enhance(tokenizer, rough))
# I really want this sentence to sound more professional.

The enhance method handles long inputs by splitting them into sentence-aware chunks, rewriting each chunk, and joining the results.

Decoding options

model.enhance(
    tokenizer,
    "their are alot of reasons why this dont work proper",
    strategy="beam",          # "beam", "greedy", or "sample"
    beam_size=4,
    repetition_penalty=1.08,
    length_penalty=0.7,
)

What the model is good at

EDEN was trained on rough-to-polished text pairs covering several editing skills:

  • Spelling and typo correction, including dyslexia-style letter swaps.
  • Grammar correction.
  • Punctuation and capitalization.
  • Clearer, more fluent rewriting and light paraphrasing.
  • Preserving the original meaning rather than inventing new content.

It is an editing model, not a chatbot or a general text generator. Give it a sentence or paragraph to clean up, not a question or an instruction.

Training data

The dataset is built from publicly available text-editing corpora plus generated noise, combined into rough-text to clean-text pairs:

Source Role
JFLEG Grammar correction examples
Grammarly CoEdIT Correction and rewrite tasks
W&I / LOCNESS Learner-English correction
ASSET Sentence simplification
WikiSplit Sentence and paragraph flow
MRPC Meaning-preserving paraphrase pairs
Synthetic noise Generated typos, swaps, punctuation, and capitalization fixes

You can rebuild the dataset locally with the training engine described below.

Retrain or extend the model

This repository ships the complete training engine as an importable eden package and a command-line tool.

pip install -r requirements.txt

# Build the dataset and tokenizer, then train from scratch.
python -m eden.cli prepare
python -m eden.cli train

# Continue training on your own examples.
python -m eden.cli finetune --data my_pairs.jsonl --mix-base

# Enhance text from the command line.
python -m eden.cli enhance "i relly wnt this to sound beter"

Your own fine-tuning data is a JSONL file of input and target pairs:

{"input": "bad rough text here", "target": "Polished text here."}
{"input": "another messy sentance", "target": "Another polished sentence."}

Keeping --mix-base on is recommended so the model learns your style without forgetting general spelling and grammar ability.

See docs/TRAINING.md for the full workflow and docs/ARCHITECTURE.md for how the model is built.

Local web dashboard

python -m eden.cli ui
# then open http://127.0.0.1:7860

The dashboard can start, pause, and resume training, shows live loss and validation metrics, watches memory use, and runs a finished checkpoint directly in the browser.

Fine-tuning with the Transformers Trainer

The model also supports standard supervised training. forward accepts labels and returns a loss, so it works with the Hugging Face Trainer for users who prefer that workflow. Tokens that should be ignored in the loss use the index -100, and decoder_input_ids are shifted from labels automatically.

Files in this repository

File Purpose
model.safetensors Trained model weights
config.json Model configuration
configuration_eden.py Configuration class for Transformers
modeling_eden.py Model class for Transformers
tokenizer.json Byte-level BPE tokenizer
eden/ Training, fine-tuning, and inference engine
scripts/ Checkpoint conversion and upload helpers
examples/ Runnable usage examples
docs/ Architecture and training guides

Limitations

  • English only.
  • Best on sentence and paragraph length inputs, up to 512 tokens per chunk.
  • It can occasionally change wording more than intended. Beam search with the default penalties gives the most conservative edits.
  • It is not designed to answer questions, follow instructions, or generate new content from scratch.

License

Released under the Apache License 2.0. See LICENSE.

Citation

@software{eden_text_enhancement,
  title  = {EDEN: Encoder Decoder Enhancement Network},
  author = {Dunn, Ryan},
  year   = {2026},
  url    = {https://huggingface.co/Rybib/EDEN}
}
Downloads last month
51
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support