Instructions to use Rybib/EDEN with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Rybib/EDEN with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("feature-extraction", model="Rybib/EDEN", trust_remote_code=True)# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("Rybib/EDEN", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
Training and fine-tuning EDEN
This guide covers retraining EDEN from scratch, fine-tuning it on your own data, and converting a checkpoint for publishing.
Install
pip install -r requirements.txt
Where files live
All training artifacts are written under a workspace folder named
eden_system, created next to where you run the commands. You can move the
workspace by setting the EDEN_HOME environment variable:
export EDEN_HOME=/path/to/workspace
The layout is:
eden_system/
data/ prepared pairs, tokenizer, training config
checkpoints/ default checkpoint folder
training_sessions/ numbered training runs, each with its own checkpoints
run/ live metrics, logs, and run state
exports/ exported artifacts
Prepare the dataset
python -m eden.cli prepare
This downloads and combines the source corpora, generates synthetic noise pairs,
trains the byte-level BPE tokenizer, and writes everything into
eden_system/data.
Train from scratch
python -m eden.cli train
Recipes control model size and memory use:
python -m eden.cli train --recipe survivor # smallest, always runs
python -m eden.cli train --recipe m5-smart # balanced default
python -m eden.cli train --recipe m5-large # largest, matches this release
Start with m5-smart. Move to m5-large only after a smaller recipe trains
without memory stops.
To resume:
python -m eden.cli train --resume eden_system/checkpoints/latest.pt
Fine-tune on your own examples
Create a JSONL file of input and target pairs:
{"input": "bad rough text here", "target": "Polished text here."}
{"input": "another messy sentance", "target": "Another polished sentence."}
CSV and TSV files with input and target columns also work. Then run:
python -m eden.cli finetune --data my_pairs.jsonl --mix-base
--mix-base blends in the base dataset so the model learns your style without
forgetting general spelling and grammar ability. Use a low learning rate for
fine-tuning, for example --lr 0.00008.
Evaluate
python -m eden.cli eval --checkpoint eden_system/checkpoints/best.pt
Convert a checkpoint for Hugging Face
Once you have a checkpoint you like, convert it into safetensors plus the configuration and tokenizer files:
python scripts/convert_checkpoint_to_hf.py \
--checkpoint eden_system/checkpoints/best.pt \
--tokenizer eden_system/data/tokenizer.json \
--out .
Then upload:
python scripts/push_to_hub.py --repo-id Rybib/EDEN
Memory safety
EDEN keeps PyTorch MPS inside a bounded memory budget and stops with a resumable checkpoint if memory use gets too high. A saved checkpoint is much better than a frozen machine. The cutoff is configurable through the training config and the recipe.
The web dashboard
python -m eden.cli ui
# open http://127.0.0.1:7860
The dashboard can start, pause, resume, and monitor training, and run a finished
checkpoint in the browser. It launches training as a separate process using
python -m eden.cli, so make sure the eden package is importable from the
folder you launch it in.