Feature Extraction
Transformers
Safetensors
PyTorch
English
eden
text-enhancement
grammar-correction
text-rewriting
encoder-decoder
transformer
custom_code
Instructions to use Rybib/EDEN with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Rybib/EDEN with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("feature-extraction", model="Rybib/EDEN", trust_remote_code=True)# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("Rybib/EDEN", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
| # Training and fine-tuning EDEN | |
| This guide covers retraining EDEN from scratch, fine-tuning it on your own data, | |
| and converting a checkpoint for publishing. | |
| ## Install | |
| ```bash | |
| pip install -r requirements.txt | |
| ``` | |
| ## Where files live | |
| All training artifacts are written under a workspace folder named | |
| `eden_system`, created next to where you run the commands. You can move the | |
| workspace by setting the `EDEN_HOME` environment variable: | |
| ```bash | |
| export EDEN_HOME=/path/to/workspace | |
| ``` | |
| The layout is: | |
| ``` | |
| eden_system/ | |
| data/ prepared pairs, tokenizer, training config | |
| checkpoints/ default checkpoint folder | |
| training_sessions/ numbered training runs, each with its own checkpoints | |
| run/ live metrics, logs, and run state | |
| exports/ exported artifacts | |
| ``` | |
| ## Prepare the dataset | |
| ```bash | |
| python -m eden.cli prepare | |
| ``` | |
| This downloads and combines the source corpora, generates synthetic noise pairs, | |
| trains the byte-level BPE tokenizer, and writes everything into | |
| `eden_system/data`. | |
| ## Train from scratch | |
| ```bash | |
| python -m eden.cli train | |
| ``` | |
| Recipes control model size and memory use: | |
| ```bash | |
| python -m eden.cli train --recipe survivor # smallest, always runs | |
| python -m eden.cli train --recipe m5-smart # balanced default | |
| python -m eden.cli train --recipe m5-large # largest, matches this release | |
| ``` | |
| Start with `m5-smart`. Move to `m5-large` only after a smaller recipe trains | |
| without memory stops. | |
| To resume: | |
| ```bash | |
| python -m eden.cli train --resume eden_system/checkpoints/latest.pt | |
| ``` | |
| ## Fine-tune on your own examples | |
| Create a JSONL file of input and target pairs: | |
| ```jsonl | |
| {"input": "bad rough text here", "target": "Polished text here."} | |
| {"input": "another messy sentance", "target": "Another polished sentence."} | |
| ``` | |
| CSV and TSV files with `input` and `target` columns also work. Then run: | |
| ```bash | |
| python -m eden.cli finetune --data my_pairs.jsonl --mix-base | |
| ``` | |
| `--mix-base` blends in the base dataset so the model learns your style without | |
| forgetting general spelling and grammar ability. Use a low learning rate for | |
| fine-tuning, for example `--lr 0.00008`. | |
| ## Evaluate | |
| ```bash | |
| python -m eden.cli eval --checkpoint eden_system/checkpoints/best.pt | |
| ``` | |
| ## Convert a checkpoint for Hugging Face | |
| Once you have a checkpoint you like, convert it into safetensors plus the | |
| configuration and tokenizer files: | |
| ```bash | |
| python scripts/convert_checkpoint_to_hf.py \ | |
| --checkpoint eden_system/checkpoints/best.pt \ | |
| --tokenizer eden_system/data/tokenizer.json \ | |
| --out . | |
| ``` | |
| Then upload: | |
| ```bash | |
| python scripts/push_to_hub.py --repo-id Rybib/EDEN | |
| ``` | |
| ## Memory safety | |
| EDEN keeps PyTorch MPS inside a bounded memory budget and stops with a resumable | |
| checkpoint if memory use gets too high. A saved checkpoint is much better than a | |
| frozen machine. The cutoff is configurable through the training config and the | |
| recipe. | |
| ## The web dashboard | |
| ```bash | |
| python -m eden.cli ui | |
| # open http://127.0.0.1:7860 | |
| ``` | |
| The dashboard can start, pause, resume, and monitor training, and run a finished | |
| checkpoint in the browser. It launches training as a separate process using | |
| `python -m eden.cli`, so make sure the `eden` package is importable from the | |
| folder you launch it in. | |