Instructions to use krystv/nomen-ai with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use krystv/nomen-ai with PEFT:
Task type is invalid.
- Notebooks
- Google Colab
- Kaggle
Nomen-AI
Nomen-AI is a production-ready pipeline for controllable, cross-lingual, morpho-phonetic brand / YouTube channel name synthesis. It is designed to fit a free-tier Google Colab T4 GPU (15GB VRAM) using Qwen2.5-1.5B-Instruct + LoRA.
Current status
Status: code/data/demo ready; GPU training blocked in the agent environment.
- Machine-readable status:
project_status.json - Full report:
FINAL_REPORT.md - Blocker details:
BLOCKERS.md - No-GPU attestation:
NO_GPU_EXECUTION_ATTESTATION.md - Live CPU-safe demo: https://huggingface.co/spaces/krystv/nomen-ai-demo
Adapter repos are initialized but do not yet contain trained weights:
- SFT adapter target: https://huggingface.co/krystv/nomen-ai-sft-lora
- DPO adapter target: https://huggingface.co/krystv/nomen-ai-dpo-lora
Public assets
- Code/model repo: https://huggingface.co/krystv/nomen-ai
- SFT dataset: https://huggingface.co/datasets/krystv/nomen-ai-sft
- DPO dataset: https://huggingface.co/datasets/krystv/nomen-ai-dpo
- Base model: https://huggingface.co/Qwen/Qwen2.5-1.5B-Instruct
- Demo Space: https://huggingface.co/spaces/krystv/nomen-ai-demo
Architecture
- CTRL-style control-token instruction SFT:
[ROOT:japanese:40+nordic:60][THEME:gaming][SYL:3][LEN:8][CREATIVE:0.8]
- Morpho-phonetic synthetic corpus using 24 language/root families.
- DPO anti-generic phase where chosen names are novel and rejected names are derivative (
TechHub,Brandify,GetZone). - Inference-time anti-duplication matrix combining fuzzy similarity and character n-gram overlap against known brands.
- Creativity knob decoding: low creativity uses contrastive search; high creativity uses min-p sampling + higher temperature.
Supported controls
24 linguistic root families: latin, greek, nordic, germanic, celtic, slavic, japanese, korean, mandarin, hindi, sanskrit, arabic, persian, turkish, swahili, yoruba, hawaiian, maori, finnish, hungarian, italian, spanish, portuguese, hebrew.
Themes: tech, gaming, beauty, vlogging, finance, lifestyle, fashion, food, fitness, music, travel, education, health, crypto, kids, luxury, eco, auto.
Train on Colab T4
git clone https://huggingface.co/krystv/nomen-ai
cd nomen-ai
pip install -q -r requirements.txt
huggingface-cli login
bash scripts/train_all_colab.sh
Or with Make:
make install
make all
Quick inference after training
from nomen_ai.control import ControlVector
from nomen_ai.inference import NomenAI
engine = NomenAI("krystv/nomen-ai-dpo-lora", base_model="Qwen/Qwen2.5-1.5B-Instruct")
cv = ControlVector(roots=["japanese", "nordic"], blend=[40, 60], theme="gaming", syllables=3, char_len=8, creativity=0.8)
print(engine.generate(cv, n=10))
Research basis
- CTRL control codes: https://arxiv.org/abs/1909.05858
- DPO: https://arxiv.org/abs/2305.18290
- SimCTG / contrastive search: https://arxiv.org/abs/2202.06417 and https://arxiv.org/abs/2210.14140
- Min-p sampling: https://arxiv.org/abs/2407.01082
- TRL SFT docs: https://huggingface.co/docs/trl/sft_trainer
- TRL DPO docs: https://huggingface.co/docs/trl/dpo_trainer
- Downloads last month
- -