Nomen-AI

Nomen-AI is a production-ready pipeline for controllable, cross-lingual, morpho-phonetic brand / YouTube channel name synthesis. It is designed to fit a free-tier Google Colab T4 GPU (15GB VRAM) using Qwen2.5-1.5B-Instruct + LoRA.

Current status

Status: code/data/demo ready; GPU training blocked in the agent environment.

Machine-readable status: project_status.json
Full report: FINAL_REPORT.md
Blocker details: BLOCKERS.md
No-GPU attestation: NO_GPU_EXECUTION_ATTESTATION.md
Live CPU-safe demo: https://huggingface.co/spaces/krystv/nomen-ai-demo

Adapter repos are initialized but do not yet contain trained weights:

SFT adapter target: https://huggingface.co/krystv/nomen-ai-sft-lora
DPO adapter target: https://huggingface.co/krystv/nomen-ai-dpo-lora

Public assets

Code/model repo: https://huggingface.co/krystv/nomen-ai
SFT dataset: https://huggingface.co/datasets/krystv/nomen-ai-sft
DPO dataset: https://huggingface.co/datasets/krystv/nomen-ai-dpo
Base model: https://huggingface.co/Qwen/Qwen2.5-1.5B-Instruct
Demo Space: https://huggingface.co/spaces/krystv/nomen-ai-demo

Architecture

CTRL-style control-token instruction SFT:
- [ROOT:japanese:40+nordic:60]
- [THEME:gaming]
- [SYL:3]
- [LEN:8]
- [CREATIVE:0.8]
Morpho-phonetic synthetic corpus using 24 language/root families.
DPO anti-generic phase where chosen names are novel and rejected names are derivative (TechHub, Brandify, GetZone).
Inference-time anti-duplication matrix combining fuzzy similarity and character n-gram overlap against known brands.
Creativity knob decoding: low creativity uses contrastive search; high creativity uses min-p sampling + higher temperature.

Supported controls

24 linguistic root families: latin, greek, nordic, germanic, celtic, slavic, japanese, korean, mandarin, hindi, sanskrit, arabic, persian, turkish, swahili, yoruba, hawaiian, maori, finnish, hungarian, italian, spanish, portuguese, hebrew.

Themes: tech, gaming, beauty, vlogging, finance, lifestyle, fashion, food, fitness, music, travel, education, health, crypto, kids, luxury, eco, auto.

Train on Colab T4

git clone https://huggingface.co/krystv/nomen-ai
cd nomen-ai
pip install -q -r requirements.txt
huggingface-cli login
bash scripts/train_all_colab.sh

Or with Make:

make install
make all

Quick inference after training

from nomen_ai.control import ControlVector
from nomen_ai.inference import NomenAI
engine = NomenAI("krystv/nomen-ai-dpo-lora", base_model="Qwen/Qwen2.5-1.5B-Instruct")
cv = ControlVector(roots=["japanese", "nordic"], blend=[40, 60], theme="gaming", syllables=3, char_len=8, creativity=0.8)
print(engine.generate(cv, n=10))

Research basis

CTRL control codes: https://arxiv.org/abs/1909.05858
DPO: https://arxiv.org/abs/2305.18290
SimCTG / contrastive search: https://arxiv.org/abs/2202.06417 and https://arxiv.org/abs/2210.14140
Min-p sampling: https://arxiv.org/abs/2407.01082
TRL SFT docs: https://huggingface.co/docs/trl/sft_trainer
TRL DPO docs: https://huggingface.co/docs/trl/dpo_trainer

Downloads last month: -

Model tree for krystv/nomen-ai

Base model

Qwen/Qwen2.5-1.5B

Finetuned

Qwen/Qwen2.5-1.5B-Instruct

Adapter

(1013)

this model

Datasets used to train krystv/nomen-ai

Space using krystv/nomen-ai 1

Papers for krystv/nomen-ai