Text Generation
PEFT
lora
trl
naming
brand-generation
controllable-generation
nomen-ai / VALIDATION.md
krystv's picture
Document GPU preflight validation
920d5a9 verified

Validation Guide

GPU preflight

Before training, run:

python scripts/preflight_gpu.py

or:

make preflight

Expected output:

GPU_PREFLIGHT_PASS

This checks:

  • PyTorch version
  • CUDA availability
  • GPU name
  • total VRAM
  • compute capability
  • minimum T4-class VRAM

GPU smoke test

The intended GPU smoke test is:

python scripts/smoke_test.py

This loads Qwen/Qwen2.5-1.5B-Instruct, runs 15 LoRA SFT steps on 100 examples, generates one candidate, and prints SMOKE_PASS.

From the agent environment this could not be executed because GPU/HF Jobs execution was repeatedly rejected.

CPU static validation

For environments without a GPU, run:

git clone https://huggingface.co/krystv/nomen-ai
cd nomen-ai
pip install -e . datasets trl peft transformers rapidfuzz pyphen PyYAML
python tests/test_static.py

This validates:

  • 20+ root families are present.
  • Control token prompt construction.
  • Syllable/character utilities.
  • Anti-duplication matrix.
  • Synthetic example generation.
  • SFT dataset schema.
  • DPO dataset schema.
  • Current TRL SFTConfig and DPOConfig argument names used by the training scripts.

Expected output:

CPU_STATIC_VALIDATION_PASS