| --- |
| tags: |
| - ml-intern |
| --- |
| # NeuroName: Domain-Specific AI Architecture for Creative Name Generation |
|
|
| [](https://opensource.org/licenses/MIT) |
| [](https://www.python.org/downloads/) |
| [](https://pytorch.org/) |
|
|
| ## π§ What is NeuroName? |
|
|
| **NeuroName** is a purpose-built neural architecture for generating creative, novel names for brands, YouTube channels, social media handles, products, and more. Unlike generic LLMs that produce obvious word combinations, NeuroName creates **genuinely new words** that: |
|
|
| - Sound natural and pronounceable |
| - Evoke intended meanings without being literal |
| - Are controllable (length, style, language feel, energy) |
| - Are truly novel β not existing words or obvious compounds |
|
|
| ## π¬ Why Current LLMs Fail at Creative Naming |
|
|
| | Problem | Why It Happens | NeuroName Solution | |
| |---------|---------------|-------------------| |
| | **Too generic** | LLMs predict probable tokens from training distribution | Character-level VAE generates outside known distributions | |
| | **Obvious combinations** | Token-level = existing word chunks | Char-level latent space enables smooth morphological blending | |
| | **No sound awareness** | No phonotactic model | Dedicated Phonotactic Discriminator scores pronounceability | |
| | **Can't be truly novel** | Constrained to recombine training tokens | VAE latent interpolation creates genuinely new sequences | |
| | **No fine control** | Prompt engineering is imprecise | Energy-based composable attribute control in latent space | |
| | **RLHF kills creativity** | Safety alignment β conservative outputs | No RLHF; creativity is the objective function | |
|
|
| ## ποΈ Architecture Overview |
|
|
| ``` |
| Input: semantic_hints + control_params (length, style, language_feel, energy) |
| β |
| βΌ |
| βββββββββββββββββββββββββββββββ |
| β Semantic Encoder β β Transformer encodes meaning hints |
| β (attention-pooled) β |
| ββββββββββββββββ¬βββββββββββββββ |
| β |
| βΌ |
| βββββββββββββββββββββββββββββββ |
| β Conditional Prior β β P(z|semantics, controls) - Gaussian |
| β Network (ΞΌ, Ο learned) β |
| ββββββββββββββββ¬βββββββββββββββ |
| β |
| βΌ z ~ N(ΞΌ, ΟΒ²) |
| βββββββββββββββββββββββββββββββ |
| β Latent Space + EBM β β Energy-based attribute composition |
| β (ODE-guided sampling) β |
| ββββββββββββββββ¬βββββββββββββββ |
| β |
| βΌ |
| βββββββββββββββββββββββββββββββ |
| β Character Decoder β β Transformer generates char-by-char |
| β (cross-attends to z) β |
| ββββββββββββββββ¬βββββββββββββββ |
| β |
| βΌ |
| βββββββββββββββββββββββββββββββ |
| β Phonotactic Validator β β CNN+Transformer scores sound quality |
| ββββββββββββββββ¬βββββββββββββββ |
| β |
| βΌ |
| Generated Name: "Velocix" β |
| ``` |
|
|
| ## 𧬠Key Innovations |
|
|
| ### 1. Character-Level VAE (not token-level) |
| Operates at individual characters, enabling creation of genuinely novel sequences impossible with subword tokenizers. |
|
|
| ### 2. Phonotactic Discriminator |
| Learned model of sound combinations (bigrams, trigrams, syllable structure) based on the **Bouba-Kiki Effect** and cross-linguistic phonotactics. Ensures outputs are pronounceable and pleasant-sounding. |
|
|
| ### 3. Morphological Composition Module |
| Explicit linguistic word-formation operations as differentiable modules: |
| - **Blending**: "breakfast + lunch β brunch" style merging |
| - **Affixation**: Meaningful prefix/suffix attachment |
| - **Vowel Harmony**: Sound shifting for cohesion |
| - **Clipping + Extension**: Shortening with style |
|
|
| ### 4. Energy-Based Composable Control |
| Multiple attributes (style, length, language feel) composed via energy functions in latent space. Mathematically principled β not prompt hacking. |
|
|
| ### 5. Sound Symbolism Integration |
| Phoneme-meaning associations baked into the architecture: |
| - **Plosives** (b, d, k, t): Power, strength β "Kodak", "TikTok" |
| - **Fricatives** (f, s, sh, v): Speed, elegance β "Swift", "Visa" |
| - **Nasals** (m, n): Warmth, comfort β "Amazon", "Nintendo" |
| - **Close vowels** (i, e): Precision, tech β "Google", "Pixel" |
|
|
| ## π¦ Installation |
|
|
| ```bash |
| pip install torch numpy pyyaml tqdm |
| git clone https://huggingface.co/asdf98/neuroname |
| cd neuroname |
| pip install -e . |
| ``` |
|
|
| ## π Quick Start |
|
|
| ```python |
| from neuroname import NeuroNameGenerator |
| |
| # Initialize generator |
| generator = NeuroNameGenerator() |
| |
| # Generate brand names with semantic hints |
| names = generator.generate( |
| semantic_hints=["speed", "technology", "future"], |
| style="modern", # modern/classic/playful/techy/organic/elegant/bold/minimal |
| language_feel="latin", # english/latin/greek/japanese/nordic/spanish/french/abstract |
| energy="energetic", # calm/neutral/energetic |
| length_range=(5, 8), |
| num_names=10, |
| temperature=0.8 |
| ) |
| print(names) |
| # ['Velocix', 'Tervon', 'Nexura', 'Fluxen', 'Zyphos', ...] |
| |
| # Generate YouTube channel names |
| names = generator.generate( |
| semantic_hints=["gaming", "adventure", "epic"], |
| style="playful", |
| language_feel="english", |
| energy="energetic", |
| length_range=(6, 12), |
| num_names=10 |
| ) |
| |
| # Generate social media handles |
| names = generator.generate( |
| semantic_hints=["art", "minimal", "aesthetic"], |
| style="elegant", |
| language_feel="french", |
| energy="calm", |
| length_range=(4, 8), |
| num_names=10 |
| ) |
| ``` |
|
|
| ## ποΈ Training |
|
|
| ```bash |
| # Train from scratch |
| python train.py --config configs/default.yaml |
| |
| # Train with custom data |
| python train.py --data_path your_names.txt --epochs 100 |
| ``` |
|
|
| ## π Repository Structure |
|
|
| ``` |
| neuroname/ |
| βββ README.md # This file |
| βββ pyproject.toml # Package configuration |
| βββ neuroname/ |
| β βββ __init__.py # Package exports |
| β βββ model.py # Core architecture (VAE + all components) |
| β βββ generator.py # High-level generation interface |
| β βββ phonotactics.py # Phonotactic scoring & sound symbolism |
| β βββ morphology.py # Morphological composition operations |
| β βββ latent_ops.py # Energy-based latent space control |
| β βββ data.py # Dataset & data loading utilities |
| β βββ config.py # Configuration management |
| βββ train.py # Training script |
| βββ configs/ |
| β βββ default.yaml # Default training configuration |
| βββ notebooks/ |
| βββ demo.ipynb # Interactive demonstration |
| ``` |
|
|
| ## π Sound Symbolism Research Basis |
|
|
| Our architecture is grounded in linguistic research on sound-meaning associations: |
|
|
| | Phoneme Type | Associations | Example Brands | |
| |-------------|--------------|----------------| |
| | Voiced plosives (b, g, d) | Strong, bold, grounded | **B**ose, **G**oogle, **D**ell | |
| | Voiceless plosives (p, t, k) | Sharp, precise, clean | **P**aypal, **T**esla, **K**odak | |
| | Fricatives (f, v, s, z) | Fast, flowing, futuristic | **V**isa, **Z**ara, **S**potify | |
| | Nasals (m, n) | Warm, nurturing, smooth | a**M**azon, **N**intendo | |
| | Liquids (l, r) | Fluid, dynamic, premium | **L**exus, **R**olex | |
| | High vowels (i, ee) | Small, quick, technical | P**i**xel, W**ii** | |
| | Low vowels (a, o) | Big, open, powerful | **A**pple, V**o**lvo | |
|
|
| ## π§ Technical Details |
|
|
| - **Model Size**: ~15M parameters (intentionally small β domain-specific, not general) |
| - **Latent Dimension**: 128 |
| - **Character Vocabulary**: 44 chars (lowercase + digits + special) |
| - **Max Name Length**: 32 characters |
| - **Training**: ELBO loss + phonotactic reward + attribute classification |
|
|
| ## π License |
|
|
| MIT License - see LICENSE file for details. |
|
|
| ## π Acknowledgments |
|
|
| Architecture inspired by: |
| - [LatentOps](https://arxiv.org/abs/2208.00638) - Composable text controls in latent space |
| - [LlaMaVAE](https://arxiv.org/abs/2312.13208) - VAE with LLM decoder |
| - [Bouba-Kiki Effect](https://en.wikipedia.org/wiki/Bouba/kiki_effect) - Sound symbolism research |
| - [Controllable Text Generation Survey](https://arxiv.org/abs/2408.12599) - CTG methods taxonomy |
|
|
| <!-- ml-intern-provenance --> |
| ## Generated by ML Intern |
|
|
| This model repository was generated by [ML Intern](https://github.com/huggingface/ml-intern), an agent for machine learning research and development on the Hugging Face Hub. |
|
|
| - Try ML Intern: https://smolagents-ml-intern.hf.space |
| - Source code: https://github.com/huggingface/ml-intern |
|
|
| ## Usage |
|
|
| ```python |
| from transformers import AutoModelForCausalLM, AutoTokenizer |
| |
| model_id = "asdf98/neuroname" |
| tokenizer = AutoTokenizer.from_pretrained(model_id) |
| model = AutoModelForCausalLM.from_pretrained(model_id) |
| ``` |
|
|
| For non-causal architectures, replace `AutoModelForCausalLM` with the appropriate `AutoModel` class. |
|
|