wordpalette
A ~710k-parameter character-level model that predicts the colours a word
evokes. You give it a word; it returns up to three CIELab colours with
proportions, and decides for itself how many a word has (blood → one red,
france → a blue/white/red tricolour). Trained entirely on free, human-made
colour data — no large language model was used at any point, for labels or
distillation.
- Demo: https://alganet.github.io/wordpalette/
- Code & training pipeline: https://github.com/alganet/wordpalette
- Size: 2.8 MB ONNX (fp32), runs on CPU / in-browser via onnxruntime-web
- License: CC-BY-4.0 (weights) · code is MIT
What it's for
Generative/affective colour: pick a theme colour from a tag, colour a word
cloud, seed a palette from a brand or a mood, or just explore how spelling maps
to colour. Because the encoder is character-level, it colours words it has
never seen — arborious → green, crimsonish → red — from morphology alone.
Usage
Inference needs only numpy + onnxruntime (no PyTorch):
import wordpalette # pip install wordpalette
wp = wordpalette.load()
[(s.hex, round(s.proportion, 2)) for s in wp.palette("brazil")]
# [('#30a432', 0.69), ('#ffd641', 0.25)]
Or drive the ONNX file directly:
import json, numpy as np, onnxruntime as ort
vocab = json.load(open("vocab.json"))
sess = ort.InferenceSession("wordpalette.onnx")
def encode(w):
ids = np.zeros(vocab["max_len"], dtype=np.int64) # PAD = 0
for i, c in enumerate(w.lower()[:vocab["max_len"]]):
j = vocab["chars"].find(c)
ids[i] = (j + 2) if j >= 0 else vocab["unk"]
return ids
lab_scaled, props = sess.run(None, {"chars": encode("ocean")[None]})
lab = lab_scaled[0] * np.array(vocab["scale"]) # [K, 3] real CIELab
I/O. Input chars: int64 [batch, 24], characters mapped as
a–z + #→2.., others→UNK(1), zero-padded (PAD 0). Outputs lab_scaled
[batch, 3, 3] (multiply by scale = [100, 110, 110] for real Lab) and props
[batch, 3] (softmax proportions). A slot is "on" when its proportion ≥ 0.15;
always keep at least the dominant slot.
How it was trained
Ten free colour sources — Simple Icons, Company Colors, color-names (meodai),
XKCD, CSS, GitHub label colours, emoji dominant-colour, flag palettes, WordNet
glosses, and Wikipedia first sentences — are normalised to word → CIELab with
a trust weight, then coalesced into one belief per word (aggregate GitHub
duplicates, denoise over-segmented palettes, drop cross-source outliers,
demote logo colours that contradict a word's everyday sense). A causal GRU is
then trained per-prefix in scaled Lab with:
- a hybrid loss — crisp where multi-colour ground truth exists (flags), flexible where it doesn't (single-colour words keep plausible secondaries);
- card-trust — WordNet teaches hue but not colour count;
- EMA weight-averaging for seed-robustness.
The full, runnable pipeline (and the reasoning behind each choice) is in the GitHub repo. Reproduction is a short CPU/GPU job from the committed data.
Evaluation & limitations
Evaluated during development against the NRC/Mohammad word–colour association lexicon (held out, never trained on) plus flag-palette accuracy and behavioural probes. It is a small model over noisy, partial supervision: it captures volume and patterns (thousands of natural kinds, morphology, flags) but not isolated facts (a handful of planets or trademark-only brands may be approximate). Colours are associative and sometimes contestable — treat outputs as evocative, not authoritative.
Attribution
Weights derive from the sources listed above; please credit them (see the repo README for per-source licences). Notably the WordNet License (Princeton) and CC-BY-SA text from Wikipedia contribute; no source trained an LLM.
Citation
@software{wordpalette,
author = {Gomes Gaigalas, Alexandre},
title = {wordpalette: a tiny character-level word-to-colour model},
year = {2026},
url = {https://github.com/alganet/wordpalette}
}