vec2slug-v1-openai-large
Generate URL slugs directly from text embeddings, without re-feeding source text through a language model. Designed to piggyback on embeddings a system already has for search or deduplication.
| Parameters | 24.8M |
| Architecture | Transformer decoder, 6L, d=512 |
| Input | OpenAI text-embedding-3-small (1536d) |
| Vocab | BPE, 5000 subwords |
| Token F1 | 0.306 |
| ONNX size | 95.1 MiB |
| Inference (CPU) | ~41ms (M-series), ~160ms (budget VPS) |
14 to 19× faster and approximately 85× cheaper than a Haiku-class LLM call for the same task, including the cost of computing a fresh embedding. With existing embeddings (the intended use case), approximately 2,000× cheaper.
This is the larger of two variants. It achieves the best Token F1 but at 2x the inference cost of the smaller model.
See also: Vec2Slug V1-Openai-Small
Quickstart
# install dependencies
pip install onnxruntime numpy
# or run directly with uv
uv run inference.py . --input embeddings.npy
from inference import OnnxPredictor
import numpy as np
predictor = OnnxPredictor.from_dir(".")
# embeddings: [N, 1536] float32 from OpenAI text-embedding-3-small
slugs = predictor.predict(embeddings)
# ["how-neural-networks-learn", "climate-change-solutions", ...]
PyTorch inference (requires torch):
from inference import PyTorchPredictor
predictor = PyTorchPredictor.from_dir(".")
slugs = predictor.predict(embeddings)
Examples
Predictions on held-out test samples (beam search, width 4). The model sees only the 1536-dim embedding, never the source text.
| Source text | Reference slug | Predicted slug |
|---|---|---|
| Children's book about astronomy and living on Mars | can-we-live-on-mars |
can-we-live-on-mars |
| Teaching resources for Martin Luther King Jr. Day | celebrating-martin-luther-king-jr-day |
celebrating-martin-luther-king-jr-day |
| Article about Waldorf education practices | 12-things-may-not-know-waldorf-education |
10-things-you-didnt-know-about-waldorf-education |
The third example illustrates the typical case: the model captures the topic correctly but diverges in specific wording. The common failure mode is overgeneralization rather than incoherence.
How it works
The model is a prefix-conditioned transformer decoder. A precomputed text embedding is linearly projected into the decoder's hidden space and placed at position 0 as a prefix token. The decoder then autoregressively generates BPE subword tokens that form a kebab-case URL slug.
Beam search uses bounded additive length reward with score-based optimal
stopping (Huang et al. 2017). All
decoding parameters are stored in model.json.
Files
| File | Description |
|---|---|
model.onnx |
ONNX model (forward pass only) |
model.json |
Sidecar: vocabulary, beam search config, stopwords |
model.pt |
PyTorch weights (state_dict) |
tokenizer.json |
BPE tokenizer (HuggingFace tokenizers format) |
inference.py |
Standalone inference script (uv run compatible) |
manifest.train.json |
Training configuration and results |
manifest.onnx.json |
Export verification (tolerance, argmax agreement) |
history.train.jsonl |
Training loss/metric curves |
Training
Trained on 2.3M documents from FineWeb-Edu with slugs extracted from source URLs. The extraction pipeline filters on language, slug format, Gopher repetition, and token count.
BPE vocabulary (5,000 subwords) with - as a special token. Trained for 36 epochs with label smoothing (0.1) and position-aware EOS loss weighting. Best checkpoint at step 70,560.
Evaluation
Evaluated on 5,000 held-out test samples using the full beam search decoding pipeline.
| Metric | Value |
|---|---|
| Token F1 (macro) | 0.306 |
| Exact match | 2.1% |
| ROUGE-L | 0.284 |
| BERTScore F1 | 0.872 |
| Validity | 100% |
| Vocab diversity | 97.8% |
Token F1 splits both slugs on hyphens and computes set-overlap F1 (order ignored). ROUGE-L measures the longest common subsequence and penalizes misordered words. BERTScore computes contextual embedding similarity via roberta-large; the floor is high (~0.82) because short English slugs are not widely separated in that embedding space.
Limitations
- Requires precomputed embeddings from OpenAI
text-embedding-3-small. Other embedding models will produce poor results. - Trained on English web content. Non-English or domain-specific text may produce generic or inaccurate slugs.
- Slugs reflect patterns in the training URLs, which include SEO-influenced and editorially inconsistent sources.
- The primary failure mode is overgeneralization: the model captures the
topic but may miss specific angles or proper nouns (
asminstead ofwasmfor a WebAssembly article).
Links
Citation
@misc{vec2slug2026,
title={vec2slug: URL Slug Generation from Text Embeddings},
author={Mahmoud, Bilal and {HASH}},
year={2026},
url={https://github.com/hashintel/labs}
}