cuneiformBase-400m
Introducing cuneiformBase-400m, a multilingual model capable of handling translation, transliteration, and script conversion tasks across multiple ancient languages: Akkadian, Sumerian, Hittite, Linear B, and Elamite.
1. Model Description
This is an instruct model based on Google's umt5-base (768 hidden dimensions, 12 encoder layers, 12 decoder layers). Unlike the original UMT5 architecture which uses untied input/output embeddings, this model uses tied embeddings (~396M parameters). It supports translation to and from English (and German for Hittite), transliteration between cuneiform signs and Latin characters, and script conversion across five ancient writing systems.
Three styles of transliteration are supported where applicable:
- Plain transliteration -- standard scholarly transliteration following CDLI notation style
- Complex transliteration -- includes special symbols, subscript numbers, and determinatives
- Simple transliteration -- stripped of all special symbols and diacritics, syllables merged to form words
Akkadian Instructions
Translation:
| Prompt | Input | Output |
|---|---|---|
Translate Akkadian cuneiform to English: |
cuneiform signs | English |
Translate Akkadian transliteration to English: |
transliteration | English |
Translate complex Akkadian transliteration to English: |
complex transliteration | English |
Translate simple Akkadian transliteration to English: |
simple transliteration | English |
Translate English to Akkadian cuneiform: |
English | cuneiform signs |
Translate English to Akkadian transliteration: |
English | transliteration |
Translate English to complex Akkadian transliteration: |
English | complex transliteration |
Translate English to simple Akkadian transliteration: |
English | simple transliteration |
Transliteration:
| Prompt | Input | Output |
|---|---|---|
Transliterate Akkadian cuneiform to Latin characters: |
cuneiform signs | transliteration |
Transliterate Akkadian cuneiform to complex Latin characters: |
cuneiform signs | complex transliteration |
Transliterate Akkadian cuneiform to simple Latin characters: |
cuneiform signs | simple transliteration |
Script Conversion:
| Prompt | Input | Output |
|---|---|---|
Convert transliterated Latin characters to Akkadian cuneiform: |
transliteration | cuneiform signs |
Convert complex transliterated Latin characters to Akkadian cuneiform: |
complex transliteration | cuneiform signs |
Convert simple transliterated Latin characters to Akkadian cuneiform: |
simple transliteration | cuneiform signs |
Sumerian Instructions
Translation:
| Prompt | Input | Output |
|---|---|---|
Translate Sumerian cuneiform to English: |
cuneiform signs | English |
Translate Sumerian transliteration to English: |
transliteration | English |
Translate complex Sumerian transliteration to English: |
complex transliteration | English |
Translate simple Sumerian transliteration to English: |
simple transliteration | English |
Translate English to Sumerian cuneiform: |
English | cuneiform signs |
Translate English to Sumerian transliteration: |
English | transliteration |
Transliteration:
| Prompt | Input | Output |
|---|---|---|
Transliterate Sumerian cuneiform to Latin characters: |
cuneiform signs | transliteration |
Transliterate Sumerian cuneiform to complex Latin characters: |
cuneiform signs | complex transliteration |
Script Conversion:
| Prompt | Input | Output |
|---|---|---|
Convert transliterated Latin characters to Sumerian cuneiform: |
transliteration | cuneiform signs |
Hittite Instructions
Translation:
| Prompt | Input | Output |
|---|---|---|
Translate Hittite transliteration to English: |
transliteration | English |
Translate complex Hittite transliteration to English: |
complex transliteration | English |
Translate simple Hittite transliteration to English: |
simple transliteration | English |
Translate Hittite transliteration to German: |
transliteration | German |
Translate complex Hittite transliteration to German: |
complex transliteration | German |
Translate simple Hittite transliteration to German: |
simple transliteration | German |
Translate English to Hittite transliteration: |
English | transliteration |
Translate German to Hittite transliteration: |
German | transliteration |
Linear B Instructions
Translation:
| Prompt | Input | Output |
|---|---|---|
Translate Linear B cuneiform to English: |
Linear B signs | English |
Translate Linear B transliteration to English: |
transliteration | English |
Translate complex Linear B transliteration to English: |
complex transliteration | English |
Translate simple Linear B transliteration to English: |
simple transliteration | English |
Translate English to Linear B cuneiform: |
English | Linear B signs |
Translate English to Linear B transliteration: |
English | transliteration |
Transliteration:
| Prompt | Input | Output |
|---|---|---|
Transliterate Linear B cuneiform to Latin characters: |
Linear B signs | transliteration |
Script Conversion:
| Prompt | Input | Output |
|---|---|---|
Convert transliterated Latin characters to Linear B cuneiform: |
transliteration | Linear B signs |
Elamite
Elamite was included in training on a limited corpus. Due to insufficient validation data, no evaluation metrics are reported for Elamite at this time. Use with caution and expect lower accuracy than the other supported languages.
Base Model
This is a finetuned version of Google's umt5-base, but with tied embeddings.
2. Usage
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
model_path = "Thalesian/cuneiformBase-400m"
tokenizer = AutoTokenizer.from_pretrained(model_path, use_fast=False)
model = AutoModelForSeq2SeqLM.from_pretrained(model_path)
# Example: Translate Akkadian cuneiform to English
prompt = "Translate Akkadian cuneiform to English: "
input_text = "π
πΉ π πΊ π½ π πΉ πΏ π π π΄ π» π π π π π π"
inputs = tokenizer(prompt + input_text, return_tensors="pt")
outputs = model.generate(**inputs, max_length=64)
prediction = tokenizer.decode(outputs[0], skip_special_tokens=True)
print("Prediction:", prediction)
> "witness Nabu-naαΉ£ir son Na di-Issar servant of son king"
3. Training and Evaluation Data
Data was used from the Akkademia project, previously published in PNAS Nexus. Additional data for pre-training and training came from CDLI data for Akkadian and Sumerian, the OARE dataset for Akkadian, Hittite data from the HPM corpus, Linear B data from published syllabary resources, and a limited Elamite corpus. More information on the training data, as well as the test and validation splits, can be found on both the GitHub and published methodology.
Training Procedure
The model was trained in multiple stages with different datasets and collators across all supported languages.
Framework Versions
- Transformers 5.0.0.dev0
- PyTorch 2.6.0+cu126
- Tokenizers 0.21.1
4. Evaluation Metrics
4.1 Akkadian
4.1.1 Akkadian Metrics by Line
| From Language | From Script | To Language | To Script | BLEU | CHRF | METEOR |
|---|---|---|---|---|---|---|
| Akkadian | Transliteration | Akkadian | Cuneiform | 95.78 | 95.33 | - |
| Akkadian | Cuneiform | English | Latin | 66.86 | 78.26 | 0.78 |
| Akkadian | Transliteration | English | Latin | 69.78 | 80.56 | 0.80 |
| Akkadian | Complex Transliteration | English | Latin | 69.78 | 80.58 | 0.80 |
| Akkadian | Simple Transliteration | English | Latin | 67.65 | 78.70 | 0.78 |
| English | Latin | Akkadian | Cuneiform | 45.61 | 45.48 | - |
| English | Latin | Akkadian | Transliteration | 43.54 | 65.42 | - |
| Akkadian | Cuneiform | Akkadian | Transliteration | 83.63 | 92.22 | - |
4.1.2 Akkadian Metrics by Document
| From Language | From Script | To Language | To Script | BLEU | CHRF | METEOR |
|---|---|---|---|---|---|---|
| Akkadian | Transliteration | Akkadian | Cuneiform | 39.47 | 50.44 | - |
| Akkadian | Cuneiform | English | Latin | 34.14 | 51.54 | 0.49 |
| Akkadian | Transliteration | English | Latin | 35.94 | 53.19 | 0.50 |
| Akkadian | Complex Transliteration | English | Latin | 36.17 | 53.46 | 0.51 |
| Akkadian | Simple Transliteration | English | Latin | 33.95 | 51.61 | 0.49 |
| English | Latin | Akkadian | Cuneiform | 24.14 | 29.94 | - |
| English | Latin | Akkadian | Transliteration | 18.84 | 35.57 | - |
| Akkadian | Cuneiform | Akkadian | Transliteration | 26.13 | 41.81 | - |
4.1.3 Akkadian Metrics by Line (CDLI Test Set)
| From Language | From Script | To Language | To Script | BLEU | CHRF | METEOR |
|---|---|---|---|---|---|---|
| Akkadian | Transliteration | Akkadian | Cuneiform | 29.79 | 46.17 | - |
| Akkadian | Complex Transliteration | Akkadian | Cuneiform | 32.05 | 47.47 | - |
| Akkadian | Simple Transliteration | Akkadian | Cuneiform | 16.97 | 24.64 | - |
| Akkadian | Cuneiform | English | Latin | 20.74 | 46.63 | 0.41 |
| Akkadian | Transliteration | English | Latin | 24.10 | 51.15 | 0.45 |
| Akkadian | Complex Transliteration | English | Latin | 23.99 | 51.30 | 0.45 |
| Akkadian | Simple Transliteration | English | Latin | 20.74 | 46.85 | 0.40 |
| English | Latin | Akkadian | Transliteration | 29.45 | 60.35 | - |
| English | Latin | Akkadian | Complex Transliteration | 18.06 | 47.32 | - |
| English | Latin | Akkadian | Simple Transliteration | 2.02 | 25.48 | - |
| Akkadian | Cuneiform | Akkadian | Transliteration | 36.74 | 74.18 | - |
| Akkadian | Cuneiform | Akkadian | Complex Transliteration | 29.49 | 68.63 | - |
| Akkadian | Cuneiform | Akkadian | Simple Transliteration | 1.80 | 30.50 | - |
4.1.4 Akkadian Metrics by Document (CDLI Test Set)
| From Language | From Script | To Language | To Script | BLEU | CHRF | METEOR |
|---|---|---|---|---|---|---|
| Akkadian | Transliteration | Akkadian | Cuneiform | 68.01 | 75.51 | - |
| Akkadian | Complex Transliteration | Akkadian | Cuneiform | 67.40 | 74.77 | - |
| Akkadian | Simple Transliteration | Akkadian | Cuneiform | 38.09 | 40.73 | - |
| Akkadian | Cuneiform | English | Latin | 23.39 | 47.78 | 0.44 |
| Akkadian | Transliteration | English | Latin | 26.23 | 50.91 | 0.47 |
| Akkadian | Complex Transliteration | English | Latin | 25.41 | 50.86 | 0.47 |
| Akkadian | Simple Transliteration | English | Latin | 22.48 | 47.68 | 0.44 |
| English | Latin | Akkadian | Transliteration | 28.88 | 51.57 | - |
| English | Latin | Akkadian | Complex Transliteration | 15.19 | 38.29 | - |
| English | Latin | Akkadian | Simple Transliteration | 1.75 | 23.31 | - |
| Akkadian | Cuneiform | Akkadian | Transliteration | 34.48 | 57.21 | - |
| Akkadian | Cuneiform | Akkadian | Complex Transliteration | 28.99 | 53.30 | - |
| Akkadian | Cuneiform | Akkadian | Simple Transliteration | 1.42 | 24.87 | - |
4.1.5 Akkadian Metrics by Document (OARE Test Set)
| From Language | From Script | To Language | To Script | BLEU | CHRF | METEOR |
|---|---|---|---|---|---|---|
| Akkadian | Transliteration | Akkadian | Cuneiform | 15.57 | 21.40 | - |
| Akkadian | Complex Transliteration | Akkadian | Cuneiform | 18.39 | 28.34 | - |
| Akkadian | Simple Transliteration | Akkadian | Cuneiform | 10.35 | 15.89 | - |
| Akkadian | Cuneiform | English | Latin | 1.72 | 17.53 | 0.13 |
| Akkadian | Transliteration | English | Latin | 0.78 | 17.67 | 0.10 |
| Akkadian | Complex Transliteration | English | Latin | 0.86 | 17.52 | 0.11 |
| Akkadian | Simple Transliteration | English | Latin | 1.53 | 18.32 | 0.14 |
| English | Latin | Akkadian | Transliteration | 0.77 | 16.17 | - |
| English | Latin | Akkadian | Complex Transliteration | 0.53 | 12.87 | - |
| English | Latin | Akkadian | Simple Transliteration | 0.33 | 10.94 | - |
| Akkadian | Cuneiform | Akkadian | Transliteration | 0.96 | 19.75 | - |
| Akkadian | Cuneiform | Akkadian | Complex Transliteration | 1.39 | 19.28 | - |
| Akkadian | Cuneiform | Akkadian | Simple Transliteration | 0.38 | 11.88 | - |
4.2 Sumerian
4.2.1 Sumerian Metrics by Line
| From Language | From Script | To Language | To Script | BLEU | CHRF | METEOR |
|---|---|---|---|---|---|---|
| Sumerian | Transliteration | Sumerian | Cuneiform | 98.85 | 98.87 | - |
| Sumerian | Cuneiform | English | Latin | 19.40 | 40.43 | 0.38 |
| Sumerian | Transliteration | English | Latin | 23.81 | 46.00 | 0.45 |
| Sumerian | Complex Transliteration | English | Latin | 23.96 | 45.88 | 0.45 |
| Sumerian | Simple Transliteration | English | Latin | 21.53 | 43.43 | 0.41 |
| English | Latin | Sumerian | Cuneiform | 52.28 | 55.05 | - |
| English | Latin | Sumerian | Transliteration | 42.02 | 62.72 | - |
| Sumerian | Cuneiform | Sumerian | Transliteration | 39.08 | 64.61 | - |
| Sumerian | Cuneiform | Sumerian | Complex Transliteration | 37.66 | 63.61 | - |
4.2.2 Sumerian Metrics by Document
| From Language | From Script | To Language | To Script | BLEU | CHRF | METEOR |
|---|---|---|---|---|---|---|
| Sumerian | Transliteration | Sumerian | Cuneiform | 78.74 | 83.74 | - |
| Sumerian | Cuneiform | English | Latin | 24.99 | 45.43 | 0.43 |
| Sumerian | Transliteration | English | Latin | 30.34 | 50.82 | 0.51 |
| Sumerian | Complex Transliteration | English | Latin | 30.43 | 50.58 | 0.50 |
| Sumerian | Simple Transliteration | English | Latin | 26.15 | 47.91 | 0.45 |
| English | Latin | Sumerian | Cuneiform | 52.58 | 55.17 | - |
| English | Latin | Sumerian | Transliteration | 48.35 | 62.48 | - |
| Sumerian | Cuneiform | Sumerian | Transliteration | 39.88 | 58.59 | - |
| Sumerian | Cuneiform | Sumerian | Complex Transliteration | 37.78 | 57.10 | - |
4.3 Hittite
4.3.1 Hittite Metrics by Line
| From Language | From Script | To Language | To Script | BLEU | CHRF | METEOR |
|---|---|---|---|---|---|---|
| Hittite | Transliteration | English | Latin | 95.62 | 97.41 | 0.97 |
| Hittite | Complex Transliteration | English | Latin | 95.08 | 97.05 | 0.97 |
| Hittite | Simple Transliteration | English | Latin | 93.45 | 96.19 | 0.96 |
| Hittite | Transliteration | German | Latin | 86.88 | 94.25 | 0.93 |
| Hittite | Complex Transliteration | German | Latin | 86.64 | 94.07 | 0.92 |
| Hittite | Simple Transliteration | German | Latin | 79.82 | 91.13 | 0.89 |
| English | Latin | Hittite | Transliteration | 55.89 | 84.47 | - |
| German | Latin | Hittite | Transliteration | 49.18 | 83.33 | - |
4.3.2 Hittite Metrics by Document
| From Language | From Script | To Language | To Script | BLEU | CHRF | METEOR |
|---|---|---|---|---|---|---|
| Hittite | Transliteration | English | Latin | 65.23 | 72.79 | 0.70 |
| Hittite | Complex Transliteration | English | Latin | 65.39 | 73.06 | 0.71 |
| Hittite | Simple Transliteration | English | Latin | 63.61 | 71.94 | 0.69 |
| Hittite | Transliteration | German | Latin | 56.01 | 68.22 | 0.65 |
| Hittite | Complex Transliteration | German | Latin | 56.47 | 68.87 | 0.66 |
| Hittite | Simple Transliteration | German | Latin | 49.32 | 64.91 | 0.62 |
| English | Latin | Hittite | Transliteration | 28.38 | 47.49 | - |
| German | Latin | Hittite | Transliteration | 24.24 | 45.63 | - |
The Hittite validation scripts were based on CTH numbers - however the English bleu score for lines (95.62) is implausibly high - we believe there was data leakage for a manually generated training set. This may impact the German as well, but German scores are consistent with past models deployed before the additional English set.
4.4 Linear B
Note: Line-level and document-level metrics are identical for Linear B, as the validation set consists of single-line documents.
4.4.1 Linear B Metrics
| From Language | From Script | To Language | To Script | BLEU | CHRF | METEOR |
|---|---|---|---|---|---|---|
| Linear B | Transliteration | Linear B | Syllabary | 86.24 | 88.29 | - |
| Linear B | Syllabary | English | Latin | 50.41 | 62.82 | 0.67 |
| Linear B | Transliteration | English | Latin | 56.51 | 66.23 | 0.70 |
| Linear B | Complex Transliteration | English | Latin | 68.33 | 73.42 | 0.78 |
| Linear B | Simple Transliteration | English | Latin | 28.24 | 44.50 | 0.50 |
| English | Latin | Linear B | Syllabary | 50.76 | 52.18 | - |
| English | Latin | Linear B | Transliteration | 52.21 | 64.18 | - |
| Linear B | Syllabary | Linear B | Transliteration | 50.98 | 73.45 | - |
4.5 Elamite
Elamite was included during training on a limited corpus. Due to insufficient validation data, no evaluation metrics are available. Results should be treated as experimental.
5. Intended Uses
- Translation of short cuneiform lines across Akkadian, Sumerian, Hittite, and Linear B
- Transliteration pipelines converting between cuneiform signs and Latin-script representations
- Reverse translation from English/German back to ancient language transliterations or cuneiform
- Comparative studies across multiple ancient writing systems
- Educational and research applications in digital Assyriology, Sumerology, Hittitology, and Aegean scripts
6. Limitations
- Context window is limited to 512 tokens; longer texts should be split into individual lines.
- Sumerian translation quality is notably lower than other languages due to the complexity and limited parallel data for Sumerian.
- Elamite support is experimental with minimal training data.
- OARE out-of-domain Akkadian data shows significantly degraded performance, indicating domain sensitivity.
- The model was trained on scholarly transliterations and may not generalize well to non-standard input formats.
- Linear B prompts use the term "cuneiform" for the syllabary script for consistency with the prompt format; Linear B is a syllabic script, not cuneiform.
7. How to Cite
@misc{drake2025cuneiformBase400m,
title = {{cuneiformBase-400m}: A Multilingual T5 Model for Ancient Script Translation and Transliteration},
author = {Drake, B. Lee},
year = {2025},
howpublished = {\url{https://huggingface.co/Thalesian/cuneiformBase-400m}}
}
- Downloads last month
- 43
Model tree for Thalesian/cuneiformBase-400m
Base model
google/umt5-base