Sanskrit & Akkadian Dense LLM
A small experimental language model trained on ancient languages—specifically Sanskrit and Akkadian/Old Babylonian texts.
What Is This?
This is a research experiment exploring whether a compact transformer architecture can learn meaningful representations of ancient languages. The model was trained from scratch on a custom dataset mixing Sanskrit and Akkadian texts, with some mathematical word problems included.
This is not a production model. It's a public experiment to:
- Test a custom dense transformer architecture on low-resource ancient languages
- See what behaviors emerge from training on Sanskrit and Akkadian together
- Gather community feedback to guide future work
Quick Start
pip install einops transformers huggingface_hub
import os, sys, json, torch
from huggingface_hub import hf_hub_download
from transformers import AutoTokenizer
repo_id = "AlgoDriveAI/Sanskrit_Akkadian_LLM"
tokenizer = AutoTokenizer.from_pretrained(repo_id)
modeling_path = hf_hub_download(repo_id=repo_id, filename="modeling_dense_llm.py")
config_path = hf_hub_download(repo_id=repo_id, filename="config.json")
weights_path = hf_hub_download(repo_id=repo_id, filename="pytorch_model.bin")
sys.path.insert(0, os.path.dirname(modeling_path))
from modeling_dense_llm import DenseLLM
with open(config_path) as f:
config = json.load(f)
device = "cuda" if torch.cuda.is_available() else "cpu"
dtype = torch.float16 if device == "cuda" else torch.float32
model = DenseLLM(
vocab_size=config["vocab_size"],
d_model=config["d_model"],
n_layers=config["n_layers"],
n_heads=config["n_heads"],
n_kv_heads=config["n_kv_heads"],
ff_hidden_mult=config["ff_hidden_mult"],
qk_norm=config["qk_norm"],
parallel_residual=config["parallel_residual"],
max_seq_len=config["max_seq_len"],
).to(device=device, dtype=dtype)
model.load_state_dict(torch.load(weights_path, map_location="cpu"), strict=True)
model.eval()
# Generate
prompt = "Translate to English: "
input_ids = tokenizer(prompt, return_tensors="pt")["input_ids"].to(device)
with torch.inference_mode():
output_ids = model.generate(
input_ids,
max_new_tokens=200,
temperature=0.55,
top_k=35,
top_p=0.88,
eos_token_id=tokenizer.eos_token_id,
)
print(tokenizer.decode(output_ids[0], skip_special_tokens=True))
Fun and Easy Gradio
import os, sys, json, torch, threading
import gradio as gr
from huggingface_hub import hf_hub_download
from transformers import AutoTokenizer
# ── Load model once at startup ──────────────────────────────────────────────
repo_id = "AlgoDriveAI/Sanskrit_Akkadian_LLM"
print("Loading tokenizer...")
tokenizer = AutoTokenizer.from_pretrained(repo_id)
print("Downloading model files...")
modeling_path = hf_hub_download(repo_id=repo_id, filename="modeling_dense_llm.py")
config_path = hf_hub_download(repo_id=repo_id, filename="config.json")
weights_path = hf_hub_download(repo_id=repo_id, filename="pytorch_model.bin")
sys.path.insert(0, os.path.dirname(modeling_path))
from modeling_dense_llm import DenseLLM
with open(config_path) as f:
config = json.load(f)
device = "cuda" if torch.cuda.is_available() else "cpu"
dtype = torch.float16 if device == "cuda" else torch.float32
print(f"Loading model on {device} ({dtype})...")
model = DenseLLM(
vocab_size=config["vocab_size"],
d_model=config["d_model"],
n_layers=config["n_layers"],
n_heads=config["n_heads"],
n_kv_heads=config["n_kv_heads"],
ff_hidden_mult=config["ff_hidden_mult"],
qk_norm=config["qk_norm"],
parallel_residual=config["parallel_residual"],
max_seq_len=config["max_seq_len"],
).to(device=device, dtype=dtype)
model.load_state_dict(torch.load(weights_path, map_location="cpu"), strict=True)
model.eval()
print("Model ready!\n")
# ── Token-by-token streaming generator ──────────────────────────────────────
@torch.inference_mode()
def stream_generate(
prompt: str,
max_new_tokens: int = 200,
temperature: float = 0.55,
top_k: int = 35,
top_p: float = 0.88,
):
"""Yield one token at a time so Gradio can stream the output."""
input_ids = tokenizer(prompt, return_tensors="pt")["input_ids"].to(device)
generated = input_ids.clone()
for _ in range(max_new_tokens):
logits = model(generated) # (B, seq_len, vocab)
next_logits = logits[:, -1, :].float() # last position, fp32 for stability
# temperature
if temperature > 0:
next_logits = next_logits / temperature
# top-k
if top_k > 0:
topk_vals, _ = torch.topk(next_logits, top_k)
next_logits[next_logits < topk_vals[:, -1:]] = float("-inf")
# top-p (nucleus)
if top_p < 1.0:
sorted_logits, sorted_idx = torch.sort(next_logits, descending=True)
cumulative = torch.cumsum(torch.softmax(sorted_logits, dim=-1), dim=-1)
mask = cumulative - torch.softmax(sorted_logits, dim=-1) >= top_p
sorted_logits[mask] = float("-inf")
next_logits = sorted_logits.scatter(1, sorted_idx, sorted_logits)
probs = torch.softmax(next_logits, dim=-1)
next_token = torch.multinomial(probs, num_samples=1)
if next_token.item() == tokenizer.eos_token_id:
break
generated = torch.cat([generated, next_token], dim=-1)
# Trim KV context if we're getting close to max_seq_len
if generated.shape[1] > config["max_seq_len"]:
generated = generated[:, -config["max_seq_len"]:]
# Decode only the NEW tokens (skip the prompt echo)
decoded = tokenizer.decode(
generated[0, input_ids.shape[1]:], skip_special_tokens=True
)
yield decoded
# ── Gradio interface ────────────────────────────────────────────────────────
def respond(prompt, max_tokens, temperature, top_k, top_p):
"""Streaming wrapper for Gradio."""
if not prompt.strip():
yield ""
return
for partial in stream_generate(prompt, max_tokens, temperature, top_k, top_p):
yield partial
with gr.Blocks(
title="Sanskrit / Akkadian LLM",
theme=gr.themes.Soft(),
) as demo:
gr.Markdown("# Sanskrit / Akkadian LLM\n*AlgoDriveAI — DenseLLM architecture*")
with gr.Row():
with gr.Column(scale=3):
prompt_box = gr.Textbox(
label="Prompt",
placeholder="e.g. Translate to English: ...",
lines=3,
)
output_box = gr.Textbox(
label="Output (streaming)",
lines=12,
interactive=False,
)
generate_btn = gr.Button("Generate", variant="primary")
with gr.Column(scale=1):
max_tokens = gr.Slider(
16, 512, value=200, step=1, label="Max new tokens"
)
temperature = gr.Slider(
0.0, 2.0, value=0.55, step=0.05, label="Temperature"
)
top_k = gr.Slider(0, 100, value=35, step=1, label="Top-K")
top_p = gr.Slider(0.0, 1.0, value=0.88, step=0.01, label="Top-P")
generate_btn.click(
fn=respond,
inputs=[prompt_box, max_tokens, temperature, top_k, top_p],
outputs=output_box,
)
# Also trigger on Enter in the prompt box
prompt_box.submit(
fn=respond,
inputs=[prompt_box, max_tokens, temperature, top_k, top_p],
outputs=output_box,
)
demo.queue()
demo.launch(server_name="0.0.0.0", server_port=7860, share=False)
Architecture
| Component | Details |
|---|---|
| Type | Custom Dense Transformer (DenseLLM) |
| Parameters | ~444M |
| Context Length | 2048 tokens |
| Attention | Grouped-Query Attention (GQA) |
| Positional Encoding | RoPE |
| Activation | SwiGLU |
| Normalization | RMSNorm |
| Hyperparameter | Value |
|---|---|
| d_model | 1024 |
| n_layers | 16 |
| n_heads | 16 |
| n_kv_heads | 2 |
| ff_mult | 4.0 |
| vocab_size | 200,064 |
Training Data
- Sanskrit texts (various sources)
- Akkadian / Old Babylonian texts
- Mathematical word problems (light mixing)
Dataset composition and preprocessing details are still being documented.
Known Limitations
- Hallucinations: The model may generate plausible-sounding but incorrect translations or glosses
- Repetition: Sometimes falls into repetitive loops, especially with longer generation
- Language blending: May unexpectedly mix Sanskrit and Akkadian in outputs
- Long context: Performance degrades on longer inputs despite 2048 token limit
Prompting Tips
The model responds better to explicit instructions:
Translate to English: [text]Give Akkadian transliteration and meaning: [text]Translate to Sanskrit: [text]
If outputs are repetitive or incoherent, try:
- Lower temperature (0.3–0.5)
- Lower top_p (0.7–0.8)
- Shorter prompts
Feedback Welcome
If you experiment with this model, I'd appreciate hearing about:
- Prompts that worked well (or didn't)
- Unexpected behaviors or failure modes
- Suggestions for training data or architecture changes
Your feedback directly informs the next iteration.
Contact
Email: csm4211@wgu.edu
Organization: AlgoDriveAI
Repository: AlgoDriveAI/Sanskrit_Akkadian_LLM
Citation
If you use or reference this model in your work:
@misc{algodrive2025sanskrit_akkadian,
author = {AlgoDriveAI, Christopher Smith},
title = {Sanskrit & Akkadian Dense LLM},
year = {2025},
publisher = {Hugging Face},
url = {https://huggingface.co/AlgoDriveAI/Sanskrit_Akkadian_LLM}
}
License
MIT
- Downloads last month
- 2,518