Sanskrit & Akkadian Dense LLM

A small experimental language model trained on ancient languages—specifically Sanskrit and Akkadian/Old Babylonian texts.

What Is This?

This is a research experiment exploring whether a compact transformer architecture can learn meaningful representations of ancient languages. The model was trained from scratch on a custom dataset mixing Sanskrit and Akkadian texts, with some mathematical word problems included.

This is not a production model. It's a public experiment to:

  1. Test a custom dense transformer architecture on low-resource ancient languages
  2. See what behaviors emerge from training on Sanskrit and Akkadian together
  3. Gather community feedback to guide future work

Quick Start

pip install einops transformers huggingface_hub
import os, sys, json, torch
from huggingface_hub import hf_hub_download
from transformers import AutoTokenizer

repo_id = "AlgoDriveAI/Sanskrit_Akkadian_LLM"

tokenizer = AutoTokenizer.from_pretrained(repo_id)
modeling_path = hf_hub_download(repo_id=repo_id, filename="modeling_dense_llm.py")
config_path = hf_hub_download(repo_id=repo_id, filename="config.json")
weights_path = hf_hub_download(repo_id=repo_id, filename="pytorch_model.bin")

sys.path.insert(0, os.path.dirname(modeling_path))
from modeling_dense_llm import DenseLLM

with open(config_path) as f:
    config = json.load(f)

device = "cuda" if torch.cuda.is_available() else "cpu"
dtype = torch.float16 if device == "cuda" else torch.float32

model = DenseLLM(
    vocab_size=config["vocab_size"],
    d_model=config["d_model"],
    n_layers=config["n_layers"],
    n_heads=config["n_heads"],
    n_kv_heads=config["n_kv_heads"],
    ff_hidden_mult=config["ff_hidden_mult"],
    qk_norm=config["qk_norm"],
    parallel_residual=config["parallel_residual"],
    max_seq_len=config["max_seq_len"],
).to(device=device, dtype=dtype)

model.load_state_dict(torch.load(weights_path, map_location="cpu"), strict=True)
model.eval()

# Generate
prompt = "Translate to English: "
input_ids = tokenizer(prompt, return_tensors="pt")["input_ids"].to(device)

with torch.inference_mode():
    output_ids = model.generate(
        input_ids,
        max_new_tokens=200,
        temperature=0.55,
        top_k=35,
        top_p=0.88,
        eos_token_id=tokenizer.eos_token_id,
    )

print(tokenizer.decode(output_ids[0], skip_special_tokens=True))

Fun and Easy Gradio

import os, sys, json, torch, threading
import gradio as gr
from huggingface_hub import hf_hub_download
from transformers import AutoTokenizer

# ── Load model once at startup ──────────────────────────────────────────────

repo_id = "AlgoDriveAI/Sanskrit_Akkadian_LLM"

print("Loading tokenizer...")
tokenizer = AutoTokenizer.from_pretrained(repo_id)

print("Downloading model files...")
modeling_path = hf_hub_download(repo_id=repo_id, filename="modeling_dense_llm.py")
config_path = hf_hub_download(repo_id=repo_id, filename="config.json")
weights_path = hf_hub_download(repo_id=repo_id, filename="pytorch_model.bin")

sys.path.insert(0, os.path.dirname(modeling_path))
from modeling_dense_llm import DenseLLM

with open(config_path) as f:
    config = json.load(f)

device = "cuda" if torch.cuda.is_available() else "cpu"
dtype = torch.float16 if device == "cuda" else torch.float32

print(f"Loading model on {device} ({dtype})...")
model = DenseLLM(
    vocab_size=config["vocab_size"],
    d_model=config["d_model"],
    n_layers=config["n_layers"],
    n_heads=config["n_heads"],
    n_kv_heads=config["n_kv_heads"],
    ff_hidden_mult=config["ff_hidden_mult"],
    qk_norm=config["qk_norm"],
    parallel_residual=config["parallel_residual"],
    max_seq_len=config["max_seq_len"],
).to(device=device, dtype=dtype)

model.load_state_dict(torch.load(weights_path, map_location="cpu"), strict=True)
model.eval()
print("Model ready!\n")


# ── Token-by-token streaming generator ──────────────────────────────────────

@torch.inference_mode()
def stream_generate(
    prompt: str,
    max_new_tokens: int = 200,
    temperature: float = 0.55,
    top_k: int = 35,
    top_p: float = 0.88,
):
    """Yield one token at a time so Gradio can stream the output."""
    input_ids = tokenizer(prompt, return_tensors="pt")["input_ids"].to(device)
    generated = input_ids.clone()

    for _ in range(max_new_tokens):
        logits = model(generated)                # (B, seq_len, vocab)
        next_logits = logits[:, -1, :].float()   # last position, fp32 for stability

        # temperature
        if temperature > 0:
            next_logits = next_logits / temperature

        # top-k
        if top_k > 0:
            topk_vals, _ = torch.topk(next_logits, top_k)
            next_logits[next_logits < topk_vals[:, -1:]] = float("-inf")

        # top-p (nucleus)
        if top_p < 1.0:
            sorted_logits, sorted_idx = torch.sort(next_logits, descending=True)
            cumulative = torch.cumsum(torch.softmax(sorted_logits, dim=-1), dim=-1)
            mask = cumulative - torch.softmax(sorted_logits, dim=-1) >= top_p
            sorted_logits[mask] = float("-inf")
            next_logits = sorted_logits.scatter(1, sorted_idx, sorted_logits)

        probs = torch.softmax(next_logits, dim=-1)
        next_token = torch.multinomial(probs, num_samples=1)

        if next_token.item() == tokenizer.eos_token_id:
            break

        generated = torch.cat([generated, next_token], dim=-1)

        # Trim KV context if we're getting close to max_seq_len
        if generated.shape[1] > config["max_seq_len"]:
            generated = generated[:, -config["max_seq_len"]:]

        # Decode only the NEW tokens (skip the prompt echo)
        decoded = tokenizer.decode(
            generated[0, input_ids.shape[1]:], skip_special_tokens=True
        )
        yield decoded


# ── Gradio interface ────────────────────────────────────────────────────────

def respond(prompt, max_tokens, temperature, top_k, top_p):
    """Streaming wrapper for Gradio."""
    if not prompt.strip():
        yield ""
        return
    for partial in stream_generate(prompt, max_tokens, temperature, top_k, top_p):
        yield partial


with gr.Blocks(
    title="Sanskrit / Akkadian LLM",
    theme=gr.themes.Soft(),
) as demo:
    gr.Markdown("# Sanskrit / Akkadian LLM\n*AlgoDriveAI — DenseLLM architecture*")

    with gr.Row():
        with gr.Column(scale=3):
            prompt_box = gr.Textbox(
                label="Prompt",
                placeholder="e.g.  Translate to English: ...",
                lines=3,
            )
            output_box = gr.Textbox(
                label="Output (streaming)",
                lines=12,
                interactive=False,
            )
            generate_btn = gr.Button("Generate", variant="primary")

        with gr.Column(scale=1):
            max_tokens = gr.Slider(
                16, 512, value=200, step=1, label="Max new tokens"
            )
            temperature = gr.Slider(
                0.0, 2.0, value=0.55, step=0.05, label="Temperature"
            )
            top_k = gr.Slider(0, 100, value=35, step=1, label="Top-K")
            top_p = gr.Slider(0.0, 1.0, value=0.88, step=0.01, label="Top-P")

    generate_btn.click(
        fn=respond,
        inputs=[prompt_box, max_tokens, temperature, top_k, top_p],
        outputs=output_box,
    )
    # Also trigger on Enter in the prompt box
    prompt_box.submit(
        fn=respond,
        inputs=[prompt_box, max_tokens, temperature, top_k, top_p],
        outputs=output_box,
    )

demo.queue()
demo.launch(server_name="0.0.0.0", server_port=7860, share=False)

Architecture

Component Details
Type Custom Dense Transformer (DenseLLM)
Parameters ~444M
Context Length 2048 tokens
Attention Grouped-Query Attention (GQA)
Positional Encoding RoPE
Activation SwiGLU
Normalization RMSNorm
Hyperparameter Value
d_model 1024
n_layers 16
n_heads 16
n_kv_heads 2
ff_mult 4.0
vocab_size 200,064

Training Data

  • Sanskrit texts (various sources)
  • Akkadian / Old Babylonian texts
  • Mathematical word problems (light mixing)

Dataset composition and preprocessing details are still being documented.

Known Limitations

  • Hallucinations: The model may generate plausible-sounding but incorrect translations or glosses
  • Repetition: Sometimes falls into repetitive loops, especially with longer generation
  • Language blending: May unexpectedly mix Sanskrit and Akkadian in outputs
  • Long context: Performance degrades on longer inputs despite 2048 token limit

Prompting Tips

The model responds better to explicit instructions:

  • Translate to English: [text]
  • Give Akkadian transliteration and meaning: [text]
  • Translate to Sanskrit: [text]

If outputs are repetitive or incoherent, try:

  • Lower temperature (0.3–0.5)
  • Lower top_p (0.7–0.8)
  • Shorter prompts

Feedback Welcome

If you experiment with this model, I'd appreciate hearing about:

  • Prompts that worked well (or didn't)
  • Unexpected behaviors or failure modes
  • Suggestions for training data or architecture changes

Your feedback directly informs the next iteration.

Contact

Email: csm4211@wgu.edu

Organization: AlgoDriveAI

Repository: AlgoDriveAI/Sanskrit_Akkadian_LLM

Citation

If you use or reference this model in your work:

@misc{algodrive2025sanskrit_akkadian,
  author = {AlgoDriveAI, Christopher Smith},
  title = {Sanskrit & Akkadian Dense LLM},
  year = {2025},
  publisher = {Hugging Face},
  url = {https://huggingface.co/AlgoDriveAI/Sanskrit_Akkadian_LLM}
}

License

MIT

Downloads last month
2,518
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support