Text Generation
Transformers
Safetensors
Undetermined
indus-script
ancient-scripts
archaeology
nlp
sequence-modeling
grammar-analysis
undeciphered-script
Instructions to use hellosindh/indus-script-models with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use hellosindh/indus-script-models with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="hellosindh/indus-script-models")# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("hellosindh/indus-script-models", dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use hellosindh/indus-script-models with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "hellosindh/indus-script-models" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "hellosindh/indus-script-models", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/hellosindh/indus-script-models
- SGLang
How to use hellosindh/indus-script-models with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "hellosindh/indus-script-models" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "hellosindh/indus-script-models", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "hellosindh/indus-script-models" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "hellosindh/indus-script-models", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use hellosindh/indus-script-models with Docker Model Runner:
docker model run hf.co/hellosindh/indus-script-models
| # Indus Script Models | |
| Four trained models + NanoGPT for the undeciphered Indus Valley Script (2600β1900 BCE). | |
| ## What's in this repo | |
| ``` | |
| models/ | |
| mlm/best/ TinyBERT masked language model | |
| cls/best/ TinyBERT sequence classifier (valid vs corrupted) | |
| ngram_model.pkl N-gram RTL transition model | |
| electra/best/ ELECTRA token discriminator | |
| deberta/best/ DeBERTa sequence discriminator | |
| nanogpt_indus.pt NanoGPT generator (153K params) | |
| data/ | |
| indus_tokenizer/ Custom tokenizer (641 Indus sign tokens) | |
| id_to_glyph.json Sign ID β glyph character mapping | |
| inference.py Run all tasks (see below) | |
| indus_ngram.py Required by ngram_model.pkl | |
| ``` | |
| ## How the pipeline works | |
| **Stage 1 β Real inscriptions (3,310 sequences):** | |
| Four models trained independently on real Indus Script inscriptions. | |
| Each learned a different aspect of grammar: | |
| - TinyBERT MLM β which signs can fill a masked position | |
| - TinyBERT Classifier β valid sequence vs corrupted | |
| - N-gram RTL β right-to-left transition probabilities | |
| - ELECTRA β token-level real vs fake discrimination | |
| - DeBERTa β sequence-level real vs fake discrimination | |
| **Stage 2 β Generate + filter:** | |
| NanoGPT generates candidates in RTL order. | |
| Each candidate scored by BERT (50%) + N-gram (25%) + ELECTRA (25%). | |
| Only sequences scoring β₯85% ensemble are kept. | |
| Exact matches to real inscriptions separated as validation evidence. | |
| **Stage 3 β Retrain on combined data (3,310 real + 5,000 synthetic = 8,310):** | |
| All models retrained β TinyBERT accuracy 78% β 89%, NanoGPT PPL 32.5 β 13.3. | |
| Final 5,000 sequences generated with retrained models. | |
| ## Quick start | |
| ```bash | |
| pip install torch transformers huggingface_hub | |
| # Clone this repo | |
| git clone https://huggingface.co/YOUR_USERNAME/indus-script-models | |
| cd indus-script-models | |
| # Run demo (validates 5 example sequences) | |
| python inference.py --task demo | |
| # Validate a sequence | |
| python inference.py --task validate --sequence "T638 T177 T420 T122" | |
| # Predict a masked sign | |
| python inference.py --task predict --sequence "T638 [MASK] T420 T122" | |
| # Generate 10 new sequences | |
| python inference.py --task generate --count 10 | |
| # Score any sequence | |
| python inference.py --task score --sequence "T604 T123 T609" | |
| ``` | |
| ## Example output | |
| ``` | |
| Loading models... | |
| β TinyBERT | |
| β N-gram | |
| β ELECTRA | |
| Sequence : T638 T177 T420 T122 | |
| Glyphs : π¦π¦¬π¦°π¦‘ | |
| BERT : 0.9650 | |
| N-gram : 0.8930 | |
| ELECTRA : 0.9410 | |
| Ensemble : 0.9410 | |
| Verdict : β VALID (β₯85%) | |
| ``` | |
| ## Model performance | |
| | Model | Metric | Value | | |
| |---|---|---| | |
| | TinyBERT Classifier | Test accuracy | 89.0% | | |
| | TinyBERT MLM | Val loss | 2.06 | | |
| | N-gram RTL | Pairwise accuracy | 88.2% | | |
| | ELECTRA | Token accuracy | 95.1% | | |
| | DeBERTa | Test accuracy | 87.1% | | |
| | NanoGPT | Perplexity | 13.3 | | |
| ## Key findings | |
| - **RTL confirmed** β right-to-left has 12% stronger grammatical structure than LTR | |
| - **Grammar proven** β H1βH2βH3 = 6.03β3.41β2.39 bits (language-like decay) | |
| - **Zipf's law** β RΒ²=0.968 (language-like token distribution) | |
| - **752 seal reproductions** β model independently reproduced real inscriptions | |
| - **Sign roles** β PREFIX (T638, T604), SUFFIX (T123, T122), CORE (T101, T268) | |
| ## Dataset | |
| The 5,000 synthetic sequences are available at: | |
| [YOUR_USERNAME/indus-script-synthetic](https://huggingface.co/datasets/YOUR_USERNAME/indus-script-synthetic) | |