| --- |
| license: apache-2.0 |
| base_model: sarvamai/sarvam-1 |
| library_name: peft |
| tags: |
| - indic-nlp |
| - dictionary |
| - sanskrit |
| - marathi |
| - hindi |
| - TransLiteral |
| - Kannada |
| - Oriya |
| - Indic |
| - Punjabi |
| - sft |
| - lora |
| --- |
| # TLF-7B-LLM-01 |
|
|
| ## Model Description |
|
|
| This model is a fine-tuned version of [sarvamai/sarvam-1](https://huggingface.co/sarvamai/sarvam-1) specialized for **Bilingual Indic Lexicography**. |
|
|
| It has been trained to provide structured morphological breakdowns, definitions, and regional translations for Sanskrit and other Indian regional languages. |
|
|
| The training data was ingested through the **[TLF Mega-Pipeline](https://github.com/assignarc/TLF-LLM-Pipeline)**, integrating structured dictionary databases (MSSQL) with unstructured regional texts to improve grammar and stylistic intelligence. |
|
|
| ### Data Source : |
|
|
| The dictionary content is freely available as Unified Dictionary project on [TransLiteral Foundation's website](https://www.transliteral.org/dictionary/). The website provides 1,153,927 Words and their 2,309,309 Meanings from 71 [dictionaries](https://www.transliteral.org/dictionary/all.kosh/source). These are cited with over 1079 [literary sources](https://www.transliteral.org/dictionary/all.references/text) from several authors from ancient Indian regional and religious texts. The source is used under [Creative Commons - ShareALike International License. ](https://creativecommons.org/licenses/by-nc-sa/4.0/) |
|
|
| ### Intended Use |
|
|
| - **Dictionary Lookups**: Providing high-accuracy definitions and etymologies. |
| - **Morphological Analysis**: Breaking down complex Sanskrit/Indic root words. |
| - **Regional Translation**: Translating word concepts across Marathi, Hindi, and English. |
|
|
| ## Training Hyperparameters |
|
|
| The following hyperparameters were used during training: |
|
|
| - **Engine**: MLX |
| - **Learning Rate**: 2e-05 |
| - **Batch Size**: 1 |
| - **Gradient Accumulation**: 64 |
| - **Optimizer**: adamw_torch |
| - **LR Scheduler**: cosine |
| - **LoRA R**: 32 |
| - **LoRA Alpha**: 16 |
| - **Max Sequence Length**: 1024 |
| |
| ## Prompt Template |
| |
| To achieve the intended structured output, use the following prompt format: |
| |
| ```text |
| <s>[INST] <<SYS>>\n{system_prompt}\n<</SYS>>\n\n{query} [/INST] |
| ``` |
| |
| ## Inference Example |
| |
| ### Using MLX (Apple Silicon) |
| |
| ```python |
| import mlx_lm |
| model, tokenizer = mlx_lm.load("AssignArc/TLF-7B-LLM-01") |
|
|
| prompt = "Provide a comprehensive morphological breakdown for: 'Abacus'" |
| # Use Sarvam/Llama template logic here |
| response = mlx_lm.generate(model, tokenizer, prompt=prompt) |
| print(response) |
| ``` |
| |
| ### Using Transformers |
| |
| ```python |
| from transformers import AutoModelForCausalLM, AutoTokenizer |
| from peft import PeftModel |
| |
| base_model = AutoModelForCausalLM.from_pretrained("sarvamai/sarvam-1") |
| model = PeftModel.from_pretrained(base_model, "AssignArc/TLF-7B-LLM-01") |
| tokenizer = AutoTokenizer.from_pretrained("sarvamai/sarvam-1") |
|
|
| inputs = tokenizer(prompt, return_tensors="pt") |
| outputs = model.generate(**inputs, max_new_tokens=256) |
| print(tokenizer.decode(outputs[0])) |
| ``` |
| |
| ## Example |
| |
| ```python |
| Prompt : Define Goddess |
| |
| 2026-03-24 19:10:20,665 - Inference - INFO - |
| [BASE MODEL]: <end_of_turn>model <start of turn>: devi is a feminine noun, meaning goddess. |
| <end_of_turn>model <start of turn>: devi is a feminine noun, meaning goddess. |
| <end_of_turn>model <start of turn>: devi is a feminine noun, meaning goddess. |
| <end_of_turn>model <start of turn>: devi is a feminine noun, meaning goddess. |
| <end_of_turn>model <start of turn>: devi is a feminine noun, meaning goddess. |
| <end_of_turn>model <start of turn>: devi is a feminine noun, meaning goddess. |
| <end_of_turn>model <start of turn>: devi is a feminine noun, meaning goddess. |
| <end_of_turn>model <start of turn>: devi is a feminine noun, meaning goddess. |
| <end_of_turn>model <start of turn>: devi is a feminine noun, meaning goddess. |
| <end_of_turn>model <start of turn>: devi is a feminine noun, meaning |
| 2026-03-24 19:10:20,665 - Inference - INFO - [FINETUNED]: "devi" Def: f. ( -वी ) 1 A female deity, goddess; a woman of the first or second order. f( आ ). A female deity, goddess; a woman of the first or second order. Tags: Feminine.<end_of_turn> |
| ``` |
| |
| ## Citation & Credits |
| |
| - **[TLF Framework](https://github.com/assignarc/TLF-LLM-Pipeline)**: Architected for Unified Indic LLM Fine-tuning. |
| - **[Data Source](https://www.transliteral.org/dictionary/)**: Custom Dictionary & Regional Text Corpus. |
| - [MLX-LM](https://github.com/ml-explore/mlx-lm) - MLX LM is a Python package for generating text and fine-tuning large language models on Apple silicon with MLX. |
| |