Model Card: Llamathan-3B
Model Details
- Model Name: llamathan-3B
- Base Model: Meta AI โ Llama2-3B
- Model Type: Causal Language Model (Decoder-only Transformer)
- Architecture: LLaMA 2
- Fine-Tuning Method: Supervised Fine-Tuning (SFT)
- Dataset Size: 3,202 instruction samples
- Language: Primarily Tamil (Tanglish + Tamil technical explanations)
Model Description
This model is a fine-tuned version of Llama 2 3B Instruct, optimized for:
- Tamil instruction-following
- Code-mixed Tamil (Tanglish) explanations
- Technical concept explanations in simplified Tamil
- Educational Q&A style prompts
The model is trained using a structured instruction dataset with the following format:
{
"instruction": "Explanation of Mixture of Experts (MoE).",
"input": "Mixtral models-la 'MoE' na enna logic?",
"output": "Motha model-aiyum orey nerathula use pannaama, specific question-ku endha 'Expert' (subset of neurons) best-nu router choose pannum. Performance high aagum aana cost kammi."
}
Training Details
Overall 3202 data samples :
CoT : Chain of Thoughts
SQL : Query explain & Query generation
Tech terms : Explaination and working
Multi step reasoning
Dataset Format
Each sample contains:
instructionโ Task definitioninputโ User query (Tamil / Tanglish / Technical)outputโ Expected response
Preprocessing Strategy
Prompt template used during training:
### Instruction:
{instruction}
### Input:
{input}
### Response:
{output}
The model was trained to predict only the Response portion autoregressively.
Training parameters
- Epochs: 3
- Batch Size: 8
- Learning Rate: 5e-5
- Optimizer: AdamW
- LR Scheduler: Cosine decay
- Max Sequence Length: 2048
- Precision: bfloat16 / fp16
- Gradient Accumulation: Enabled
- Training Hardware: T4
Suitable For
- Tamil educational assistants
- Technical concept explanation in Tamil
- AI/ML explanation chatbot
- Code-mixed Tamil conversational agents
Limitations
- Small dataset size (3,202 samples)
- Possible hallucinations on unseen domains
- May mix Tamil and English terminology inconsistently
- Limited reasoning depth compared to larger models
Example Usage
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
model_name = "Hariharan05/Llamathan-3B"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype=torch.float16,
device_map="auto"
)
prompt = """### Instruction:
Explanation of Mixture of Experts (MoE).
### Input:
Mixtral models-la 'MoE' na enna logic?
### Output:
"""
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
outputs = model.generate(**inputs, max_new_tokens=200)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Evaluation
Evaluation was performed using:
- Manual qualitative assessment
- Instruction-following accuracy
- Tamil fluency and coherence checks
- Downloads last month
- -
Hardware compatibility
Log In
to add your hardware
4-bit
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support