Model Card: Llamathan-3B

Model Details

  • Model Name: llamathan-3B
  • Base Model: Meta AI โ€“ Llama2-3B
  • Model Type: Causal Language Model (Decoder-only Transformer)
  • Architecture: LLaMA 2
  • Fine-Tuning Method: Supervised Fine-Tuning (SFT)
  • Dataset Size: 3,202 instruction samples
  • Language: Primarily Tamil (Tanglish + Tamil technical explanations)

Model Description

This model is a fine-tuned version of Llama 2 3B Instruct, optimized for:

  • Tamil instruction-following
  • Code-mixed Tamil (Tanglish) explanations
  • Technical concept explanations in simplified Tamil
  • Educational Q&A style prompts

The model is trained using a structured instruction dataset with the following format:

{
  "instruction": "Explanation of Mixture of Experts (MoE).",
  "input": "Mixtral models-la 'MoE' na enna logic?",
  "output": "Motha model-aiyum orey nerathula use pannaama, specific question-ku endha 'Expert' (subset of neurons) best-nu router choose pannum. Performance high aagum aana cost kammi."
}

Training Details

Overall 3202 data samples :

CoT : Chain of Thoughts
SQL : Query explain & Query generation 
Tech terms : Explaination and working 
Multi step reasoning 

Dataset Format

Each sample contains:

  • instruction โ†’ Task definition
  • input โ†’ User query (Tamil / Tanglish / Technical)
  • output โ†’ Expected response

Preprocessing Strategy

Prompt template used during training:

### Instruction:
{instruction}

### Input:
{input}

### Response:
{output}

The model was trained to predict only the Response portion autoregressively.


Training parameters

  • Epochs: 3
  • Batch Size: 8
  • Learning Rate: 5e-5
  • Optimizer: AdamW
  • LR Scheduler: Cosine decay
  • Max Sequence Length: 2048
  • Precision: bfloat16 / fp16
  • Gradient Accumulation: Enabled
  • Training Hardware: T4

Suitable For

  • Tamil educational assistants
  • Technical concept explanation in Tamil
  • AI/ML explanation chatbot
  • Code-mixed Tamil conversational agents

Limitations

  • Small dataset size (3,202 samples)
  • Possible hallucinations on unseen domains
  • May mix Tamil and English terminology inconsistently
  • Limited reasoning depth compared to larger models

Example Usage

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model_name = "Hariharan05/Llamathan-3B"

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.float16,
    device_map="auto"
)

prompt = """### Instruction:
Explanation of Mixture of Experts (MoE).

### Input:
Mixtral models-la 'MoE' na enna logic?

### Output:
"""

inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
outputs = model.generate(**inputs, max_new_tokens=200)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Evaluation

Evaluation was performed using:

  • Manual qualitative assessment
  • Instruction-following accuracy
  • Tamil fluency and coherence checks

Downloads last month
-
GGUF
Model size
3B params
Architecture
llama
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support