βοΈ Kanoonu AI β Phi-3 Fine-tuned on Indian Law
A domain-specific legal assistant fine-tuned on Indian law, Constitution, IPC, and CrPC
π Overview
Kanoonu AI (ΰ€ΰ€Ύΰ€¨ΰ₯ΰ€¨ β Hindi for "Law") is a fine-tuned version of microsoft/Phi-3-mini-4k-instruct specialised for Indian legal domain question answering.
It was trained using QLoRA (Quantized Low-Rank Adaptation) on 23,370 Indian law Q&A pairs, covering the Indian Penal Code (IPC), Code of Criminal Procedure (CrPC), Constitution of India, and other statutes.
"Kanoonu AI" is part of a larger project to make Indian legal information accessible to everyone through conversational AI.
π Training Results
| Metric | Value |
|---|---|
| Final Train Loss | 0.3478 |
| Best Eval Loss | 0.6568 |
| Training Examples | 23,370 |
| Training Steps | 731 |
| Epochs | 1 |
| Trainable Parameters | 29,884,416 (0.78%) |
A final train loss of 0.3478 indicates excellent domain adaptation with no signs of overfitting.
π Quick Start
Option 1 β Load LoRA Adapter (requires base model)
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
# Load base model + LoRA adapter
base_model = AutoModelForCausalLM.from_pretrained(
"microsoft/Phi-3-mini-4k-instruct",
torch_dtype = torch.float16,
device_map = "auto",
)
tokenizer = AutoTokenizer.from_pretrained("tejasgowda05/Kanoonu-AI-Phi3-Finetuned")
model = PeftModel.from_pretrained(base_model, "tejasgowda05/Kanoonu-AI-Phi3-Finetuned")
# Inference
def ask_kanoonu(question):
prompt = f"""<|system|>
You are Kanoonu AI, an expert Indian legal assistant created by Tejas Gowda.
Answer questions clearly and accurately based on Indian law, the Constitution
of India, IPC, CrPC, and other relevant statutes.<|end|>
<|user|>
{question}<|end|>
<|assistant|>
"""
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
outputs = model.generate(**inputs, max_new_tokens=200, do_sample=False)
return tokenizer.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True)
print(ask_kanoonu("What is the difference between IPC and CrPC?"))
Option 2 β GGUF with Ollama (Recommended for local deployment)
# Download the GGUF model
huggingface-cli download tejasgowda05/Kanoonu-AI-Phi3-GGUF --local-dir ./kanoonu_model
# Create Modelfile
echo 'FROM ./kanoonu_model/kanoonu-ai-phi3-q4_k_m.gguf
SYSTEM "You are Kanoonu AI, an expert Indian legal assistant created by Tejas Gowda."' > Modelfile
# Run with Ollama
ollama create kanoonu-ai -f Modelfile
ollama run kanoonu-ai "What are the fundamental rights in the Indian Constitution?"
Option 3 β GGUF with llama-cpp-python
from llama_cpp import Llama
llm = Llama(model_path="./kanoonu_model/kanoonu-ai-phi3-q4_k_m.gguf")
response = llm("What is an FIR and how is it filed in India?")
print(response["choices"][0]["text"])
ποΈ Model Architecture & Training
Base Model
- Model:
microsoft/Phi-3-mini-4k-instruct(3.8B parameters) - Architecture: Phi-3 / Mistral-based transformer
Fine-tuning Method β QLoRA
| Parameter | Value |
|---|---|
| Method | QLoRA (4-bit quantization + LoRA) |
| LoRA Rank (r) | 16 |
| LoRA Alpha | 32 |
| LoRA Dropout | 0.05 |
| Target Modules | q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj |
| Trainable Parameters | 29,884,416 (0.78% of total) |
Training Configuration
| Parameter | Value |
|---|---|
| Platform | Kaggle GPU T4 |
| Batch Size (per device) | 2 |
| Gradient Accumulation | 16 steps |
| Effective Batch Size | 32 |
| Learning Rate | 2e-4 |
| LR Scheduler | Cosine decay |
| Warmup Ratio | 0.05 |
| Optimizer | AdamW 8-bit |
| Max Sequence Length | 512 |
| Sequence Packing | Enabled |
| Precision | FP16 |
| Total Training Time | ~205 minutes |
Prompt Template (Phi-3 Native Format)
<|system|>
You are Kanoonu AI, an expert Indian legal assistant...<|end|>
<|user|>
{question}<|end|>
<|assistant|>
{answer}<|end|>
π Dataset
This model was trained on tejasgowda05/Indian-Kanoonu-Dataset β a Phi-3 formatted version of the original viber1/indian-law-dataset.
| Property | Value |
|---|---|
| Total Examples | 24,607 |
| Train Split | 23,377 (95%) |
| Eval Split | 1,230 (5%) |
| Domain | Indian Law |
| Topics | IPC, CrPC, Constitution, Civil Procedure, Family Law |
| Language | English |
Attribution: Original Q&A content from
viber1/indian-law-dataset(Apache 2.0). Formatted with Phi-3 chat template for this project.
π Related Resources
| Resource | Link |
|---|---|
| π€ LoRA Adapter (this repo) | tejasgowda05/Kanoonu-AI-Phi3-Finetuned |
| π€ GGUF Model | tejasgowda05/Kanoonu-AI-Phi3-GGUF |
| π€ Formatted Dataset | tejasgowda05/Indian-Kanoonu-dataset |
| π¦ Base Model | microsoft/Phi-3-mini-4k-instruct |
| π¦ Original Dataset | viber1/indian-law-dataset |
β οΈ Limitations & Disclaimer
- This model is intended for educational and informational purposes only
- It is not a substitute for professional legal advice
- Always consult a qualified lawyer for legal matters
- The model may occasionally produce inaccurate or outdated legal information
- Coverage is primarily focused on Indian national law; state-specific laws may not be fully covered
π€ Author
Tejas Gowda N β tejasgowda05
Built as part of the Kanoonu AI project β making Indian legal information accessible through conversational AI.
π License
This model is released under the Apache 2.0 License, inherited from the base model and original dataset.
π Citation
If you use this model in your work, please cite:
@misc{tejasgowda2026kanoonuai,
author = {Tejas Gowda N},
title = {Kanoonu AI: Phi-3 Fine-tuned on Indian Law},
year = {2026},
publisher = {HuggingFace},
url = {https://huggingface.co/tejasgowda05/Kanoonu-AI-Phi3-Finetuned}
}