⚖️ Kanoonu AI — Phi-3 Fine-tuned on Indian Law

A domain-specific legal assistant fine-tuned on Indian law, Constitution, IPC, and CrPC

📖 Overview

Kanoonu AI (कानून — Hindi for "Law") is a fine-tuned version of microsoft/Phi-3-mini-4k-instruct specialised for Indian legal domain question answering.

It was trained using QLoRA (Quantized Low-Rank Adaptation) on 23,370 Indian law Q&A pairs, covering the Indian Penal Code (IPC), Code of Criminal Procedure (CrPC), Constitution of India, and other statutes.

"Kanoonu AI" is part of a larger project to make Indian legal information accessible to everyone through conversational AI.

🏆 Training Results

Metric	Value
Final Train Loss	0.3478
Best Eval Loss	0.6568
Training Examples	23,370
Training Steps	731
Epochs	1
Trainable Parameters	29,884,416 (0.78%)

A final train loss of 0.3478 indicates excellent domain adaptation with no signs of overfitting.

🚀 Quick Start

Option 1 — Load LoRA Adapter (requires base model)

from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

# Load base model + LoRA adapter
base_model = AutoModelForCausalLM.from_pretrained(
    "microsoft/Phi-3-mini-4k-instruct",
    torch_dtype  = torch.float16,
    device_map   = "auto",
)
tokenizer = AutoTokenizer.from_pretrained("tejasgowda05/Kanoonu-AI-Phi3-Finetuned")
model     = PeftModel.from_pretrained(base_model, "tejasgowda05/Kanoonu-AI-Phi3-Finetuned")

# Inference
def ask_kanoonu(question):
    prompt = f"""<|system|>
You are Kanoonu AI, an expert Indian legal assistant created by Tejas Gowda.
Answer questions clearly and accurately based on Indian law, the Constitution
of India, IPC, CrPC, and other relevant statutes.<|end|>
<|user|>
{question}<|end|>
<|assistant|>
"""
    inputs  = tokenizer(prompt, return_tensors="pt").to("cuda")
    outputs = model.generate(**inputs, max_new_tokens=200, do_sample=False)
    return tokenizer.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True)

print(ask_kanoonu("What is the difference between IPC and CrPC?"))

Option 2 — GGUF with Ollama (Recommended for local deployment)

# Download the GGUF model
huggingface-cli download tejasgowda05/Kanoonu-AI-Phi3-GGUF --local-dir ./kanoonu_model

# Create Modelfile
echo 'FROM ./kanoonu_model/kanoonu-ai-phi3-q4_k_m.gguf
SYSTEM "You are Kanoonu AI, an expert Indian legal assistant created by Tejas Gowda."' > Modelfile

# Run with Ollama
ollama create kanoonu-ai -f Modelfile
ollama run kanoonu-ai "What are the fundamental rights in the Indian Constitution?"

Option 3 — GGUF with llama-cpp-python

from llama_cpp import Llama

llm = Llama(model_path="./kanoonu_model/kanoonu-ai-phi3-q4_k_m.gguf")
response = llm("What is an FIR and how is it filed in India?")
print(response["choices"][0]["text"])

🏗️ Model Architecture & Training

Base Model

Model: microsoft/Phi-3-mini-4k-instruct (3.8B parameters)
Architecture: Phi-3 / Mistral-based transformer

Fine-tuning Method — QLoRA

Parameter	Value
Method	QLoRA (4-bit quantization + LoRA)
LoRA Rank (r)	16
LoRA Alpha	32
LoRA Dropout	0.05
Target Modules	q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
Trainable Parameters	29,884,416 (0.78% of total)

Training Configuration

Parameter	Value
Platform	Kaggle GPU T4
Batch Size (per device)	2
Gradient Accumulation	16 steps
Effective Batch Size	32
Learning Rate	2e-4
LR Scheduler	Cosine decay
Warmup Ratio	0.05
Optimizer	AdamW 8-bit
Max Sequence Length	512
Sequence Packing	Enabled
Precision	FP16
Total Training Time	~205 minutes

Prompt Template (Phi-3 Native Format)

<|system|>
You are Kanoonu AI, an expert Indian legal assistant...<|end|>
<|user|>
{question}<|end|>
<|assistant|>
{answer}<|end|>

📚 Dataset

This model was trained on tejasgowda05/Indian-Kanoonu-Dataset — a Phi-3 formatted version of the original viber1/indian-law-dataset.

Property	Value
Total Examples	24,607
Train Split	23,377 (95%)
Eval Split	1,230 (5%)
Domain	Indian Law
Topics	IPC, CrPC, Constitution, Civil Procedure, Family Law
Language	English

Attribution: Original Q&A content from viber1/indian-law-dataset (Apache 2.0). Formatted with Phi-3 chat template for this project.

🔗 Related Resources

Resource	Link
🤗 LoRA Adapter (this repo)	tejasgowda05/Kanoonu-AI-Phi3-Finetuned
🤗 GGUF Model	tejasgowda05/Kanoonu-AI-Phi3-GGUF
🤗 Formatted Dataset	tejasgowda05/Indian-Kanoonu-dataset
📦 Base Model	microsoft/Phi-3-mini-4k-instruct
📦 Original Dataset	viber1/indian-law-dataset

⚠️ Limitations & Disclaimer

This model is intended for educational and informational purposes only
It is not a substitute for professional legal advice
Always consult a qualified lawyer for legal matters
The model may occasionally produce inaccurate or outdated legal information
Coverage is primarily focused on Indian national law; state-specific laws may not be fully covered

👤 Author

Tejas Gowda N — tejasgowda05

Built as part of the Kanoonu AI project — making Indian legal information accessible through conversational AI.

📄 License

This model is released under the Apache 2.0 License, inherited from the base model and original dataset.

📝 Citation

If you use this model in your work, please cite:

@misc{tejasgowda2026kanoonuai,
  author    = {Tejas Gowda N},
  title     = {Kanoonu AI: Phi-3 Fine-tuned on Indian Law},
  year      = {2026},
  publisher = {HuggingFace},
  url       = {https://huggingface.co/tejasgowda05/Kanoonu-AI-Phi3-Finetuned}
}

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for tejasgowda05/Kanoonu-AI-Phi3-Finetuned

Base model

microsoft/Phi-3-mini-4k-instruct

Finetuned

(853)

this model

Quantizations

1 model