βš–οΈ Kanoonu AI β€” Phi-3 Fine-tuned on Indian Law

A domain-specific legal assistant fine-tuned on Indian law, Constitution, IPC, and CrPC

Model GGUF Dataset License


πŸ“– Overview

Kanoonu AI (ΰ€•ΰ€Ύΰ€¨ΰ₯‚ΰ€¨ β€” Hindi for "Law") is a fine-tuned version of microsoft/Phi-3-mini-4k-instruct specialised for Indian legal domain question answering.

It was trained using QLoRA (Quantized Low-Rank Adaptation) on 23,370 Indian law Q&A pairs, covering the Indian Penal Code (IPC), Code of Criminal Procedure (CrPC), Constitution of India, and other statutes.

"Kanoonu AI" is part of a larger project to make Indian legal information accessible to everyone through conversational AI.


πŸ† Training Results

Metric Value
Final Train Loss 0.3478
Best Eval Loss 0.6568
Training Examples 23,370
Training Steps 731
Epochs 1
Trainable Parameters 29,884,416 (0.78%)

A final train loss of 0.3478 indicates excellent domain adaptation with no signs of overfitting.


πŸš€ Quick Start

Option 1 β€” Load LoRA Adapter (requires base model)

from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

# Load base model + LoRA adapter
base_model = AutoModelForCausalLM.from_pretrained(
    "microsoft/Phi-3-mini-4k-instruct",
    torch_dtype  = torch.float16,
    device_map   = "auto",
)
tokenizer = AutoTokenizer.from_pretrained("tejasgowda05/Kanoonu-AI-Phi3-Finetuned")
model     = PeftModel.from_pretrained(base_model, "tejasgowda05/Kanoonu-AI-Phi3-Finetuned")

# Inference
def ask_kanoonu(question):
    prompt = f"""<|system|>
You are Kanoonu AI, an expert Indian legal assistant created by Tejas Gowda.
Answer questions clearly and accurately based on Indian law, the Constitution
of India, IPC, CrPC, and other relevant statutes.<|end|>
<|user|>
{question}<|end|>
<|assistant|>
"""
    inputs  = tokenizer(prompt, return_tensors="pt").to("cuda")
    outputs = model.generate(**inputs, max_new_tokens=200, do_sample=False)
    return tokenizer.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True)

print(ask_kanoonu("What is the difference between IPC and CrPC?"))

Option 2 β€” GGUF with Ollama (Recommended for local deployment)

# Download the GGUF model
huggingface-cli download tejasgowda05/Kanoonu-AI-Phi3-GGUF --local-dir ./kanoonu_model

# Create Modelfile
echo 'FROM ./kanoonu_model/kanoonu-ai-phi3-q4_k_m.gguf
SYSTEM "You are Kanoonu AI, an expert Indian legal assistant created by Tejas Gowda."' > Modelfile

# Run with Ollama
ollama create kanoonu-ai -f Modelfile
ollama run kanoonu-ai "What are the fundamental rights in the Indian Constitution?"

Option 3 β€” GGUF with llama-cpp-python

from llama_cpp import Llama

llm = Llama(model_path="./kanoonu_model/kanoonu-ai-phi3-q4_k_m.gguf")
response = llm("What is an FIR and how is it filed in India?")
print(response["choices"][0]["text"])

πŸ—οΈ Model Architecture & Training

Base Model

  • Model: microsoft/Phi-3-mini-4k-instruct (3.8B parameters)
  • Architecture: Phi-3 / Mistral-based transformer

Fine-tuning Method β€” QLoRA

Parameter Value
Method QLoRA (4-bit quantization + LoRA)
LoRA Rank (r) 16
LoRA Alpha 32
LoRA Dropout 0.05
Target Modules q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
Trainable Parameters 29,884,416 (0.78% of total)

Training Configuration

Parameter Value
Platform Kaggle GPU T4
Batch Size (per device) 2
Gradient Accumulation 16 steps
Effective Batch Size 32
Learning Rate 2e-4
LR Scheduler Cosine decay
Warmup Ratio 0.05
Optimizer AdamW 8-bit
Max Sequence Length 512
Sequence Packing Enabled
Precision FP16
Total Training Time ~205 minutes

Prompt Template (Phi-3 Native Format)

<|system|>
You are Kanoonu AI, an expert Indian legal assistant...<|end|>
<|user|>
{question}<|end|>
<|assistant|>
{answer}<|end|>

πŸ“š Dataset

This model was trained on tejasgowda05/Indian-Kanoonu-Dataset β€” a Phi-3 formatted version of the original viber1/indian-law-dataset.

Property Value
Total Examples 24,607
Train Split 23,377 (95%)
Eval Split 1,230 (5%)
Domain Indian Law
Topics IPC, CrPC, Constitution, Civil Procedure, Family Law
Language English

Attribution: Original Q&A content from viber1/indian-law-dataset (Apache 2.0). Formatted with Phi-3 chat template for this project.


πŸ”— Related Resources

Resource Link
πŸ€— LoRA Adapter (this repo) tejasgowda05/Kanoonu-AI-Phi3-Finetuned
πŸ€— GGUF Model tejasgowda05/Kanoonu-AI-Phi3-GGUF
πŸ€— Formatted Dataset tejasgowda05/Indian-Kanoonu-dataset
πŸ“¦ Base Model microsoft/Phi-3-mini-4k-instruct
πŸ“¦ Original Dataset viber1/indian-law-dataset

⚠️ Limitations & Disclaimer

  • This model is intended for educational and informational purposes only
  • It is not a substitute for professional legal advice
  • Always consult a qualified lawyer for legal matters
  • The model may occasionally produce inaccurate or outdated legal information
  • Coverage is primarily focused on Indian national law; state-specific laws may not be fully covered

πŸ‘€ Author

Tejas Gowda N β€” tejasgowda05

Built as part of the Kanoonu AI project β€” making Indian legal information accessible through conversational AI.


πŸ“„ License

This model is released under the Apache 2.0 License, inherited from the base model and original dataset.


πŸ“ Citation

If you use this model in your work, please cite:

@misc{tejasgowda2026kanoonuai,
  author    = {Tejas Gowda N},
  title     = {Kanoonu AI: Phi-3 Fine-tuned on Indian Law},
  year      = {2026},
  publisher = {HuggingFace},
  url       = {https://huggingface.co/tejasgowda05/Kanoonu-AI-Phi3-Finetuned}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for tejasgowda05/Kanoonu-AI-Phi3-Finetuned

Finetuned
(497)
this model
Quantizations
1 model

Dataset used to train tejasgowda05/Kanoonu-AI-Phi3-Finetuned