Instructions to use parthashirolkar/LegalParam-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use parthashirolkar/LegalParam-GGUF with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="parthashirolkar/LegalParam-GGUF",
	filename="legalparam-Q4_K_M-fixed.gguf",
)

llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": "What is the capital of France?"
		}
	]
)

Notebooks
Google Colab
Kaggle
Local Apps Settings

llama.cpp

How to use parthashirolkar/LegalParam-GGUF with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf parthashirolkar/LegalParam-GGUF:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf parthashirolkar/LegalParam-GGUF:Q4_K_M

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf parthashirolkar/LegalParam-GGUF:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf parthashirolkar/LegalParam-GGUF:Q4_K_M

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf parthashirolkar/LegalParam-GGUF:Q4_K_M
# Run inference directly in the terminal:
./llama-cli -hf parthashirolkar/LegalParam-GGUF:Q4_K_M

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf parthashirolkar/LegalParam-GGUF:Q4_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf parthashirolkar/LegalParam-GGUF:Q4_K_M

Use Docker

docker model run hf.co/parthashirolkar/LegalParam-GGUF:Q4_K_M

LM Studio
Jan

vLLM

How to use parthashirolkar/LegalParam-GGUF with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "parthashirolkar/LegalParam-GGUF"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "parthashirolkar/LegalParam-GGUF",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/parthashirolkar/LegalParam-GGUF:Q4_K_M

Ollama
How to use parthashirolkar/LegalParam-GGUF with Ollama:
```
ollama run hf.co/parthashirolkar/LegalParam-GGUF:Q4_K_M
```

Unsloth Studio

How to use parthashirolkar/LegalParam-GGUF with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for parthashirolkar/LegalParam-GGUF to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for parthashirolkar/LegalParam-GGUF to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for parthashirolkar/LegalParam-GGUF to start chatting

Docker Model Runner
How to use parthashirolkar/LegalParam-GGUF with Docker Model Runner:
```
docker model run hf.co/parthashirolkar/LegalParam-GGUF:Q4_K_M
```

Lemonade

How to use parthashirolkar/LegalParam-GGUF with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull parthashirolkar/LegalParam-GGUF:Q4_K_M

Run and chat with the model

lemonade run user.LegalParam-GGUF-Q4_K_M

List all available models

lemonade list

LegalParam GGUF Models

GGUF quantized versions of bharatgenai/LegalParam for use with Ollama.

Model Information

Original Model: bharatgenai/LegalParam

Architecture: ParamBharatGen (LLaMA-based)
Parameters: 2.9B
Context Length: 2048 tokens
Purpose: Specialized AI assistant for Indian law

Available Quantizations

Quantization	File Size	Description	Use Case
Q4_K_M	1.7GB	4-bit quantized	Recommended for most use cases
Q6_K	2.2GB	6-bit quantized	Higher quality, moderate resource usage
F16	5.4GB	16-bit float (no quantization)	Highest quality, requires more memory

Quick Start

1. Install Ollama

curl -fsSL https://ollama.com/install.sh | sh

2. Create the Model

Choose a quantization level:

# Q4_K_M (Recommended - 1.7GB)
ollama create legalparam:q4 -f Modelfile

# Q6_K (Higher quality - 2.2GB)
ollama create legalparam:q6 -f Modelfile-q6

# F16 (Highest quality - 5.4GB)
ollama create legalparam:f16 -f Modelfile-f16

3. Run the Model

# Interactive chat
ollama run legalparam:q4

# Single query
ollama run legalparam:q4 "What steps should a farmer take to legally transfer agricultural land ownership?"

Python Usage

from ollama import Client

client = Client()

response = client.chat(model='legalparam:q4', messages=[
  {'role': 'user', 'content': 'What are the fundamental rights in the Indian Constitution?'}
])

print(response['message']['content'])

Model File Details

All Modelfiles include:

Correct chat template matching the tokenizer's format
Stop tokens (</s>, <user>, <assistant>) to prevent infinite generation loops
Optimized parameters for legal question answering

Chat Template Format

<user>
{user_message}
<assistant>
{assistant_response}

Context Window

Default: 2048 tokens (combined input + output)
Scaling: Can be extended with RoPE scaling in Ollama (experimental)

Example Queries

The model excels at Indian legal queries:

"Explain the First Amendment of the Indian Constitution"
"What is the procedure for filing a civil suit in India?"
"What are the key provisions of the Land Acquisition Act?"
"Explain the concept of judicial review in India"
"What are the powers of the Supreme Court of India?"

Technical Specifications

Model Architecture

Hidden size: 2048
Layers: 32
Attention heads: 16
KV heads: 8 (Grouped Query Attention)
Vocabulary: 256,006 tokens

Special Tokens

<s>: Beginning of sequence (BOS)
</s>: End of sequence (EOS)
<user>: User message marker
<assistant>: Assistant message marker

Limitations

Context limited to 2048 tokens
Training data cutoff: August 2023
Optimized for Indian law queries
May not perform well on non-legal topics

Original Model

This is a quantized version of bharatgenai/LegalParam. For the original PyTorch model, training details, and full documentation, please refer to the original repository.

License

Please refer to the original model repository for licensing information.

Conversion Process

These models were converted from the original HuggingFace format to GGUF using llama.cpp with the following process:

Loaded original model with transformers
Converted to GGUF format
Quantized to Q4_K_M, Q6_K, and F16 precision
Validated with Ollama inference engine

Troubleshooting

Model repeats or loops

Ensure you're using the provided Modelfiles
Stop tokens are pre-configured to prevent infinite loops

Out of memory errors

Try a smaller quantization (Q4_K_M instead of Q6_K)
Reduce num_ctx parameter in Ollama

Poor quality responses

Try F16 quantization for highest quality
Ensure proper prompt formatting with <user> and <assistant> tags

Acknowledgments

Original model: bharatgenai/LegalParam
GGUF conversion: llama.cpp
Inference engine: Ollama

Downloads last month: 8

GGUF

Model size

3B params

Architecture

llama

Hardware compatibility

4-bit

6-bit

Model tree for parthashirolkar/LegalParam-GGUF

Base model

bharatgenai/LegalParam

Quantized

(1)

this model