Instructions to use parthashirolkar/LegalParam-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use parthashirolkar/LegalParam-GGUF with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="parthashirolkar/LegalParam-GGUF", filename="legalparam-Q4_K_M-fixed.gguf", )
llm.create_chat_completion( messages = [ { "role": "user", "content": "What is the capital of France?" } ] ) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- llama.cpp
How to use parthashirolkar/LegalParam-GGUF with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf parthashirolkar/LegalParam-GGUF:Q4_K_M # Run inference directly in the terminal: llama-cli -hf parthashirolkar/LegalParam-GGUF:Q4_K_M
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf parthashirolkar/LegalParam-GGUF:Q4_K_M # Run inference directly in the terminal: llama-cli -hf parthashirolkar/LegalParam-GGUF:Q4_K_M
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf parthashirolkar/LegalParam-GGUF:Q4_K_M # Run inference directly in the terminal: ./llama-cli -hf parthashirolkar/LegalParam-GGUF:Q4_K_M
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf parthashirolkar/LegalParam-GGUF:Q4_K_M # Run inference directly in the terminal: ./build/bin/llama-cli -hf parthashirolkar/LegalParam-GGUF:Q4_K_M
Use Docker
docker model run hf.co/parthashirolkar/LegalParam-GGUF:Q4_K_M
- LM Studio
- Jan
- vLLM
How to use parthashirolkar/LegalParam-GGUF with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "parthashirolkar/LegalParam-GGUF" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "parthashirolkar/LegalParam-GGUF", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/parthashirolkar/LegalParam-GGUF:Q4_K_M
- Ollama
How to use parthashirolkar/LegalParam-GGUF with Ollama:
ollama run hf.co/parthashirolkar/LegalParam-GGUF:Q4_K_M
- Unsloth Studio
How to use parthashirolkar/LegalParam-GGUF with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for parthashirolkar/LegalParam-GGUF to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for parthashirolkar/LegalParam-GGUF to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for parthashirolkar/LegalParam-GGUF to start chatting
- Docker Model Runner
How to use parthashirolkar/LegalParam-GGUF with Docker Model Runner:
docker model run hf.co/parthashirolkar/LegalParam-GGUF:Q4_K_M
- Lemonade
How to use parthashirolkar/LegalParam-GGUF with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull parthashirolkar/LegalParam-GGUF:Q4_K_M
Run and chat with the model
lemonade run user.LegalParam-GGUF-Q4_K_M
List all available models
lemonade list
LegalParam GGUF Models
GGUF quantized versions of bharatgenai/LegalParam for use with Ollama.
Model Information
Original Model: bharatgenai/LegalParam
- Architecture: ParamBharatGen (LLaMA-based)
- Parameters: 2.9B
- Context Length: 2048 tokens
- Purpose: Specialized AI assistant for Indian law
Available Quantizations
| Quantization | File Size | Description | Use Case |
|---|---|---|---|
| Q4_K_M | 1.7GB | 4-bit quantized | Recommended for most use cases |
| Q6_K | 2.2GB | 6-bit quantized | Higher quality, moderate resource usage |
| F16 | 5.4GB | 16-bit float (no quantization) | Highest quality, requires more memory |
Quick Start
1. Install Ollama
curl -fsSL https://ollama.com/install.sh | sh
2. Create the Model
Choose a quantization level:
# Q4_K_M (Recommended - 1.7GB)
ollama create legalparam:q4 -f Modelfile
# Q6_K (Higher quality - 2.2GB)
ollama create legalparam:q6 -f Modelfile-q6
# F16 (Highest quality - 5.4GB)
ollama create legalparam:f16 -f Modelfile-f16
3. Run the Model
# Interactive chat
ollama run legalparam:q4
# Single query
ollama run legalparam:q4 "What steps should a farmer take to legally transfer agricultural land ownership?"
Python Usage
from ollama import Client
client = Client()
response = client.chat(model='legalparam:q4', messages=[
{'role': 'user', 'content': 'What are the fundamental rights in the Indian Constitution?'}
])
print(response['message']['content'])
Model File Details
All Modelfiles include:
- Correct chat template matching the tokenizer's format
- Stop tokens (
</s>,<user>,<assistant>) to prevent infinite generation loops - Optimized parameters for legal question answering
Chat Template Format
<user>
{user_message}
<assistant>
{assistant_response}
Context Window
- Default: 2048 tokens (combined input + output)
- Scaling: Can be extended with RoPE scaling in Ollama (experimental)
Example Queries
The model excels at Indian legal queries:
- "Explain the First Amendment of the Indian Constitution"
- "What is the procedure for filing a civil suit in India?"
- "What are the key provisions of the Land Acquisition Act?"
- "Explain the concept of judicial review in India"
- "What are the powers of the Supreme Court of India?"
Technical Specifications
Model Architecture
- Hidden size: 2048
- Layers: 32
- Attention heads: 16
- KV heads: 8 (Grouped Query Attention)
- Vocabulary: 256,006 tokens
Special Tokens
<s>: Beginning of sequence (BOS)</s>: End of sequence (EOS)<user>: User message marker<assistant>: Assistant message marker
Limitations
- Context limited to 2048 tokens
- Training data cutoff: August 2023
- Optimized for Indian law queries
- May not perform well on non-legal topics
Original Model
This is a quantized version of bharatgenai/LegalParam. For the original PyTorch model, training details, and full documentation, please refer to the original repository.
License
Please refer to the original model repository for licensing information.
Conversion Process
These models were converted from the original HuggingFace format to GGUF using llama.cpp with the following process:
- Loaded original model with transformers
- Converted to GGUF format
- Quantized to Q4_K_M, Q6_K, and F16 precision
- Validated with Ollama inference engine
Troubleshooting
Model repeats or loops
- Ensure you're using the provided Modelfiles
- Stop tokens are pre-configured to prevent infinite loops
Out of memory errors
- Try a smaller quantization (Q4_K_M instead of Q6_K)
- Reduce
num_ctxparameter in Ollama
Poor quality responses
- Try F16 quantization for highest quality
- Ensure proper prompt formatting with
<user>and<assistant>tags
Acknowledgments
- Original model: bharatgenai/LegalParam
- GGUF conversion: llama.cpp
- Inference engine: Ollama
- Downloads last month
- 8
4-bit
6-bit
Model tree for parthashirolkar/LegalParam-GGUF
Base model
bharatgenai/LegalParam