Gemma 2B NIRF Lookup 2025 - GGUF Version 2

Overview

This repository contains GGUF-converted versions of the coderop12/gemma2b-nirf-lookup-2025 model, optimized for efficient inference with llama.cpp and compatible frameworks.

Model Details

Base Model: google/gemma-2-2b-it
Fine-tuning: QLoRA (4-bit) on NIRF 2025 institutional data
Specialty: Indian higher education institutional ranking lookups
Training Data: 100 NIRF 2025 lookup samples
Conversion: HuggingFace → GGUF format

Files Included

gemma2b-nirf-lookup-2025-f16.gguf (4.88 GB) - Original FP16 precision
gemma2b-nirf-lookup-2025-q4_k_m.gguf (1.59 GB) - Q4_K_M quantized (recommended)

Quick Start

Option 1: llama.cpp

# Clone llama.cpp
git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp

# Build
cmake -B build
cmake --build build --config Release

# Run inference
./build/bin/llama-cli \
    --model gemma2b-nirf-lookup-2025-q4_k_m.gguf \
    --prompt "What is the ranking of IIT Madras in NIRF 2025?" \
    --n-predict 100 \
    --temp 0.7
Option 2: Ollama
bash# Create Modelfile
echo 'FROM ./gemma2b-nirf-lookup-2025-q4_k_m.gguf' > Modelfile

# Import model
ollama create nirf-lookup -f Modelfile

# Run
ollama run nirf-lookup "What is the ranking of IIT Delhi in NIRF 2025?"
Option 3: Python with llama-cpp-python
pythonfrom llama_cpp import Llama

# Load model
llm = Llama(model_path="gemma2b-nirf-lookup-2025-q4_k_m.gguf")

# Generate response
output = llm("What is the ranking of IIT Bombay in NIRF 2025?", 
             max_tokens=100, temperature=0.7)
print(output['choices'][0]['text'])
Sample Queries

"What is the ranking of IIT Madras in NIRF 2025?"
"Which engineering college ranks #2 in NIRF 2025?"
"Tell me about the top 3 universities in NIRF 2025 overall ranking"
"What is the NIRF score of IIT Delhi in 2025?"

Expected Output Format
The model provides structured responses with:

Institution ranking and score
Source references (e.g., [NIRF2025-OVERALL-IR-O-U-0456])
Additional contextual information

Performance

Q4_K_M Version: ~12 tokens/second on T4 GPU
Memory Usage: ~2GB VRAM for Q4_K_M version
Quality: Minimal degradation from original model

Technical Specifications

Architecture: Gemma2ForCausalLM
Parameters: 2.61B
Context Length: 8192 tokens
Quantization: Q4_K_M (recommended) / FP16 (maximum quality)

Hardware Recommendations

CPU: 4+ cores, 8GB+ RAM
GPU: T4/RTX 3060 or better for optimal performance
Storage: 2GB+ free space

Conversion Process

Downloaded base fine-tuned model from HuggingFace
Converted to GGUF F16 format using llama.cpp
Quantized to Q4_K_M for optimal size/quality balance
Validated functionality with NIRF-specific queries

License
This model derivative follows Google's Gemma Terms of Use. See original base model license.
Citation
If you use this model, please cite:
@misc{gemma2b-nirf-gguf-v2,
  title={Gemma 2B NIRF Lookup 2025 - GGUF Version 2},
  author={coderop12},
  year={2025},
  url={https://huggingface.co/coderop12/gemma2b-nirf-lookup-2025-gguf-v2}
}
Limitations

Specialized for NIRF 2025 data only
Limited training dataset (100 samples)
May not generalize to other ranking systems
Verify critical information against official NIRF sources

Support
For issues or questions, please open an issue in this repository.

Downloads last month: 3

GGUF

Model size

3B params

Architecture

gemma2

Hardware compatibility

4-bit

16-bit

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support