---
base_model: google/gemma-2b-it
language: en
library_name: transformers
license: apache-2.0
pipeline_tag: text-generation
tags:
- precision-grounding
- document-qa
- zero-hallucination
- legal-tech
- technical-analysis
---

# 📂 Solvrays Finetuned Pdf - Document AI

## 🌟 Model Overview
This model is a high-precision fine-tuning of **google/gemma-2b-it**, specifically architected for **Zero-Hallucination Technical Retrieval**. It has been trained on a proprietary dataset of technical and architectural documentation to ensure deep contextual grounding.

### 🚀 Key Capabilities
- **Technical Grounding**: Prioritizes factual documentation over generative speculation.
- **Chunk-Aware Memory**: Optimized for overlapping document segments (256-token window).
- **Deterministic Precision**: Best used with `do_sample=False` for architectural accuracy.

## 💻 Professional Implementation
The model requires specific prompt construction to trigger its 'Knowledge Retrieval' mode:

```python
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model_id = 'solvrays/solvrays-finetuned-pdf'
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id, 
    device_map='auto', 
    torch_dtype=torch.bfloat16, 
    quantization_config={'load_in_4bit': True}
)

def query_model(user_query):
    # High-Precision Retrieval Template
    prompt = f'### Knowledge Retrieval Content: {user_query}\n### Verified Response: '
    inputs = tokenizer(prompt, return_tensors='pt').to(model.device)
    outputs = model.generate(**inputs, max_new_tokens=512, do_sample=False)
    return tokenizer.decode(outputs[0], skip_special_tokens=True).split('### Verified Response:')[-1].strip()
```

## 📊 Technical Specifications
| Feature | Configuration |
| :--- | :--- |
| **Base Model** | google/gemma-2b-it |
| **Precision** | BrainFloat16 (BF16) |
| **Fine-tuning** | QLoRA (4-bit Normalized Float) |
| **LoRA Rank (r)** | 16 |
| **LoRA Alpha** | 32 |
| **Target Modules** | q, k, v, o, gate, up, down |
| **Training Epochs** | 25 |

## 🛠 Training Environment
- **Hardware**: NVIDIA L4 x 2 (Dual GPU Architecture)
- **Optimizer**: Paged AdamW 8-bit
- **Context Length**: 256 tokens per block

## ⚠️ Constraints & Risk Mitigation
- **Out-of-Scope**: This model is not intended for general conversation or creative writing. It is a specialized document analyst.
- **Hallucination Control**: If information is not present in the internal weights, the model is trained to state 'Not Documented' or provide an empty response for verification.
- **Numerical Accuracy**: Always cross-verify critical measurements with original PDF source material.

---
**Senior AI Architect & Developer**: Solvrays