singtan's picture
Upload README.md with huggingface_hub
395d8b2 verified
---
base_model: google/gemma-2b-it
language: en
library_name: transformers
license: apache-2.0
pipeline_tag: text-generation
tags:
- precision-grounding
- document-qa
- zero-hallucination
- legal-tech
- technical-analysis
---
# πŸ“‚ Solvrays Finetuned Pdf - Document AI
## 🌟 Model Overview
This model is a high-precision fine-tuning of **google/gemma-2b-it**, specifically architected for **Zero-Hallucination Technical Retrieval**. It has been trained on a proprietary dataset of technical and architectural documentation to ensure deep contextual grounding.
### πŸš€ Key Capabilities
- **Technical Grounding**: Prioritizes factual documentation over generative speculation.
- **Chunk-Aware Memory**: Optimized for overlapping document segments (256-token window).
- **Deterministic Precision**: Best used with `do_sample=False` for architectural accuracy.
## πŸ’» Professional Implementation
The model requires specific prompt construction to trigger its 'Knowledge Retrieval' mode:
```python
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
model_id = 'solvrays/solvrays-finetuned-pdf'
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
model_id,
device_map='auto',
torch_dtype=torch.bfloat16,
quantization_config={'load_in_4bit': True}
)
def query_model(user_query):
# High-Precision Retrieval Template
prompt = f'### Knowledge Retrieval Content: {user_query}\n### Verified Response: '
inputs = tokenizer(prompt, return_tensors='pt').to(model.device)
outputs = model.generate(**inputs, max_new_tokens=512, do_sample=False)
return tokenizer.decode(outputs[0], skip_special_tokens=True).split('### Verified Response:')[-1].strip()
```
## πŸ“Š Technical Specifications
| Feature | Configuration |
| :--- | :--- |
| **Base Model** | google/gemma-2b-it |
| **Precision** | BrainFloat16 (BF16) |
| **Fine-tuning** | QLoRA (4-bit Normalized Float) |
| **LoRA Rank (r)** | 16 |
| **LoRA Alpha** | 32 |
| **Target Modules** | q, k, v, o, gate, up, down |
| **Training Epochs** | 25 |
## πŸ›  Training Environment
- **Hardware**: NVIDIA L4 x 2 (Dual GPU Architecture)
- **Optimizer**: Paged AdamW 8-bit
- **Context Length**: 256 tokens per block
## ⚠️ Constraints & Risk Mitigation
- **Out-of-Scope**: This model is not intended for general conversation or creative writing. It is a specialized document analyst.
- **Hallucination Control**: If information is not present in the internal weights, the model is trained to state 'Not Documented' or provide an empty response for verification.
- **Numerical Accuracy**: Always cross-verify critical measurements with original PDF source material.
---
**Senior AI Architect & Developer**: Solvrays