File size: 2,766 Bytes
9f04aef
146c055
 
9f04aef
146c055
 
60b70a5
146c055
 
c013a11
146c055
 
9f04aef
 
395d8b2
4b55b77
f68da05
 
4b55b77
f68da05
 
 
 
4b55b77
 
f68da05
4b55b77
 
60b70a5
 
9f04aef
146c055
60b70a5
f68da05
 
 
 
 
 
146c055
f68da05
 
 
 
 
 
60b70a5
4b55b77
f68da05
 
146c055
f68da05
 
 
 
 
 
 
 
 
 
 
 
4b55b77
f68da05
 
 
 
4b55b77
 
eb1f0a2
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
---
base_model: google/gemma-2b-it
language: en
library_name: transformers
license: apache-2.0
pipeline_tag: text-generation
tags:
- precision-grounding
- document-qa
- zero-hallucination
- legal-tech
- technical-analysis
---

# πŸ“‚ Solvrays Finetuned Pdf - Document AI

## 🌟 Model Overview
This model is a high-precision fine-tuning of **google/gemma-2b-it**, specifically architected for **Zero-Hallucination Technical Retrieval**. It has been trained on a proprietary dataset of technical and architectural documentation to ensure deep contextual grounding.

### πŸš€ Key Capabilities
- **Technical Grounding**: Prioritizes factual documentation over generative speculation.
- **Chunk-Aware Memory**: Optimized for overlapping document segments (256-token window).
- **Deterministic Precision**: Best used with `do_sample=False` for architectural accuracy.

## πŸ’» Professional Implementation
The model requires specific prompt construction to trigger its 'Knowledge Retrieval' mode:

```python
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model_id = 'solvrays/solvrays-finetuned-pdf'
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id, 
    device_map='auto', 
    torch_dtype=torch.bfloat16, 
    quantization_config={'load_in_4bit': True}
)

def query_model(user_query):
    # High-Precision Retrieval Template
    prompt = f'### Knowledge Retrieval Content: {user_query}\n### Verified Response: '
    inputs = tokenizer(prompt, return_tensors='pt').to(model.device)
    outputs = model.generate(**inputs, max_new_tokens=512, do_sample=False)
    return tokenizer.decode(outputs[0], skip_special_tokens=True).split('### Verified Response:')[-1].strip()
```

## πŸ“Š Technical Specifications
| Feature | Configuration |
| :--- | :--- |
| **Base Model** | google/gemma-2b-it |
| **Precision** | BrainFloat16 (BF16) |
| **Fine-tuning** | QLoRA (4-bit Normalized Float) |
| **LoRA Rank (r)** | 16 |
| **LoRA Alpha** | 32 |
| **Target Modules** | q, k, v, o, gate, up, down |
| **Training Epochs** | 25 |

## πŸ›  Training Environment
- **Hardware**: NVIDIA L4 x 2 (Dual GPU Architecture)
- **Optimizer**: Paged AdamW 8-bit
- **Context Length**: 256 tokens per block

## ⚠️ Constraints & Risk Mitigation
- **Out-of-Scope**: This model is not intended for general conversation or creative writing. It is a specialized document analyst.
- **Hallucination Control**: If information is not present in the internal weights, the model is trained to state 'Not Documented' or provide an empty response for verification.
- **Numerical Accuracy**: Always cross-verify critical measurements with original PDF source material.

---
**Senior AI Architect & Developer**: Solvrays