File size: 2,731 Bytes
c7ac6bf
d95e66f
 
 
 
4b4d83c
 
 
 
 
 
d95e66f
4b4d83c
d95e66f
c7ac6bf
f1b6942
d95e66f
f1b6942
d95e66f
 
f1b6942
d95e66f
 
 
 
 
f1b6942
d95e66f
 
f1b6942
d95e66f
 
 
122b11c
d95e66f
 
 
 
 
 
 
 
122b11c
d95e66f
 
122b11c
d95e66f
 
122b11c
d95e66f
 
122b11c
d95e66f
 
 
 
 
 
 
122b11c
d95e66f
 
122b11c
d95e66f
 
122b11c
d95e66f
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
---
license: apache-2.0
library_name: transformers
base_model: google/gemma-2b
tags:
  - text-generation
  - standalone
  - merged-weights
  - pdf-optimized
  - gemma
  - vision-guided-training
language:
  - en
pipeline_tag: text-generation
---

# πŸš€ Solvrays Finetuned Pdf (Standalone Merged Weight)

## 🌟 Overview
This model is a high-performance, standalone version of **Gemma 2B**, meticulously fine-tuned for **complex document understanding and technical metadata extraction**. Unlike standard PEFT adapters, this version features **merged weights**, enabling seamless integration into production pipelines without the overhead of loading separate adapter layers.

### πŸ›  Key Features
- **Zero-Overhead Inference**: Merged weights allow loading as a native CausalLM.
- **Document Intelligence**: Fine-tuned on technical PDF structures, including infrastructure guides and architectural documentation.
- **Vision-Guided Data Pipeline**: Trained on text recovered through a hybrid Digital/OCR pipeline for maximum data fidelity.
- **Optimized Context**: Tailored for high-precision extraction and summary tasks from technical corpora.

## πŸ’» Quick Start (Inference)
You can deploy this model using standard Hugging Face `transformers` logic.

```python
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model_id = "singtan/solvrays-finetuned-pdf"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id, 
    device_map="auto", 
    torch_dtype=torch.float16, 
    trust_remote_code=True
)

prompt = "Analyze the provided technical documentation and summarize the key infrastructure recommendations."
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

with torch.no_grad():
    outputs = model.generate(**inputs, max_new_tokens=256, temperature=0.7, top_p=0.9)

print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```

## πŸ“Š Training Specifications
- **Base Model**: google/gemma-2b
- **Training Strategy**: QLoRA (4-bit quantization) followed by FP16 weight merging.
- **Final Loss Performance**: N/A
- **Learning Rate**: 0.0001
- **Epochs**: 3
- **Hardware**: Optimized for NVIDIA L4/V100/H100 environments.

## ⚠️ Limitations & Bias
While optimized for technical documentation, this model remains a generative LLM and may produce hallucinations if the input context is missing or highly ambiguous. It is recommended to use **Retrieval-Augmented Generation (RAG)** or **strict prompting** for mission-critical data extraction.

## πŸ“œ License
This model follows the **Apache-2.0** license. Usage must adhere to the Google Gemma Prohibited Use Policy.

---
**Fine-tuned and Merged by Bibek Lama Singtan**