--- base_model: google/gemma-2b-it language: en library_name: transformers license: apache-2.0 pipeline_tag: text-generation tags: - precision-grounding - document-qa - zero-hallucination - legal-tech - technical-analysis --- # 📂 Solvrays Finetuned Pdf - Document AI ## 🌟 Model Overview This model is a high-precision fine-tuning of **google/gemma-2b-it**, specifically architected for **Zero-Hallucination Technical Retrieval**. It has been trained on a proprietary dataset of technical and architectural documentation to ensure deep contextual grounding. ### 🚀 Key Capabilities - **Technical Grounding**: Prioritizes factual documentation over generative speculation. - **Chunk-Aware Memory**: Optimized for overlapping document segments (256-token window). - **Deterministic Precision**: Best used with `do_sample=False` for architectural accuracy. ## 💻 Professional Implementation The model requires specific prompt construction to trigger its 'Knowledge Retrieval' mode: ```python from transformers import AutoTokenizer, AutoModelForCausalLM import torch model_id = 'solvrays/solvrays-finetuned-pdf' tokenizer = AutoTokenizer.from_pretrained(model_id) model = AutoModelForCausalLM.from_pretrained( model_id, device_map='auto', torch_dtype=torch.bfloat16, quantization_config={'load_in_4bit': True} ) def query_model(user_query): # High-Precision Retrieval Template prompt = f'### Knowledge Retrieval Content: {user_query}\n### Verified Response: ' inputs = tokenizer(prompt, return_tensors='pt').to(model.device) outputs = model.generate(**inputs, max_new_tokens=512, do_sample=False) return tokenizer.decode(outputs[0], skip_special_tokens=True).split('### Verified Response:')[-1].strip() ``` ## 📊 Technical Specifications | Feature | Configuration | | :--- | :--- | | **Base Model** | google/gemma-2b-it | | **Precision** | BrainFloat16 (BF16) | | **Fine-tuning** | QLoRA (4-bit Normalized Float) | | **LoRA Rank (r)** | 16 | | **LoRA Alpha** | 32 | | **Target Modules** | q, k, v, o, gate, up, down | | **Training Epochs** | 25 | ## 🛠 Training Environment - **Hardware**: NVIDIA L4 x 2 (Dual GPU Architecture) - **Optimizer**: Paged AdamW 8-bit - **Context Length**: 256 tokens per block ## ⚠️ Constraints & Risk Mitigation - **Out-of-Scope**: This model is not intended for general conversation or creative writing. It is a specialized document analyst. - **Hallucination Control**: If information is not present in the internal weights, the model is trained to state 'Not Documented' or provide an empty response for verification. - **Numerical Accuracy**: Always cross-verify critical measurements with original PDF source material. --- **Senior AI Architect & Developer**: Solvrays