singtan commited on
Commit
f68da05
Β·
verified Β·
1 Parent(s): 4b55b77

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +39 -32
README.md CHANGED
@@ -12,18 +12,18 @@ tags:
12
  - technical-analysis
13
  ---
14
 
15
- # πŸ“‚ Solvrays Finetuned Pdf - High Precision Document Analyst
16
 
17
- ## 🌟 Overview
18
- This model is a specialized fine-tuning of **google/gemma-2b-it**, engineered for **Zero-Hallucination Document Retrieval**. It has been optimized to handle complex, domain-specific documents (Technical, Legal, or Architectural) with strict adherence to provided context.
19
 
20
- ### πŸ“‹ Technical Objectives
21
- - **Factual Integrity**: Hardcoded propensity to avoid speculation.
22
- - **Contextual Continuity**: Overlap-aware training prevents information loss across page boundaries.
23
- - **Domain Versatility**: Seamlessly switches between technical and non-technical document styles.
24
 
25
  ## πŸ’» Professional Implementation
26
- To achieve the trained precision level, utilize the following standardized implementation:
27
 
28
  ```python
29
  from transformers import AutoTokenizer, AutoModelForCausalLM
@@ -31,34 +31,41 @@ import torch
31
 
32
  model_id = 'solvrays/solvrays-finetuned-pdf'
33
  tokenizer = AutoTokenizer.from_pretrained(model_id)
34
- model = AutoModelForCausalLM.from_pretrained(model_id, device_map='auto', torch_dtype=torch.bfloat16)
 
 
 
 
 
35
 
36
- # Universal Retrieval Template
37
- query = 'What are the infrastructure requirements?'
38
- prompt = f'### Knowledge Retrieval Content: {query}
39
- ### Verified Response: '
40
-
41
- inputs = tokenizer(prompt, return_tensors='pt').to(model.device)
42
- with torch.no_grad():
43
- outputs = model.generate(**inputs, max_new_tokens=256, do_sample=False, repetition_penalty=1.2)
44
-
45
- print(tokenizer.decode(outputs[0], skip_special_tokens=True).split('### Verified Response:')[-1].strip())
46
  ```
47
 
48
- ## πŸ“Š Model Specifications
49
- | Parameter | Configuration |
50
  | :--- | :--- |
51
- | Base Model | google/gemma-2b-it |
52
- | Fine-tuning Method | QLoRA (4-bit quantization) |
53
- | LoRA Rank (r) | 16 |
54
- | LoRA Alpha | 32 |
55
- | Learning Rate | 2e-4 |
56
- | Training Epochs | 20 |
 
 
 
 
 
 
57
 
58
- ## ⚠️ Constraints & Compliance
59
- - **Context Window**: Optimized for 256-512 token blocks. For multi-document search beyond 10+ pages, RAG integration is recommended.
60
- - **Bias Awareness**: Reflects the factual contents of the provided technical PDFs.
61
- - **Verification**: Always cross-reference critical numeric values with source documents.
62
 
63
  ---
64
- **Architected and Fine-tuned by Bibek Lama Singtan**
 
12
  - technical-analysis
13
  ---
14
 
15
+ # πŸ“‚ Solvrays Finetuned Pdf - Senior Document AI
16
 
17
+ ## 🌟 Model Overview
18
+ This model is a high-precision fine-tuning of **google/gemma-2b-it**, specifically architected for **Zero-Hallucination Technical Retrieval**. It has been trained on a proprietary dataset of technical and architectural documentation to ensure deep contextual grounding.
19
 
20
+ ### πŸš€ Key Capabilities
21
+ - **Technical Grounding**: Prioritizes factual documentation over generative speculation.
22
+ - **Chunk-Aware Memory**: Optimized for overlapping document segments (256-token window).
23
+ - **Deterministic Precision**: Best used with `do_sample=False` for architectural accuracy.
24
 
25
  ## πŸ’» Professional Implementation
26
+ The model requires specific prompt construction to trigger its 'Knowledge Retrieval' mode:
27
 
28
  ```python
29
  from transformers import AutoTokenizer, AutoModelForCausalLM
 
31
 
32
  model_id = 'solvrays/solvrays-finetuned-pdf'
33
  tokenizer = AutoTokenizer.from_pretrained(model_id)
34
+ model = AutoModelForCausalLM.from_pretrained(
35
+ model_id,
36
+ device_map='auto',
37
+ torch_dtype=torch.bfloat16,
38
+ quantization_config={'load_in_4bit': True}
39
+ )
40
 
41
+ def query_model(user_query):
42
+ # High-Precision Retrieval Template
43
+ prompt = f'### Knowledge Retrieval Content: {user_query}\n### Verified Response: '
44
+ inputs = tokenizer(prompt, return_tensors='pt').to(model.device)
45
+ outputs = model.generate(**inputs, max_new_tokens=512, do_sample=False)
46
+ return tokenizer.decode(outputs[0], skip_special_tokens=True).split('### Verified Response:')[-1].strip()
 
 
 
 
47
  ```
48
 
49
+ ## πŸ“Š Technical Specifications
50
+ | Feature | Configuration |
51
  | :--- | :--- |
52
+ | **Base Model** | google/gemma-2b-it |
53
+ | **Precision** | BrainFloat16 (BF16) |
54
+ | **Fine-tuning** | QLoRA (4-bit Normalized Float) |
55
+ | **LoRA Rank (r)** | 16 |
56
+ | **LoRA Alpha** | 32 |
57
+ | **Target Modules** | q, k, v, o, gate, up, down |
58
+ | **Training Epochs** | 25 |
59
+
60
+ ## πŸ›  Training Environment
61
+ - **Hardware**: NVIDIA L4 x 2 (Dual GPU Architecture)
62
+ - **Optimizer**: Paged AdamW 8-bit
63
+ - **Context Length**: 256 tokens per block
64
 
65
+ ## ⚠️ Constraints & Risk Mitigation
66
+ - **Out-of-Scope**: This model is not intended for general conversation or creative writing. It is a specialized document analyst.
67
+ - **Hallucination Control**: If information is not present in the internal weights, the model is trained to state 'Not Documented' or provide an empty response for verification.
68
+ - **Numerical Accuracy**: Always cross-verify critical measurements with original PDF source material.
69
 
70
  ---
71
+ **Senior AI Architect & Developer**: Bibek Lama Singtan