AdityaNarayan commited on
Commit
c14069c
ยท
verified ยท
1 Parent(s): 109706c

Added README.md

Browse files
Files changed (1) hide show
  1. README.md +132 -0
README.md ADDED
@@ -0,0 +1,132 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: Qwen/Qwen2.5-Coder-32B-Instruct
3
+ tags:
4
+ - rust
5
+ - hyperswitch
6
+ - lora
7
+ - CPT
8
+ - fine-tuned
9
+ - causal-lm
10
+ pipeline_tag: text-generation
11
+ language:
12
+ - en
13
+ ---
14
+
15
+ # Qwen2.5-Coder-32B-Instruct-CPT-LoRA-Adapter-HyperSwitch
16
+
17
+ A LoRA fine-tuned model based on **Qwen/Qwen2.5-Coder-32B-Instruct** specialized for the [Hyperswitch](https://github.com/juspay/hyperswitch) Rust codebase. This model excels at understanding payment processing patterns, Hyperswitch architecture, and Rust development practices.
18
+
19
+ ## ๐ŸŽฏ Model Description
20
+
21
+ This LoRA adapter was trained on **16,731 samples** extracted from the Hyperswitch codebase to enhance code understanding, explanation, and generation within the payment processing domain.
22
+
23
+ - **Base Model**: Qwen/Qwen2.5-Coder-32B-Instruct
24
+ - **Training Type**: Causal Language Modeling (CLM) with LoRA
25
+ - **Domain**: Payment Processing, Rust Development
26
+ - **Specialization**: Hyperswitch codebase patterns and architecture
27
+
28
+ ## ๐Ÿ“Š Training Details
29
+
30
+ ### Dataset Composition
31
+ - **Total Samples**: 16,731
32
+ - **File-level samples**: 2,120 complete files
33
+ - **Granular samples**: 14,611 extracted components
34
+ - Functions: 4,121
35
+ - Structs: 5,710
36
+ - Traits: 223
37
+ - Implementations: 4,296
38
+ - Modules: 261
39
+
40
+ ### LoRA Configuration
41
+ ```yaml
42
+ r: 64 # LoRA rank
43
+ alpha: 128 # LoRA alpha (2*r)
44
+ dropout: 0.05 # LoRA dropout
45
+ target_modules: # Applied to all linear layers
46
+ - q_proj, k_proj, v_proj, o_proj
47
+ - gate_proj, up_proj, down_proj
48
+ ```
49
+
50
+ ### Training Hyperparameters
51
+ - **Epochs**: 5
52
+ - **Batch Size**: 2 per device (16 effective with gradient accumulation)
53
+ - **Learning Rate**: 5e-5 (cosine schedule)
54
+ - **Max Context**: 8,192 tokens
55
+ - **Hardware**: 2x NVIDIA H200 (80GB each)
56
+ - **Training Time**: ~4 hours (2,355 steps)
57
+
58
+ ### Training Results
59
+ ```
60
+ Final Loss: 0.48 (from 1.63)
61
+ Perplexity: 1.59 (from 5.12)
62
+ Accuracy: 89% (from 61%)
63
+ ```
64
+
65
+ ## ๐Ÿš€ Usage
66
+
67
+ ### Quick Start
68
+ ```python
69
+ from transformers import AutoModelForCausalLM, AutoTokenizer
70
+ from peft import PeftModel
71
+ import torch
72
+
73
+ # Load base model
74
+ base_model = AutoModelForCausalLM.from_pretrained(
75
+ "Qwen/Qwen2.5-Coder-32B-Instruct",
76
+ torch_dtype=torch.bfloat16,
77
+ device_map="auto"
78
+ )
79
+
80
+ # Load tokenizer
81
+ tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-Coder-32B-Instruct")
82
+
83
+ # Load LoRA adapter
84
+ model = PeftModel.from_pretrained(base_model, "juspay/Qwen2.5-Coder-32B-Instruct-CPT-LoRA-Adapter-HyperSwitch")
85
+
86
+ # Generate code
87
+ prompt = """// Hyperswitch payment processing
88
+ pub fn validate_payment_method("""
89
+
90
+ inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
91
+ outputs = model.generate(
92
+ **inputs,
93
+ max_new_tokens=200,
94
+ temperature=0.2, # Lower temperature for code generation
95
+ do_sample=True,
96
+ pad_token_id=tokenizer.eos_token_id
97
+ )
98
+
99
+ print(tokenizer.decode(outputs[0], skip_special_tokens=True))
100
+ ```
101
+
102
+ ### Recommended Settings
103
+ - **Temperature**: 0.2-0.3 for code generation
104
+ - **Temperature**: 0.5-0.7 for explanations and documentation
105
+ - **Max tokens**: 512-1024 for most tasks
106
+
107
+ ## ๐Ÿ› ๏ธ Technical Specifications
108
+
109
+ - **Context Window**: 8,192 tokens
110
+ - **Precision**: bfloat16
111
+ - **Memory Usage**: ~78GB VRAM (32B base model)
112
+ - **Inference Speed**: Optimized with Flash Attention 2
113
+
114
+
115
+
116
+ ## ๐Ÿ™ Acknowledgments
117
+
118
+ - **Qwen Team** for the excellent Qwen2.5-Coder base model
119
+ - **Hyperswitch Team** for the open-source payment processing platform
120
+ - **Hugging Face** for the transformers and PEFT libraries
121
+
122
+ ## ๐Ÿ“ž Citation
123
+
124
+ ```bibtex
125
+ @misc{hyperswitch-qwen-lora-2024,
126
+ title={Qwen2.5-Coder-32B-Instruct-CPT-LoRA-Adapter-HyperSwitch},
127
+ author={Aditya Narayan},
128
+ year={2024},
129
+ publisher={Hugging Face},
130
+ url={https://huggingface.co/juspayQwen25-Coder-32B-Instruct-CPT-LoRA-Adapter-HyperSwitch}
131
+ }
132
+ ```