doublebank commited on
Commit
6ebcc7c
·
verified ·
1 Parent(s): 16d5d4e

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +224 -22
README.md CHANGED
@@ -1,22 +1,224 @@
1
- ---
2
- base_model: unsloth/llama-3.2-3b-instruct-unsloth-bnb-4bit
3
- tags:
4
- - text-generation-inference
5
- - transformers
6
- - unsloth
7
- - llama
8
- - trl
9
- license: apache-2.0
10
- language:
11
- - en
12
- ---
13
-
14
- # Uploaded model
15
-
16
- - **Developed by:** doublebank
17
- - **License:** apache-2.0
18
- - **Finetuned from model :** unsloth/llama-3.2-3b-instruct-unsloth-bnb-4bit
19
-
20
- This llama model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.
21
-
22
- [<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Bro Chatbot - Fine-tuned Llama 3.2 3B
2
+
3
+ A conversational AI chatbot with a distinctive "bro" personality - casual, chilled, and supportive. Fine-tuned using Unsloth for efficient training.
4
+
5
+ ## Model Details
6
+
7
+ - **Base Model**: unsloth/Llama-3.2-3B-Instruct
8
+ - **Fine-tuning Method**: LoRA (Low-Rank Adaptation) with Unsloth
9
+ - **Model Type**: Conversational AI Chatbot
10
+ - **Language**: English
11
+ - **License**: Same as base model
12
+
13
+ ## Training Configuration
14
+
15
+ ### Model Loading Parameters
16
+ ```python
17
+ max_seq_length = 2048 # Maximum context window
18
+ dtype = None # Auto-detect optimal precision
19
+ load_in_4bit = True # 4-bit quantization for memory efficiency
20
+ ```
21
+
22
+ ### LoRA Configuration
23
+ ```python
24
+ r = 16 # LoRA rank
25
+ lora_alpha = 16 # LoRA scaling factor
26
+ lora_dropout = 0 # No dropout
27
+ target_modules = [ # Modules to adapt
28
+ "q_proj", "k_proj", "v_proj", "o_proj",
29
+ "gate_proj", "up_proj", "down_proj"
30
+ ]
31
+ ```
32
+
33
+ ### Training Parameters
34
+ ```python
35
+ per_device_train_batch_size = 2
36
+ gradient_accumulation_steps = 4
37
+ learning_rate = 2e-4
38
+ max_steps = 60
39
+ warmup_steps = 5
40
+ weight_decay = 0.01
41
+ optimizer = "adamw_8bit"
42
+ ```
43
+
44
+ ## Usage
45
+
46
+ ### Installation
47
+ ```bash
48
+ pip install unsloth transformers torch
49
+ ```
50
+
51
+ ### Loading the Model
52
+ ```python
53
+ from unsloth import FastLanguageModel
54
+ from unsloth.chat_templates import get_chat_template
55
+
56
+ # Load model with exact training parameters
57
+ model, tokenizer = FastLanguageModel.from_pretrained(
58
+ model_name = "your-username/bro-chatbot",
59
+ max_seq_length = 2048, # IMPORTANT: Use same as training
60
+ dtype = None,
61
+ load_in_4bit = True, # IMPORTANT: Use same as training
62
+ )
63
+
64
+ # Setup chat template
65
+ tokenizer = get_chat_template(tokenizer, chat_template = "llama-3.1")
66
+
67
+ # Enable fast inference
68
+ FastLanguageModel.for_inference(model)
69
+ ```
70
+
71
+ ### Basic Usage
72
+ ```python
73
+ def chat_with_bro(message):
74
+ messages = [{"role": "user", "content": message}]
75
+ inputs = tokenizer.apply_chat_template(
76
+ messages,
77
+ tokenize=True,
78
+ add_generation_prompt=True,
79
+ return_tensors="pt"
80
+ ).to("cuda")
81
+
82
+ # Generate response
83
+ outputs = model.generate(
84
+ input_ids=inputs,
85
+ max_new_tokens=128,
86
+ use_cache=True,
87
+ temperature=0.7,
88
+ min_p=0.1
89
+ )
90
+
91
+ # Extract only the new response
92
+ input_length = inputs.shape[1]
93
+ response = tokenizer.decode(outputs[0][input_length:], skip_special_tokens=True)
94
+ return response
95
+
96
+ # Example usage
97
+ response = chat_with_bro("How do I learn Python?")
98
+ print(response)
99
+ ```
100
+
101
+ ### Streaming Usage
102
+ ```python
103
+ from transformers import TextStreamer
104
+
105
+ def chat_with_bro_streaming(message):
106
+ messages = [{"role": "user", "content": message}]
107
+ inputs = tokenizer.apply_chat_template(
108
+ messages,
109
+ tokenize=True,
110
+ add_generation_prompt=True,
111
+ return_tensors="pt"
112
+ ).to("cuda")
113
+
114
+ # Stream response in real-time
115
+ text_streamer = TextStreamer(tokenizer, skip_prompt=True, skip_special_tokens=True)
116
+ model.generate(
117
+ input_ids=inputs,
118
+ streamer=text_streamer,
119
+ max_new_tokens=128,
120
+ use_cache=True,
121
+ temperature=0.7,
122
+ min_p=0.1
123
+ )
124
+
125
+ # Example usage
126
+ chat_with_bro_streaming("What's the meaning of life?")
127
+ ```
128
+
129
+ ## Personality Traits
130
+
131
+ The model exhibits these characteristics:
132
+ - **Casual**: Uses relaxed language, slang, and informal expressions
133
+ - **Chilled**: Maintains a calm, easy-going attitude in all responses
134
+ - **Supportive**: Always encouraging and positive, offering help and motivation
135
+ - **Friendly**: Uses terms like "bro", "dude", "man" in conversations
136
+ - **Helpful**: Provides useful information while maintaining the casual tone
137
+
138
+ ## Training Data
139
+
140
+ - **Dataset Size**: 57 conversation pairs
141
+ - **Data Format**: User-assistant conversations in JSON format
142
+ - **Topics Covered**:
143
+ - General knowledge questions
144
+ - Personal advice and motivation
145
+ - Technical explanations
146
+ - Casual conversations
147
+ - Problem-solving scenarios
148
+
149
+ ## Example Conversations
150
+
151
+ **User**: "What is the capital of France?"
152
+ **Bro Bot**: "Yo bro! The capital of France is Paris, dude. Beautiful city with the Eiffel Tower and all that good stuff. You thinking of visiting or just curious? Either way, you got this!"
153
+
154
+ **User**: "I'm feeling really stressed about my job interview tomorrow."
155
+ **Bro Bot**: "Hey man, I totally get that stress - interviews can be nerve-wracking! But listen bro, you wouldn't have gotten the interview if they didn't see something awesome in you already. Just be yourself, take some deep breaths, and remember you're gonna crush it. You got the skills, now just show 'em that chill confidence!"
156
+
157
+ ## Hardware Requirements
158
+
159
+ ### Minimum Requirements
160
+ - **GPU**: 8GB VRAM (RTX 3070, RTX 4060 Ti, or equivalent)
161
+ - **RAM**: 16GB system RAM
162
+ - **Storage**: 10GB free space
163
+
164
+ ### Recommended Requirements
165
+ - **GPU**: 16GB+ VRAM (RTX 4080, RTX 4090, or equivalent)
166
+ - **RAM**: 32GB system RAM
167
+ - **Storage**: 20GB+ free space
168
+
169
+ ## Performance Notes
170
+
171
+ - **Inference Speed**: ~2x faster with Unsloth optimizations
172
+ - **Memory Usage**: ~75% reduction with 4-bit quantization
173
+ - **Context Window**: 2048 tokens maximum
174
+ - **Response Quality**: Maintains personality while being informative
175
+
176
+ ## Limitations
177
+
178
+ - Trained on a relatively small dataset (57 examples)
179
+ - May occasionally break character in complex technical discussions
180
+ - Limited to English language conversations
181
+ - Context window limited to 2048 tokens
182
+ - Requires GPU for optimal performance
183
+
184
+ ## Technical Details
185
+
186
+ ### Model Architecture
187
+ - **Base**: Llama 3.2 3B parameters
188
+ - **Adaptation**: LoRA with rank 16
189
+ - **Quantization**: 4-bit using bitsandbytes
190
+ - **Chat Template**: Llama 3.1 format
191
+
192
+ ### Training Infrastructure
193
+ - **Framework**: Unsloth + TRL (Transformers Reinforcement Learning)
194
+ - **Optimization**: AdamW 8-bit optimizer
195
+ - **Memory**: Gradient checkpointing enabled
196
+ - **Platform**: Tested on Windows and Linux
197
+
198
+ ## Citation
199
+
200
+ If you use this model, please cite:
201
+
202
+ ```bibtex
203
+ @misc{bro-chatbot-2024,
204
+ title={Bro Chatbot: A Casual Conversational AI},
205
+ author={Your Name},
206
+ year={2024},
207
+ howpublished={HuggingFace Model Hub},
208
+ url={https://huggingface.co/your-username/bro-chatbot}
209
+ }
210
+ ```
211
+
212
+ ## Acknowledgments
213
+
214
+ - **Unsloth**: For efficient fine-tuning framework
215
+ - **Meta**: For the Llama 3.2 base model
216
+ - **HuggingFace**: For model hosting and transformers library
217
+
218
+ ## Contact
219
+
220
+ For questions or issues, please open an issue on the model repository or contact [your-email@example.com].
221
+
222
+ ---
223
+
224
+ **Note**: This model is for research and educational purposes. Please use responsibly and be aware of potential biases in AI-generated content.