Prithvik-1 commited on
Commit
5703502
Β·
verified Β·
1 Parent(s): 244a62f

Upload docs/QUICK_INFERENCE_GUIDE.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. docs/QUICK_INFERENCE_GUIDE.md +145 -0
docs/QUICK_INFERENCE_GUIDE.md ADDED
@@ -0,0 +1,145 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # πŸš€ Quick Inference Guide - mistral-finetuned-fifo1
2
+
3
+ ## βœ… Everything is Fixed and Ready!
4
+
5
+ Your fine-tuned model **mistral-finetuned-fifo1** is now working in the UI!
6
+
7
+ ---
8
+
9
+ ## 🌐 Access Gradio Interface
10
+
11
+ **Public URL**: https://3833be2ce50507322f.gradio.live
12
+ **Local URL**: http://0.0.0.0:7860
13
+
14
+ ---
15
+
16
+ ## 🎯 Quick Start - Test Your Model
17
+
18
+ ### Method 1: Direct Inference (Fastest)
19
+
20
+ 1. Open Gradio interface
21
+ 2. Go to **"πŸ§ͺ Test Inference"** tab
22
+ 3. **Select model**:
23
+ - Model Source: `Local Model`
24
+ - Dropdown: `/workspace/ftt/semicon-finetuning-scripts/mistral-finetuned-fifo1`
25
+ 4. Enter your prompt
26
+ 5. Click **"πŸ”„ Run Inference"**
27
+ 6. Done! Results appear in seconds.
28
+
29
+ ---
30
+
31
+ ### Method 2: Via API (For Production)
32
+
33
+ 1. Open Gradio interface
34
+ 2. Go to **"🌐 API Hosting"** tab
35
+ 3. **Select model**:
36
+ - Model Source: `Local Model`
37
+ - Dropdown: `/workspace/ftt/semicon-finetuning-scripts/mistral-finetuned-fifo1`
38
+ 4. Click **"πŸš€ Start API Server"**
39
+ 5. Wait 20-30 seconds for loading
40
+ 6. Server ready at: http://0.0.0.0:8000
41
+ 7. API Docs: http://0.0.0.0:8000/docs
42
+
43
+ **Then test via API**:
44
+ ```bash
45
+ curl -X POST "http://localhost:8000/generate" \
46
+ -H "Content-Type: application/json" \
47
+ -d '{
48
+ "prompt": "Your test prompt",
49
+ "max_length": 512,
50
+ "temperature": 0.7
51
+ }'
52
+ ```
53
+
54
+ ---
55
+
56
+ ## πŸ“ Example Prompts
57
+
58
+ Since your model was trained on FIFO (100 samples), try prompts related to:
59
+ - FIFO operations
60
+ - Semiconductor protocols
61
+ - AHB to APB bridge scenarios
62
+ - Whatever domain your training data covered
63
+
64
+ **Example**:
65
+ ```
66
+ Explain how a FIFO buffer works in a semiconductor device.
67
+ ```
68
+
69
+ ---
70
+
71
+ ## βš™οΈ Recommended Settings
72
+
73
+ ### For Accuracy:
74
+ - Max Length: 512
75
+ - Temperature: 0.1-0.3
76
+
77
+ ### For Creativity:
78
+ - Max Length: 1024
79
+ - Temperature: 0.7-0.9
80
+
81
+ ### For Speed:
82
+ - Max Length: 128-256
83
+ - Temperature: 0.5
84
+
85
+ ---
86
+
87
+ ## πŸ”§ Troubleshooting
88
+
89
+ ### Model Not in Dropdown?
90
+ ```bash
91
+ # Restart Gradio
92
+ pkill -f interface_app.py
93
+ cd /workspace/ftt/semicon-finetuning-scripts
94
+ python3 interface_app.py
95
+ ```
96
+
97
+ ### API Server Won't Start?
98
+ - Check logs in Gradio UI
99
+ - Ensure port 8000 is free: `lsof -i :8000`
100
+ - Kill if needed: `kill $(lsof -t -i :8000)`
101
+
102
+ ### Out of Memory?
103
+ ```bash
104
+ # Free GPU memory
105
+ pkill -f python3
106
+ python3 -c "import torch; torch.cuda.empty_cache()"
107
+ ```
108
+
109
+ ---
110
+
111
+ ## πŸ“Š What Was Fixed
112
+
113
+ βœ… **Model Listing**: Your new model now appears in all dropdowns
114
+ βœ… **API Server**: Fixed cache issue by using local base model
115
+ βœ… **Inference**: Both direct and API methods work perfectly
116
+
117
+ ---
118
+
119
+ ## πŸ“š Full Documentation
120
+
121
+ For detailed information, see:
122
+ - **Setup**: `/workspace/ftt/LOCAL_MODEL_SETUP.md`
123
+ - **Fixes**: `/workspace/ftt/MODEL_INFERENCE_FIXES.md`
124
+
125
+ ---
126
+
127
+ ## πŸ’‘ Pro Tips
128
+
129
+ 1. **First Run**: Direct inference is faster (no API server startup)
130
+ 2. **Production**: Use API server for multiple requests
131
+ 3. **Testing**: Start with short prompts to verify it works
132
+ 4. **Memory**: Close other processes if GPU is full
133
+
134
+ ---
135
+
136
+ **Your Model Info**:
137
+ - Location: `/workspace/ftt/semicon-finetuning-scripts/mistral-finetuned-fifo1`
138
+ - Type: LoRA Adapter (161 MB)
139
+ - Base: Mistral-7B-v0.1 (28 GB, local)
140
+ - Training: 100 samples, 3 epochs
141
+ - Device: A100 GPU
142
+
143
+ ---
144
+
145
+ πŸŽ‰ **Ready to go! Start testing your model now!**