Upload docs/QUICK_INFERENCE_GUIDE.md with huggingface_hub
Browse files- docs/QUICK_INFERENCE_GUIDE.md +145 -0
docs/QUICK_INFERENCE_GUIDE.md
ADDED
|
@@ -0,0 +1,145 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# π Quick Inference Guide - mistral-finetuned-fifo1
|
| 2 |
+
|
| 3 |
+
## β
Everything is Fixed and Ready!
|
| 4 |
+
|
| 5 |
+
Your fine-tuned model **mistral-finetuned-fifo1** is now working in the UI!
|
| 6 |
+
|
| 7 |
+
---
|
| 8 |
+
|
| 9 |
+
## π Access Gradio Interface
|
| 10 |
+
|
| 11 |
+
**Public URL**: https://3833be2ce50507322f.gradio.live
|
| 12 |
+
**Local URL**: http://0.0.0.0:7860
|
| 13 |
+
|
| 14 |
+
---
|
| 15 |
+
|
| 16 |
+
## π― Quick Start - Test Your Model
|
| 17 |
+
|
| 18 |
+
### Method 1: Direct Inference (Fastest)
|
| 19 |
+
|
| 20 |
+
1. Open Gradio interface
|
| 21 |
+
2. Go to **"π§ͺ Test Inference"** tab
|
| 22 |
+
3. **Select model**:
|
| 23 |
+
- Model Source: `Local Model`
|
| 24 |
+
- Dropdown: `/workspace/ftt/semicon-finetuning-scripts/mistral-finetuned-fifo1`
|
| 25 |
+
4. Enter your prompt
|
| 26 |
+
5. Click **"π Run Inference"**
|
| 27 |
+
6. Done! Results appear in seconds.
|
| 28 |
+
|
| 29 |
+
---
|
| 30 |
+
|
| 31 |
+
### Method 2: Via API (For Production)
|
| 32 |
+
|
| 33 |
+
1. Open Gradio interface
|
| 34 |
+
2. Go to **"π API Hosting"** tab
|
| 35 |
+
3. **Select model**:
|
| 36 |
+
- Model Source: `Local Model`
|
| 37 |
+
- Dropdown: `/workspace/ftt/semicon-finetuning-scripts/mistral-finetuned-fifo1`
|
| 38 |
+
4. Click **"π Start API Server"**
|
| 39 |
+
5. Wait 20-30 seconds for loading
|
| 40 |
+
6. Server ready at: http://0.0.0.0:8000
|
| 41 |
+
7. API Docs: http://0.0.0.0:8000/docs
|
| 42 |
+
|
| 43 |
+
**Then test via API**:
|
| 44 |
+
```bash
|
| 45 |
+
curl -X POST "http://localhost:8000/generate" \
|
| 46 |
+
-H "Content-Type: application/json" \
|
| 47 |
+
-d '{
|
| 48 |
+
"prompt": "Your test prompt",
|
| 49 |
+
"max_length": 512,
|
| 50 |
+
"temperature": 0.7
|
| 51 |
+
}'
|
| 52 |
+
```
|
| 53 |
+
|
| 54 |
+
---
|
| 55 |
+
|
| 56 |
+
## π Example Prompts
|
| 57 |
+
|
| 58 |
+
Since your model was trained on FIFO (100 samples), try prompts related to:
|
| 59 |
+
- FIFO operations
|
| 60 |
+
- Semiconductor protocols
|
| 61 |
+
- AHB to APB bridge scenarios
|
| 62 |
+
- Whatever domain your training data covered
|
| 63 |
+
|
| 64 |
+
**Example**:
|
| 65 |
+
```
|
| 66 |
+
Explain how a FIFO buffer works in a semiconductor device.
|
| 67 |
+
```
|
| 68 |
+
|
| 69 |
+
---
|
| 70 |
+
|
| 71 |
+
## βοΈ Recommended Settings
|
| 72 |
+
|
| 73 |
+
### For Accuracy:
|
| 74 |
+
- Max Length: 512
|
| 75 |
+
- Temperature: 0.1-0.3
|
| 76 |
+
|
| 77 |
+
### For Creativity:
|
| 78 |
+
- Max Length: 1024
|
| 79 |
+
- Temperature: 0.7-0.9
|
| 80 |
+
|
| 81 |
+
### For Speed:
|
| 82 |
+
- Max Length: 128-256
|
| 83 |
+
- Temperature: 0.5
|
| 84 |
+
|
| 85 |
+
---
|
| 86 |
+
|
| 87 |
+
## π§ Troubleshooting
|
| 88 |
+
|
| 89 |
+
### Model Not in Dropdown?
|
| 90 |
+
```bash
|
| 91 |
+
# Restart Gradio
|
| 92 |
+
pkill -f interface_app.py
|
| 93 |
+
cd /workspace/ftt/semicon-finetuning-scripts
|
| 94 |
+
python3 interface_app.py
|
| 95 |
+
```
|
| 96 |
+
|
| 97 |
+
### API Server Won't Start?
|
| 98 |
+
- Check logs in Gradio UI
|
| 99 |
+
- Ensure port 8000 is free: `lsof -i :8000`
|
| 100 |
+
- Kill if needed: `kill $(lsof -t -i :8000)`
|
| 101 |
+
|
| 102 |
+
### Out of Memory?
|
| 103 |
+
```bash
|
| 104 |
+
# Free GPU memory
|
| 105 |
+
pkill -f python3
|
| 106 |
+
python3 -c "import torch; torch.cuda.empty_cache()"
|
| 107 |
+
```
|
| 108 |
+
|
| 109 |
+
---
|
| 110 |
+
|
| 111 |
+
## π What Was Fixed
|
| 112 |
+
|
| 113 |
+
β
**Model Listing**: Your new model now appears in all dropdowns
|
| 114 |
+
β
**API Server**: Fixed cache issue by using local base model
|
| 115 |
+
β
**Inference**: Both direct and API methods work perfectly
|
| 116 |
+
|
| 117 |
+
---
|
| 118 |
+
|
| 119 |
+
## π Full Documentation
|
| 120 |
+
|
| 121 |
+
For detailed information, see:
|
| 122 |
+
- **Setup**: `/workspace/ftt/LOCAL_MODEL_SETUP.md`
|
| 123 |
+
- **Fixes**: `/workspace/ftt/MODEL_INFERENCE_FIXES.md`
|
| 124 |
+
|
| 125 |
+
---
|
| 126 |
+
|
| 127 |
+
## π‘ Pro Tips
|
| 128 |
+
|
| 129 |
+
1. **First Run**: Direct inference is faster (no API server startup)
|
| 130 |
+
2. **Production**: Use API server for multiple requests
|
| 131 |
+
3. **Testing**: Start with short prompts to verify it works
|
| 132 |
+
4. **Memory**: Close other processes if GPU is full
|
| 133 |
+
|
| 134 |
+
---
|
| 135 |
+
|
| 136 |
+
**Your Model Info**:
|
| 137 |
+
- Location: `/workspace/ftt/semicon-finetuning-scripts/mistral-finetuned-fifo1`
|
| 138 |
+
- Type: LoRA Adapter (161 MB)
|
| 139 |
+
- Base: Mistral-7B-v0.1 (28 GB, local)
|
| 140 |
+
- Training: 100 samples, 3 epochs
|
| 141 |
+
- Device: A100 GPU
|
| 142 |
+
|
| 143 |
+
---
|
| 144 |
+
|
| 145 |
+
π **Ready to go! Start testing your model now!**
|