Spaces:

Prithvik-1
/

mistral-finetuning-interface

Paused

App Files Files Community

Prithvik-1 commited on Nov 24, 2025

Commit

5703502

verified ·

1 Parent(s): 244a62f

Upload docs/QUICK_INFERENCE_GUIDE.md with huggingface_hub

Browse files

Files changed (1) hide show

docs/QUICK_INFERENCE_GUIDE.md +145 -0

docs/QUICK_INFERENCE_GUIDE.md ADDED Viewed

	@@ -0,0 +1,145 @@

+# 🚀 Quick Inference Guide - mistral-finetuned-fifo1
+## ✅ Everything is Fixed and Ready!
+Your fine-tuned model **mistral-finetuned-fifo1** is now working in the UI!
+---
+## 🌐 Access Gradio Interface
+**Public URL**: https://3833be2ce50507322f.gradio.live
+**Local URL**: http://0.0.0.0:7860
+---
+## 🎯 Quick Start - Test Your Model
+### Method 1: Direct Inference (Fastest)
+1. Open Gradio interface
+2. Go to **"🧪 Test Inference"** tab
+3. **Select model**:
+   - Model Source: `Local Model`
+   - Dropdown: `/workspace/ftt/semicon-finetuning-scripts/mistral-finetuned-fifo1`
+4. Enter your prompt
+5. Click **"🔄 Run Inference"**
+6. Done! Results appear in seconds.
+---
+### Method 2: Via API (For Production)
+1. Open Gradio interface
+2. Go to **"🌐 API Hosting"** tab
+3. **Select model**:
+   - Model Source: `Local Model`
+   - Dropdown: `/workspace/ftt/semicon-finetuning-scripts/mistral-finetuned-fifo1`
+4. Click **"🚀 Start API Server"**
+5. Wait 20-30 seconds for loading
+6. Server ready at: http://0.0.0.0:8000
+7. API Docs: http://0.0.0.0:8000/docs
+**Then test via API**:
+```bash
+curl -X POST "http://localhost:8000/generate" \
+     -H "Content-Type: application/json" \
+     -d '{
+       "prompt": "Your test prompt",
+       "max_length": 512,
+       "temperature": 0.7
+     }'
+```
+---
+## 📝 Example Prompts
+Since your model was trained on FIFO (100 samples), try prompts related to:
+- FIFO operations
+- Semiconductor protocols
+- AHB to APB bridge scenarios
+- Whatever domain your training data covered
+**Example**:
+```
+Explain how a FIFO buffer works in a semiconductor device.
+```
+---
+## ⚙️ Recommended Settings
+### For Accuracy:
+- Max Length: 512
+- Temperature: 0.1-0.3
+### For Creativity:
+- Max Length: 1024
+- Temperature: 0.7-0.9
+### For Speed:
+- Max Length: 128-256
+- Temperature: 0.5
+---
+## 🔧 Troubleshooting
+### Model Not in Dropdown?
+```bash
+# Restart Gradio
+pkill -f interface_app.py
+cd /workspace/ftt/semicon-finetuning-scripts
+python3 interface_app.py
+```
+### API Server Won't Start?
+- Check logs in Gradio UI
+- Ensure port 8000 is free: `lsof -i :8000`
+- Kill if needed: `kill $(lsof -t -i :8000)`
+### Out of Memory?
+```bash
+# Free GPU memory
+pkill -f python3
+python3 -c "import torch; torch.cuda.empty_cache()"
+```
+---
+## 📊 What Was Fixed
+✅ **Model Listing**: Your new model now appears in all dropdowns
+✅ **API Server**: Fixed cache issue by using local base model
+✅ **Inference**: Both direct and API methods work perfectly
+---
+## 📚 Full Documentation
+For detailed information, see:
+- **Setup**: `/workspace/ftt/LOCAL_MODEL_SETUP.md`
+- **Fixes**: `/workspace/ftt/MODEL_INFERENCE_FIXES.md`
+---
+## 💡 Pro Tips
+1. **First Run**: Direct inference is faster (no API server startup)
+2. **Production**: Use API server for multiple requests
+3. **Testing**: Start with short prompts to verify it works
+4. **Memory**: Close other processes if GPU is full
+---
+**Your Model Info**:
+- Location: `/workspace/ftt/semicon-finetuning-scripts/mistral-finetuned-fifo1`
+- Type: LoRA Adapter (161 MB)
+- Base: Mistral-7B-v0.1 (28 GB, local)
+- Training: 100 samples, 3 epochs
+- Device: A100 GPU
+---
+🎉 **Ready to go! Start testing your model now!**