Spaces:
Sleeping
Sleeping
A newer version of the Streamlit SDK is available:
1.54.0
π MLOps Platform Startup Guide
Welcome to the MLOps Training Platform! This guide will help you get started quickly.
β‘ Quick Launch
Option 1: Streamlit Web Interface (Recommended)
# Activate your virtual environment
# Windows:
venv\Scripts\activate
# Linux/Mac:
source venv/bin/activate
# Launch the Streamlit app
streamlit run streamlit_app.py
# The app will open in your browser at http://localhost:8501
Option 2: Programmatic Usage
# Run the example script
python example_usage.py
Option 3: FastAPI Backend (Original)
# Run the FastAPI server
python -m uvicorn app.main:app --reload
# API will be available at http://localhost:8000
# Interactive docs at http://localhost:8000/docs
π¦ First-Time Setup Checklist
- Python 3.8+ installed
- Virtual environment created (
python -m venv venv) - Virtual environment activated
- Dependencies installed (
pip install -r requirements.txt) - At least 4GB RAM available
- Internet connection (for downloading models)
π― Your First Training Session
1. Prepare Your Data
Create a CSV file with these columns:
text- Your text sampleslabel- Binary labels (0 or 1)
Example: phishing_data.csv
text,label
"Legitimate business email content",0
"URGENT: Click here to claim prize!",1
"Meeting scheduled for tomorrow",0
"Your account is compromised! Act now!",1
2. Launch the Platform
streamlit run streamlit_app.py
3. Follow the Workflow
Data Upload Tab
- Upload your CSV file
- Or click "Sample" button to load example data
- Verify data structure and class distribution
Training Config Tab
- Select target language (English, Chinese, Khmer)
- Choose model architecture (start with DistilBERT for CPU)
- Adjust hyperparameters:
- Epochs: 3-5 for most tasks
- Batch size: 8-16 for CPU, 32-64 for GPU
- Learning rate: 2e-5 (default is good)
Training Tab
- Click "Start Training"
- Monitor progress in real-time
- Watch metrics update live
Evaluation Tab
- Review final metrics
- Test model with new text
- Download trained model
π Language-Specific Tips
English π¬π§
- Use RoBERTa or DistilBERT
- Standard preprocessing works well
- Fast training on CPU
Chinese π¨π³
- Use mBERT or XLM-RoBERTa
- Automatic word segmentation with jieba
- May need more training time
Khmer π°π
- Use mBERT or XLM-RoBERTa
- Unicode normalization applied
- Ensure UTF-8 encoding in CSV
π‘ Pro Tips
For CPU Training
# In Training Config:
- Model: distilbert-base-multilingual-cased
- Batch size: 8
- Max length: 128
- Epochs: 3
For GPU Training
# In Training Config:
- Model: xlm-roberta-base
- Batch size: 32
- Max length: 256
- Epochs: 5
Dealing with Imbalanced Data
- Ensure both classes have sufficient samples (min 20-30 each)
- Consider using stratified sampling
- Monitor precision and recall, not just accuracy
π Common Issues & Solutions
Issue: "Out of Memory"
Solutions:
- Reduce batch size to 4 or 8
- Use DistilBERT instead of larger models
- Reduce max sequence length to 128
Issue: "Model download fails"
Solutions:
- Check internet connection
- Try with VPN if blocked
- Manually download model from Hugging Face Hub
Issue: "Training too slow"
Solutions:
- Use smaller model (DistilBERT)
- Reduce dataset size for testing
- Check if GPU is available:
torch.cuda.is_available()
Issue: "Low accuracy"
Solutions:
- Increase number of epochs
- Try different learning rate (3e-5 or 5e-5)
- Ensure data quality and labels are correct
- Use more training data
π Understanding Metrics
| Metric | What it means | When to focus on it |
|---|---|---|
| Accuracy | Overall correct predictions | Balanced datasets |
| Precision | Of predicted positives, how many are correct | Minimize false alarms |
| Recall | Of actual positives, how many found | Don't miss any positives |
| F1 Score | Balance of precision and recall | General performance |
π Useful Resources
π Getting Help
- Check the troubleshooting section in MLOPS_README.md
- Review the logs in the training tab
- Run
example_usage.pyto test programmatically - Check console output for detailed error messages
π Next Steps
After successfully training your first model:
- Export Model: Download from Evaluation tab
- Deploy: Use with FastAPI backend or integrate elsewhere
- Iterate: Try different languages, models, hyperparameters
- Scale: Train on larger datasets with GPU
Happy Training! π
For detailed documentation, see MLOPS_README.md