Model-Training-V2 / HF_SPACE_INSTRUCTIONS.md
Kahrhoff's picture
Upload 7 files
39a2491 verified
# OpenFinancial Chatbot - HF Space Trainer
This is a self-contained training script designed to run in a Hugging Face Space.
## πŸš€ Quick Setup Instructions
### 1. Create a New HF Space
1. Go to https://huggingface.co/new-space
2. Choose **Gradio** as the SDK
3. Set hardware to **CPU Basic** (free) or **T4 GPU** (paid)
4. Name it something like `openfinancial-trainer`
### 2. Upload Files to Your Space
Upload these files to your HF Space:
- `hf_space_trainer.py` β†’ rename to `app.py`
- `requirements_hf_space.txt` β†’ rename to `requirements.txt`
- Your training CSV files (from the `trainingData` folder)
### 3. Training Data Format
Your CSV should have columns like:
- `Question` and `Answer`, OR
- `Input` and `Output`, OR
- `Prompt` and `Response`
The script will automatically detect the column names.
### 4. Start Training
1. Wait for the space to build (2-3 minutes)
2. Click **"πŸš€ Start Training"**
3. Monitor progress in real-time
4. Training takes 15-30 minutes on CPU, 5-10 minutes on GPU
### 5. Download Your Model
After training completes:
1. Go to your space's **Files** tab
2. Download the entire `trained_model` folder
3. Copy it to your local project
## 🎯 What This Does
- Loads your training data automatically
- Trains TinyLlama model for financial Q&A
- Saves model locally in the space
- Provides simple web interface
- Works on both CPU and GPU
## πŸ’‘ Pro Tips
- **Free Option**: Use CPU Basic (slower but free)
- **Fast Option**: Use T4 GPU (~$0.60/hour, much faster)
- **Multiple Files**: Script tries common CSV names automatically
- **Resume Training**: Refresh status to see if training completed
## πŸ“ Expected Output
After training, you'll have a `trained_model` folder containing:
- `config.json` - Model configuration
- `pytorch_model.bin` - Trained weights
- `tokenizer.json` - Tokenizer files
- Other supporting files
Copy this folder to your local backend directory and use it with your chatbot!