# OpenFinancial Chatbot - HF Space Trainer

This is a self-contained training script designed to run in a Hugging Face Space.

## 🚀 Quick Setup Instructions

### 1. Create a New HF Space
1. Go to https://huggingface.co/new-space
2. Choose **Gradio** as the SDK
3. Set hardware to **CPU Basic** (free) or **T4 GPU** (paid)
4. Name it something like `openfinancial-trainer`

### 2. Upload Files to Your Space
Upload these files to your HF Space:
- `hf_space_trainer.py` → rename to `app.py` 
- `requirements_hf_space.txt` → rename to `requirements.txt`
- Your training CSV files (from the `trainingData` folder)

### 3. Training Data Format
Your CSV should have columns like:
- `Question` and `Answer`, OR
- `Input` and `Output`, OR  
- `Prompt` and `Response`

The script will automatically detect the column names.

### 4. Start Training
1. Wait for the space to build (2-3 minutes)
2. Click **"🚀 Start Training"** 
3. Monitor progress in real-time
4. Training takes 15-30 minutes on CPU, 5-10 minutes on GPU

### 5. Download Your Model
After training completes:
1. Go to your space's **Files** tab
2. Download the entire `trained_model` folder
3. Copy it to your local project

## 🎯 What This Does
- Loads your training data automatically
- Trains TinyLlama model for financial Q&A
- Saves model locally in the space
- Provides simple web interface
- Works on both CPU and GPU

## 💡 Pro Tips
- **Free Option**: Use CPU Basic (slower but free)
- **Fast Option**: Use T4 GPU (~$0.60/hour, much faster)
- **Multiple Files**: Script tries common CSV names automatically
- **Resume Training**: Refresh status to see if training completed

## 📁 Expected Output
After training, you'll have a `trained_model` folder containing:
- `config.json` - Model configuration
- `pytorch_model.bin` - Trained weights
- `tokenizer.json` - Tokenizer files
- Other supporting files

Copy this folder to your local backend directory and use it with your chatbot!