# OpenFinancial Chatbot - HF Space Trainer This is a self-contained training script designed to run in a Hugging Face Space. ## 🚀 Quick Setup Instructions ### 1. Create a New HF Space 1. Go to https://huggingface.co/new-space 2. Choose **Gradio** as the SDK 3. Set hardware to **CPU Basic** (free) or **T4 GPU** (paid) 4. Name it something like `openfinancial-trainer` ### 2. Upload Files to Your Space Upload these files to your HF Space: - `hf_space_trainer.py` → rename to `app.py` - `requirements_hf_space.txt` → rename to `requirements.txt` - Your training CSV files (from the `trainingData` folder) ### 3. Training Data Format Your CSV should have columns like: - `Question` and `Answer`, OR - `Input` and `Output`, OR - `Prompt` and `Response` The script will automatically detect the column names. ### 4. Start Training 1. Wait for the space to build (2-3 minutes) 2. Click **"🚀 Start Training"** 3. Monitor progress in real-time 4. Training takes 15-30 minutes on CPU, 5-10 minutes on GPU ### 5. Download Your Model After training completes: 1. Go to your space's **Files** tab 2. Download the entire `trained_model` folder 3. Copy it to your local project ## 🎯 What This Does - Loads your training data automatically - Trains TinyLlama model for financial Q&A - Saves model locally in the space - Provides simple web interface - Works on both CPU and GPU ## 💡 Pro Tips - **Free Option**: Use CPU Basic (slower but free) - **Fast Option**: Use T4 GPU (~$0.60/hour, much faster) - **Multiple Files**: Script tries common CSV names automatically - **Resume Training**: Refresh status to see if training completed ## 📁 Expected Output After training, you'll have a `trained_model` folder containing: - `config.json` - Model configuration - `pytorch_model.bin` - Trained weights - `tokenizer.json` - Tokenizer files - Other supporting files Copy this folder to your local backend directory and use it with your chatbot!