Spaces:

piyushdev
/

gpt-oss

Sleeping

File size: 6,273 Bytes

7e0815e
 
 
 
 
 
 
 
 
 
fe7cad2
 
 
 
 
 
 
 
20be5b8
7ba47b9
 
 
 
fe7cad2
7ba47b9
fe7cad2
 
 
 
 
 
 
 
0d6d59d
9c4a566
 
 
 
 
 
 
 
7ba47b9
 
 
 
 
 
fe7cad2
 
 
 
 
 
 
 
 
 
 
 
 
 
 
ad2a69c
 
 
fe7cad2
 
 
ad2a69c
 
0d6d59d
ad2a69c
fe7cad2
 
 
 
 
 
 
 
7ba47b9
 
 
fe7cad2
 
 
 
7ba47b9
 
fe7cad2
 
 
 
7ba47b9
 
 
 
 
 
fe7cad2
 
 
20be5b8
ac3bc2f
7ba47b9
 
fe7cad2
7ba47b9
fe7cad2
7ba47b9
fe7cad2
 
 
ad2a69c
9c4a566
 
ad2a69c
 
 
0d6d59d
fe7cad2
 
 
 
0d6d59d
9c4a566
20be5b8
 
 
fe7cad2
7ba47b9
 
 
 
 
 
fe7cad2
7ba47b9
fe7cad2
 
7ba47b9
 
fe7cad2
 
 
 
 
 
0d6d59d
fe7cad2
0d6d59d
 
 
 
 
 
 
 
 
fe7cad2
 
 
0d6d59d
 
fe7cad2
20be5b8

---
title: Business Category Description Generator
emoji: 🏢
colorFrom: blue
colorTo: purple
sdk: gradio
app_file: app.py
pinned: false
---

# Business Category Description Generator

A Hugging Face Gradio application that generates CLIP-ready visual descriptions for business category keywords from CSV files.

## Features

- 📤 **Upload Multiple CSV Files**: Process one or more CSV files at once
- 🔄 **Batch Processing**: Automatically processes all unique categories from your files
- 🤖 **AI-Powered**: Uses OpenAI's GPT-OSS-20B model for high-quality descriptions
- 🔁 **Automatic Retry Logic**: 3 attempts per category with intelligent error recovery
- ✅ **Validation**: JSON validation and quality checks for every description
- 📊 **Progress Tracking**: Real-time progress updates with success/failure reporting
- 💾 **Automatic Saving**: Output files with Status column showing results
- 📥 **Easy Download**: Download all processed files directly from the interface
- ⚡ **Zero GPU Support**: Use Zero GPU for faster, free GPU acceleration

## How to Use

### 1. Deploy to Hugging Face Spaces

1. Go to [Hugging Face Spaces](https://huggingface.co/spaces)
2. Click "Create new Space"
3. Choose "Gradio" as the SDK
4. Upload `app.py`, `requirements.txt`, and `README.md`
5. **Add Your HF Token as a Secret (Required)**:
   - Go to your Space's Settings (gear icon)
   - Find the "Repository secrets" or "Secrets" section
   - Click "Add a secret" or "New secret"
   - Enter:
     - **Name**: `HF_TOKEN`
     - **Value**: Your Hugging Face token (get from https://huggingface.co/settings/tokens)
   - Click "Save"
6. **Optional: Enable Zero GPU for Faster Processing**:
   - Zero GPU provides free GPU acceleration
   - No Pro subscription required
   - Space will automatically use GPU when available
   - Significantly speeds up processing for large batches
7. Your app will be deployed and restart automatically!

### 2. Prepare Your CSV Files

Your CSV files should contain a column with business category keywords. For example:

```csv
category,other_column
Car Rental For Self Driven,additional_data
Mehandi,additional_data
Photographer,additional_data
Equipment,additional_data
```

### 3. Use the Application

1. **Upload Files**: Upload one or more CSV files
2. **Specify Column**: Enter the name of the column containing categories (default: "category")
3. **Adjust Settings** (optional):
   - Max Tokens: 64-512 (default: 256)
   - Temperature: 0.1-1.0 (default: 0.7)
   - Top-p: 0.1-1.0 (default: 0.9)
4. **Process**: Click "Process Files" and wait for completion
5. **Download**: Download the output CSV files with descriptions

*Note: Authentication is handled automatically via the HF_TOKEN secret you configured in Space settings.*

## Output Format

Each output CSV file contains:

| Column | Description |
|--------|-------------|
| `Category` | The original category keyword |
| `Description` | The generated CLIP-ready visual description (validated) |
| `Raw_Response` | The complete model response (for debugging) |
| `Status` | "Success" or "Failed" with error details |

## Example Output

```csv
Category,Description,Raw_Response,Status
Car Rental For Self Driven,"a car available for self-drive rental, parked at a pickup spot without a chauffeur; looks travel-ready, clean, well-maintained, keys handed over to customer","{""Category"": ""Car Rental For Self Driven"", ""Description"": ""...""}",Success
```

## Model Settings

- **Max Tokens**: Controls the maximum length of generated descriptions (default: 256)
- **Temperature**: Controls output consistency (default: 0.3)
  - 0.2-0.4: Consistent, focused descriptions (recommended)
  - 0.5-0.7: Balanced creativity and consistency
  - 0.8-1.0: More creative variations
- **Top-p**: Nucleus sampling parameter, controls diversity (default: 0.9)

## Technical Details

- **Model**: openai/gpt-oss-20b
- **Framework**: Gradio (latest stable version)
- **Retry Logic**: 3 attempts per category with 1-second delay between retries
- **Validation**: JSON parsing, structure validation, and minimum length checks
- **Processing**: Categories are deduplicated automatically
- **Rate Limiting**: 0.5-second delay between categories to avoid API throttling
- **Output Files**: Named as `output_{original_name}_{timestamp}.csv`
- **Zero GPU Support**: Free GPU acceleration available for Spaces

## Troubleshooting

### "HF_TOKEN not found" error
- Make sure you've added `HF_TOKEN` as a Secret in your Space settings
- Go to Space Settings → Secrets → Add a secret
- Name must be exactly: `HF_TOKEN` (case-sensitive)
- Value: your token from https://huggingface.co/settings/tokens
- Restart your Space after adding the secret (or it will restart automatically)

### "Column not found" error
- Check that the column name matches exactly (case-sensitive)
- View the error message to see available columns

### Authentication errors
- Ensure your HF token has proper permissions (Read access minimum)
- Check that your account has access to the Inference API
- Verify the token hasn't expired
- Make sure you're using a valid token from https://huggingface.co/settings/tokens

### Inconsistent or incomplete output
- Lower the Temperature to 0.2-0.4 for more consistent results
- Check the Status column in output CSV to identify failed categories
- Failed categories can be extracted and reprocessed separately
- Zero GPU will provide more reliable processing with better resources

### Slow processing
- The model processes each unique category individually (includes retries)
- Large files with many unique categories will take longer
- Consider splitting very large files into smaller batches
- Zero GPU acceleration is automatically available for your Space
- Each category has a 0.5s delay to prevent rate limiting

## Local Development

To run locally:

```bash
# Install dependencies
pip install -r requirements.txt

# Set your Hugging Face token as an environment variable
# Windows (PowerShell):
$env:HF_TOKEN="your_hf_token_here"

# Linux/Mac:
export HF_TOKEN="your_hf_token_here"

# Run the app
python app.py
```

Get your token from: https://huggingface.co/settings/tokens

## License

This project uses the GPT-OSS-20B model via Hugging Face Inference API.