File size: 6,273 Bytes
7e0815e
 
 
 
 
 
 
 
 
 
fe7cad2
 
 
 
 
 
 
 
20be5b8
7ba47b9
 
 
 
fe7cad2
7ba47b9
fe7cad2
 
 
 
 
 
 
 
0d6d59d
9c4a566
 
 
 
 
 
 
 
7ba47b9
 
 
 
 
 
fe7cad2
 
 
 
 
 
 
 
 
 
 
 
 
 
 
ad2a69c
 
 
fe7cad2
 
 
ad2a69c
 
0d6d59d
ad2a69c
fe7cad2
 
 
 
 
 
 
 
7ba47b9
 
 
fe7cad2
 
 
 
7ba47b9
 
fe7cad2
 
 
 
7ba47b9
 
 
 
 
 
fe7cad2
 
 
20be5b8
ac3bc2f
7ba47b9
 
fe7cad2
7ba47b9
fe7cad2
7ba47b9
fe7cad2
 
 
ad2a69c
9c4a566
 
ad2a69c
 
 
0d6d59d
fe7cad2
 
 
 
0d6d59d
9c4a566
20be5b8
 
 
fe7cad2
7ba47b9
 
 
 
 
 
fe7cad2
7ba47b9
fe7cad2
 
7ba47b9
 
fe7cad2
 
 
 
 
 
0d6d59d
fe7cad2
0d6d59d
 
 
 
 
 
 
 
 
fe7cad2
 
 
0d6d59d
 
fe7cad2
20be5b8
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
---
title: Business Category Description Generator
emoji: 🏒
colorFrom: blue
colorTo: purple
sdk: gradio
app_file: app.py
pinned: false
---

# Business Category Description Generator

A Hugging Face Gradio application that generates CLIP-ready visual descriptions for business category keywords from CSV files.

## Features

- πŸ“€ **Upload Multiple CSV Files**: Process one or more CSV files at once
- πŸ”„ **Batch Processing**: Automatically processes all unique categories from your files
- πŸ€– **AI-Powered**: Uses OpenAI's GPT-OSS-20B model for high-quality descriptions
- πŸ” **Automatic Retry Logic**: 3 attempts per category with intelligent error recovery
- βœ… **Validation**: JSON validation and quality checks for every description
- πŸ“Š **Progress Tracking**: Real-time progress updates with success/failure reporting
- πŸ’Ύ **Automatic Saving**: Output files with Status column showing results
- πŸ“₯ **Easy Download**: Download all processed files directly from the interface
- ⚑ **Zero GPU Support**: Use Zero GPU for faster, free GPU acceleration

## How to Use

### 1. Deploy to Hugging Face Spaces

1. Go to [Hugging Face Spaces](https://huggingface.co/spaces)
2. Click "Create new Space"
3. Choose "Gradio" as the SDK
4. Upload `app.py`, `requirements.txt`, and `README.md`
5. **Add Your HF Token as a Secret (Required)**:
   - Go to your Space's Settings (gear icon)
   - Find the "Repository secrets" or "Secrets" section
   - Click "Add a secret" or "New secret"
   - Enter:
     - **Name**: `HF_TOKEN`
     - **Value**: Your Hugging Face token (get from https://huggingface.co/settings/tokens)
   - Click "Save"
6. **Optional: Enable Zero GPU for Faster Processing**:
   - Zero GPU provides free GPU acceleration
   - No Pro subscription required
   - Space will automatically use GPU when available
   - Significantly speeds up processing for large batches
7. Your app will be deployed and restart automatically!

### 2. Prepare Your CSV Files

Your CSV files should contain a column with business category keywords. For example:

```csv
category,other_column
Car Rental For Self Driven,additional_data
Mehandi,additional_data
Photographer,additional_data
Equipment,additional_data
```

### 3. Use the Application

1. **Upload Files**: Upload one or more CSV files
2. **Specify Column**: Enter the name of the column containing categories (default: "category")
3. **Adjust Settings** (optional):
   - Max Tokens: 64-512 (default: 256)
   - Temperature: 0.1-1.0 (default: 0.7)
   - Top-p: 0.1-1.0 (default: 0.9)
4. **Process**: Click "Process Files" and wait for completion
5. **Download**: Download the output CSV files with descriptions

*Note: Authentication is handled automatically via the HF_TOKEN secret you configured in Space settings.*

## Output Format

Each output CSV file contains:

| Column | Description |
|--------|-------------|
| `Category` | The original category keyword |
| `Description` | The generated CLIP-ready visual description (validated) |
| `Raw_Response` | The complete model response (for debugging) |
| `Status` | "Success" or "Failed" with error details |

## Example Output

```csv
Category,Description,Raw_Response,Status
Car Rental For Self Driven,"a car available for self-drive rental, parked at a pickup spot without a chauffeur; looks travel-ready, clean, well-maintained, keys handed over to customer","{""Category"": ""Car Rental For Self Driven"", ""Description"": ""...""}",Success
```

## Model Settings

- **Max Tokens**: Controls the maximum length of generated descriptions (default: 256)
- **Temperature**: Controls output consistency (default: 0.3)
  - 0.2-0.4: Consistent, focused descriptions (recommended)
  - 0.5-0.7: Balanced creativity and consistency
  - 0.8-1.0: More creative variations
- **Top-p**: Nucleus sampling parameter, controls diversity (default: 0.9)

## Technical Details

- **Model**: openai/gpt-oss-20b
- **Framework**: Gradio (latest stable version)
- **Retry Logic**: 3 attempts per category with 1-second delay between retries
- **Validation**: JSON parsing, structure validation, and minimum length checks
- **Processing**: Categories are deduplicated automatically
- **Rate Limiting**: 0.5-second delay between categories to avoid API throttling
- **Output Files**: Named as `output_{original_name}_{timestamp}.csv`
- **Zero GPU Support**: Free GPU acceleration available for Spaces

## Troubleshooting

### "HF_TOKEN not found" error
- Make sure you've added `HF_TOKEN` as a Secret in your Space settings
- Go to Space Settings β†’ Secrets β†’ Add a secret
- Name must be exactly: `HF_TOKEN` (case-sensitive)
- Value: your token from https://huggingface.co/settings/tokens
- Restart your Space after adding the secret (or it will restart automatically)

### "Column not found" error
- Check that the column name matches exactly (case-sensitive)
- View the error message to see available columns

### Authentication errors
- Ensure your HF token has proper permissions (Read access minimum)
- Check that your account has access to the Inference API
- Verify the token hasn't expired
- Make sure you're using a valid token from https://huggingface.co/settings/tokens

### Inconsistent or incomplete output
- Lower the Temperature to 0.2-0.4 for more consistent results
- Check the Status column in output CSV to identify failed categories
- Failed categories can be extracted and reprocessed separately
- Zero GPU will provide more reliable processing with better resources

### Slow processing
- The model processes each unique category individually (includes retries)
- Large files with many unique categories will take longer
- Consider splitting very large files into smaller batches
- Zero GPU acceleration is automatically available for your Space
- Each category has a 0.5s delay to prevent rate limiting

## Local Development

To run locally:

```bash
# Install dependencies
pip install -r requirements.txt

# Set your Hugging Face token as an environment variable
# Windows (PowerShell):
$env:HF_TOKEN="your_hf_token_here"

# Linux/Mac:
export HF_TOKEN="your_hf_token_here"

# Run the app
python app.py
```

Get your token from: https://huggingface.co/settings/tokens

## License

This project uses the GPT-OSS-20B model via Hugging Face Inference API.