Create README.md
Browse files
README.md
CHANGED
|
@@ -1,12 +1,60 @@
|
|
| 1 |
---
|
| 2 |
-
title:
|
| 3 |
-
emoji:
|
| 4 |
-
colorFrom:
|
| 5 |
-
colorTo:
|
| 6 |
sdk: gradio
|
| 7 |
-
sdk_version:
|
| 8 |
app_file: app.py
|
| 9 |
pinned: false
|
|
|
|
|
|
|
| 10 |
---
|
| 11 |
|
| 12 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
---
|
| 2 |
+
title: Website Category Classifier
|
| 3 |
+
emoji: π
|
| 4 |
+
colorFrom: blue
|
| 5 |
+
colorTo: green
|
| 6 |
sdk: gradio
|
| 7 |
+
sdk_version: 4.44.0
|
| 8 |
app_file: app.py
|
| 9 |
pinned: false
|
| 10 |
+
license: mit
|
| 11 |
+
hardware: zero-gpu
|
| 12 |
---
|
| 13 |
|
| 14 |
+
# Website Category Classifier (Fixed Version)
|
| 15 |
+
|
| 16 |
+
This application classifies websites into three categories using a fine-tuned Mistral 7B model:
|
| 17 |
+
- **OTHER**: General websites
|
| 18 |
+
- **NEWS/BLOG**: News websites and blogs
|
| 19 |
+
- **E-COMMERCE**: Online shopping sites
|
| 20 |
+
|
| 21 |
+
## π§ Fixed Issues
|
| 22 |
+
|
| 23 |
+
This version resolves the torch.int1 AttributeError by:
|
| 24 |
+
- Removing unsloth dependency (causing compatibility issues)
|
| 25 |
+
- Using transformers library directly
|
| 26 |
+
- Pinning PyTorch version to avoid conflicts
|
| 27 |
+
- Adding proper error handling
|
| 28 |
+
|
| 29 |
+
## Features
|
| 30 |
+
|
| 31 |
+
- **Batch Processing**: Classify up to 20 URLs at once
|
| 32 |
+
- **AI-Powered**: Uses Mistral 7B model for accurate classification
|
| 33 |
+
- **Real-time Progress**: Shows processing progress
|
| 34 |
+
- **GPU Acceleration**: Powered by Hugging Face ZeroGPU
|
| 35 |
+
- **Error Recovery**: Handles failed URLs gracefully
|
| 36 |
+
|
| 37 |
+
## Usage
|
| 38 |
+
|
| 39 |
+
1. Enter URLs (one per line) in the input textbox
|
| 40 |
+
2. Click "π Classify Websites"
|
| 41 |
+
3. View results showing each URL and its predicted category
|
| 42 |
+
|
| 43 |
+
## Model
|
| 44 |
+
|
| 45 |
+
This app uses the `limitedonly41/website_mistral7b_v02` model loaded via transformers library with 4-bit quantization for efficiency.
|
| 46 |
+
|
| 47 |
+
## Technical Details
|
| 48 |
+
|
| 49 |
+
- Built with Gradio for the interface
|
| 50 |
+
- Uses transformers instead of unsloth for compatibility
|
| 51 |
+
- ZeroGPU decorator for efficient GPU utilization
|
| 52 |
+
- Async processing for better performance
|
| 53 |
+
- Translation support for non-English websites
|
| 54 |
+
|
| 55 |
+
## Limitations
|
| 56 |
+
|
| 57 |
+
- Maximum 20 URLs per batch (reduced for stability)
|
| 58 |
+
- 30-second timeout per URL
|
| 59 |
+
- Requires internet connection for URL scraping
|
| 60 |
+
- Model loading may take a few minutes on first run
|