CV_website_classify / README.md
limitedonly41's picture
Update README.md
1531e35 verified
---
title: Website Category Classifier
emoji: πŸ”
colorFrom: blue
colorTo: green
sdk: gradio
app_file: app.py
pinned: false
license: mit
hardware: zero-gpu
---
# Website Category Classifier (Fixed Version)
This application classifies websites into three categories using a fine-tuned Mistral 7B model:
- **OTHER**: General websites
- **NEWS/BLOG**: News websites and blogs
- **E-COMMERCE**: Online shopping sites
## πŸ”§ Fixed Issues
This version resolves the torch.int1 AttributeError by:
- Removing unsloth dependency (causing compatibility issues)
- Using transformers library directly
- Pinning PyTorch version to avoid conflicts
- Adding proper error handling
## Features
- **Batch Processing**: Classify up to 20 URLs at once
- **AI-Powered**: Uses Mistral 7B model for accurate classification
- **Real-time Progress**: Shows processing progress
- **GPU Acceleration**: Powered by Hugging Face ZeroGPU
- **Error Recovery**: Handles failed URLs gracefully
## Usage
1. Enter URLs (one per line) in the input textbox
2. Click "πŸš€ Classify Websites"
3. View results showing each URL and its predicted category
## Model
This app uses the `limitedonly41/website_mistral7b_v02` model loaded via transformers library with 4-bit quantization for efficiency.
## Technical Details
- Built with Gradio for the interface
- Uses transformers instead of unsloth for compatibility
- ZeroGPU decorator for efficient GPU utilization
- Async processing for better performance
- Translation support for non-English websites
## Limitations
- Maximum 20 URLs per batch (reduced for stability)
- 30-second timeout per URL
- Requires internet connection for URL scraping
- Model loading may take a few minutes on first run