CV_website_classify / README.md
limitedonly41's picture
Update README.md
1531e35 verified

A newer version of the Gradio SDK is available: 6.5.1

Upgrade
metadata
title: Website Category Classifier
emoji: πŸ”
colorFrom: blue
colorTo: green
sdk: gradio
app_file: app.py
pinned: false
license: mit
hardware: zero-gpu

Website Category Classifier (Fixed Version)

This application classifies websites into three categories using a fine-tuned Mistral 7B model:

  • OTHER: General websites
  • NEWS/BLOG: News websites and blogs
  • E-COMMERCE: Online shopping sites

πŸ”§ Fixed Issues

This version resolves the torch.int1 AttributeError by:

  • Removing unsloth dependency (causing compatibility issues)
  • Using transformers library directly
  • Pinning PyTorch version to avoid conflicts
  • Adding proper error handling

Features

  • Batch Processing: Classify up to 20 URLs at once
  • AI-Powered: Uses Mistral 7B model for accurate classification
  • Real-time Progress: Shows processing progress
  • GPU Acceleration: Powered by Hugging Face ZeroGPU
  • Error Recovery: Handles failed URLs gracefully

Usage

  1. Enter URLs (one per line) in the input textbox
  2. Click "πŸš€ Classify Websites"
  3. View results showing each URL and its predicted category

Model

This app uses the limitedonly41/website_mistral7b_v02 model loaded via transformers library with 4-bit quantization for efficiency.

Technical Details

  • Built with Gradio for the interface
  • Uses transformers instead of unsloth for compatibility
  • ZeroGPU decorator for efficient GPU utilization
  • Async processing for better performance
  • Translation support for non-English websites

Limitations

  • Maximum 20 URLs per batch (reduced for stability)
  • 30-second timeout per URL
  • Requires internet connection for URL scraping
  • Model loading may take a few minutes on first run