| license: mit | |
| language: | |
| - zh | |
| - en | |
| tags: | |
| - url-classification | |
| - list-page-detection | |
| - detail-page-detection | |
| - qwen | |
| - fine-tuning | |
| - lora | |
| - url-parser | |
| - peft | |
| base_model: Qwen/Qwen2.5-1.5B | |
| # URL Page Type Classifier (LoRA) | |
| 基于 Qwen2.5-1.5B + LoRA 微调的URL类型分类模型,用于判断URL是列表页还是详情页。 | |
| ## 模型信息 | |
| | 项目 | 详情 | | |
| |------|------| | |
| | **基础模型** | Qwen/Qwen2.5-1.5B | | |
| | **微调方法** | LoRA (r=16, alpha=32) | | |
| | **可训练参数** | ~18M (1.18%) | | |
| ## 性能测试 | |
| | 测试集 | 样本数 | 准确率 | | |
| |--------|--------|--------| | |
| | 训练数据 | 100 | **100%** | | |
| | 随机生成URL | 1000 | **100%** | | |
| ## 使用方法 | |
| ```python | |
| from transformers import AutoTokenizer, AutoModelForCausalLM | |
| from peft import PeftModel | |
| # 加载基础模型 | |
| base_model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5-1.5B", device_map="auto", trust_remote_code=True) | |
| tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-1.5B", trust_remote_code=True) | |
| # 加载LoRA | |
| model = PeftModel.from_pretrained(base_model, "windlx/url-classifier-lora") | |
| model.eval() | |
| # 推理 | |
| url = "https://example.com/product/12345" | |
| # ... (推理代码) | |
| ``` | |
| ## 相关链接 | |
| - **Merged模型**: https://huggingface.co/windlx/url-classifier-model | |
| - **GitHub**: https://github.com/xiuxiu/url-classifier | |