File size: 2,216 Bytes
749f3b0 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 |
---
library_name: transformers
base_model: Qwen/Qwen2.5-1.5B-Instruct
license: apache-2.0
tags:
- qwen2.5
- avito
- validation
- classification
- text-generation
- merged-lora
language:
- ru
pipeline_tag: text-generation
---
# Avito Validation Model (Merged)
Fine-tuned Qwen2.5-1.5B-Instruct для валидации объявлений Avito.
LoRA адаптер смержен с базовой моделью для удобства развертывания.
## Model Details
- **Base Model:** [Qwen/Qwen2.5-1.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-1.5B-Instruct)
- **Training Method:** LoRA (merged)
- **LoRA Rank:** 16
- **LoRA Alpha:** 32
- **Target Modules:** q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
- **Training Platform:** Fireworks.ai (December 2024)
## Training Stats
- **Epochs:** 2
- **Steps:** 3,333
- **Training Sequences:** 34,672
- **Training Tokens:** ~101M
- **Final Loss:** 0.125
## Usage
### Direct Usage
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("Stepan222/avito-validation-merged")
tokenizer = AutoTokenizer.from_pretrained("Stepan222/avito-validation-merged")
# Example input
messages = [
{"role": "system", "content": "Ты эксперт по валидации объявлений. Всегда отвечай строго в JSON формате."},
{"role": "user", "content": '''АРТИКУЛ: "06L121011B"
ОБЪЯВЛЕНИЯ: [{"id": "7655180983", "title": "Насос водяной VAG 06L121011B", "snippet": "...", "price": 9890.0}]'''}
]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=512)
print(tokenizer.decode(outputs[0]))
```
## Input Format
```json
АРТИКУЛ: "<articulum>"
ОБЪЯВЛЕНИЯ: [
{"id": "...", "title": "...", "snippet": "...", "price": ..., "seller_reviews": ...},
...
]
```
## Output Format
```json
{
"passed_ids": ["id1", "id2", ...],
"rejected": [
{"id": "id3", "reason": "Причина отклонения"}
]
}
```
## License
Apache 2.0
|