Stepan222
/

avito-validation-merged

Text Generation

text-generation-inference

Model card Files Files and versions

avito-validation-merged / README.md

Stepan222's picture

Upload folder using huggingface_hub

749f3b0 verified about 1 month ago

|

history blame contribute delete

2.22 kB

	---
	library_name: transformers
	base_model: Qwen/Qwen2.5-1.5B-Instruct
	license: apache-2.0
	tags:
	- qwen2.5
	- avito
	- validation
	- classification
	- text-generation
	- merged-lora
	language:
	- ru
	pipeline_tag: text-generation
	---

	# Avito Validation Model (Merged)

	Fine-tuned Qwen2.5-1.5B-Instruct для валидации объявлений Avito.
	LoRA адаптер смержен с базовой моделью для удобства развертывания.

	## Model Details

	- Base Model: [Qwen/Qwen2.5-1.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-1.5B-Instruct)
	- Training Method: LoRA (merged)
	- LoRA Rank: 16
	- LoRA Alpha: 32
	- Target Modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
	- Training Platform: Fireworks.ai (December 2024)

	## Training Stats

	- Epochs: 2
	- Steps: 3,333
	- Training Sequences: 34,672
	- Training Tokens: ~101M
	- Final Loss: 0.125

	## Usage

	### Direct Usage

	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer

	model = AutoModelForCausalLM.from_pretrained("Stepan222/avito-validation-merged")
	tokenizer = AutoTokenizer.from_pretrained("Stepan222/avito-validation-merged")

	# Example input
	messages = [
	{"role": "system", "content": "Ты эксперт по валидации объявлений. Всегда отвечай строго в JSON формате."},
	{"role": "user", "content": '''АРТИКУЛ: "06L121011B"
	ОБЪЯВЛЕНИЯ: [{"id": "7655180983", "title": "Насос водяной VAG 06L121011B", "snippet": "...", "price": 9890.0}]'''}
	]

	text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
	inputs = tokenizer(text, return_tensors="pt")
	outputs = model.generate(**inputs, max_new_tokens=512)
	print(tokenizer.decode(outputs[0]))
	```

	## Input Format

	```json
	АРТИКУЛ: "<articulum>"
	ОБЪЯВЛЕНИЯ: [
	{"id": "...", "title": "...", "snippet": "...", "price": ..., "seller_reviews": ...},
	...
	]
	```

	## Output Format

	```json
	{
	"passed_ids": ["id1", "id2", ...],
	"rejected": [
	{"id": "id3", "reason": "Причина отклонения"}
	]
	}
	```

	## License

	Apache 2.0