|
|
--- |
|
|
datasets: |
|
|
- saurabhshahane/ecommerce-text-classification |
|
|
language: |
|
|
- en |
|
|
library_name: transformers |
|
|
license: apache-2.0 |
|
|
metrics: |
|
|
- accuracy |
|
|
- f1 |
|
|
pipeline_tag: text-generation |
|
|
tags: |
|
|
- Ecommerce |
|
|
- Phi-3.5 |
|
|
- Fine-tuned |
|
|
--- |
|
|
|
|
|
## Phi-3.5-mini-instruct-Ecommerce-Text-Classification |
|
|
This model is a fine-tuned version of [microsoft/Phi-3.5-mini-instruct](https://huggingface.co/microsoft/Phi-3.5-mini-instruct) on an [saurabhshahane/ecommerce-text-classification](https://www.kaggle.com/datasets/saurabhshahane/ecommerce-text-classification) dataset. |
|
|
|
|
|
## Tutorial |
|
|
|
|
|
Customize Phi-3.5-mini-instruct model to predict various Ecommerce Categories from the text. |
|
|
|
|
|
## Use with Transformers |
|
|
|
|
|
```python |
|
|
from transformers import AutoTokenizer,AutoModelForCausalLM,pipeline |
|
|
import torch |
|
|
|
|
|
model_id = "kingabzpro/Phi-3.5-mini-instruct-Ecommerce-Text-Classification" |
|
|
|
|
|
tokenizer = AutoTokenizer.from_pretrained(model_id) |
|
|
|
|
|
model = AutoModelForCausalLM.from_pretrained( |
|
|
model_id, |
|
|
return_dict=True, |
|
|
low_cpu_mem_usage=True, |
|
|
torch_dtype=torch.float16, |
|
|
device_map="auto", |
|
|
trust_remote_code=True, |
|
|
) |
|
|
|
|
|
text = "Inalsa Dazzle Glass Top, 3 Burner Gas Stove with Rust Proof Powder Coated Body, Black Toughened Glass Top, 2 Medium and 1 Small High Efficiency Brass Burners, Aluminum Mixing Tubes, Powder Coated Body, Inbuilt Stainless Steel Drip Trays, 360 degree Swivel Nozzle,Bigger Legs to Facilitate Cleaning Under Cooktop" |
|
|
prompt = f"""Classify the E-commerce text into Electronics, Household, Books and Clothing. |
|
|
text: {text} |
|
|
label: """.strip() |
|
|
|
|
|
pipe = pipeline( |
|
|
"text-generation", |
|
|
model=model, |
|
|
tokenizer=tokenizer, |
|
|
torch_dtype=torch.float16, |
|
|
device_map="auto", |
|
|
) |
|
|
|
|
|
outputs = pipe(prompt, max_new_tokens=4, do_sample=True, temperature=0.1) |
|
|
|
|
|
print(outputs[0]["generated_text"].split("label: ")[-1].strip()) |
|
|
|
|
|
# Household |
|
|
``` |
|
|
## Results |
|
|
|
|
|
```bash |
|
|
Accuracy: 0.860 |
|
|
Accuracy for label Electronics: 0.825 |
|
|
Accuracy for label Household: 0.926 |
|
|
Accuracy for label Books: 0.683 |
|
|
Accuracy for label Clothing: 0.947 |
|
|
``` |
|
|
**Classification Report:** |
|
|
|
|
|
```bash |
|
|
precision recall f1-score support |
|
|
|
|
|
Electronics 0.97 0.82 0.89 40 |
|
|
Household 0.88 0.93 0.90 81 |
|
|
Books 0.90 0.68 0.78 41 |
|
|
Clothing 0.88 0.95 0.91 38 |
|
|
|
|
|
micro avg 0.90 0.86 0.88 200 |
|
|
macro avg 0.91 0.85 0.87 200 |
|
|
weighted avg 0.90 0.86 0.88 200 |
|
|
``` |
|
|
|
|
|
**Confusion Matrix:** |
|
|
|
|
|
```bash |
|
|
[[33 6 1 0] |
|
|
[ 1 75 2 3] |
|
|
[ 0 3 28 2] |
|
|
[ 0 1 0 36]] |
|
|
``` |