Sothay's picture
Update README.md
d11cca5 verified
|
Raw
History Blame Contribute Delete
9.9 kB
---
license: gemma
pipeline_tag: text-generation
language:
- en
- km
tags:
- customs
- hs-code
- classification
- cambodia
- gemma
- unsloth
- qlora
base_model:
- unsloth/gemma-4-E4B-it
---
# Gemma‑4 HS Code Classifier (Cambodia Customs)
A **Gemma‑4‑E4B‑it** model fine‑tuned with QLoRA to classify product descriptions into **8‑digit HS codes** and return corresponding Cambodian trade rates (Customs Duty, Special Tax, VAT, Excise Tax).
Built with **[Unsloth](https://github.com/unslothai/unsloth)** for fast, memory‑efficient fine‑tuning on a single T4 GPU.
---
## 🎯 What it does
Given a plain‑English product description, the model generates:
```text
HS Code: 61091000
Unit: PIECE
Customs Duty: 25%
Special Tax: 0%
VAT: 10%
Excise Tax: 0%
```
**⚠️ Important**: The rates in the text are generated by the model and **may be wrong**.
For production, always use the included **lookup table** (`hs_code_lookup.json`) – see [Production use](#-production-use) below.
---
## 🚀 Quick start (in Colab or locally)
This repository contains **only the LoRA adapter**, not the full model.
Loading it will automatically download the base model (`unsloth/gemma-4-E4B-it`) and apply the adapter in 4-bit.
```python
# %% [Install]
%%capture
import os, re
# Install everything needed for the T4 Colab environment
!pip install sentencepiece protobuf "datasets==4.3.0" "huggingface_hub>=0.34.0" hf_transfer
!pip install --no-deps unsloth_zoo bitsandbytes accelerate xformers peft trl triton unsloth
!pip install --no-deps --upgrade "torchao>=0.16.0"
!pip install --no-deps transformers==5.5.0 "tokenizers>=0.22.0,<=0.23.0"
!pip install torchcodec
import torch
torch._dynamo.config.recompile_limit = 64
import warnings
# Suppress the specific PyTorch size check warning from bitsandbytes
warnings.filterwarnings(
"ignore",
category=FutureWarning,
message=".*_check_is_size will be removed in a future PyTorch release.*"
)
#------------
from unsloth import FastModel
model, tokenizer = FastModel.from_pretrained(
"Sothay/gemma4-hscode-classifier", # LoRA adapter on Hugging Face
load_in_4bit = True, # required – the adapter was trained in 4-bit
max_seq_length = 1024,
)
# ---------- Inference with the authoritative lookup table (recommended) ----------
import json, re
with open("hs_code_lookup.json") as f:
rate_lookup = json.load(f)
def predict_hs_code(description: str) -> dict:
system_prompt = (
"You are a customs compliance AI. Classify the product description to its "
"correct 8-digit HS code and output the corresponding trade rates (Customs Duty, "
"Special Tax, VAT, Excise Tax) and unit."
)
messages = [
{"role": "system", "content": [{"type": "text", "text": system_prompt}]},
{"role": "user", "content": [{"type": "text", "text": f"Description: {description}"}]},
]
inputs = tokenizer.apply_chat_template(messages, add_generation_prompt=True, return_tensors="pt").to("cuda")
out = model.generate(inputs, max_new_tokens=80, do_sample=False)
text = tokenizer.decode(out[0][inputs.shape[1]:], skip_special_tokens=True)
m = re.search(r"HS Code:\s*([0-9]{4,10})", text)
code = m.group(1) if m else None
if code and code in rate_lookup:
return {"hs_code": code, "source": "lookup_table", **rate_lookup[code]}
return {"hs_code": code, "source": "model_only_UNVERIFIED", "raw_output": text}
print(predict_hs_code("Men's cotton knitted T-shirt"))
```
---
## 🔍 Raw model output (debugging)
If you want to see exactly what the model generated (including the rates it predicted) without the lookup table, use the raw‑output function below.
**Do not** use these rates in production – they are only for debugging or confidence evaluation.
```python
def predict_hs_code_raw(description: str, max_new_tokens=100) -> dict:
system_prompt = (
"You are a customs compliance AI. Classify the product description to its "
"correct 8-digit HS code and output the corresponding trade rates (Customs Duty, "
"Special Tax, VAT, Excise Tax) and unit."
)
messages = [
{"role": "system", "content": [{"type": "text", "text": system_prompt}]},
{"role": "user", "content": [{"type": "text", "text": f"Description: {description}"}]},
]
inputs = tokenizer.apply_chat_template(
messages, add_generation_prompt=True, tokenize=True,
return_dict=True, return_tensors="pt",
).to("cuda")
out = model.generate(**inputs, max_new_tokens=max_new_tokens, use_cache=True, do_sample=False)
raw_text = tokenizer.decode(out[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True)
def extract(pattern, text):
m = re.search(pattern, text)
return m.group(1).strip() if m else None
return {
"hs_code": extract(r"HS Code:\s*([0-9.]+)", raw_text),
"unit": extract(r"Unit:\s*(.*)", raw_text),
"cd_rate": extract(r"Customs Duty:\s*([\d.]+)%?", raw_text),
"st_rate": extract(r"Special Tax:\s*([\d.]+)%?", raw_text),
"vat_rate": extract(r"VAT:\s*([\d.]+)%?", raw_text),
"et_rate": extract(r"Excise Tax:\s*([\d.]+)%?", raw_text),
"raw_output": raw_text
}
# Example
raw = predict_hs_code_raw("Men's cotton knitted T-shirt")
print(raw["raw_output"])
print(raw["hs_code"]) # model’s guess
```
---
## 🧠 Training details
- **Base model**: `unsloth/gemma-4-E4B-it` (4‑bit QLoRA)
- **Adapter rank**: r=16, alpha=16, targeting all language & attention layers
- **Gradient checkpointing**: Unsloth’s own implementation (avoids Gemma‑4 KV‑shared layer bug)
- **Dataset**: Custom Cambodian HS‑code dataset (`hs_code.csv`) with descriptions, codes, and official rates
- Cleaned, deduplicated, split into 90/10 train/validation
- Chat roles fixed to system/user/assistant (Gemma‑4 standard)
- **Training config**: 3 epochs, effective batch size 8, learning rate 2e‑4, linear schedule, eval & save every epoch, best model loaded
- **Hardware**: Google Colab T4 (16 GB) – peak memory ~10 GB thanks to QLoRA
- **Accuracy**: Evaluated on held‑out examples (exact HS‑code match) – see model card for current numbers
---
## ⚖️ Production use
> **Always use the lookup table – never trust the model’s generated rates.**
The model is a **classifier**: description → HS code.
Rates are fetched deterministically from `hs_code_lookup.json`, a file extracted from the same official tariff data used during training.
Why?
- A causal LM recalling a rate from memory will occasionally hallucinate – a customs tool with confident, wrong numbers is worse than one that says “I don’t know”.
- The lookup table guarantees 100% accuracy on rates once the HS code is correct.
The `hs_code_lookup.json` file is included in this repository and can be downloaded via:
```python
from huggingface_hub import hf_hub_download
hf_hub_download("Sothay/gemma4-hscode-classifier", "hs_code_lookup.json")
```
---
## 📦 Files in this repository
| File | Description |
|------|-------------|
| `adapter_model.safetensors` | LoRA adapter weights (few MB) |
| `adapter_config.json` | Adapter configuration (references base model) |
| `tokenizer.json`, `tokenizer_config.json` | Tokenizer files |
| `hs_code_lookup.json` | Authoritative rate table for production inference |
| `README.md` | This file |
> **Note**: Only the adapter is stored here – the full Gemma‑4 base model is automatically fetched from Unsloth when you call `FastModel.from_pretrained`.
> If you need a **merged, full‑precision model** (for vLLM, TGI, etc.), generate it locally with Unsloth:
> ```python
> model.save_pretrained_merged("merged_fp16", tokenizer, save_method="merged_16bit")
> ```
---
## 🦙 Ollama / llama.cpp (GGUF)
Export a quantized GGUF directly from the loaded adapter:
```python
model.save_pretrained_gguf("gguf_model", tokenizer, quantization_method="q4_k_m")
```
Then use with Ollama (see [`Modelfile` example](https://ollama.com) – set temperature 0, deterministic sampling).
---
## 📊 Example predictions
| Description | Predicted HS Code | Unit | CD | ST | VAT | ET |
|-------------|-------------------|------|----|----|-----|----|
| Toyota Hilux pickup, diesel 2.8L | 87042110 | UNIT | 35% | 50% | 10% | 0% |
| iPhone 15 Pro Max 256GB | 85171200 | UNIT | 0% | 0% | 10% | 0% |
| Heineken beer 330ml can | 22030010 | LTR | 35% | 30% | 10% | 0% |
*(Rates from lookup table – not generated by the model.)*
---
## ⚠️ Limitations
- The model may output incorrect HS codes for ambiguous, misspelled, or region‑specific descriptions.
- It was trained on a fixed set of Cambodian HS codes; revisions after the training data cutoff are not covered.
- Duty rates can become outdated – always cross‑check with the latest official tariff schedule.
- The model is a classifier, **not** a legal authority. For binding decisions, consult a customs professional.
---
## 📝 License
This model is a derivative of **Gemma‑4‑E4B‑it** and is subject to the [Gemma license](https://ai.google.dev/gemma/terms).
The HS‑code dataset and lookup table are the property of their respective owners.
---
## 🙏 Acknowledgments
- [Unsloth](https://github.com/unslothai/unsloth) – made QLoRA + Gemma‑4 on a T4 effortless
- [Google DeepMind](https://deepmind.google) – for the Gemma family of models
---
## 📚 Citation
If you use this model, please cite:
```bibtex
@misc{gemma4-hscode-classifier,
author = {Sothay},
title = {Gemma‑4 HS Code Classifier (Cambodia Customs)},
year = 2025,
publisher = {Hugging Face},
howpublished = {\url{https://huggingface.co/Sothay/gemma4-hscode-classifier}}
}
```
---
**Author**: [Sothay](https://huggingface.co/Sothay)
**Model card version**: 1.2