Update README.md

d11cca5 verified 9 days ago

9.9 kB

	---
	license: gemma
	pipeline_tag: text-generation
	language:
	- en
	- km
	tags:
	- customs
	- hs-code
	- classification
	- cambodia
	- gemma
	- unsloth
	- qlora
	base_model:
	- unsloth/gemma-4-E4B-it
	---

	# Gemma‑4 HS Code Classifier (Cambodia Customs)

	A Gemma‑4‑E4B‑it model fine‑tuned with QLoRA to classify product descriptions into 8‑digit HS codes and return corresponding Cambodian trade rates (Customs Duty, Special Tax, VAT, Excise Tax).

	Built with [Unsloth](https://github.com/unslothai/unsloth) for fast, memory‑efficient fine‑tuning on a single T4 GPU.

	---

	## 🎯 What it does

	Given a plain‑English product description, the model generates:

	```text
	HS Code: 61091000
	Unit: PIECE
	Customs Duty: 25%
	Special Tax: 0%
	VAT: 10%
	Excise Tax: 0%
	```

	⚠️ Important: The rates in the text are generated by the model and may be wrong.
	For production, always use the included lookup table (`hs_code_lookup.json`) – see [Production use](#-production-use) below.

	---

	## 🚀 Quick start (in Colab or locally)

	This repository contains only the LoRA adapter, not the full model.
	Loading it will automatically download the base model (`unsloth/gemma-4-E4B-it`) and apply the adapter in 4-bit.

	```python

	# %% [Install]
	%%capture
	import os, re
	# Install everything needed for the T4 Colab environment
	!pip install sentencepiece protobuf "datasets==4.3.0" "huggingface_hub>=0.34.0" hf_transfer
	!pip install --no-deps unsloth_zoo bitsandbytes accelerate xformers peft trl triton unsloth
	!pip install --no-deps --upgrade "torchao>=0.16.0"
	!pip install --no-deps transformers==5.5.0 "tokenizers>=0.22.0,<=0.23.0"
	!pip install torchcodec
	import torch
	torch._dynamo.config.recompile_limit = 64


	import warnings

	# Suppress the specific PyTorch size check warning from bitsandbytes
	warnings.filterwarnings(
	"ignore",
	category=FutureWarning,
	message="._check_is_size will be removed in a future PyTorch release."
	)

	#------------

	from unsloth import FastModel

	model, tokenizer = FastModel.from_pretrained(
	"Sothay/gemma4-hscode-classifier", # LoRA adapter on Hugging Face
	load_in_4bit = True, # required – the adapter was trained in 4-bit
	max_seq_length = 1024,
	)

	# ---------- Inference with the authoritative lookup table (recommended) ----------
	import json, re

	with open("hs_code_lookup.json") as f:
	rate_lookup = json.load(f)

	def predict_hs_code(description: str) -> dict:
	system_prompt = (
	"You are a customs compliance AI. Classify the product description to its "
	"correct 8-digit HS code and output the corresponding trade rates (Customs Duty, "
	"Special Tax, VAT, Excise Tax) and unit."
	)
	messages = [
	{"role": "system", "content": [{"type": "text", "text": system_prompt}]},
	{"role": "user", "content": [{"type": "text", "text": f"Description: {description}"}]},
	]
	inputs = tokenizer.apply_chat_template(messages, add_generation_prompt=True, return_tensors="pt").to("cuda")
	out = model.generate(inputs, max_new_tokens=80, do_sample=False)
	text = tokenizer.decode(out[0][inputs.shape[1]:], skip_special_tokens=True)

	m = re.search(r"HS Code:\s*([0-9]{4,10})", text)
	code = m.group(1) if m else None
	if code and code in rate_lookup:
	return {"hs_code": code, "source": "lookup_table", **rate_lookup[code]}
	return {"hs_code": code, "source": "model_only_UNVERIFIED", "raw_output": text}

	print(predict_hs_code("Men's cotton knitted T-shirt"))
	```

	---

	## 🔍 Raw model output (debugging)

	If you want to see exactly what the model generated (including the rates it predicted) without the lookup table, use the raw‑output function below.
	Do not use these rates in production – they are only for debugging or confidence evaluation.

	```python
	def predict_hs_code_raw(description: str, max_new_tokens=100) -> dict:
	system_prompt = (
	"You are a customs compliance AI. Classify the product description to its "
	"correct 8-digit HS code and output the corresponding trade rates (Customs Duty, "
	"Special Tax, VAT, Excise Tax) and unit."
	)
	messages = [
	{"role": "system", "content": [{"type": "text", "text": system_prompt}]},
	{"role": "user", "content": [{"type": "text", "text": f"Description: {description}"}]},
	]
	inputs = tokenizer.apply_chat_template(
	messages, add_generation_prompt=True, tokenize=True,
	return_dict=True, return_tensors="pt",
	).to("cuda")

	out = model.generate(**inputs, max_new_tokens=max_new_tokens, use_cache=True, do_sample=False)
	raw_text = tokenizer.decode(out[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True)

	def extract(pattern, text):
	m = re.search(pattern, text)
	return m.group(1).strip() if m else None

	return {
	"hs_code": extract(r"HS Code:\s*([0-9.]+)", raw_text),
	"unit": extract(r"Unit:\s(.)", raw_text),
	"cd_rate": extract(r"Customs Duty:\s*([\d.]+)%?", raw_text),
	"st_rate": extract(r"Special Tax:\s*([\d.]+)%?", raw_text),
	"vat_rate": extract(r"VAT:\s*([\d.]+)%?", raw_text),
	"et_rate": extract(r"Excise Tax:\s*([\d.]+)%?", raw_text),
	"raw_output": raw_text
	}

	# Example
	raw = predict_hs_code_raw("Men's cotton knitted T-shirt")
	print(raw["raw_output"])
	print(raw["hs_code"]) # model’s guess
	```

	---

	## 🧠 Training details

	- Base model: `unsloth/gemma-4-E4B-it` (4‑bit QLoRA)
	- Adapter rank: r=16, alpha=16, targeting all language & attention layers
	- Gradient checkpointing: Unsloth’s own implementation (avoids Gemma‑4 KV‑shared layer bug)
	- Dataset: Custom Cambodian HS‑code dataset (`hs_code.csv`) with descriptions, codes, and official rates
	- Cleaned, deduplicated, split into 90/10 train/validation
	- Chat roles fixed to system/user/assistant (Gemma‑4 standard)
	- Training config: 3 epochs, effective batch size 8, learning rate 2e‑4, linear schedule, eval & save every epoch, best model loaded
	- Hardware: Google Colab T4 (16 GB) – peak memory ~10 GB thanks to QLoRA
	- Accuracy: Evaluated on held‑out examples (exact HS‑code match) – see model card for current numbers

	---

	## ⚖️ Production use

	> Always use the lookup table – never trust the model’s generated rates.

	The model is a classifier: description → HS code.
	Rates are fetched deterministically from `hs_code_lookup.json`, a file extracted from the same official tariff data used during training.

	Why?
	- A causal LM recalling a rate from memory will occasionally hallucinate – a customs tool with confident, wrong numbers is worse than one that says “I don’t know”.
	- The lookup table guarantees 100% accuracy on rates once the HS code is correct.

	The `hs_code_lookup.json` file is included in this repository and can be downloaded via:

	```python
	from huggingface_hub import hf_hub_download
	hf_hub_download("Sothay/gemma4-hscode-classifier", "hs_code_lookup.json")
	```

	---

	## 📦 Files in this repository

	\| File \| Description \|
	\|------\|-------------\|
	\| `adapter_model.safetensors` \| LoRA adapter weights (few MB) \|
	\| `adapter_config.json` \| Adapter configuration (references base model) \|
	\| `tokenizer.json`, `tokenizer_config.json` \| Tokenizer files \|
	\| `hs_code_lookup.json` \| Authoritative rate table for production inference \|
	\| `README.md` \| This file \|

	> Note: Only the adapter is stored here – the full Gemma‑4 base model is automatically fetched from Unsloth when you call `FastModel.from_pretrained`.
	> If you need a merged, full‑precision model (for vLLM, TGI, etc.), generate it locally with Unsloth:
	> ```python
	> model.save_pretrained_merged("merged_fp16", tokenizer, save_method="merged_16bit")
	> ```

	---

	## 🦙 Ollama / llama.cpp (GGUF)

	Export a quantized GGUF directly from the loaded adapter:

	```python
	model.save_pretrained_gguf("gguf_model", tokenizer, quantization_method="q4_k_m")
	```

	Then use with Ollama (see [`Modelfile` example](https://ollama.com) – set temperature 0, deterministic sampling).

	---

	## 📊 Example predictions

	\| Description \| Predicted HS Code \| Unit \| CD \| ST \| VAT \| ET \|
	\|-------------\|-------------------\|------\|----\|----\|-----\|----\|
	\| Toyota Hilux pickup, diesel 2.8L \| 87042110 \| UNIT \| 35% \| 50% \| 10% \| 0% \|
	\| iPhone 15 Pro Max 256GB \| 85171200 \| UNIT \| 0% \| 0% \| 10% \| 0% \|
	\| Heineken beer 330ml can \| 22030010 \| LTR \| 35% \| 30% \| 10% \| 0% \|

	(Rates from lookup table – not generated by the model.)

	---

	## ⚠️ Limitations

	- The model may output incorrect HS codes for ambiguous, misspelled, or region‑specific descriptions.
	- It was trained on a fixed set of Cambodian HS codes; revisions after the training data cutoff are not covered.
	- Duty rates can become outdated – always cross‑check with the latest official tariff schedule.
	- The model is a classifier, not a legal authority. For binding decisions, consult a customs professional.

	---

	## 📝 License

	This model is a derivative of Gemma‑4‑E4B‑it and is subject to the [Gemma license](https://ai.google.dev/gemma/terms).
	The HS‑code dataset and lookup table are the property of their respective owners.

	---

	## 🙏 Acknowledgments

	- [Unsloth](https://github.com/unslothai/unsloth) – made QLoRA + Gemma‑4 on a T4 effortless
	- [Google DeepMind](https://deepmind.google) – for the Gemma family of models

	---

	## 📚 Citation

	If you use this model, please cite:

	```bibtex
	@misc{gemma4-hscode-classifier,
	author = {Sothay},
	title = {Gemma‑4 HS Code Classifier (Cambodia Customs)},
	year = 2025,
	publisher = {Hugging Face},
	howpublished = {\url{https://huggingface.co/Sothay/gemma4-hscode-classifier}}
	}
	```

	---

	Author: [Sothay](https://huggingface.co/Sothay)
	Model card version: 1.2