--- license: gemma pipeline_tag: text-generation language: - en - km tags: - customs - hs-code - classification - cambodia - gemma - unsloth - qlora base_model: - unsloth/gemma-4-E4B-it --- # Gemma‑4 HS Code Classifier (Cambodia Customs) A **Gemma‑4‑E4B‑it** model fine‑tuned with QLoRA to classify product descriptions into **8‑digit HS codes** and return corresponding Cambodian trade rates (Customs Duty, Special Tax, VAT, Excise Tax). Built with **[Unsloth](https://github.com/unslothai/unsloth)** for fast, memory‑efficient fine‑tuning on a single T4 GPU. --- ## 🎯 What it does Given a plain‑English product description, the model generates: ```text HS Code: 61091000 Unit: PIECE Customs Duty: 25% Special Tax: 0% VAT: 10% Excise Tax: 0% ``` **⚠️ Important**: The rates in the text are generated by the model and **may be wrong**. For production, always use the included **lookup table** (`hs_code_lookup.json`) – see [Production use](#-production-use) below. --- ## 🚀 Quick start (in Colab or locally) This repository contains **only the LoRA adapter**, not the full model. Loading it will automatically download the base model (`unsloth/gemma-4-E4B-it`) and apply the adapter in 4-bit. ```python # %% [Install] %%capture import os, re # Install everything needed for the T4 Colab environment !pip install sentencepiece protobuf "datasets==4.3.0" "huggingface_hub>=0.34.0" hf_transfer !pip install --no-deps unsloth_zoo bitsandbytes accelerate xformers peft trl triton unsloth !pip install --no-deps --upgrade "torchao>=0.16.0" !pip install --no-deps transformers==5.5.0 "tokenizers>=0.22.0,<=0.23.0" !pip install torchcodec import torch torch._dynamo.config.recompile_limit = 64 import warnings # Suppress the specific PyTorch size check warning from bitsandbytes warnings.filterwarnings( "ignore", category=FutureWarning, message=".*_check_is_size will be removed in a future PyTorch release.*" ) #------------ from unsloth import FastModel model, tokenizer = FastModel.from_pretrained( "Sothay/gemma4-hscode-classifier", # LoRA adapter on Hugging Face load_in_4bit = True, # required – the adapter was trained in 4-bit max_seq_length = 1024, ) # ---------- Inference with the authoritative lookup table (recommended) ---------- import json, re with open("hs_code_lookup.json") as f: rate_lookup = json.load(f) def predict_hs_code(description: str) -> dict: system_prompt = ( "You are a customs compliance AI. Classify the product description to its " "correct 8-digit HS code and output the corresponding trade rates (Customs Duty, " "Special Tax, VAT, Excise Tax) and unit." ) messages = [ {"role": "system", "content": [{"type": "text", "text": system_prompt}]}, {"role": "user", "content": [{"type": "text", "text": f"Description: {description}"}]}, ] inputs = tokenizer.apply_chat_template(messages, add_generation_prompt=True, return_tensors="pt").to("cuda") out = model.generate(inputs, max_new_tokens=80, do_sample=False) text = tokenizer.decode(out[0][inputs.shape[1]:], skip_special_tokens=True) m = re.search(r"HS Code:\s*([0-9]{4,10})", text) code = m.group(1) if m else None if code and code in rate_lookup: return {"hs_code": code, "source": "lookup_table", **rate_lookup[code]} return {"hs_code": code, "source": "model_only_UNVERIFIED", "raw_output": text} print(predict_hs_code("Men's cotton knitted T-shirt")) ``` --- ## 🔍 Raw model output (debugging) If you want to see exactly what the model generated (including the rates it predicted) without the lookup table, use the raw‑output function below. **Do not** use these rates in production – they are only for debugging or confidence evaluation. ```python def predict_hs_code_raw(description: str, max_new_tokens=100) -> dict: system_prompt = ( "You are a customs compliance AI. Classify the product description to its " "correct 8-digit HS code and output the corresponding trade rates (Customs Duty, " "Special Tax, VAT, Excise Tax) and unit." ) messages = [ {"role": "system", "content": [{"type": "text", "text": system_prompt}]}, {"role": "user", "content": [{"type": "text", "text": f"Description: {description}"}]}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to("cuda") out = model.generate(**inputs, max_new_tokens=max_new_tokens, use_cache=True, do_sample=False) raw_text = tokenizer.decode(out[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True) def extract(pattern, text): m = re.search(pattern, text) return m.group(1).strip() if m else None return { "hs_code": extract(r"HS Code:\s*([0-9.]+)", raw_text), "unit": extract(r"Unit:\s*(.*)", raw_text), "cd_rate": extract(r"Customs Duty:\s*([\d.]+)%?", raw_text), "st_rate": extract(r"Special Tax:\s*([\d.]+)%?", raw_text), "vat_rate": extract(r"VAT:\s*([\d.]+)%?", raw_text), "et_rate": extract(r"Excise Tax:\s*([\d.]+)%?", raw_text), "raw_output": raw_text } # Example raw = predict_hs_code_raw("Men's cotton knitted T-shirt") print(raw["raw_output"]) print(raw["hs_code"]) # model’s guess ``` --- ## 🧠 Training details - **Base model**: `unsloth/gemma-4-E4B-it` (4‑bit QLoRA) - **Adapter rank**: r=16, alpha=16, targeting all language & attention layers - **Gradient checkpointing**: Unsloth’s own implementation (avoids Gemma‑4 KV‑shared layer bug) - **Dataset**: Custom Cambodian HS‑code dataset (`hs_code.csv`) with descriptions, codes, and official rates - Cleaned, deduplicated, split into 90/10 train/validation - Chat roles fixed to system/user/assistant (Gemma‑4 standard) - **Training config**: 3 epochs, effective batch size 8, learning rate 2e‑4, linear schedule, eval & save every epoch, best model loaded - **Hardware**: Google Colab T4 (16 GB) – peak memory ~10 GB thanks to QLoRA - **Accuracy**: Evaluated on held‑out examples (exact HS‑code match) – see model card for current numbers --- ## ⚖️ Production use > **Always use the lookup table – never trust the model’s generated rates.** The model is a **classifier**: description → HS code. Rates are fetched deterministically from `hs_code_lookup.json`, a file extracted from the same official tariff data used during training. Why? - A causal LM recalling a rate from memory will occasionally hallucinate – a customs tool with confident, wrong numbers is worse than one that says “I don’t know”. - The lookup table guarantees 100% accuracy on rates once the HS code is correct. The `hs_code_lookup.json` file is included in this repository and can be downloaded via: ```python from huggingface_hub import hf_hub_download hf_hub_download("Sothay/gemma4-hscode-classifier", "hs_code_lookup.json") ``` --- ## 📦 Files in this repository | File | Description | |------|-------------| | `adapter_model.safetensors` | LoRA adapter weights (few MB) | | `adapter_config.json` | Adapter configuration (references base model) | | `tokenizer.json`, `tokenizer_config.json` | Tokenizer files | | `hs_code_lookup.json` | Authoritative rate table for production inference | | `README.md` | This file | > **Note**: Only the adapter is stored here – the full Gemma‑4 base model is automatically fetched from Unsloth when you call `FastModel.from_pretrained`. > If you need a **merged, full‑precision model** (for vLLM, TGI, etc.), generate it locally with Unsloth: > ```python > model.save_pretrained_merged("merged_fp16", tokenizer, save_method="merged_16bit") > ``` --- ## 🦙 Ollama / llama.cpp (GGUF) Export a quantized GGUF directly from the loaded adapter: ```python model.save_pretrained_gguf("gguf_model", tokenizer, quantization_method="q4_k_m") ``` Then use with Ollama (see [`Modelfile` example](https://ollama.com) – set temperature 0, deterministic sampling). --- ## 📊 Example predictions | Description | Predicted HS Code | Unit | CD | ST | VAT | ET | |-------------|-------------------|------|----|----|-----|----| | Toyota Hilux pickup, diesel 2.8L | 87042110 | UNIT | 35% | 50% | 10% | 0% | | iPhone 15 Pro Max 256GB | 85171200 | UNIT | 0% | 0% | 10% | 0% | | Heineken beer 330ml can | 22030010 | LTR | 35% | 30% | 10% | 0% | *(Rates from lookup table – not generated by the model.)* --- ## ⚠️ Limitations - The model may output incorrect HS codes for ambiguous, misspelled, or region‑specific descriptions. - It was trained on a fixed set of Cambodian HS codes; revisions after the training data cutoff are not covered. - Duty rates can become outdated – always cross‑check with the latest official tariff schedule. - The model is a classifier, **not** a legal authority. For binding decisions, consult a customs professional. --- ## 📝 License This model is a derivative of **Gemma‑4‑E4B‑it** and is subject to the [Gemma license](https://ai.google.dev/gemma/terms). The HS‑code dataset and lookup table are the property of their respective owners. --- ## 🙏 Acknowledgments - [Unsloth](https://github.com/unslothai/unsloth) – made QLoRA + Gemma‑4 on a T4 effortless - [Google DeepMind](https://deepmind.google) – for the Gemma family of models --- ## 📚 Citation If you use this model, please cite: ```bibtex @misc{gemma4-hscode-classifier, author = {Sothay}, title = {Gemma‑4 HS Code Classifier (Cambodia Customs)}, year = 2025, publisher = {Hugging Face}, howpublished = {\url{https://huggingface.co/Sothay/gemma4-hscode-classifier}} } ``` --- **Author**: [Sothay](https://huggingface.co/Sothay) **Model card version**: 1.2