Instructions to use xinyacs/EcomBert-DC-V1 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use xinyacs/EcomBert-DC-V1 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-classification", model="xinyacs/EcomBert-DC-V1")# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("xinyacs/EcomBert-DC-V1", dtype="auto") - Notebooks
- Google Colab
- Kaggle
EcomBert-DC-V1 access request
This repository is publicly accessible, but you have to accept the conditions to access its files and content.
This repository contains both code and model weights. Please describe your intended use. Access is manually reviewed by the author, and commercial use is not permitted unless prior written authorization has been granted.
Log in or Sign Up to review the conditions and access this model content.
EcomBert-DC-V1
EcomBert-DC-V1 is a 50-class text classification model for cross-border e-commerce seller questions. It uses jhu-clsp/mmBERT-small as the backbone, with a custom mean-pooling classifier head inspired by ModernBERT and an auxiliary primary-category head.
This repository is organized both as a Hugging Face model repository and as a lightweight business inference project:
.
|-- infer.py
|-- ecombert_dc/
| |-- inference.py
| |-- model.py
| `-- config.py
`-- models/
`-- ecombert-dc-v1/
|-- model.safetensors
|-- backbone_config.json
|-- tokenizer.json
|-- label2id.json
`-- ...
models/ecombert-dc-v1/model.safetensors already contains the fused mmBERT-small backbone and classification-head weights. Default inference does not require users to download the mmBERT-small weights separately.
Architecture
- Backbone:
jhu-clsp/mmBERT-small - Pooling: mean pooling
- Classification head: ModernBERT-style dense + GELU + LayerNorm + dropout
- Dropout:
0.1 - Class weighting: none
- Max length:
768 - Labels: 10 primary categories and 50 secondary categories
This is a custom PyTorch classifier, not native
AutoModelForSequenceClassificationweights. Use the root-levelinfer.pyscript orecombert_dc.EcomBertDocumentClassifierfor inference.
Performance
The test set comes from the fixed split used by this project and contains 1,199 records.
| Metric | Value |
|---|---|
| Primary accuracy | 83.74% |
| Secondary accuracy / Accuracy | 72.31% |
| Conditional accuracy | 86.35% |
| Macro F1 | 66.36% |
| Weighted F1 | 72.07% |
| Cross-primary error rate | 16.26% |
| Share of errors that cross primary categories | 58.73% |
Installation
pip install -r requirements.txt
CLI Inference
Run from the repository root. The default model directory is models/ecombert-dc-v1:
python infer.py --text "广告花费突然上涨,关键词点击很多但是没有转化,应该怎么优化?"
You can also specify the model directory explicitly. Both the project root and the model asset directory are supported:
python infer.py --model-dir . --text "新品刚上架,Vine和Coupon应该怎么配合启动?"
python infer.py --model-dir models/ecombert-dc-v1 --text "新品刚上架,Vine和Coupon应该怎么配合启动?"
For long documents, chunk averaging can be enabled:
python infer.py --input samples.jsonl --max-chunks-per-doc 3 --chunk-stride 128 --batch-size 4
Python Inference
from ecombert_dc import EcomBertDocumentClassifier
clf = EcomBertDocumentClassifier("models/ecombert-dc-v1")
print(clf.predict("新品刚上架,Vine和Coupon应该怎么配合启动?", top_k=3))
Files
infer.py: command-line inference entrypointecombert_dc/: custom model and inference pipelinemodels/ecombert-dc-v1/model.safetensors: fused backbone and classification-head weightsmodels/ecombert-dc-v1/backbone_config.json: mmBERT-small backbone structure configurationmodels/ecombert-dc-v1/model_config.json: classifier structure configurationmodels/ecombert-dc-v1/train_config.json: training and inference defaultsmodels/ecombert-dc-v1/label2id.json/id2label.json: secondary-category mappingsmodels/ecombert-dc-v1/category2id.json/id2category.json: primary-category mappingsmodels/ecombert-dc-v1/tokenizer.json: mmBERT tokenizermodels/ecombert-dc-v1/metrics.json: validation metrics saved with the best checkpointmodels/ecombert-dc-v1/test_metrics.json: metrics on the fixed test set
License
This project is released under a custom non-commercial license. See LICENSE for the full terms.
Unless you have obtained prior written authorization from the author, you may not directly or indirectly use this repository, model, weights, code, outputs, or derivative works for commercial activities or any profit-making activities.
Limitations
This model is designed for business classification over cross-border e-commerce text. Generalization to other domains should be evaluated separately. Some category boundaries naturally overlap, so high-risk workflows should combine the model with human review or confidence thresholds.