You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

ACE-privacy-filter-zhtw

APMIC-logo-橫-黑

NVIDIA-NeMo

Model Description

ACE-privacy-filter-zhtw is a privacy-preserving language model from APMIC's ACE family, engineered to detect, classify, and neutralize personally identifiable information (PII) within Traditional Chinese (zh-TW) text. It is the enterprise sibling of an internal research lineage, hardened for production and aligned to the realities of Taiwanese data — government records, financial documents, healthcare notes, and customer correspondence.

The model treats privacy not as a post-processing step, but as a native behavior: given free-form text, it returns content in which sensitive identifiers have been surfaced and removed, while the surrounding meaning is preserved.

It is built on OpenAI's open-weight gpt-oss-20b. The corpora behind its zh-TW alignment and the full methodology of its training recipe, however, remain proprietary to APMIC. What is shared here is what it does — not entirely how it came to do it.

Model Details

  • Developed by: APMIC
  • Funded by: APMIC, led by CEO Jerry Wu
  • Model type: Causal language model, fine-tuned for privacy filtering / de-identification (Transformers)
  • Language(s): Traditional Chinese (zh-TW) & English
  • License: APMIC proprietary (enterprise use; contact APMIC for terms)
  • Base model: openai/gpt-oss-20b — OpenAI's open-weight model, fine-tuned and aligned by APMIC for zh-TW privacy filtering. (The training recipe and zh-TW alignment corpora remain proprietary.)

What It Does

Given Traditional Chinese text, ACE-privacy-filter-zhtw:

  • Detects personally identifiable information embedded in natural, conversational, and document-style language.
  • Classifies each identifier into a privacy category.
  • Neutralizes it — redacting, masking, or replacing the sensitive span while keeping the text readable and semantically intact.

It is designed to operate on the messy, real-world text where regex and rule engines fail: mixed Chinese-English content, inconsistent formatting, OCR-derived noise, and the idiomatic phrasing of Taiwanese business and government communication.

Privacy Entity Coverage

The filter is tuned toward identifiers that matter in a Taiwanese context, including (but not limited to):

  • 身分證字號 (National ID numbers)
  • 健保卡號 / 病歷號 (NHI card & medical record numbers)
  • 手機與市話號碼 (Mobile & landline numbers)
  • 地址 (Residential & mailing addresses)
  • 銀行帳號與信用卡號 (Bank account & card numbers)
  • 姓名 (Personal names)
  • Email 與帳號識別碼 (Email & account identifiers)
  • 車牌號碼 (Vehicle plate numbers)
  • 公司統一編號 (Business registration numbers)

Data Foundation

The structural backbone of ACE-privacy-filter-zhtw's privacy understanding draws on nvidia/Nemotron-PII — NVIDIA's large-scale synthetic corpus of 100,000 records spanning 55+ PII/PHI categories across 50+ industries, covering both structured documents (forms, invoices) and unstructured content (emails, notes).

This foundation gave the model a broad, industry-spanning prior over what privacy looks like — across healthcare, finance, legal, and enterprise scenarios. APMIC then carried that prior across the language boundary, re-grounding it in the entity types, formats, and cultural conventions specific to Traditional Chinese and Taiwan. The bridge from Nemotron-PII's English foundation to native zh-TW behavior is where APMIC's proprietary work lives.

NVIDIA Ecosystem

ACE-privacy-filter-zhtw is part of APMIC's broader collaboration with NVIDIA's data and platform ecosystem. It builds on NVIDIA-originated privacy data, is optimized for inference on modern NVIDIA GPU architectures, and is designed to slot into enterprise deployment pipelines alongside other models in the ACE family.

Intended Use

  • De-identification of Traditional Chinese documents prior to storage, analytics, or LLM ingestion.
  • Privacy guardrails in conversational AI and RAG pipelines handling Taiwanese user data.
  • Compliance support for organizations operating under Taiwan's 個人資料保護法 (Personal Data Protection Act) and adjacent regulatory regimes.

Out of Scope

  • The model is an assistive control, not a legal guarantee. It does not certify compliance, and its output should be reviewed in high-stakes settings.
  • It is not a general-purpose chat assistant.
  • Performance on languages or locales outside Traditional Chinese / Taiwan is not a design target.

Usage

The input/output format shown below is representative. Production integration details are provided to enterprise partners.

from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "APMIC/ACE-privacy-filter-zhtw"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, device_map="auto")

text = "您好,我是王小明,身分證字號 A123456789,手機 0912-345-678,住台北市信義區市府路1號。"

messages = [{"role": "user", "content": text}]
inputs = tokenizer.apply_chat_template(messages, add_generation_prompt=True, return_tensors="pt").to(model.device)
output = model.generate(inputs, max_new_tokens=256)
print(tokenizer.decode(output[0], skip_special_tokens=True))
# → 您好,我是[姓名],身分證字號 [身分證字號],手機 [電話],住[地址]。

Positioning

ACE-privacy-filter-zhtw demonstrates APMIC's capacity to take a foundation of NVIDIA privacy data and forge it into a Traditional-Chinese-native, enterprise-ready privacy layer — for organizations that need their data protected before it is ever processed, and who would rather not know exactly how the lock was made.

Disclaimer

This model is provided for enterprise privacy-filtering use. No PII detection system is perfect; APMIC makes no warranty that all sensitive information will be identified or removed. Operators remain responsible for validating outputs and meeting their own regulatory obligations.


© APMIC. Part of the ACE model family.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for APMIC/ACE-privacy-filter-zhtw

Finetuned
(534)
this model

Dataset used to train APMIC/ACE-privacy-filter-zhtw