Instructions to use cp500/infon-coref-pointer with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers.js
How to use cp500/infon-coref-pointer with Transformers.js:
// npm i @huggingface/transformers import { pipeline } from '@huggingface/transformers'; // Allocate pipeline const pipe = await pipeline('token-classification', 'cp500/infon-coref-pointer');
@cp500/infon-coref
Multilingual coreference resolution in the browser or Node, via ONNX.
The trained model is a pointer-network coref resolver fine-tuned on top of a multilingual MiniLM-L12 distilled from XLM-R. It handles English, Japanese, Korean, Thai, and Chinese β replaces English-only fastcoref for use cases that need multilingual coverage.
The model artefacts live at cp500/infon-coref-pointer on the Hugging Face Hub. This package is the JavaScript client that loads them.
Install
npm install @cp500/infon-coref onnxruntime-web
# or for Node:
npm install @cp500/infon-coref onnxruntime-node
The ONNX runtime is a peer dependency so you only install the one
your environment needs. @huggingface/tokenizers is optional;
if installed, we use its WASM SentencePiece tokenizer (faster and
fully spec-compliant). Otherwise the package falls back to a minimal
pure-JS tokenizer that handles the XLM-R vocabulary.
Quick start (browser)
import { InfonCorefModel } from '@cp500/infon-coref';
const model = await InfonCorefModel.fromHub('cp500/infon-coref-pointer', {
precision: 'fp16', // 'fp16' (default, ~235 MB) or 'fp32' (~470 MB)
device: 'auto', // tries WebGPU, falls back to WASM
});
const result = await model.resolve(
'Toyota announced a partnership with Panasonic on battery technology. ' +
'The Japanese automaker said the deal is worth $250 million.'
);
for (const cluster of result.clusters) {
const surfaces = cluster.map(i => result.mentions[i].text);
console.log(surfaces.join(' β '));
// Toyota β The Japanese automaker
}
Quick start (Node)
import { InfonCorefModel } from '@cp500/infon-coref';
// Same API as fromHub, but reads from local files (e.g. after a
// huggingface-cli download).
const model = await InfonCorefModel.fromLocal('./models/infon-coref/');
const result = await model.resolve('Toyota e Panasonic anunciaram...');
What you get back
interface CorefResult {
text: string; // original input, unchanged
tokens: Token[]; // wordpieces with char offsets
mentions: Mention[]; // detected mentions in document order
clusters: number[][]; // clusters[c] = list of mention indices
timing: {
tokenize: number;
backbone: number;
bioDecode: number;
scorer: number;
total: number; // ms
};
}
interface Mention {
start: number; // wordpiece index, inclusive
end: number; // wordpiece index, inclusive
charStart: number; // char offset in source text
charEnd: number;
text: string; // literal substring of source text
cluster: number; // -1 for singleton
antecedent: number; // 0-based mention index, -1 = no antecedent
}
Languages
Trained on synthetic Bedrock/Claude-generated data balanced across:
| Code | Language |
|---|---|
en |
English |
ja |
Japanese |
ko |
Korean |
th |
Thai |
zh |
Chinese (Simplified) |
The XLM-R backbone covers ~100 languages but mention detection + pointer-net heads were only trained on these 5. Other languages may work via zero-shot transfer; verify on your domain before shipping.
API
InfonCorefModel.fromHub(repo, options?)
Load model artefacts from a Hugging Face repo. Downloads (and caches
in the browser Cache API) meta.json, the chosen ONNX backbone,
the mention scorer, and tokenizer.json.
| Option | Type | Default | Notes |
|---|---|---|---|
precision |
'fp32' | 'fp16' |
'fp16' |
FP16 halves the download. Falls back to FP32 if FP16 is missing in the repo. |
device |
'auto' | 'webgpu' | 'wasm' | 'cpu' | 'cuda' |
'auto' |
Browser auto-prefers WebGPU. |
maxLength |
number |
256 |
Truncates inputs longer than N wordpieces. |
bioThreshold |
number |
none | If set, suppresses low-confidence span detections. 0.7 is a common stricter setting. |
revision |
string |
'main' |
HF branch/tag/commit-SHA pin. |
debug |
boolean |
false |
Logs per-stage timings to console.debug. |
InfonCorefModel.fromLocal(baseUrl, options?)
Same as fromHub but loads files relative to a base URL or
filesystem path. Browser: baseUrl is a URL prefix
(/models/coref/). Node: a directory path (./models/coref/).
The directory must contain:
meta.json
tokenizer.json
onnx/backbone_bio.onnx (and .onnx.data sidecar if present)
onnx/backbone_bio_fp16.onnx
onnx/mention_scorer.onnx
onnx/mention_scorer_fp16.onnx
model.resolve(text, options?)
Run end-to-end coref on a single document. Returns
CorefResult.
options accepts the same per-call overrides as fromHub's
maxLength, bioThreshold, debug.
Power-user exports
If you want to swap one stage of the pipeline (e.g. a custom tokenizer or a different ORT runtime), the helpers are exported individually:
import {
buildPairs, // mention M β flat (pair_i, pair_j) tensors
decodeBio, // BIO logits β wordpiece spans
groupClusters, // antecedent decisions β union-find clusters
loadTokenizer, // SentencePiece JSON β Tokenizer
fetchHubFile, // HF Hub fetch + browser-cache
} from '@cp500/infon-coref';
These match the Python reference implementation in
scripts/coref_onnx_experiment.py
exactly β useful when comparing a Python/TS pipeline at the
intermediate-tensor level.
Architecture
βββββββββββββββββββββββββββ
β text β
ββββββββββββββ¬βββββββββββββ
βΌ
βββββββββββββββββββββββββββ
β SentencePiece tokenize β tokenizer.json (XLM-R vocab)
ββββββββββββββ¬βββββββββββββ
βΌ input_ids, attention_mask
βββββββββββββββββββββββββββ
β backbone_bio.onnx β MiniLM-L12 (12 layers, H=384)
β β’ XLM-R encoder β + 3-class BIO head
β β’ bio_logits (T,3) β
ββββββββββ¬βββββββββ¬ββββββββ
β β
β βΌ bio_logits β run-length decode β spans
β ββββββββββββββββββββββββ
β β decodeBio (TS) β
β ββββββββββββ¬ββββββββββββ
β βΌ span_starts, span_ends
β ββββββββββββββββββββββββ
β β buildPairs (TS) β
β ββββββββββββ¬ββββββββββββ
β βΌ pair_i, pair_j (triangular)
βΌ βΌ
βββββββββββββββββββββββββββ
β mention_scorer.onnx β gather + segment-mean pool +
β β’ pair_scores (P,) β 3-vector pair MLP
ββββββββββββββ¬βββββββββββββ
βΌ
βββββββββββββββββββββββββββ
β pickAntecedents (TS) β
β + groupClusters (TS) β
ββββββββββββββ¬βββββββββββββ
βΌ
CorefResult
The split between the two ONNX graphs exists so the BIO head can
share computation with the backbone (one forward pass), while the
mention scorer can be re-run with different (pair_i, pair_j)
batches without recomputing hidden states. It also keeps each ONNX
file's input signature simple enough to trace cleanly.
Performance ballpark
Numbers from a 2024 M1 Pro Macbook on a 110-token English document:
| Stage | WASM (FP16) | WebGPU (FP16) | Node CPU (FP16) |
|---|---|---|---|
| Tokenize | 4 ms | 4 ms | 2 ms |
| Backbone | 220 ms | 70 ms | 90 ms |
| BIO | <1 ms | <1 ms | <1 ms |
| Scorer | 5 ms | 4 ms | 2 ms |
| Total | ~230 ms | ~80 ms | ~95 ms |
First call adds ~2-4 s for ONNX session warmup. The Cache API in browsers persists the downloaded model so warmup-after-reload is limited to session creation.
License
Apache 2.0. The trained weights at cp500/infon-coref-pointer carry
the same license; the underlying MiniLM-L12 backbone is also Apache
2.0.
Status
Alpha. The API is stable enough to integrate behind your own abstraction; expect minor breaking changes on the public class shape until 1.0.
Issue tracker: https://github.com/cp500/infon-coref-js/issues