fblgit
/

haremb-privacy-filter-opennemo

Token Classification

Mixture of Experts

Model card Files Files and versions

haremb-privacy-filter-opennemo / infer.log

fblgit's picture

Upload folder using huggingface_hub

f0f5785 20 days ago

history blame contribute delete

2.52 kB

	Inference benchmark: A: openmed-base vs B: haremb
	device : cuda dtype: torch.bfloat16
	ctx : 1024

	A: openmed-base (reference / teacher)
	load : 0.71s
	eval : 64.66s on 212,909 tokens (3293 tok/s)
	Performance:
	total params : 1399.61M (139.35M dense + 1260.26M MoE-experts)
	active params / token : 178.73M (memory footprint — embed lookup + top_4/128 experts: 128.04M embed + 39.38M MoE-active + 11.31M attn/norm/head)
	compute params / token : 50.69M (matmul FLOPs only — embedding lookup excluded)
	GFLOP / token (fwd, MAC×2): 0.101
	weights size (on disk) : —
	weights size (in RAM) : 2.61 GiB
	weights resident (GPU) : 2.61 GiB
	peak GPU mem (eval, ctx=1024) : 3.30 GiB

	B: haremb (this checkpoint)
	load : 0.10s
	eval : 33.56s on 212,909 tokens (6343 tok/s)
	Performance:
	total params : 287.11M (129.58M dense + 157.53M MoE-experts)
	active params / token : 134.50M (memory footprint — embed lookup + top_4/128 experts: 128.04M embed + 4.92M MoE-active + 1.54M attn/norm/head)
	compute params / token : 6.46M (matmul FLOPs only — embedding lookup excluded)
	GFLOP / token (fwd, MAC×2): 0.013
	weights size (on disk) : 547.6 MiB
	weights size (in RAM) : 547.6 MiB
	weights resident (GPU) : 548.3 MiB
	peak GPU mem (eval, ctx=1024) : 1.22 GiB

	B vs A (haremb vs openmed-base):
	total params : 4.87× smaller
	active params / token : 1.33× less [memory]
	compute params / token : 7.85× cheaper [FLOPs]
	GFLOP / token : 7.85× cheaper
	weights size (on disk) : —
	weights in RAM : 4.87× smaller
	peak GPU mem (eval) : 2.70× less
	throughput : 1.93× faster

	Sample inference (load → tokenize → forward → viterbi-decode → spans):
	text: 'Patient Sarah Johnson (DOB 03/15/1985), MRN 4872910, phone 415-555-0123, email sarah.johnson@example.com, credit card 4111-1111-1111-1111.'
	forward latency: 65.8ms (53 tokens)
	detected 7 spans:
	[ 1, 2) first_name 'Sarah'
	[ 2, 3) last_name 'Johnson'
	[ 6, 12) date '03/15/1985'
	[ 16, 19) phone_number '4872910'
	[ 22, 28) phone_number '415-555-0123'
	[ 30, 37) email 'sarah.johnson@example.com'
	[ 41, 52) credit_debit_card '4111-1111-1111-1111'