ToMMeR-Llama-3.2-1B_L2_R64

ToMMeR is a lightweight probing model extracting emergent mention detection capabilities from early layers representations of any LLM backbone, achieving high Zero Shot recall across a wide set of 13 NER benchmarks.

Model Details

This model can be plugged at layer 2 of meta-llama/Llama-3.2-1B, with a computational overhead not greater than an additional attention head.

Property	Value
Base LLM	`meta-llama/Llama-3.2-1B`
Layer	2
#Params	264.2K

Usage

Installation

To use ToMMeR, you need to install its codebase first.

pip install git+https://github.com/VictorMorand/llm2ner.git

Raw inference

By default, ToMMeR outputs span probabilities, but we also propose built-in options for decoding entities.

Inputs:
- tokens (batch, seq): tokens to process,
- model: LLM to extract representation from.
Outputs: (batch, seq, seq) matrix (masked outside valid spans)

from xpm_torch.huggingface import TorchHFHub
from llm2ner import ToMMeR, utils

tommer: ToMMeR = TorchHFHub.from_pretrained("llm2ner/ToMMeR-Llama-3.2-1B_L2_R64")
# load Backbone llm, optionnally cut the unused layer to save GPU space.
llm = utils.load_llm( tommer.llm_name, cut_to_layer=tommer.layer,)
tommer.to(llm.device)

#### Raw Inference
text = ["Large language models are awesome"]
print(f"Input text: {text[0]}")

#tokenize in shape (1, seq_len)
tokens = llm.tokenizer(text, return_tensors="pt")["input_ids"].to(llm.device)
# Output raw scores
output = tommer.forward(tokens, llm) # (batch_size, seq_len, seq_len)
print(f"Raw Output shape: {output.shape}")

#use given decoding strategy to infer entities
entities = tommer.infer_entities(tokens=tokens, model=llm, threshold=0.5, decoding_strategy="greedy")
str_entities = [ llm.tokenizer.decode(tokens[0,b:e+1]) for b, e in entities[0]]
print(f"Predicted entities: {str_entities}")

>>>INFO:root:Cut LlamaModel with 16 layers to 7 layers
>>> Input text: Large language models are awesome
>>> Raw Output shape: torch.Size([1, 6, 6])
>>> Predicted entities: ['Large language models']

Fancy Outputs

We also provide inference and plotting utils in llm2ner.plotting.

from xpm_torch.huggingface import TorchHFHub
from llm2ner import ToMMeR, utils, plotting

tommer: ToMMeR = TorchHFHub.from_pretrained("llm2ner/ToMMeR-Llama-3.2-1B_L2_R64")
# load Backbone llm, optionnally cut the unused layer to save GPU space.
llm = utils.load_llm( tommer.llm_name, cut_to_layer=tommer.layer,)
tommer.to(llm.device)

text = "Large language models are awesome. While trained on language modeling, they exhibit emergent Zero Shot abilities that make them suitable for a wide range of tasks, including Named Entity Recognition (NER). "

#fancy interactive output
outputs = plotting.demo_inference( text, tommer, llm,
    decoding_strategy="threshold",  # or "greedy" for flat segmentation
    threshold=0.5, # default 50%
    show_attn=True,
)

Large PRED language PRED models are awesome . While trained on language PRED modeling , they exhibit emergent PRED abilities that make them suitable for a wide range of tasks PRED , including Named PRED Entity Recognition ( NER PRED ) .

Please visit the repository for more details and a demo notebook.

Evaluation Results

dataset	precision	recall	f1	n_samples
MultiNERD	0.1668	0.992	0.2855	154144
CoNLL 2003	0.2409	0.9643	0.3856	16493
CrossNER_politics	0.2464	0.9762	0.3935	1389
CrossNER_AI	0.2506	0.9749	0.3988	879
CrossNER_literature	0.2688	0.9592	0.4199	916
CrossNER_science	0.2774	0.9727	0.4317	1193
CrossNER_music	0.3037	0.9686	0.4625	945
ncbi	0.0964	0.9407	0.1748	3952
FabNER	0.2626	0.8111	0.3967	13681
WikiNeural	0.1609	0.9911	0.2769	92672
GENIA_NER	0.1886	0.9696	0.3157	16563
ACE 2005	0.2514	0.4976	0.334	8230
Ontonotes	0.2015	0.7659	0.319	42193
Aggregated	0.1803	0.943	0.3027	353250
Mean	0.2243	0.9065	0.3534	353250

Citation

If using this model or the approach, please cite the associated paper:

@misc{morand2025tommerefficiententity,
      title={ToMMeR -- Efficient Entity Mention Detection from Large Language Models},
      author={Victor Morand and Nadi Tomeh and Josiane Mothe and Benjamin Piwowarski},
      year={2025},
      eprint={2510.19410},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2510.19410},
}

License

Apache-2.0 (see repository for full text).

Downloads last month: -; Downloads are not tracked for this model. How to track

Safetensors

Model size

264k params

Tensor type

F32

Model tree for llm2ner/ToMMeR-Llama-3.2-1B_L2_R64

Base model

meta-llama/Llama-3.2-1B

Finetuned

(937)

this model

Paper for llm2ner/ToMMeR-Llama-3.2-1B_L2_R64

ToMMeR -- Efficient Entity Mention Detection from Large Language Models

Paper • 2510.19410 • Published Oct 22, 2025 • 4