| | --- |
| | language: |
| | - en |
| | license: apache-2.0 |
| | library_name: llm2ner |
| | base_model: answerdotai/ModernBERT-base |
| | tags: |
| | - ner |
| | - span-detection |
| | - llm |
| | - pytorch |
| | pipeline_tag: token-classification |
| | model_name: ToMMeR-ModernBERT-base_L5_R64 |
| | source: https://github.com/VictorMorand/llm2ner |
| | paper: https://arxiv.org/abs/2510.19410 |
| | --- |
| | |
| | # ToMMeR-ModernBERT-base_L5_R64 |
| |
|
| | ToMMeR is a lightweight probing model extracting emergent mention detection capabilities from early layers representations of any LLM backbone, achieving high Zero Shot recall across a wide set of 13 NER benchmarks. |
| |
|
| | ## Checkpoint Details |
| |
|
| | | Property | Value | |
| | |-----------|-------| |
| | | Base LLM | `answerdotai/ModernBERT-base` | |
| | | Layer | 5| |
| | | #Params | 99.1K | |
| |
|
| |
|
| | # Usage |
| |
|
| | ## Installation |
| |
|
| | Our code can be installed with pip+git, Please visit the [repository](https://github.com/VictorMorand/llm2ner) for more details. |
| |
|
| | ```bash |
| | pip install git+https://github.com/VictorMorand/llm2ner.git |
| | ``` |
| |
|
| | ## Fancy Outputs |
| |
|
| | ```python |
| | import llm2ner |
| | from llm2ner import ToMMeR |
| | |
| | tommer = ToMMeR.from_pretrained("llm2ner/ToMMeR-ModernBERT-base_L5_R64") |
| | # load Backbone llm, optionnally cut the unused layer to save GPU space. |
| | llm = llm2ner.utils.load_llm( tommer.llm_name, cut_to_layer=tommer.layer,) |
| | tommer.to(llm.device) |
| | |
| | text = "Large language models are awesome. While trained on language modeling, they exhibit emergent Zero Shot abilities that make them suitable for a wide range of tasks, including Named Entity Recognition (NER). " |
| | |
| | #fancy interactive output |
| | outputs = llm2ner.plotting.demo_inference( text, tommer, llm, |
| | decoding_strategy="threshold", # or "greedy" for flat segmentation |
| | threshold=0.5, # default 50% |
| | show_attn=True, |
| | ) |
| | ``` |
| | <div> |
| | <span class="tex2jax_ignore"><div class="spans" style="line-height: 2.5; direction: ltr"> |
| | <span style="font-weight: bold; display: inline-block; position: relative; height: 60px;"> |
| | Large |
| | <span style="background: lightblue; top: 40px; height: 4px; left: -1px; width: calc(100% + 2px); position: absolute;"> |
| | </span> |
| | <span style="background: lightblue; top: 40px; height: 4px; border-top-left-radius: 3px; border-bottom-left-radius: 3px; left: -1px; width: calc(100% + 2px); position: absolute;"> |
| | <span style="background: lightblue; z-index: 10; color: #000; top: -0.5em; padding: 2px 3px; position: absolute; font-size: 0.6em; font-weight: bold; line-height: 1; border-radius: 3px"> |
| | PRED |
| | </span> |
| | </span> |
| | </span> |
| | <span style="font-weight: bold; display: inline-block; position: relative; height: 77px;"> |
| | language |
| | <span style="background: lightblue; top: 40px; height: 4px; left: -1px; width: calc(100% + 2px); position: absolute;"> |
| | </span> |
| | <span style="background: lightblue; top: 57px; height: 4px; left: -1px; width: calc(100% + 2px); position: absolute;"> |
| | </span> |
| | <span style="background: lightblue; top: 57px; height: 4px; border-top-left-radius: 3px; border-bottom-left-radius: 3px; left: -1px; width: calc(100% + 2px); position: absolute;"> |
| | <span style="background: lightblue; z-index: 10; color: #000; top: -0.5em; padding: 2px 3px; position: absolute; font-size: 0.6em; font-weight: bold; line-height: 1; border-radius: 3px"> |
| | PRED |
| | </span> |
| | </span> |
| | </span> |
| | <span style="font-weight: bold; display: inline-block; position: relative; height: 77px;"> |
| | models |
| | <span style="background: lightblue; top: 40px; height: 4px; left: -1px; width: calc(100% + 2px); position: absolute;"> |
| | </span> |
| | <span style="background: lightblue; top: 57px; height: 4px; left: -1px; width: calc(100% + 2px); position: absolute;"> |
| | </span> |
| | </span> |
| | are awesome . While trained on |
| | <span style="font-weight: bold; display: inline-block; position: relative; height: 60px;"> |
| | language |
| | <span style="background: lightblue; top: 40px; height: 4px; left: -1px; width: calc(100% + 2px); position: absolute;"> |
| | </span> |
| | <span style="background: lightblue; top: 40px; height: 4px; border-top-left-radius: 3px; border-bottom-left-radius: 3px; left: -1px; width: calc(100% + 2px); position: absolute;"> |
| | <span style="background: lightblue; z-index: 10; color: #000; top: -0.5em; padding: 2px 3px; position: absolute; font-size: 0.6em; font-weight: bold; line-height: 1; border-radius: 3px"> |
| | PRED |
| | </span> |
| | </span> |
| | </span> |
| | <span style="font-weight: bold; display: inline-block; position: relative; height: 60px;"> |
| | modeling |
| | <span style="background: lightblue; top: 40px; height: 4px; left: -1px; width: calc(100% + 2px); position: absolute;"> |
| | </span> |
| | </span> |
| | , they exhibit |
| | <span style="font-weight: bold; display: inline-block; position: relative; height: 60px;"> |
| | emergent |
| | <span style="background: lightblue; top: 40px; height: 4px; left: -1px; width: calc(100% + 2px); position: absolute;"> |
| | </span> |
| | <span style="background: lightblue; top: 40px; height: 4px; border-top-left-radius: 3px; border-bottom-left-radius: 3px; left: -1px; width: calc(100% + 2px); position: absolute;"> |
| | <span style="background: lightblue; z-index: 10; color: #000; top: -0.5em; padding: 2px 3px; position: absolute; font-size: 0.6em; font-weight: bold; line-height: 1; border-radius: 3px"> |
| | PRED |
| | </span> |
| | </span> |
| | </span> |
| | <span style="font-weight: bold; display: inline-block; position: relative; height: 60px;"> |
| | abilities |
| | <span style="background: lightblue; top: 40px; height: 4px; left: -1px; width: calc(100% + 2px); position: absolute;"> |
| | </span> |
| | </span> |
| | that make them suitable for a wide range of |
| | <span style="font-weight: bold; display: inline-block; position: relative; height: 60px;"> |
| | tasks |
| | <span style="background: lightblue; top: 40px; height: 4px; left: -1px; width: calc(100% + 2px); position: absolute;"> |
| | </span> |
| | <span style="background: lightblue; top: 40px; height: 4px; border-top-left-radius: 3px; border-bottom-left-radius: 3px; left: -1px; width: calc(100% + 2px); position: absolute;"> |
| | <span style="background: lightblue; z-index: 10; color: #000; top: -0.5em; padding: 2px 3px; position: absolute; font-size: 0.6em; font-weight: bold; line-height: 1; border-radius: 3px"> |
| | PRED |
| | </span> |
| | </span> |
| | </span> |
| | , including |
| | <span style="font-weight: bold; display: inline-block; position: relative; height: 60px;"> |
| | Named |
| | <span style="background: lightblue; top: 40px; height: 4px; left: -1px; width: calc(100% + 2px); position: absolute;"> |
| | </span> |
| | <span style="background: lightblue; top: 40px; height: 4px; border-top-left-radius: 3px; border-bottom-left-radius: 3px; left: -1px; width: calc(100% + 2px); position: absolute;"> |
| | <span style="background: lightblue; z-index: 10; color: #000; top: -0.5em; padding: 2px 3px; position: absolute; font-size: 0.6em; font-weight: bold; line-height: 1; border-radius: 3px"> |
| | PRED |
| | </span> |
| | </span> |
| | </span> |
| | <span style="font-weight: bold; display: inline-block; position: relative; height: 60px;"> |
| | Entity |
| | |
| | <span style="background: lightblue; top: 40px; height: 4px; left: -1px; width: calc(100% + 2px); position: absolute;"> |
| | </span> |
| | </span> |
| | <span style="font-weight: bold; display: inline-block; position: relative; height: 60px;"> |
| | Recognition |
| | <span style="background: lightblue; top: 40px; height: 4px; left: -1px; width: calc(100% + 2px); position: absolute;"> |
| | </span> |
| | </span> |
| | ( |
| | <span style="font-weight: bold; display: inline-block; position: relative; height: 60px;"> |
| | NER |
| | <span style="background: lightblue; top: 40px; height: 4px; left: -1px; width: calc(100% + 2px); position: absolute;"> |
| | </span> |
| | <span style="background: lightblue; top: 40px; height: 4px; border-top-left-radius: 3px; border-bottom-left-radius: 3px; left: -1px; width: calc(100% + 2px); position: absolute;"> |
| | <span style="background: lightblue; z-index: 10; color: #000; top: -0.5em; padding: 2px 3px; position: absolute; font-size: 0.6em; font-weight: bold; line-height: 1; border-radius: 3px"> |
| | PRED |
| | </span> |
| | </span> |
| | </span> |
| | ) . </div></span> |
| | </div> |
| | |
| |
|
| | ## Raw inference |
| | By default, ToMMeR outputs span probabilities, but we also propose built-in options for decoding entities. |
| |
|
| | - Inputs: |
| | - tokens (batch, seq): tokens to process, |
| | - model: LLM to extract representation from. |
| | - Outputs: (batch, seq, seq) matrix (masked outside valid spans) |
| |
|
| | ```python |
| | |
| | tommer = ToMMeR.from_pretrained("llm2ner/ToMMeR-ModernBERT-base_L5_R64") |
| | # load Backbone llm, optionnally cut the unused layer to save GPU space. |
| | llm = llm2ner.utils.load_llm( tommer.llm_name, cut_to_layer=tommer.layer,) |
| | tommer.to(llm.device) |
| | |
| | #### Raw Inference |
| | text = ["Large language models are awesome"] |
| | print(f"Input text: {text[0]}") |
| | |
| | #tokenize in shape (1, seq_len) |
| | tokens = model.tokenizer(text, return_tensors="pt")["input_ids"].to(device) |
| | # Output raw scores |
| | output = tommer.forward(tokens, model) # (batch_size, seq_len, seq_len) |
| | print(f"Raw Output shape: {output.shape}") |
| | |
| | #use given decoding strategy to infer entities |
| | entities = tommer.infer_entities(tokens=tokens, model=model, threshold=0.5, decoding_strategy="greedy") |
| | str_entities = [ model.tokenizer.decode(tokens[0,b:e+1]) for b, e in entities[0]] |
| | print(f"Predicted entities: {str_entities}") |
| | |
| | >>> Input text: Large language models are awesome |
| | >>> Raw Output shape: torch.Size([1, 6, 6]) |
| | >>> Predicted entities: ['Large language models'] |
| | ``` |
| |
|
| | Please visit the [repository](https://github.com/VictorMorand/llm2ner) for more details and a demo notebook. |
| |
|
| | ## Evaluation Results |
| |
|
| | | dataset | precision | recall | f1 | n_samples | |
| | |---------------------|-------------|----------|--------|-------------| |
| | | MultiNERD | 0.1661 | 0.984 | 0.2843 | 154144 | |
| | | CoNLL 2003 | 0.2403 | 0.9271 | 0.3817 | 16493 | |
| | | CrossNER_politics | 0.2264 | 0.9738 | 0.3674 | 1389 | |
| | | CrossNER_AI | 0.2527 | 0.9607 | 0.4002 | 879 | |
| | | CrossNER_literature | 0.2464 | 0.956 | 0.3918 | 916 | |
| | | CrossNER_science | 0.2615 | 0.9694 | 0.4119 | 1193 | |
| | | CrossNER_music | 0.2603 | 0.9573 | 0.4093 | 945 | |
| | | ncbi | 0.1012 | 0.8939 | 0.1818 | 3952 | |
| | | FabNER | 0.2551 | 0.7489 | 0.3806 | 13681 | |
| | | WikiNeural | 0.1582 | 0.9836 | 0.2725 | 92672 | |
| | | GENIA_NER | 0.1926 | 0.9587 | 0.3207 | 16563 | |
| | | ACE 2005 | 0.2017 | 0.3968 | 0.2674 | 8230 | |
| | | Ontonotes | 0.1974 | 0.7555 | 0.3131 | 42193 | |
| | | Aggregated | 0.1773 | 0.9287 | 0.2977 | 353250 | |
| | | Mean | 0.2123 | 0.882 | 0.3371 | 353250 | |
| | |
| | ## Citation |
| | If using this model or the approach, please cite the associated paper: |
| | ``` |
| | @misc{morand2025tommerefficiententity, |
| | title={ToMMeR -- Efficient Entity Mention Detection from Large Language Models}, |
| | author={Victor Morand and Nadi Tomeh and Josiane Mothe and Benjamin Piwowarski}, |
| | year={2025}, |
| | eprint={2510.19410}, |
| | archivePrefix={arXiv}, |
| | primaryClass={cs.CL}, |
| | url={https://arxiv.org/abs/2510.19410}, |
| | } |
| | ``` |
| | |
| | ## License |
| | Apache-2.0 (see repository for full text). |