File size: 2,896 Bytes
fc44f00 8ce28e4 fc44f00 8ce28e4 9fdcb57 7e3eb77 fc44f00 c629be3 9fdcb57 c629be3 65b094f e9a03c5 9fdcb57 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 | ---
base_model:
- mistralai/Mistral-7B-Instruct-v0.3
datasets:
- noystl/Recombination-Extraction
language:
- en
library_name: transformers
license: apache-2.0
pipeline_tag: text-generation
---
This Hugging Face repository contains a fine-tuned Mistral model trained for the task of extracting recombination examples from scientific abstracts, as described in the paper [CHIMERA: A Knowledge Base of Scientific Idea Recombinations for Research Analysis and Ideation](https://huggingface.co/papers/2505.20779). The model utilizes a LoRA adapter on top of a Mistral base model.
The model can be used for the information extraction task of identifying recombination examples within scientific text.
**Quick Links**
- 🌐 [Project](https://noy-sternlicht.github.io/CHIMERA-Web)
- 📃 [Paper](https://arxiv.org/abs/2505.20779)
- 🛠️ [Code](https://github.com/noy-sternlicht/CHIMERA-KB)
## Sample Usage
You can use this model with the Hugging Face `transformers` library to extract recombination instances from text. The model expects a specific prompt format for this task.
```python
from transformers import pipeline, AutoTokenizer
import torch
model_id = "noystl/mistral-e2e"
tokenizer = AutoTokenizer.from_pretrained(model_id)
# Initialize the text generation pipeline
generator = pipeline(
"text-generation",
model=model_id,
tokenizer=tokenizer,
torch_dtype=torch.bfloat16, # Use bfloat16 for better performance on compatible GPUs
device_map="auto", # Automatically select best device (GPU or CPU)
trust_remote_code=True # Required for custom model components
)
# Example abstract for recombination extraction
abstract = """The multi-granular diagnostic approach of pathologists can inspire Histopathological image classification.
This suggests a novel way to improve accuracy in image classification tasks."""
# Format the input prompt as expected by the model
prompt = f"Extract any recombination instances (inspiration/combination) from the following abstract:\
Abstract: {abstract}\
Recombination:"
# Generate the output. Use do_sample=False for deterministic extraction.
# max_new_tokens should be set appropriately for the expected JSON output.
outputs = generator(prompt, max_new_tokens=200, do_sample=False)
# Print the generated text, which should contain the extracted recombination in JSON format
print(outputs[0]["generated_text"])
```
For more advanced usage, including training and evaluation, please refer to the [GitHub repository](https://github.com/noy-sternlicht/CHIMERA-KB).
**Bibtex**
```bibtex
@misc{sternlicht2025chimeraknowledgebaseidea,
title={CHIMERA: A Knowledge Base of Idea Recombination in Scientific Literature},
author={Noy Sternlicht and Tom Hope},
year={2025},
eprint={2505.20779},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2505.20779},
}
``` |