File size: 2,896 Bytes

---
base_model:
- mistralai/Mistral-7B-Instruct-v0.3
datasets:
- noystl/Recombination-Extraction
language:
- en
library_name: transformers
license: apache-2.0
pipeline_tag: text-generation
---

This Hugging Face repository contains a fine-tuned Mistral model trained for the task of extracting recombination examples from scientific abstracts, as described in the paper [CHIMERA: A Knowledge Base of Scientific Idea Recombinations for Research Analysis and Ideation](https://huggingface.co/papers/2505.20779). The model utilizes a LoRA adapter on top of a Mistral base model.

The model can be used for the information extraction task of identifying recombination examples within scientific text.

**Quick Links**
- 🌐 [Project](https://noy-sternlicht.github.io/CHIMERA-Web)
- 📃 [Paper](https://arxiv.org/abs/2505.20779)
- 🛠️ [Code](https://github.com/noy-sternlicht/CHIMERA-KB)

## Sample Usage

You can use this model with the Hugging Face `transformers` library to extract recombination instances from text. The model expects a specific prompt format for this task.

```python
from transformers import pipeline, AutoTokenizer
import torch

model_id = "noystl/mistral-e2e" 

tokenizer = AutoTokenizer.from_pretrained(model_id)

# Initialize the text generation pipeline
generator = pipeline(
    "text-generation", 
    model=model_id, 
    tokenizer=tokenizer,
    torch_dtype=torch.bfloat16, # Use bfloat16 for better performance on compatible GPUs
    device_map="auto", # Automatically select best device (GPU or CPU)
    trust_remote_code=True # Required for custom model components
)

# Example abstract for recombination extraction
abstract = """The multi-granular diagnostic approach of pathologists can inspire Histopathological image classification.
This suggests a novel way to improve accuracy in image classification tasks."""

# Format the input prompt as expected by the model
prompt = f"Extract any recombination instances (inspiration/combination) from the following abstract:\
Abstract: {abstract}\
Recombination:"

# Generate the output. Use do_sample=False for deterministic extraction.
# max_new_tokens should be set appropriately for the expected JSON output.
outputs = generator(prompt, max_new_tokens=200, do_sample=False)

# Print the generated text, which should contain the extracted recombination in JSON format
print(outputs[0]["generated_text"])
```

For more advanced usage, including training and evaluation, please refer to the [GitHub repository](https://github.com/noy-sternlicht/CHIMERA-KB).

**Bibtex**
```bibtex
@misc{sternlicht2025chimeraknowledgebaseidea,
      title={CHIMERA: A Knowledge Base of Idea Recombination in Scientific Literature}, 
      author={Noy Sternlicht and Tom Hope},
      year={2025},
      eprint={2505.20779},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2505.20779}, 
}
```