Text Generation
Transformers
English
mistral-e2e / README.md
nielsr's picture
nielsr HF Staff
Improve model card: Update license and add sample usage
9fdcb57 verified
|
raw
history blame
2.9 kB
---
base_model:
- mistralai/Mistral-7B-Instruct-v0.3
datasets:
- noystl/Recombination-Extraction
language:
- en
library_name: transformers
license: apache-2.0
pipeline_tag: text-generation
---
This Hugging Face repository contains a fine-tuned Mistral model trained for the task of extracting recombination examples from scientific abstracts, as described in the paper [CHIMERA: A Knowledge Base of Scientific Idea Recombinations for Research Analysis and Ideation](https://huggingface.co/papers/2505.20779). The model utilizes a LoRA adapter on top of a Mistral base model.
The model can be used for the information extraction task of identifying recombination examples within scientific text.
**Quick Links**
- ๐ŸŒ [Project](https://noy-sternlicht.github.io/CHIMERA-Web)
- ๐Ÿ“ƒ [Paper](https://arxiv.org/abs/2505.20779)
- ๐Ÿ› ๏ธ [Code](https://github.com/noy-sternlicht/CHIMERA-KB)
## Sample Usage
You can use this model with the Hugging Face `transformers` library to extract recombination instances from text. The model expects a specific prompt format for this task.
```python
from transformers import pipeline, AutoTokenizer
import torch
model_id = "noystl/mistral-e2e"
tokenizer = AutoTokenizer.from_pretrained(model_id)
# Initialize the text generation pipeline
generator = pipeline(
"text-generation",
model=model_id,
tokenizer=tokenizer,
torch_dtype=torch.bfloat16, # Use bfloat16 for better performance on compatible GPUs
device_map="auto", # Automatically select best device (GPU or CPU)
trust_remote_code=True # Required for custom model components
)
# Example abstract for recombination extraction
abstract = """The multi-granular diagnostic approach of pathologists can inspire Histopathological image classification.
This suggests a novel way to improve accuracy in image classification tasks."""
# Format the input prompt as expected by the model
prompt = f"Extract any recombination instances (inspiration/combination) from the following abstract:\
Abstract: {abstract}\
Recombination:"
# Generate the output. Use do_sample=False for deterministic extraction.
# max_new_tokens should be set appropriately for the expected JSON output.
outputs = generator(prompt, max_new_tokens=200, do_sample=False)
# Print the generated text, which should contain the extracted recombination in JSON format
print(outputs[0]["generated_text"])
```
For more advanced usage, including training and evaluation, please refer to the [GitHub repository](https://github.com/noy-sternlicht/CHIMERA-KB).
**Bibtex**
```bibtex
@misc{sternlicht2025chimeraknowledgebaseidea,
title={CHIMERA: A Knowledge Base of Idea Recombination in Scientific Literature},
author={Noy Sternlicht and Tom Hope},
year={2025},
eprint={2505.20779},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2505.20779},
}
```