| | --- |
| | base_model: |
| | - mistralai/Mistral-7B-Instruct-v0.3 |
| | datasets: |
| | - noystl/Recombination-Extraction |
| | language: |
| | - en |
| | library_name: transformers |
| | license: apache-2.0 |
| | pipeline_tag: text-generation |
| | --- |
| | |
| | This Hugging Face repository contains a fine-tuned Mistral model trained for the task of extracting recombination examples from scientific abstracts, as described in the paper [CHIMERA: A Knowledge Base of Scientific Idea Recombinations for Research Analysis and Ideation](https://huggingface.co/papers/2505.20779). The model utilizes a LoRA adapter on top of a Mistral base model. |
| |
|
| | The model can be used for the information extraction task of identifying recombination examples within scientific text. |
| |
|
| | **Quick Links** |
| | - ๐ [Project](https://noy-sternlicht.github.io/CHIMERA-Web) |
| | - ๐ [Paper](https://arxiv.org/abs/2505.20779) |
| | - ๐ ๏ธ [Code](https://github.com/noy-sternlicht/CHIMERA-KB) |
| |
|
| | ## Sample Usage |
| |
|
| | You can use this model with the Hugging Face `transformers` library to extract recombination instances from text. The model expects a specific prompt format for this task. |
| |
|
| | ```python |
| | from transformers import pipeline, AutoTokenizer |
| | import torch |
| | |
| | model_id = "noystl/mistral-e2e" |
| | |
| | tokenizer = AutoTokenizer.from_pretrained(model_id) |
| | |
| | # Initialize the text generation pipeline |
| | generator = pipeline( |
| | "text-generation", |
| | model=model_id, |
| | tokenizer=tokenizer, |
| | torch_dtype=torch.bfloat16, # Use bfloat16 for better performance on compatible GPUs |
| | device_map="auto", # Automatically select best device (GPU or CPU) |
| | trust_remote_code=True # Required for custom model components |
| | ) |
| | |
| | # Example abstract for recombination extraction |
| | abstract = """The multi-granular diagnostic approach of pathologists can inspire Histopathological image classification. |
| | This suggests a novel way to improve accuracy in image classification tasks.""" |
| | |
| | # Format the input prompt as expected by the model |
| | prompt = f"Extract any recombination instances (inspiration/combination) from the following abstract:\ |
| | Abstract: {abstract}\ |
| | Recombination:" |
| | |
| | # Generate the output. Use do_sample=False for deterministic extraction. |
| | # max_new_tokens should be set appropriately for the expected JSON output. |
| | outputs = generator(prompt, max_new_tokens=200, do_sample=False) |
| | |
| | # Print the generated text, which should contain the extracted recombination in JSON format |
| | print(outputs[0]["generated_text"]) |
| | ``` |
| |
|
| | For more advanced usage, including training and evaluation, please refer to the [GitHub repository](https://github.com/noy-sternlicht/CHIMERA-KB). |
| |
|
| | **Bibtex** |
| | ```bibtex |
| | @misc{sternlicht2025chimeraknowledgebaseidea, |
| | title={CHIMERA: A Knowledge Base of Idea Recombination in Scientific Literature}, |
| | author={Noy Sternlicht and Tom Hope}, |
| | year={2025}, |
| | eprint={2505.20779}, |
| | archivePrefix={arXiv}, |
| | primaryClass={cs.CL}, |
| | url={https://arxiv.org/abs/2505.20779}, |
| | } |
| | ``` |