--- base_model: - mistralai/Mistral-7B-Instruct-v0.3 datasets: - noystl/Recombination-Extraction language: - en library_name: transformers license: apache-2.0 pipeline_tag: text-generation --- This Hugging Face repository contains a fine-tuned Mistral model trained for the task of extracting recombination examples from scientific abstracts, as described in the paper [CHIMERA: A Knowledge Base of Scientific Idea Recombinations for Research Analysis and Ideation](https://huggingface.co/papers/2505.20779). The model utilizes a LoRA adapter on top of a Mistral base model. The model can be used for the information extraction task of identifying recombination examples within scientific text. **Quick Links** - 🌐 [Project](https://noy-sternlicht.github.io/CHIMERA-Web) - 📃 [Paper](https://arxiv.org/abs/2505.20779) - 🛠️ [Code](https://github.com/noy-sternlicht/CHIMERA-KB) ## Sample Usage You can use this model with the Hugging Face `transformers` library to extract recombination instances from text. The model expects a specific prompt format for this task. ```python from transformers import pipeline, AutoTokenizer import torch model_id = "noystl/mistral-e2e" tokenizer = AutoTokenizer.from_pretrained(model_id) # Initialize the text generation pipeline generator = pipeline( "text-generation", model=model_id, tokenizer=tokenizer, torch_dtype=torch.bfloat16, # Use bfloat16 for better performance on compatible GPUs device_map="auto", # Automatically select best device (GPU or CPU) trust_remote_code=True # Required for custom model components ) # Example abstract for recombination extraction abstract = """The multi-granular diagnostic approach of pathologists can inspire Histopathological image classification. This suggests a novel way to improve accuracy in image classification tasks.""" # Format the input prompt as expected by the model prompt = f"Extract any recombination instances (inspiration/combination) from the following abstract:\ Abstract: {abstract}\ Recombination:" # Generate the output. Use do_sample=False for deterministic extraction. # max_new_tokens should be set appropriately for the expected JSON output. outputs = generator(prompt, max_new_tokens=200, do_sample=False) # Print the generated text, which should contain the extracted recombination in JSON format print(outputs[0]["generated_text"]) ``` For more advanced usage, including training and evaluation, please refer to the [GitHub repository](https://github.com/noy-sternlicht/CHIMERA-KB). **Bibtex** ```bibtex @misc{sternlicht2025chimeraknowledgebaseidea, title={CHIMERA: A Knowledge Base of Idea Recombination in Scientific Literature}, author={Noy Sternlicht and Tom Hope}, year={2025}, eprint={2505.20779}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2505.20779}, } ```