Text Generation
Transformers
English
mistral-e2e / README.md
nielsr's picture
nielsr HF Staff
Improve model card: Update license and add sample usage
9fdcb57 verified
|
raw
history blame
2.9 kB
metadata
base_model:
  - mistralai/Mistral-7B-Instruct-v0.3
datasets:
  - noystl/Recombination-Extraction
language:
  - en
library_name: transformers
license: apache-2.0
pipeline_tag: text-generation

This Hugging Face repository contains a fine-tuned Mistral model trained for the task of extracting recombination examples from scientific abstracts, as described in the paper CHIMERA: A Knowledge Base of Scientific Idea Recombinations for Research Analysis and Ideation. The model utilizes a LoRA adapter on top of a Mistral base model.

The model can be used for the information extraction task of identifying recombination examples within scientific text.

Quick Links

Sample Usage

You can use this model with the Hugging Face transformers library to extract recombination instances from text. The model expects a specific prompt format for this task.

from transformers import pipeline, AutoTokenizer
import torch

model_id = "noystl/mistral-e2e" 

tokenizer = AutoTokenizer.from_pretrained(model_id)

# Initialize the text generation pipeline
generator = pipeline(
    "text-generation", 
    model=model_id, 
    tokenizer=tokenizer,
    torch_dtype=torch.bfloat16, # Use bfloat16 for better performance on compatible GPUs
    device_map="auto", # Automatically select best device (GPU or CPU)
    trust_remote_code=True # Required for custom model components
)

# Example abstract for recombination extraction
abstract = """The multi-granular diagnostic approach of pathologists can inspire Histopathological image classification.
This suggests a novel way to improve accuracy in image classification tasks."""

# Format the input prompt as expected by the model
prompt = f"Extract any recombination instances (inspiration/combination) from the following abstract:\
Abstract: {abstract}\
Recombination:"

# Generate the output. Use do_sample=False for deterministic extraction.
# max_new_tokens should be set appropriately for the expected JSON output.
outputs = generator(prompt, max_new_tokens=200, do_sample=False)

# Print the generated text, which should contain the extracted recombination in JSON format
print(outputs[0]["generated_text"])

For more advanced usage, including training and evaluation, please refer to the GitHub repository.

Bibtex

@misc{sternlicht2025chimeraknowledgebaseidea,
      title={CHIMERA: A Knowledge Base of Idea Recombination in Scientific Literature}, 
      author={Noy Sternlicht and Tom Hope},
      year={2025},
      eprint={2505.20779},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2505.20779}, 
}