Improve model card: Update license and add sample usage

9fdcb57 verified 7 months ago

2.9 kB

	---
	base_model:
	- mistralai/Mistral-7B-Instruct-v0.3
	datasets:
	- noystl/Recombination-Extraction
	language:
	- en
	library_name: transformers
	license: apache-2.0
	pipeline_tag: text-generation
	---

	This Hugging Face repository contains a fine-tuned Mistral model trained for the task of extracting recombination examples from scientific abstracts, as described in the paper [CHIMERA: A Knowledge Base of Scientific Idea Recombinations for Research Analysis and Ideation](https://huggingface.co/papers/2505.20779). The model utilizes a LoRA adapter on top of a Mistral base model.

	The model can be used for the information extraction task of identifying recombination examples within scientific text.

	Quick Links
	- 🌐 [Project](https://noy-sternlicht.github.io/CHIMERA-Web)
	- 📃 [Paper](https://arxiv.org/abs/2505.20779)
	- 🛠️ [Code](https://github.com/noy-sternlicht/CHIMERA-KB)

	## Sample Usage

	You can use this model with the Hugging Face `transformers` library to extract recombination instances from text. The model expects a specific prompt format for this task.

	```python
	from transformers import pipeline, AutoTokenizer
	import torch

	model_id = "noystl/mistral-e2e"

	tokenizer = AutoTokenizer.from_pretrained(model_id)

	# Initialize the text generation pipeline
	generator = pipeline(
	"text-generation",
	model=model_id,
	tokenizer=tokenizer,
	torch_dtype=torch.bfloat16, # Use bfloat16 for better performance on compatible GPUs
	device_map="auto", # Automatically select best device (GPU or CPU)
	trust_remote_code=True # Required for custom model components
	)

	# Example abstract for recombination extraction
	abstract = """The multi-granular diagnostic approach of pathologists can inspire Histopathological image classification.
	This suggests a novel way to improve accuracy in image classification tasks."""

	# Format the input prompt as expected by the model
	prompt = f"Extract any recombination instances (inspiration/combination) from the following abstract:\
	Abstract: {abstract}\
	Recombination:"

	# Generate the output. Use do_sample=False for deterministic extraction.
	# max_new_tokens should be set appropriately for the expected JSON output.
	outputs = generator(prompt, max_new_tokens=200, do_sample=False)

	# Print the generated text, which should contain the extracted recombination in JSON format
	print(outputs[0]["generated_text"])
	```

	For more advanced usage, including training and evaluation, please refer to the [GitHub repository](https://github.com/noy-sternlicht/CHIMERA-KB).

	Bibtex
	```bibtex
	@misc{sternlicht2025chimeraknowledgebaseidea,
	title={CHIMERA: A Knowledge Base of Idea Recombination in Scientific Literature},
	author={Noy Sternlicht and Tom Hope},
	year={2025},
	eprint={2505.20779},
	archivePrefix={arXiv},
	primaryClass={cs.CL},
	url={https://arxiv.org/abs/2505.20779},
	}
	```