| | --- |
| | base_model: BAAI/bge-m3 |
| | tags: |
| | - datadreamer |
| | - datadreamer-0.46.0 |
| | - synthetic |
| | - sentence-transformers |
| | - feature-extraction |
| | - sentence-similarity |
| | library_name: sentence-transformers |
| | pipeline_tag: sentence-similarity |
| | --- |
| | |
| | # Model Card |
| |
|
| | Given a document, this retrieval embedding model helps retrieve instruction templates from [FineTemplates](https://huggingface.co/datasets/fineinstructions/finetemplates) relevant to various chunks / sections of the document. |
| |
|
| | **Note:** This retrieval embedding is symmetric, so it can also be used to retrieve relevant documents to a [(`compatible_document_description`) of a instruction template](https://huggingface.co/datasets/fineinstructions/finetemplates). |
| |
|
| | ## Requirements |
| | ``` |
| | datasets |
| | faiss |
| | huggingface_hub |
| | numpy |
| | pandas |
| | sentence_transformers |
| | ``` |
| |
|
| | ## Example Usage |
| |
|
| | ```python3 |
| | import importlib |
| | import json |
| | from huggingface_hub import hf_hub_download |
| | |
| | |
| | def download_and_import_module(module_name, variable): |
| | module = importlib.util.module_from_spec( |
| | importlib.util.spec_from_file_location( |
| | module_name, |
| | hf_hub_download( |
| | repo_id="fineinstructions/instruction_template_retrieval_embedding", |
| | filename=f"{module_name}.py", |
| | ), |
| | ) |
| | ) |
| | module.__spec__.loader.exec_module(module) |
| | return getattr(module, variable) |
| | |
| | |
| | # Import the retriever helper class |
| | InstructionTemplateRetriever = download_and_import_module("instruction_template_retriever", "InstructionTemplateRetriever") |
| | |
| | # Prepare an example document |
| | EXAMPLE_DOC = """ |
| | Title: Surprising Facts about Pigeons |
| | Submitted On: September 24, 2008 |
| | |
| | Fact 1: |
| | During World War I, a homing pigeon named Cher Ami played a critical role in saving nearly 200 soldiers who were trapped behind enemy lines. |
| | Despite being injured by enemy fire, Cher Ami managed to deliver a crucial message that led to their rescue. For this act of bravery, the |
| | French government awarded the pigeon the Croix de Guerre, a military medal of honor. Cher Ami became a symbol of courage and the extraordinary |
| | utility of pigeons in wartime communication. |
| | |
| | Fact 2: |
| | Pigeons possess impressive cognitive abilities, one of the most surprising being their capacity for self-recognition in mirrors. This |
| | trait is rare in the animal kingdom and is often considered a marker of higher intelligence. Experiments have shown that pigeons can distinguish |
| | themselves from other birds when looking into a mirror, suggesting a level of self-awareness previously thought to be unique to primates and a |
| | few other animals. |
| | |
| | Fact 3: |
| | Thanks to centuries of selective breeding, there are now more than 300 recognized breeds of domestic pigeon. These range from show pigeons with |
| | elaborate feather patterns and head crests to performance breeds used in tumbling and racing. The sheer variety reflects the bird’s long history |
| | as a companion species to humans. |
| | |
| | Fact 4: |
| | The Ancient Romans were known for their elaborate grooming rituals, and pigeons played an unexpected role in their beauty routines. Specifically, |
| | they used pigeon droppings as a bleaching agent to style and lighten their hair. This unusual practice was part of the broader Roman obsession with |
| | fashion and appearance, demonstrating how even the most unexpected materials found a place in early cosmetic treatments. |
| | """ |
| | |
| | |
| | # Retrieve relevant instruction templates to different chunks / sections of a document |
| | retriever = InstructionTemplateRetriever( |
| | coverage_chunks=4, sigma=0.05, alpha=1.0 # 4 chunks/sections |
| | ) |
| | print(json.dumps(retriever.search(document=EXAMPLE_DOC), indent=4)) |
| | |
| | # Results look like: |
| | |
| | # Instruction Templates for Entire Document: |
| | # - "What's something <fi>a few word description of something remarkable or noteworthy</fi> you can tell me" |
| | |
| | # Instruction Templates for Chunk 1/4 of the Document: |
| | # - "write a <fi>a few word description of the type of message</fi> for <fi>a significant achievement or milestone</fi>" |
| | |
| | # Instruction Templates for Chunk 2/4 of the Document: |
| | # - "how are <fi>a type of organism or entity</fi> so <fi>exceptionally strong or notable in some way</fi>?" |
| | |
| | # Instruction Templates for Chunk 3/4 of the Document: |
| | # - "what are the common <fi>a type of organism, creature, or entity</fi>?" |
| | |
| | # Instruction Templates for Chunk 4/4 of the Document: |
| | # - "how did <fi>a group of people</fi> <fi>perform a common practice or activity</fi>" |
| | ``` |
| |
|
| | --- |
| | This model was trained with a synthetic dataset with [DataDreamer 🤖💤](https://datadreamer.dev). The synthetic dataset card and model card can be found [here](datadreamer.json). The training arguments can be found [here](training_args.json). |