Instructions to use ronit01/rag_tuned_minilm with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries
How to use ronit01/rag_tuned_minilm with sentence-transformers:
from sentence_transformers import SentenceTransformer

model = SentenceTransformer("ronit01/rag_tuned_minilm")

sentences = [
    "How do you resolve an ImportError for GenerationMixin that occurs between experiments?",
    "This use case notebook features an hybrid workflow spanning a self-hosted open LLM for embeddings and an Open AI call for generation. ",
    "This tutorial shows Group Relative Policy Optimization (GRPO) to improve mathematical reasoning capabilities. \nGRPO is an RL approach that uses multiple reward functions to provide richer training signals.\n\nIt uses the GSM8K mathematical reasoning dataset;\n`see its details on Hugging Face <https://huggingface.co/datasets/openai/gsm8k>`__.\nWe use a sample of 500 training examples and 100 evaluation examples for tractable demo runtimes.\n\nThe prompt format includes a system message instructing the model to respond with structured reasoning\nand answer tags, encouraging step-by-step mathematical problem solving with clear formatting.\n\n\nModel, Adapter, and Trainer Knobs\n-------\n\nWe compare 3 different base model architectures: Llama-3.1-8B-Instruct, Qwen2.5-3B-Instruct, \nand Qwen2.5-7B-Instruct, all using 4-bit quantization for efficient training.\n\nAll models use the same medium capacity LoRA configuration, targeting only 2 modules. \nWe compare two different learning rates for the smaller Qwen model alone.\nThis results in 4 total combinations launched with a simple grid search.\n\nThere are 5 custom reward functions that collectively shape the model's behavior. \nThe whole set of reward functions is used for all configs. \n\n* Correctness reward: Awards 2.0 points for matching the ground truth answer exactly.\n* Integer reward: Awards 0.5 points for producing numeric answers (validates output format).\n* Strict format reward: Awards 0.5 points for exact XML formatting compliance.\n* Soft format reward: Awards 0.5 points for flexible XML formatting (more lenient matching).\n* XML count reward: Fine-grained reward (up to 0.5 points) for proper XML tag usage and structure.\n\nThe lite version uses two smaller architectures: Qwen2.5-0.5B-Instruct and Llama-3.2-1B-Instruct, \nboth still using 4-bit quantization. LoRA capacity is reduced with rank 16.",
    "    :param search_cfg: The search algorithm type and its kwargs to use for retrieval of vectors/chunks, provided as a single dictionary. Must include a key :code:`\"type\"` with one of the following three options listed as value; default is :code:`\"similarity\"`.\n\n      * :code:`\"similarity\"`: Standard cosine similarity search.\n      * :code:`\"similarity_score_threshold\"`: Similarity search with minimum score threshold (SST).\n      * :code:`\"mmr\"`: Maximum Marginal Relevance (MMR) search for diversity.\n\n      Additional parameters for search configuration depend on the type; the keys can include the following:\n\n      * :code:`\"k\"`: Number of documents to retrieve. Default is 5.\n      * :code:`\"filter\"`: Optional filter criteria function for search results.\n      * :code:`\"score_threshold\"`: Only for SST. Minimum similarity score threshold. \n      * :code:`\"fetch_k\"`: Only for MMR. Number of documents to fetch before MMR reranking. Default is 20.\n      * :code:`\"lambda_mult\"`: Only for MMR. Diversity parameter for MMR balancing relevance vs. diversity. Default is 0.5.\n    :type search_cfg: dict, optional"
]
embeddings = model.encode(sentences)

similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [4, 4]
Notebooks
Google Colab
Kaggle
New discussion
Resources
View closed (0)
Welcome to the community

The community tab is the place to discuss and collaborate with the HF community!