Instructions to use neural-bridge/Rago-v2-7b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use neural-bridge/Rago-v2-7b with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="neural-bridge/Rago-v2-7b")# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("neural-bridge/Rago-v2-7b") model = AutoModelForCausalLM.from_pretrained("neural-bridge/Rago-v2-7b") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use neural-bridge/Rago-v2-7b with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "neural-bridge/Rago-v2-7b" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "neural-bridge/Rago-v2-7b", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/neural-bridge/Rago-v2-7b
- SGLang
How to use neural-bridge/Rago-v2-7b with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "neural-bridge/Rago-v2-7b" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "neural-bridge/Rago-v2-7b", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "neural-bridge/Rago-v2-7b" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "neural-bridge/Rago-v2-7b", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use neural-bridge/Rago-v2-7b with Docker Model Runner:
docker model run hf.co/neural-bridge/Rago-v2-7b
Rago v2 7B
Rago v2 7B is a 7B parameter retrieval-augmented generation-optimized model built by Neural Bridge AI and trained on RAG Full Dataset 20000. It is available under Apache license 2.0.
Model Description
Rago v2 7B model is a retrieval-augmented generation-optimized (RAGO) model, which enhances large language models by integrating an external authoritative knowledge base (context) for generating responses. This integration significantly improves the model's ability to produce relevant, accurate, and context-specific output across specialized domains or internal data without necessitating retraining. It addresses key challenges of large language models (LLMs), such as unpredictability, reliance on potentially outdated data, and the propagation of incorrect information, thereby improving user trust in AI applications. Rago v2 7B, specifically, is an advancement built upon the Llama 2 7B model, optimized for retrieval-augmented generation, making it particularly effective in contextually aware response generation.
from transformers import AutoTokenizer, AutoModelForCausalLM
import transformers
import torch
model = "neural-bridge/Rago-v2-7b"
tokenizer = AutoTokenizer.from_pretrained(model)
pipeline = transformers.pipeline(
"text-generation",
model=model,
tokenizer=tokenizer,
torch_dtype=torch.bfloat16,
trust_remote_code=True,
device_map="auto",
)
def create_prompt(context, question):
return f"""##CONTEXT## {context} ##QUESTION## {question} ##ANSWER##"""
sequences = pipeline(
create_prompt(
context="Neural Bridge AI is a software company developing artificial intelligence (AI) solutions. It is founded in New York in the USA.",
question="What solutions does Neural Bridge AI develop for its clients?"
),
max_length=200,
do_sample=True,
top_k=10,
num_return_sequences=1,
eos_token_id=tokenizer.eos_token_id,
)
def print_result(generated_text):
result_start = "##ANSWER##"
answer_start = generated_text.find(result_start)
print(generated_text[answer_start + len(result_start) :].strip())
for seq in sequences:
print_result(seq["generated_text"])
Model Details
Training Data
Rago v2 7B has been trained using the Neural Bridge's RAG Full 20000 dataset, which comprises a blend of RefinedWeb, gms8k, and RAG Hallucination Dataset 1000.
Training Details
In terms of training specifics, Rago v2 7B is built upon Llama 2 7B employing LoRA to enhance the model's capability to deliver relevant, precise, and context-specific output across specialized domains or internal datasets. This approach aims to tackle significant challenges faced by LLMs, such as unpredictability, reliance on potentially outdated information, and the spread of incorrect data. The architecture of Rago v2 7B mirrors that of Llama 2 7B, augmented with LoRA adapters. The model is trained on a single NVIDIA A100 GPU for approximately two days, utilizing a learning rate of 1e-5 with cosine scheduler, alongside the following LoRA parameters:
- LoRA Rank (R): 64
- LoRA Alpha: 16
- LoRA Dropout: 0.1
- Target Modules: q_proj, k_proj, v_proj, o_proj
Rago v2 7B benefits from a custom data collator designed to boost model performance significantly. Employing a masked language modeling (MLM) strategy, the model is fine-tuned to generate more accurate responses by exclusively masking the answer portion of the training data. This custom data collator has led to noticeable improvements in the model's factuality performance.
Neural Bridge AI Rago Models Index
| Model | Link | Base Model |
|---|---|---|
| Rago v1 1B | link | Falcon-RW-1B |
| Rago v1 7B | link | Falcon-7B |
| Rago v1 40B | link | Falcon-40B |
| Rago v2 7B | link | Llama 2 7B |
| Rago v2 13B | link | Llama 2 13B |
License
This public extract is made available under Apache license 2.0. Users should also abide to the Llama 2 7B ToU.
- Downloads last month
- 6