Instructions to use jusKnows/hr_bloom_prompt_tuned with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use jusKnows/hr_bloom_prompt_tuned with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="jusKnows/hr_bloom_prompt_tuned")# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("jusKnows/hr_bloom_prompt_tuned") model = AutoModelForCausalLM.from_pretrained("jusKnows/hr_bloom_prompt_tuned") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use jusKnows/hr_bloom_prompt_tuned with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "jusKnows/hr_bloom_prompt_tuned" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "jusKnows/hr_bloom_prompt_tuned", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/jusKnows/hr_bloom_prompt_tuned
- SGLang
How to use jusKnows/hr_bloom_prompt_tuned with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "jusKnows/hr_bloom_prompt_tuned" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "jusKnows/hr_bloom_prompt_tuned", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "jusKnows/hr_bloom_prompt_tuned" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "jusKnows/hr_bloom_prompt_tuned", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use jusKnows/hr_bloom_prompt_tuned with Docker Model Runner:
docker model run hf.co/jusKnows/hr_bloom_prompt_tuned
Model Card for Model ID
This model was trained to choose between RAG and COT techniques for especific domain chat aplications. Depending on users questions, the model may choose what is the best way to generate the response.
- Sometimes, questions are domain specific and can be answered by performing a simple RAG.
- Sometimes, we may get complex questions that require a step by step approach.
We performed a simple prompt tunning over a low-parameters base model so that we can create a basic low parameter model capable of few-shot classification with really low dataset of nearly ~100 samples.
Base Model Sources
Prompt tunned version from bigscience/bloom-560m on a bnb configuration of 4bits.
Uses
This model aims to start to perform a especific task by choosing Retrieval Augmented Generation-RAG or Chain of Thought-COT
How to Get Started with the Model
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
def make_inference(query, model):
prompt = """\
### Below is an instruction that describes a task. Write a response that appropriately completes the request.
### Instruction:
Categorize this question into one of this two categories:
RAG
COT
Input:
{Question}
### Response:
"""
batch = tokenizer(prompt.format(Question=query), return_tensors='pt').to("cuda")
with torch.cuda.amp.autocast():
output_tokens = model.generate(**batch, max_new_tokens=10)
return output_tokens
query = "{your_question_goes_here}"
output_tokens = make_inference(query, model)
response = tokenizer.decode(output_tokens[0])
print(response)
Training Details
Training Data
The dataset used is a sinthetic dataset that contains pairs, values of quentions, techniques.
Training Prompt
### Below is an instruction that describes a task. Write a response that appropriately completes the request.
### Instruction:
Categorize this question into one of this two categories:
RAG
COT
Input:
{Question}
### Response:
{Category}
### End
"""
Training Hyperparameters
- evaluation_strategy="steps",
- eval_steps=1,
- logging_strategy="steps",
- per_device_train_batch_size=6,
- gradient_accumulation_steps=4,
- warmup_steps=50,
- max_steps=100,
- learning_rate=1e-3,
- fp16=True,
- logging_steps=1,
Evaluation
Metrics
- Accuracy
Results
Train:
Validation:
Test:
Model Card Contact
Linkedin: www.linkedin.com/in/jrodriguez130
- Downloads last month
- 8


