|
|
---
|
|
|
base_model:
|
|
|
- Qwen/Qwen2.5-3B-Instruct
|
|
|
tags:
|
|
|
- text-generation-inference
|
|
|
- transformers
|
|
|
- qwen2
|
|
|
- trl
|
|
|
- grpo
|
|
|
license: apache-2.0
|
|
|
language:
|
|
|
- zho
|
|
|
- eng
|
|
|
- fra
|
|
|
- spa
|
|
|
- por
|
|
|
- deu
|
|
|
- ita
|
|
|
- rus
|
|
|
- jpn
|
|
|
- kor
|
|
|
- vie
|
|
|
- tha
|
|
|
- ara
|
|
|
---
|
|
|
|
|
|
# TBH.AI Secure Reasoning Model
|
|
|
|
|
|
- **Developed by:** TBH.AI
|
|
|
- **License:** apache-2.0
|
|
|
- **Fine-tuned from:** Qwen/Qwen2.5-3B-Instruct
|
|
|
- **Fine-tuning Method:** GRPO (General Reinforcement with Policy Optimization)
|
|
|
- **Inspired by:** DeepSeek-R1
|
|
|
|
|
|
## **Model Description**
|
|
|
TBH.AI Secure Reasoning Model is a cutting-edge AI model designed for secure, reliable, and structured reasoning. Fine-tuned on Qwen 2.5 using GRPO, it enhances logical reasoning, decision-making, and problem-solving capabilities while maintaining a strong focus on reducing AI hallucinations and ensuring factual accuracy.
|
|
|
|
|
|
Unlike conventional language models that rely primarily on knowledge retrieval, TBH.AI's model is designed to autonomously engage with complex problems, breaking them down into structured thought processes. Inspired by DeepSeek-R1, it employs advanced reinforcement learning methodologies that allow it to validate and refine its logical conclusions securely and effectively.
|
|
|
|
|
|
This model is particularly suited for tasks requiring high-level reasoning, structured analysis, and problem-solving in critical domains such as cybersecurity, finance, and research. It is ideal for professionals and organizations seeking AI solutions that prioritize security, transparency, and truthfulness.
|
|
|
|
|
|
## **Features**
|
|
|
- **Secure Self-Reasoning Capabilities:** Independently analyzes problems while ensuring factual consistency.
|
|
|
- **Reinforcement Learning with GRPO:** Fine-tuned using policy optimization techniques for logical precision.
|
|
|
- **Multi-Step Logical Deduction:** Breaks down complex queries into structured, step-by-step responses.
|
|
|
- **Industry-Ready Security Focus:** Ideal for cybersecurity, finance, and high-stakes applications requiring trust and reliability.
|
|
|
|
|
|
## **Limitations**
|
|
|
- Requires well-structured prompts for optimal reasoning depth.
|
|
|
- Not optimized for tasks requiring extensive factual recall beyond its training scope.
|
|
|
- Performance depends on reinforcement learning techniques and fine-tuning datasets.
|
|
|
|
|
|
## **Usage**
|
|
|
To use this model for secure text generation and reasoning tasks, follow the structure below:
|
|
|
```python
|
|
|
from transformers import AutoTokenizer, AutoModelForCausalLM
|
|
|
import torch
|
|
|
|
|
|
# Load tokenizer and model
|
|
|
tokenizer = AutoTokenizer.from_pretrained("saishshinde15/TBH.AI_Base_Reasoning")
|
|
|
model = AutoModelForCausalLM.from_pretrained("saishshinde15/TBH.AI_Base_Reasoning")
|
|
|
|
|
|
# Prepare input prompt using chat template
|
|
|
SYSTEM_PROMPT = """
|
|
|
Respond in the following format:
|
|
|
<reasoning>
|
|
|
...
|
|
|
</reasoning>
|
|
|
<answer>
|
|
|
...
|
|
|
</answer>
|
|
|
"""
|
|
|
text = tokenizer.apply_chat_template([
|
|
|
{"role": "system", "content": SYSTEM_PROMPT},
|
|
|
{"role": "user", "content": "What is 2x+3=4"},
|
|
|
], tokenize=False, add_generation_prompt=True)
|
|
|
|
|
|
# Tokenize input
|
|
|
input_ids = tokenizer(text, return_tensors="pt").input_ids
|
|
|
|
|
|
# Move to GPU if available
|
|
|
device = "cuda" if torch.cuda.is_available() else "cpu"
|
|
|
model.to(device)
|
|
|
input_ids = input_ids.to(device)
|
|
|
|
|
|
# Generate response
|
|
|
from vllm import SamplingParams
|
|
|
sampling_params = SamplingParams(
|
|
|
temperature=0.8,
|
|
|
top_p=0.95,
|
|
|
max_tokens=1024,
|
|
|
)
|
|
|
output = model.generate(
|
|
|
input_ids,
|
|
|
sampling_params=sampling_params,
|
|
|
)
|
|
|
|
|
|
# Decode and print output
|
|
|
output_text = tokenizer.decode(output[0], skip_special_tokens=True)
|
|
|
print(output_text)
|
|
|
```
|
|
|
|
|
|
<details>
|
|
|
<summary>Fast inference</summary>
|
|
|
|
|
|
```python
|
|
|
pip install transformers vllm vllm[lora] torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
|
|
|
|
|
|
text = tokenizer.apply_chat_template([
|
|
|
{"role" : "system", "content" : SYSTEM_PROMPT},
|
|
|
{"role" : "user", "content" : "What is 2x+3=4"},
|
|
|
], tokenize = False, add_generation_prompt = True)
|
|
|
|
|
|
from vllm import SamplingParams
|
|
|
sampling_params = SamplingParams(
|
|
|
temperature = 0.8,
|
|
|
top_p = 0.95,
|
|
|
max_tokens = 1024,
|
|
|
)
|
|
|
output = model.fast_generate(
|
|
|
text,
|
|
|
sampling_params = sampling_params,
|
|
|
lora_request = model.load_lora("grpo_saved_lora"),
|
|
|
)[0].outputs[0].text
|
|
|
|
|
|
output
|
|
|
```
|
|
|
</details>
|
|
|
|
|
|
# Recommended Prompt
|
|
|
Use the following prompt for detailed and personalized results. This is the recommended format as the model was fine-tuned to respond in this structure:
|
|
|
|
|
|
```python
|
|
|
You are a secure reasoning model developed by TBH.AI. Your role is to respond in the following structured format:
|
|
|
|
|
|
<reasoning>
|
|
|
...
|
|
|
</reasoning>
|
|
|
<answer>
|
|
|
...
|
|
|
</answer>
|
|
|
```
|
|
|
|