YAML Metadata
Warning:
empty or missing yaml metadata in repo card
(https://huggingface.co/docs/hub/model-cards#model-card-metadata)
RL Optimized Chat Model
This repository contains a RL-optimized chat model based on TinyLlama/TinyLlama-1.1B-Chat-v1.0.
Model Description
This model uses a DQN (Deep Q-Network) agent to select the best prompting strategy for a base language model.
Usage
Quick Start
- Clone the repository:
git clone https://huggingface.co/yqq1231231/rl-chat-tinyllama
cd rl-chat-tinyllama
- Install dependencies:
pip install torch transformers numpy
- Run the chat script:
python chat.py
Advanced Usage
You can also use the model directly in your code:
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from src.models.rl_agent import DQNAgent
# 1. Load the RL model
agent_config = {
'state_dim': 1,
'action_dim': 3,
}
agent = DQNAgent(agent_config)
agent.load("best_model.pt")
# 2. Load the base language model
base_model = "TinyLlama/TinyLlama-1.1B-Chat-v1.0"
tokenizer = AutoTokenizer.from_pretrained(base_model)
model = AutoModelForCausalLM.from_pretrained(base_model)
# Now you can use both models together!
Training Method
This model was trained using reinforcement learning to optimize prompt selection for better responses, using the DQN algorithm with reward signals to guide the model toward better prompting strategies.
- Downloads last month
- -
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support