RL Optimized Chat Model

This repository contains a RL-optimized chat model based on TinyLlama/TinyLlama-1.1B-Chat-v1.0.

Model Description

This model uses a DQN (Deep Q-Network) agent to select the best prompting strategy for a base language model.

Usage

Quick Start

Clone the repository:

git clone https://huggingface.co/yqq1231231/rl-chat-tinyllama
cd rl-chat-tinyllama

Install dependencies:

pip install torch transformers numpy

Run the chat script:

python chat.py

Advanced Usage

You can also use the model directly in your code:

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from src.models.rl_agent import DQNAgent

# 1. Load the RL model
agent_config = {
    'state_dim': 1,
    'action_dim': 3,
}
agent = DQNAgent(agent_config)
agent.load("best_model.pt")

# 2. Load the base language model
base_model = "TinyLlama/TinyLlama-1.1B-Chat-v1.0"
tokenizer = AutoTokenizer.from_pretrained(base_model)
model = AutoModelForCausalLM.from_pretrained(base_model)

# Now you can use both models together!

Training Method

This model was trained using reinforcement learning to optimize prompt selection for better responses, using the DQN algorithm with reward signals to guide the model toward better prompting strategies.

Downloads last month: -

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support