YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

RL Optimized Chat Model

This repository contains a RL-optimized chat model based on TinyLlama/TinyLlama-1.1B-Chat-v1.0.

Model Description

This model uses a DQN (Deep Q-Network) agent to select the best prompting strategy for a base language model.

Usage

Quick Start

  1. Clone the repository:
git clone https://huggingface.co/yqq1231231/rl-chat-tinyllama
cd rl-chat-tinyllama
  1. Install dependencies:
pip install torch transformers numpy
  1. Run the chat script:
python chat.py

Advanced Usage

You can also use the model directly in your code:

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from src.models.rl_agent import DQNAgent

# 1. Load the RL model
agent_config = {
    'state_dim': 1,
    'action_dim': 3,
}
agent = DQNAgent(agent_config)
agent.load("best_model.pt")

# 2. Load the base language model
base_model = "TinyLlama/TinyLlama-1.1B-Chat-v1.0"
tokenizer = AutoTokenizer.from_pretrained(base_model)
model = AutoModelForCausalLM.from_pretrained(base_model)

# Now you can use both models together!

Training Method

This model was trained using reinforcement learning to optimize prompt selection for better responses, using the DQN algorithm with reward signals to guide the model toward better prompting strategies.

Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support