Qwen3-4B Outreach Agent — Prompt-Internalized (Stage 4)

RAS1981/Qwen3-4B-outreach-stage4

A compact Russian real-estate operator model trained through a 5-stage curriculum (CPT → S1 → S2 → S3 → S4) to fully internalize long, complex system prompts. This final Stage-4 version requires zero prompt scaffolding at inference and delivers fast TTFT, stable multi-turn reasoning, and consistent sales-oriented behavior.


Model Description

Stage 4 is the final distilled checkpoint in a progressive prompt-internalization pipeline. The model acts as a Russian real-estate qualification agent, trained to:

  • Greet users, set conversation tone, and collect key parameters (район, бюджет, сроки).
  • Handle noisy/fragmented queries, objections, misclicks, corrections.
  • Maintain long, multi-turn conversation state internally (no system prompt needed).
  • Direct the client toward booking a call, meeting, or viewing.
  • Stay within business rules and safely decline out-of-scope topics.
  • Produce structured, concise operator-style messages (bullet points, quick summaries).

The model has been aligned across 4 SFT stages and 1 domain-pretrain stage to operate fully autonomously.


Training Stages Overview

(No hyperparameters disclosed — only conceptual behavior.)

Stage 0 — Continued Pretrain (Domain CPT)

Large corpus of Russian real-estate text; builds robust domain vocabulary, patterns, and document-level reasoning.

Stage 1 — Full 41k Prompt

Full system template + easy queries; teaches tone, etiquette, greetings, qualification patterns, and safety rules.

Stage 2 — Core 15k Rules

Mid-level compression; model begins internalizing main scripts, question ordering, CTAs, and objection handling.

Stage 3 — 3–5k Summary Prompt

High-compression stage; strengthens behavior even when template is short or partially omitted.

Stage 4 — Zero Prompt

Final distilled agent; fully internalized scripts, tone, policy, and flow — works with query-only inference.


Recommended Inference Settings

  • Temperature: 0.1 (max stability; operator tone)

  • Top-p: 1.0

  • Max tokens: 2000

  • System prompt (optional but recommended):

    <system_instructions>
    Вы Александр Оператор по недвижимости в Центр Подбора Новостроек 
    Ваша миссия определить квалифицированных потенциальных клиентов 
    для приобретения новостроек и обеспечить их связь со специализированными консультантами
    </system_instructions>
    

Even though Stage-4 does not require system prompts, this small header ensures absolute consistency.


🚀 Quickstart: Run with vLLM (recommended)

1. Install environment

apt update && apt install -y python3-pip git  # Basics (30s)
pip install uv                                # Fast resolver (recommended)
uv venv main --python 3.12                    # Create isolated env
source main/bin/activate                      # Activate venv
uv pip install vllm                            # Install vLLM 0.11.0+ (2–5 min)

2. Serve the model with vLLM

vllm serve RAS1981/Qwen3-4B-outreach-stage4 \
  --max-model-len 8000 \
  --dtype auto \
  --enable-chunked-prefill \
  --max-num-batched-tokens 4000 \
  --port 8000 \
  --host 0.0.0.0 \
  --api-key token-abc123 \
  --trust-remote-code \
  --enforce-eager \
  --download-dir /tmp/hf_cache/models

vLLM will expose an OpenAI-compatible endpoint: http://<server>:8000/v1


🧪 TTFT Test (OpenAI client compatible)

import time
from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:8000/v1", 
    api_key="token-abc123"
)

def measure_ttft():
    start = time.perf_counter()
    first_seen = False

    response = client.chat.completions.create(
        model="RAS1981/Qwen3-4B-outreach-stage4",
        messages=[
            {
                'role': 'system',
                'content': (
                    '<system_instructions>'
                    'Вы Александр Оператор по недвижимости в Центр Подбора Новостроек '
                    'Ваша миссия определить квалифицированных потенциальных клиентов '
                    'для приобретения новостроек и обеспечить их связь '
                    'со специализированными консультантами'
                    '</system_instructions>'
                )
            },
            {"role": "user", "content": "здравствуйте"}
        ],
        max_tokens=2000,
        temperature=0.1,
        stream=True,
    )

    for idx, chunk in enumerate(response):
        delta = chunk.choices[0].delta
        text = getattr(delta, "content", None)
        t = (time.perf_counter() - start) * 1000

        if text and not first_seen:
            print(f">>> TTFT: {t:.0f} ms\n")
            first_seen = True

        if text:
            print(text, end="", flush=True)

if __name__ == "__main__":
    measure_ttft()

Downloads last month
94
Safetensors
Model size
4B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for RAS1981/Qwen3-4B-outreach-stage4