Qwen3-4B Outreach Agent — Prompt-Internalized (Stage 4)
RAS1981/Qwen3-4B-outreach-stage4
A compact Russian real-estate operator model trained through a 5-stage curriculum (CPT → S1 → S2 → S3 → S4) to fully internalize long, complex system prompts. This final Stage-4 version requires zero prompt scaffolding at inference and delivers fast TTFT, stable multi-turn reasoning, and consistent sales-oriented behavior.
Model Description
Stage 4 is the final distilled checkpoint in a progressive prompt-internalization pipeline. The model acts as a Russian real-estate qualification agent, trained to:
- Greet users, set conversation tone, and collect key parameters (район, бюджет, сроки).
- Handle noisy/fragmented queries, objections, misclicks, corrections.
- Maintain long, multi-turn conversation state internally (no system prompt needed).
- Direct the client toward booking a call, meeting, or viewing.
- Stay within business rules and safely decline out-of-scope topics.
- Produce structured, concise operator-style messages (bullet points, quick summaries).
The model has been aligned across 4 SFT stages and 1 domain-pretrain stage to operate fully autonomously.
Training Stages Overview
(No hyperparameters disclosed — only conceptual behavior.)
Stage 0 — Continued Pretrain (Domain CPT)
Large corpus of Russian real-estate text; builds robust domain vocabulary, patterns, and document-level reasoning.
Stage 1 — Full 41k Prompt
Full system template + easy queries; teaches tone, etiquette, greetings, qualification patterns, and safety rules.
Stage 2 — Core 15k Rules
Mid-level compression; model begins internalizing main scripts, question ordering, CTAs, and objection handling.
Stage 3 — 3–5k Summary Prompt
High-compression stage; strengthens behavior even when template is short or partially omitted.
Stage 4 — Zero Prompt
Final distilled agent; fully internalized scripts, tone, policy, and flow — works with query-only inference.
Recommended Inference Settings
Temperature:
0.1(max stability; operator tone)Top-p:
1.0Max tokens: 2000
System prompt (optional but recommended):
<system_instructions> Вы Александр Оператор по недвижимости в Центр Подбора Новостроек Ваша миссия определить квалифицированных потенциальных клиентов для приобретения новостроек и обеспечить их связь со специализированными консультантами </system_instructions>
Even though Stage-4 does not require system prompts, this small header ensures absolute consistency.
🚀 Quickstart: Run with vLLM (recommended)
1. Install environment
apt update && apt install -y python3-pip git # Basics (30s)
pip install uv # Fast resolver (recommended)
uv venv main --python 3.12 # Create isolated env
source main/bin/activate # Activate venv
uv pip install vllm # Install vLLM 0.11.0+ (2–5 min)
2. Serve the model with vLLM
vllm serve RAS1981/Qwen3-4B-outreach-stage4 \
--max-model-len 8000 \
--dtype auto \
--enable-chunked-prefill \
--max-num-batched-tokens 4000 \
--port 8000 \
--host 0.0.0.0 \
--api-key token-abc123 \
--trust-remote-code \
--enforce-eager \
--download-dir /tmp/hf_cache/models
vLLM will expose an OpenAI-compatible endpoint:
http://<server>:8000/v1
🧪 TTFT Test (OpenAI client compatible)
import time
from openai import OpenAI
client = OpenAI(
base_url="http://localhost:8000/v1",
api_key="token-abc123"
)
def measure_ttft():
start = time.perf_counter()
first_seen = False
response = client.chat.completions.create(
model="RAS1981/Qwen3-4B-outreach-stage4",
messages=[
{
'role': 'system',
'content': (
'<system_instructions>'
'Вы Александр Оператор по недвижимости в Центр Подбора Новостроек '
'Ваша миссия определить квалифицированных потенциальных клиентов '
'для приобретения новостроек и обеспечить их связь '
'со специализированными консультантами'
'</system_instructions>'
)
},
{"role": "user", "content": "здравствуйте"}
],
max_tokens=2000,
temperature=0.1,
stream=True,
)
for idx, chunk in enumerate(response):
delta = chunk.choices[0].delta
text = getattr(delta, "content", None)
t = (time.perf_counter() - start) * 1000
if text and not first_seen:
print(f">>> TTFT: {t:.0f} ms\n")
first_seen = True
if text:
print(text, end="", flush=True)
if __name__ == "__main__":
measure_ttft()
- Downloads last month
- 94
Model tree for RAS1981/Qwen3-4B-outreach-stage4
Base model
RAS1981/Qwen3-4B-outreach-stage0