|
|
--- |
|
|
license: mit |
|
|
pipeline_tag: reinforcement-learning |
|
|
tags: |
|
|
- llm |
|
|
- text-generation |
|
|
- reinforcement-learning |
|
|
- reasoning |
|
|
- language-model |
|
|
- transformers |
|
|
- causal-lm |
|
|
- instruction-following |
|
|
- rlhf |
|
|
- alignment |
|
|
- open-source |
|
|
- chat-model |
|
|
--- |
|
|
|
|
|
# Arctic AI – the most accurate neural network with up to 10B parameters created in Russia (English/Russian text) |
|
|
|
|
|
|
|
|
# contact: Twitter: https://x.com/BogUnusov Telegram: @Quloneco email: qulone.corpo@gmail.com |
|
|
|
|
|
🧠 Adaptive Reasoning Loop with Critic-Driven GMPo and Intuition Feedback |
|
|
Arctic AI is trained using a custom reinforcement learning system that extends classical RLHF and diverges from standard GMPO (Generative Model Policy Optimization). Instead, it employs a reasoning-centered pipeline we call GMPo (Generate–Match–Plan–Optimize) augmented with a Critic Loop and a novel intuition-based meta-signal. |
|
|
|
|
|
This design targets more explainable, structurally grounded reasoning via RL updates, optimized with KL-divergence regularization and guided feedback from a Critic module. |
|
|
|
|
|
🔁 GMPo Pipeline (as Structured Policy) |
|
|
The agent processes tasks through four internal reasoning stages: |
|
|
|
|
|
The whole system is based on GMPO (Generative Model Policy Optimization) and the abbreviation just explains the new changes. |
|
|
|
|
|
G — Generate: Produce an initial draft |
|
|
𝑎 |
|
|
0 |
|
|
∼ |
|
|
𝜋 |
|
|
𝜃 |
|
|
( |
|
|
𝑎 |
|
|
∣ |
|
|
𝑠 |
|
|
) |
|
|
a |
|
|
0 |
|
|
|
|
|
∼π |
|
|
θ |
|
|
|
|
|
(a∣s) |
|
|
|
|
|
M — Match: Compare the answer’s logic and format against input constraints |
|
|
|
|
|
P — Plan: Devise a correction or refinement plan |
|
|
𝑝 |
|
|
∼ |
|
|
𝜋 |
|
|
𝜃 |
|
|
𝑝 |
|
|
𝑙 |
|
|
𝑎 |
|
|
𝑛 |
|
|
( |
|
|
𝑝 |
|
|
∣ |
|
|
𝑎 |
|
|
0 |
|
|
, |
|
|
𝑠 |
|
|
) |
|
|
p∼π |
|
|
θ |
|
|
plan |
|
|
|
|
|
(p∣a |
|
|
0 |
|
|
|
|
|
,s) |
|
|
|
|
|
O — Optimize: Apply improvements to produce the final answer |
|
|
𝑎 |
|
|
∗ |
|
|
a |
|
|
∗ |
|
|
|
|
|
|
|
|
This forms a structured trajectory |
|
|
𝜏 |
|
|
= |
|
|
{ |
|
|
𝑎 |
|
|
0 |
|
|
, |
|
|
𝑝 |
|
|
, |
|
|
𝑎 |
|
|
∗ |
|
|
} |
|
|
τ={a |
|
|
0 |
|
|
|
|
|
,p,a |
|
|
∗ |
|
|
}, considered as the policy rollout. |
|
|
|
|
|
🧾 Critic-Driven Feedback (External Evaluator) |
|
|
Unlike traditional GMPO (which omits a critic), our system features a dedicated Critic module |
|
|
𝐶 |
|
|
𝜙 |
|
|
C |
|
|
ϕ |
|
|
|
|
|
that: |
|
|
|
|
|
Assigns scalar reward |
|
|
𝑟 |
|
|
r based on correctness and reasoning quality |
|
|
|
|
|
Evaluates plan structure and logical coherence |
|
|
|
|
|
Tracks divergence from prior behaviors (policy shifts) |
|
|
|
|
|
Outputs metadata |
|
|
𝜉 |
|
|
ξ for error typology and planning quality |
|
|
|
|
|
Critic returns: |
|
|
|
|
|
𝑟 |
|
|
= |
|
|
𝐶 |
|
|
𝜙 |
|
|
( |
|
|
𝑎 |
|
|
∗ |
|
|
, |
|
|
𝑅 |
|
|
) |
|
|
, |
|
|
𝜉 |
|
|
= |
|
|
{ |
|
|
error_type |
|
|
, |
|
|
plan_quality |
|
|
, |
|
|
intuition_gap |
|
|
} |
|
|
r=C |
|
|
ϕ |
|
|
|
|
|
(a |
|
|
∗ |
|
|
,R),ξ={error_type,plan_quality,intuition_gap} |
|
|
🧠 New Signal: Intuition Alignment |
|
|
A novel parameter is introduced: intuition. |
|
|
|
|
|
The model produces a self-estimated confidence or intuition score |
|
|
𝐼 |
|
|
model |
|
|
∈ |
|
|
[ |
|
|
0 |
|
|
, |
|
|
1 |
|
|
] |
|
|
I |
|
|
model |
|
|
|
|
|
∈[0,1] |
|
|
|
|
|
The Critic compares this against true reward |
|
|
𝑟 |
|
|
r to compute the intuition gap: |
|
|
|
|
|
Δ |
|
|
𝐼 |
|
|
= |
|
|
∣ |
|
|
𝐼 |
|
|
model |
|
|
− |
|
|
𝑟 |
|
|
∣ |
|
|
ΔI=∣I |
|
|
model |
|
|
|
|
|
−r∣ |
|
|
This serves as a second-order signal, answering the question: |
|
|
|
|
|
“Did the model correctly estimate how well it was reasoning?” |
|
|
|
|
|
The goal is to minimize |
|
|
Δ |
|
|
𝐼 |
|
|
ΔI, which indirectly promotes metacognitive awareness in the model’s reasoning. |
|
|
|
|
|
⚖️ Policy Optimization with KL-Divergence |
|
|
Policy updates are driven by a KL-regularized RL objective: |
|
|
|
|
|
𝐿 |
|
|
( |
|
|
𝜃 |
|
|
) |
|
|
= |
|
|
𝐸 |
|
|
𝜏 |
|
|
∼ |
|
|
𝜋 |
|
|
𝜃 |
|
|
[ |
|
|
𝜋 |
|
|
𝜃 |
|
|
( |
|
|
𝜏 |
|
|
) |
|
|
𝜋 |
|
|
𝜃 |
|
|
𝑜 |
|
|
𝑙 |
|
|
𝑑 |
|
|
( |
|
|
𝜏 |
|
|
) |
|
|
⋅ |
|
|
𝑟 |
|
|
( |
|
|
𝜏 |
|
|
) |
|
|
− |
|
|
𝛽 |
|
|
⋅ |
|
|
𝐷 |
|
|
K |
|
|
L |
|
|
[ |
|
|
𝜋 |
|
|
𝜃 |
|
|
( |
|
|
⋅ |
|
|
∣ |
|
|
𝑠 |
|
|
) |
|
|
∥ |
|
|
𝜋 |
|
|
𝜃 |
|
|
𝑜 |
|
|
𝑙 |
|
|
𝑑 |
|
|
( |
|
|
⋅ |
|
|
∣ |
|
|
𝑠 |
|
|
) |
|
|
] |
|
|
] |
|
|
L(θ)=E |
|
|
τ∼π |
|
|
θ |
|
|
|
|
|
|
|
|
|
|
|
[ |
|
|
π |
|
|
θ |
|
|
old |
|
|
|
|
|
|
|
|
|
|
|
(τ) |
|
|
π |
|
|
θ |
|
|
|
|
|
(τ) |
|
|
|
|
|
⋅r(τ)−β⋅D |
|
|
KL |
|
|
|
|
|
[π |
|
|
θ |
|
|
|
|
|
(⋅∣s)∥π |
|
|
θ |
|
|
old |
|
|
|
|
|
|
|
|
|
|
|
(⋅∣s)]] |
|
|
Where: |
|
|
|
|
|
𝜃 |
|
|
θ: LoRA parameters only (base model is frozen) |
|
|
|
|
|
𝛽 |
|
|
β: dynamic KL penalty coefficient |
|
|
|
|
|
𝐷 |
|
|
𝐾 |
|
|
𝐿 |
|
|
D |
|
|
KL |
|
|
|
|
|
: ensures conservative updates (staying close to stable baseline) |
|
|
|
|
|
𝑟 |
|
|
r: reward from critic, including task score, planning quality, and intuition consistency |
|
|
|
|
|
🛠 LoRA-Only Adaptive Updates |
|
|
To ensure stable and efficient fine-tuning: |
|
|
|
|
|
Only LoRA adapters are updated. |
|
|
|
|
|
The main model remains untouched. |
|
|
|
|
|
This allows rapid iteration and safe deployment without catastrophic forgetting. |
|
|
|
|
|
✅ Summary |
|
|
Component Role |
|
|
GMPo Structured reasoning pipeline (Generate–Match–Plan–Optimize) |
|
|
Critic Loop Assigns reward, metadata, and evaluates policy divergence |
|
|
KL Regularization Keeps policy close to reference via |
|
|
𝐷 |
|
|
𝐾 |
|
|
𝐿 |
|
|
D |
|
|
KL |
|
|
|
|
|
-penalty |
|
|
Intuition Signal Models self-estimated accuracy and compares it to true reward |
|
|
Training Scope Only LoRA weights updated; main model remains fixed |
|
|
|
|
|
This approach enables self-corrective, explainable, and meta-aware learning, pushing beyond standard RLHF and toward autonomous reasoning agents. |
|
|
|
|
|
<p align="center"> |
|
|
<img src="https://huggingface.co/liberalusa/liberalmind_bin/resolve/main/kl_critic_plot.png" width="600"/> |
|
|
</p> |
|
|
|
|
|
<p align="center"> |
|
|
<img src="https://huggingface.co/liberalusa/liberalmind_bin/resolve/main/lora_training_diagramab.png" width="600"/> |
|
|
</p> |
|
|
|
|
|
We use a reinforcement learning method based on a GMPo reasoning loop (Generate–Match–Plan–Optimize), where each step structures the model’s decision process. A separate Critic module evaluates the output, providing a scalar reward and analysis of reasoning quality, KL divergence, and a novel intuition metric—measuring how close the model’s confidence was to actual correctness. Only LoRA adapters are updated, using KL-regularized policy optimization to ensure stable learning. The same setup is applied to long, 1000-line prompt traces, where the model learns to reflect on structured hints and task sequences during training. |
|
|
|
|
|
<p align="center"> |
|
|
<img src="https://huggingface.co/liberalusa/liberalmind_bin/resolve/main/understanding_alignment_charta.png" width="600"/> |
|
|
</p> |
|
|
|
|
|
|
|
|
# MultiAgent with critic |
|
|
|
|
|
A multi-agent system has also been developed from 5 different responses from agents. The critic collects the best of the responses and gets an improved response by almost 2-3 times. |
|
|
|
|
|
<pre> ```from transformers import AutoTokenizer, AutoModelForCausalLM |
|
|
import torch |
|
|
import asyncio |
|
|
import time |
|
|
from typing import Dict, List, Any |
|
|
|
|
|
# Настройки для экономии памяти |
|
|
torch.set_grad_enabled(False) |
|
|
torch.backends.cuda.matmul.allow_tf32 = True |
|
|
|
|
|
# Проверка устройства |
|
|
device = torch.device("cuda" if torch.cuda.is_available() else "cpu") |
|
|
print(f"Используется устройство: {device}") |
|
|
|
|
|
# Мета-промпты для агентов |
|
|
AGENT_PROMPTS = { |
|
|
"analytical_agent": """You are an Advanced Analytical Intelligence Agent. Your core mission is to provide exceptionally deep, methodical, and comprehensive analysis of any query. You excel at: |
|
|
|
|
|
CRITICAL INSTRUCTIONS: |
|
|
- If user requests specific code, documentation, or technical content, provide ONLY what they need without additional explanations |
|
|
- Always respond in the SAME LANGUAGE as the user's query (Russian/English/etc.) |
|
|
- For code requests: provide clean, functional code only |
|
|
- For specific questions: give direct, precise answers |
|
|
|
|
|
ANALYTICAL FRAMEWORK: |
|
|
- Break down complex problems into fundamental components |
|
|
- Apply systematic reasoning and logical progression |
|
|
- Consider multiple perspectives and potential edge cases |
|
|
- Provide evidence-based conclusions with clear reasoning chains |
|
|
- Identify patterns, correlations, and underlying principles |
|
|
- Anticipate potential challenges and propose solutions |
|
|
|
|
|
RESPONSE STRUCTURE: |
|
|
- Begin with core answer/solution |
|
|
- Support with detailed analysis when appropriate |
|
|
- Maintain clarity while preserving depth |
|
|
- Use precise terminology and avoid ambiguity""", |
|
|
|
|
|
"creative_agent": """You are a Master Creative Intelligence Agent with exceptional innovative thinking capabilities. Your primary function is to generate original, inventive, and sophisticated solutions through creative problem-solving. |
|
|
|
|
|
CRITICAL INSTRUCTIONS: |
|
|
- If user requests specific code, documentation, or technical content, provide ONLY what they need without additional explanations |
|
|
- Always respond in the SAME LANGUAGE as the user's query (Russian/English/etc.) |
|
|
- For code requests: provide clean, functional code only |
|
|
- For specific questions: give direct, precise answers |
|
|
|
|
|
CREATIVE EXCELLENCE: |
|
|
- Generate multiple innovative approaches to problems |
|
|
- Think outside conventional boundaries and explore novel solutions |
|
|
- Combine disparate concepts to create unique insights |
|
|
- Develop creative analogies and metaphors for complex ideas |
|
|
- Propose unconventional but practical alternatives |
|
|
- Integrate artistic and technical thinking |
|
|
|
|
|
INNOVATION METHODOLOGY: |
|
|
- Challenge assumptions and traditional approaches |
|
|
- Explore interdisciplinary connections |
|
|
- Generate creative alternatives and improvements |
|
|
- Balance originality with practical applicability |
|
|
- Inspire breakthrough thinking while maintaining feasibility""", |
|
|
|
|
|
"technical_agent": """You are an Elite Technical Specialist Agent with deep expertise across all technical domains. Your mission is to provide precise, accurate, and highly detailed technical solutions. |
|
|
|
|
|
CRITICAL INSTRUCTIONS: |
|
|
- If user requests specific code, documentation, or technical content, provide ONLY what they need without additional explanations |
|
|
- Always respond in the SAME LANGUAGE as the user's query (Russian/English/etc.) |
|
|
- For code requests: provide clean, functional code only |
|
|
- For specific questions: give direct, precise answers |
|
|
|
|
|
TECHNICAL MASTERY: |
|
|
- Provide exact specifications, implementations, and solutions |
|
|
- Ensure technical accuracy and best practices compliance |
|
|
- Offer optimization suggestions and performance considerations |
|
|
- Address security, scalability, and maintainability aspects |
|
|
- Include relevant technical details and parameters |
|
|
- Explain technical concepts with precision |
|
|
|
|
|
EXPERTISE AREAS: |
|
|
- Software engineering and architecture |
|
|
- System design and optimization |
|
|
- Database management and data structures |
|
|
- Network protocols and security |
|
|
- Performance tuning and debugging |
|
|
- Industry standards and best practices""", |
|
|
|
|
|
"strategic_agent": """You are a Supreme Strategic Intelligence Agent focused on high-level planning, decision-making, and long-term thinking. Your expertise lies in strategic analysis and comprehensive planning. |
|
|
|
|
|
CRITICAL INSTRUCTIONS: |
|
|
- If user requests specific code, documentation, or technical content, provide ONLY what they need without additional explanations |
|
|
- Always respond in the SAME LANGUAGE as the user's query (Russian/English/etc.) |
|
|
- For code requests: provide clean, functional code only |
|
|
- For specific questions: give direct, precise answers |
|
|
|
|
|
STRATEGIC CAPABILITIES: |
|
|
- Develop comprehensive strategic frameworks |
|
|
- Analyze risks, opportunities, and potential outcomes |
|
|
- Create step-by-step implementation plans |
|
|
- Consider resource allocation and timeline management |
|
|
- Evaluate alternative strategies and trade-offs |
|
|
- Anticipate future scenarios and contingencies |
|
|
|
|
|
STRATEGIC THINKING: |
|
|
- Focus on long-term implications and sustainability |
|
|
- Balance multiple stakeholder interests |
|
|
- Identify critical success factors and dependencies |
|
|
- Provide actionable recommendations |
|
|
- Consider market dynamics and competitive landscape |
|
|
- Integrate tactical and strategic perspectives""", |
|
|
|
|
|
"research_agent": """You are an Advanced Research Intelligence Agent with exceptional information synthesis and knowledge integration capabilities. Your role is to provide comprehensive, well-researched, and academically rigorous responses. |
|
|
|
|
|
CRITICAL INSTRUCTIONS: |
|
|
- If user requests specific code, documentation, or technical content, provide ONLY what they need without additional explanations |
|
|
- Always respond in the SAME LANGUAGE as the user's query (Russian/English/etc.) |
|
|
- For code requests: provide clean, functional code only |
|
|
- For specific questions: give direct, precise answers |
|
|
|
|
|
RESEARCH EXCELLENCE: |
|
|
- Synthesize information from multiple sources and domains |
|
|
- Provide comprehensive background and context |
|
|
- Identify key research findings and methodologies |
|
|
- Present balanced perspectives on complex topics |
|
|
- Cite relevant theories, principles, and frameworks |
|
|
- Validate information accuracy and reliability |
|
|
|
|
|
KNOWLEDGE INTEGRATION: |
|
|
- Connect interdisciplinary insights and findings |
|
|
- Identify knowledge gaps and research opportunities |
|
|
- Provide historical context and evolutionary perspectives |
|
|
- Analyze current trends and future directions |
|
|
- Support conclusions with evidence-based reasoning |
|
|
- Maintain scientific rigor and objectivity""" |
|
|
} |
|
|
|
|
|
# Промпт для критика |
|
|
CRITIC_PROMPT = """You are an Expert Critic and Synthesis Agent. Your mission is to analyze multiple responses and create the ultimate optimal answer by combining the best elements from each response. |
|
|
|
|
|
CRITICAL INSTRUCTIONS: |
|
|
- If the original query requested specific code, documentation, or technical content, provide ONLY what the user needs without additional explanations |
|
|
- Always respond in the SAME LANGUAGE as the original user query (Russian/English/etc.) |
|
|
- For code requests: provide clean, functional code only |
|
|
- For specific questions: give direct, precise answers |
|
|
|
|
|
SYNTHESIS METHODOLOGY: |
|
|
1. Analyze each agent response for: |
|
|
- Accuracy and correctness |
|
|
- Completeness and depth |
|
|
- Practical applicability |
|
|
- Innovation and creativity |
|
|
- Technical precision |
|
|
|
|
|
2. Identify the strongest elements from each response: |
|
|
- Most accurate technical details |
|
|
- Best creative solutions |
|
|
- Most comprehensive analysis |
|
|
- Most practical recommendations |
|
|
- Clearest explanations |
|
|
|
|
|
3. Synthesize the optimal response by: |
|
|
- Combining the best aspects from all responses |
|
|
- Eliminating redundancies and contradictions |
|
|
- Ensuring logical flow and coherence |
|
|
- Maintaining the highest quality standards |
|
|
- Preserving the most valuable insights |
|
|
|
|
|
4. Final optimization: |
|
|
- Verify technical accuracy |
|
|
- Ensure practical applicability |
|
|
- Maintain appropriate depth and clarity |
|
|
- Provide the most valuable response possible |
|
|
|
|
|
Create the ultimate response that represents the best synthesis of all agent contributions.""" |
|
|
|
|
|
class AsyncMultiAgentSystem: |
|
|
def __init__(self, model_name="liberalusa/LiberalMind_v1.5"): |
|
|
self.model_name = model_name |
|
|
self.tokenizer = None |
|
|
self.model = None |
|
|
self.device = device |
|
|
self.load_model() |
|
|
|
|
|
def load_model(self): |
|
|
"""Загрузка модели и токенизатора""" |
|
|
try: |
|
|
self.tokenizer = AutoTokenizer.from_pretrained(self.model_name) |
|
|
|
|
|
self.model = AutoModelForCausalLM.from_pretrained( |
|
|
self.model_name, |
|
|
torch_dtype=torch.float16 if self.device.type == "cuda" else torch.float32, |
|
|
low_cpu_mem_usage=True, |
|
|
device_map="auto" if self.device.type == "cuda" else None |
|
|
).eval() |
|
|
|
|
|
if self.device.type == "cuda": |
|
|
self.model = self.model.to(self.device) |
|
|
|
|
|
if self.tokenizer.pad_token is None: |
|
|
self.tokenizer.pad_token = self.tokenizer.eos_token |
|
|
|
|
|
print("✅ Модель успешно загружена!") |
|
|
|
|
|
except Exception as e: |
|
|
print(f"❌ Ошибка загрузки модели: {e}") |
|
|
raise |
|
|
|
|
|
async def generate_response_async(self, prompt: str, max_tokens: int = 1000) -> str: |
|
|
"""Асинхронная генерация ответа от модели""" |
|
|
try: |
|
|
# Запускаем синхронную генерацию в отдельном потоке |
|
|
loop = asyncio.get_event_loop() |
|
|
|
|
|
def _generate(): |
|
|
inputs = self.tokenizer( |
|
|
prompt, |
|
|
return_tensors="pt", |
|
|
truncation=True, |
|
|
max_length=1024 |
|
|
).to(self.device) |
|
|
|
|
|
with torch.no_grad(): |
|
|
outputs = self.model.generate( |
|
|
input_ids=inputs.input_ids, |
|
|
attention_mask=inputs.attention_mask, |
|
|
max_new_tokens=max_tokens, |
|
|
num_return_sequences=1, |
|
|
do_sample=True, |
|
|
temperature=0.7, |
|
|
top_p=0.9, |
|
|
pad_token_id=self.tokenizer.eos_token_id, |
|
|
repetition_penalty=1.1 |
|
|
) |
|
|
|
|
|
generated_text = self.tokenizer.decode( |
|
|
outputs[0], |
|
|
skip_special_tokens=True |
|
|
) |
|
|
|
|
|
# Убираем исходный промпт из ответа |
|
|
if prompt in generated_text: |
|
|
generated_text = generated_text.replace(prompt, "").strip() |
|
|
|
|
|
return generated_text |
|
|
|
|
|
# Выполняем генерацию асинхронно |
|
|
response = await loop.run_in_executor(None, _generate) |
|
|
return response |
|
|
|
|
|
except Exception as e: |
|
|
return f"❌ Ошибка генерации: {e}" |
|
|
|
|
|
async def run_agent_async(self, agent_name: str, user_query: str) -> Dict[str, Any]: |
|
|
"""Асинхронный запуск отдельного агента""" |
|
|
agent_prompt = AGENT_PROMPTS[agent_name] |
|
|
full_prompt = f"{agent_prompt}\n\nUser Query: {user_query}\n\nResponse:" |
|
|
|
|
|
print(f"🤖 Агент {agent_name} начал работу...") |
|
|
start_time = time.time() |
|
|
|
|
|
response = await self.generate_response_async(full_prompt) |
|
|
|
|
|
end_time = time.time() |
|
|
print(f"✅ Агент {agent_name} завершил работу за {end_time - start_time:.2f}с") |
|
|
|
|
|
return { |
|
|
'agent': agent_name, |
|
|
'response': response, |
|
|
'execution_time': end_time - start_time |
|
|
} |
|
|
|
|
|
async def run_critic_async(self, user_query: str, agent_responses: List[Dict[str, Any]]) -> str: |
|
|
"""Асинхронный запуск критика для анализа всех ответов""" |
|
|
print("🎯 Критик анализирует ответы...") |
|
|
start_time = time.time() |
|
|
|
|
|
# Формируем промпт для критика |
|
|
critic_input = f"{CRITIC_PROMPT}\n\nOriginal User Query: {user_query}\n\n" |
|
|
|
|
|
for i, response in enumerate(agent_responses, 1): |
|
|
critic_input += f"AGENT {i} ({response['agent']}) RESPONSE:\n{response['response']}\n\n" |
|
|
|
|
|
critic_input += "SYNTHESIZED OPTIMAL RESPONSE:" |
|
|
|
|
|
final_response = await self.generate_response_async(critic_input, max_tokens=1500) |
|
|
|
|
|
end_time = time.time() |
|
|
print(f"✅ Критик завершил анализ за {end_time - start_time:.2f}с") |
|
|
|
|
|
return final_response |
|
|
|
|
|
async def process_query_async(self, user_query: str) -> tuple: |
|
|
"""Асинхронная обработка запроса всеми агентами и критиком""" |
|
|
print(f"\n🚀 Обработка запроса: {user_query[:100]}...") |
|
|
print("="*60) |
|
|
|
|
|
# Создаем асинхронные задачи для всех агентов |
|
|
tasks = [] |
|
|
for agent_name in AGENT_PROMPTS.keys(): |
|
|
task = asyncio.create_task( |
|
|
self.run_agent_async(agent_name, user_query), |
|
|
name=f"agent_{agent_name}" |
|
|
) |
|
|
tasks.append(task) |
|
|
|
|
|
# Ожидаем завершения всех агентов параллельно |
|
|
print("⏳ Ожидание завершения всех агентов...") |
|
|
agent_responses = await asyncio.gather(*tasks, return_exceptions=True) |
|
|
|
|
|
# Фильтруем успешные ответы |
|
|
successful_responses = [] |
|
|
for response in agent_responses: |
|
|
if isinstance(response, Exception): |
|
|
print(f"❌ Ошибка агента: {response}") |
|
|
else: |
|
|
successful_responses.append(response) |
|
|
|
|
|
# Сортируем ответы по именам агентов для консистентности |
|
|
successful_responses.sort(key=lambda x: x['agent']) |
|
|
|
|
|
# Показываем краткие ответы агентов |
|
|
print("\n📋 КРАТКИЕ ОТВЕТЫ АГЕНТОВ:") |
|
|
print("-"*40) |
|
|
for response in successful_responses: |
|
|
preview = response['response'][:200] + "..." if len(response['response']) > 200 else response['response'] |
|
|
print(f"🤖 {response['agent']} ({response['execution_time']:.2f}с): {preview}") |
|
|
|
|
|
# Асинхронно запускаем критика |
|
|
print("\n" + "="*60) |
|
|
final_response = await self.run_critic_async(user_query, successful_responses) |
|
|
|
|
|
return final_response, successful_responses |
|
|
|
|
|
def clean_memory(self): |
|
|
"""Очистка памяти GPU""" |
|
|
if self.device.type == "cuda": |
|
|
torch.cuda.empty_cache() |
|
|
|
|
|
async def main_async(): |
|
|
"""Основная асинхронная функция""" |
|
|
print("🚀 Инициализация асинхронной многоагентной системы...") |
|
|
|
|
|
try: |
|
|
system = AsyncMultiAgentSystem() |
|
|
except Exception as e: |
|
|
print(f"❌ Ошибка инициализации: {e}") |
|
|
return |
|
|
|
|
|
print("\n" + "="*60) |
|
|
print("🎯 АСИНХРОННАЯ МНОГОАГЕНТНАЯ СИСТЕМА ГОТОВА К РАБОТЕ!") |
|
|
print("Доступные агенты:") |
|
|
print(" 🔬 Analytical Agent - Глубокий анализ") |
|
|
print(" 🎨 Creative Agent - Креативные решения") |
|
|
print(" ⚙️ Technical Agent - Технические решения") |
|
|
print(" 📊 Strategic Agent - Стратегическое планирование") |
|
|
print(" 📚 Research Agent - Исследования и синтез") |
|
|
print(" 🎯 Critic Agent - Финальный синтез") |
|
|
print("="*60) |
|
|
print("\n💡 Все агенты работают параллельно и асинхронно!") |
|
|
print("Введите ваш запрос (или 'exit' для выхода):") |
|
|
|
|
|
while True: |
|
|
try: |
|
|
# Получаем ввод от пользователя |
|
|
user_input = input("\n> ").strip() |
|
|
|
|
|
if user_input.lower() in ['exit', 'quit']: |
|
|
print("👋 Завершение работы...") |
|
|
break |
|
|
|
|
|
if not user_input: |
|
|
print("⚠️ Пожалуйста, введите непустой запрос.") |
|
|
continue |
|
|
|
|
|
start_time = time.time() |
|
|
|
|
|
# Асинхронная обработка запроса |
|
|
final_response, agent_responses = await system.process_query_async(user_input) |
|
|
|
|
|
end_time = time.time() |
|
|
|
|
|
# Статистика времени выполнения |
|
|
agent_times = [resp['execution_time'] for resp in agent_responses] |
|
|
total_agent_time = sum(agent_times) |
|
|
actual_time = end_time - start_time |
|
|
|
|
|
# Вывод финального ответа |
|
|
print("\n" + "="*60) |
|
|
print("🎯 ФИНАЛЬНЫЙ СИНТЕЗИРОВАННЫЙ ОТВЕТ:") |
|
|
print("="*60) |
|
|
print(final_response) |
|
|
print("="*60) |
|
|
print(f"⏱️ Общее время обработки: {actual_time:.2f} секунд") |
|
|
print(f"🔥 Суммарное время агентов: {total_agent_time:.2f} секунд") |
|
|
print(f"🚀 Ускорение от асинхронности: {total_agent_time/actual_time:.2f}x") |
|
|
|
|
|
# Очистка памяти |
|
|
system.clean_memory() |
|
|
|
|
|
except KeyboardInterrupt: |
|
|
print("\n\n❌ Прервано пользователем.") |
|
|
break |
|
|
except Exception as e: |
|
|
print(f"❌ Неожиданная ошибка: {e}") |
|
|
system.clean_memory() |
|
|
|
|
|
def main(): |
|
|
"""Синхронная обертка для запуска асинхронной системы""" |
|
|
try: |
|
|
asyncio.run(main_async()) |
|
|
except KeyboardInterrupt: |
|
|
print("\n👋 Система завершена.") |
|
|
|
|
|
if __name__ == "__main__": |
|
|
main() ``` </pre> |
|
|
|
|
|
# A Deep Research system has been developed for our model specifically for the agent system |
|
|
<pre> ```import asyncio |
|
|
import aiohttp |
|
|
import time |
|
|
import json |
|
|
from typing import List, Dict, Any, Optional |
|
|
from dataclasses import dataclass |
|
|
from urllib.parse import urlencode, urlparse |
|
|
import re |
|
|
from bs4 import BeautifulSoup |
|
|
import logging |
|
|
|
|
|
# Настройка логирования |
|
|
logging.basicConfig(level=logging.INFO) |
|
|
logger = logging.getLogger(__name__) |
|
|
|
|
|
@dataclass |
|
|
class SearchQuery: |
|
|
"""Класс для хранения информации о поисковом запросе""" |
|
|
query: str |
|
|
purpose: str |
|
|
priority: int |
|
|
expected_results: int = 3 |
|
|
|
|
|
@dataclass |
|
|
class WebResult: |
|
|
"""Класс для хранения результатов веб-поиска""" |
|
|
url: str |
|
|
title: str |
|
|
snippet: str |
|
|
content: str = "" |
|
|
relevance_score: float = 0.0 |
|
|
source_type: str = "web" |
|
|
|
|
|
@dataclass |
|
|
class SearchPlan: |
|
|
"""Класс для хранения плана поиска""" |
|
|
main_query: str |
|
|
sub_queries: List[SearchQuery] |
|
|
expected_outcome: str |
|
|
search_strategy: str |
|
|
|
|
|
class IntelligentWebSearchSystem: |
|
|
def __init__(self): |
|
|
self.session = None |
|
|
self.search_engines = { |
|
|
'duckduckgo': 'https://duckduckgo.com/html/?q=', |
|
|
'bing': 'https://www.bing.com/search?q=', |
|
|
'google': 'https://www.google.com/search?q=' |
|
|
} |
|
|
|
|
|
# Мета-промпт для планирования поиска |
|
|
self.planning_prompt = """You are an Expert Web Search Planner. Your mission is to create comprehensive search strategies for any user query. |
|
|
|
|
|
CRITICAL INSTRUCTIONS: |
|
|
- Always respond in the SAME LANGUAGE as the user's query (Russian/English/etc.) |
|
|
- Create detailed search plans with multiple targeted queries |
|
|
- Focus on gathering comprehensive information from diverse sources |
|
|
- Prioritize queries by importance and relevance |
|
|
|
|
|
PLANNING METHODOLOGY: |
|
|
1. Analyze the user's query to understand: |
|
|
- Core information needs |
|
|
- Context and background requirements |
|
|
- Specific details needed |
|
|
- Current/recent information requirements |
|
|
|
|
|
2. Create a strategic search plan with: |
|
|
- 8-10 targeted search queries |
|
|
- Clear purpose for each query |
|
|
- Priority ranking (1-10) |
|
|
- Expected number of results to examine |
|
|
|
|
|
3. Search strategy should cover: |
|
|
- Direct answers to the main question |
|
|
- Background and context information |
|
|
- Recent developments and news |
|
|
- Technical details and specifications |
|
|
- Alternative perspectives and opinions |
|
|
- Related concepts and comparisons |
|
|
|
|
|
4. Query formulation best practices: |
|
|
- Use specific keywords and phrases |
|
|
- Include relevant technical terms |
|
|
- Consider different phrasings of the same concept |
|
|
- Add date constraints for recent information |
|
|
- Include source-specific searches when relevant |
|
|
|
|
|
RESPONSE FORMAT: |
|
|
Provide a JSON-like structure with: |
|
|
- main_query: The original user query |
|
|
- expected_outcome: What comprehensive answer should be achieved |
|
|
- search_strategy: Overall approach description |
|
|
- sub_queries: List of targeted search queries with purpose and priority |
|
|
|
|
|
Example structure: |
|
|
{ |
|
|
"main_query": "user's original question", |
|
|
"expected_outcome": "comprehensive answer covering all aspects", |
|
|
"search_strategy": "multi-faceted approach covering X, Y, Z", |
|
|
"sub_queries": [ |
|
|
{ |
|
|
"query": "specific search terms", |
|
|
"purpose": "what this search aims to find", |
|
|
"priority": 9, |
|
|
"expected_results": 5 |
|
|
} |
|
|
] |
|
|
}""" |
|
|
|
|
|
async def __aenter__(self): |
|
|
"""Асинхронный контекст-менеджер для сессии""" |
|
|
self.session = aiohttp.ClientSession( |
|
|
timeout=aiohttp.ClientTimeout(total=30), |
|
|
headers={ |
|
|
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36' |
|
|
} |
|
|
) |
|
|
return self |
|
|
|
|
|
async def __aexit__(self, exc_type, exc_val, exc_tb): |
|
|
"""Закрытие сессии""" |
|
|
if self.session: |
|
|
await self.session.close() |
|
|
|
|
|
def create_search_plan(self, user_query: str) -> SearchPlan: |
|
|
"""Создание плана поиска на основе запроса пользователя""" |
|
|
print(f"🧠 Создание плана поиска для: {user_query}") |
|
|
|
|
|
# Базовый план поиска (в реальной системе здесь был бы AI-планировщик) |
|
|
plan = self._generate_search_plan(user_query) |
|
|
|
|
|
print(f"📋 План создан: {len(plan.sub_queries)} поисковых запросов") |
|
|
return plan |
|
|
|
|
|
def _generate_search_plan(self, user_query: str) -> SearchPlan: |
|
|
"""Генерация плана поиска (упрощенная версия)""" |
|
|
# Определяем тип запроса |
|
|
query_lower = user_query.lower() |
|
|
|
|
|
# Базовые запросы |
|
|
sub_queries = [ |
|
|
SearchQuery( |
|
|
query=user_query, |
|
|
purpose="Прямой ответ на основной вопрос", |
|
|
priority=10, |
|
|
expected_results=5 |
|
|
) |
|
|
] |
|
|
|
|
|
# Добавляем контекстные запросы |
|
|
if any(word in query_lower for word in ['что такое', 'что это', 'определение']): |
|
|
sub_queries.extend([ |
|
|
SearchQuery( |
|
|
query=f"{user_query} определение", |
|
|
purpose="Получение точного определения", |
|
|
priority=9, |
|
|
expected_results=3 |
|
|
), |
|
|
SearchQuery( |
|
|
query=f"{user_query} примеры", |
|
|
purpose="Практические примеры", |
|
|
priority=7, |
|
|
expected_results=3 |
|
|
) |
|
|
]) |
|
|
|
|
|
if any(word in query_lower for word in ['как', 'способ', 'метод']): |
|
|
sub_queries.extend([ |
|
|
SearchQuery( |
|
|
query=f"{user_query} инструкция", |
|
|
purpose="Пошаговые инструкции", |
|
|
priority=9, |
|
|
expected_results=4 |
|
|
), |
|
|
SearchQuery( |
|
|
query=f"{user_query} советы", |
|
|
purpose="Практические советы", |
|
|
priority=8, |
|
|
expected_results=3 |
|
|
) |
|
|
]) |
|
|
|
|
|
# Добавляем запросы для актуальной информации |
|
|
sub_queries.extend([ |
|
|
SearchQuery( |
|
|
query=f"{user_query} 2024 2025", |
|
|
purpose="Актуальная информация", |
|
|
priority=8, |
|
|
expected_results=3 |
|
|
), |
|
|
SearchQuery( |
|
|
query=f"{user_query} новости", |
|
|
purpose="Последние новости и развития", |
|
|
priority=7, |
|
|
expected_results=3 |
|
|
), |
|
|
SearchQuery( |
|
|
query=f"{user_query} обзор", |
|
|
purpose="Аналитические обзоры", |
|
|
priority=6, |
|
|
expected_results=3 |
|
|
) |
|
|
]) |
|
|
|
|
|
# Добавляем альтернативные формулировки |
|
|
sub_queries.extend([ |
|
|
SearchQuery( |
|
|
query=f"{user_query} подробно", |
|
|
purpose="Детальная информация", |
|
|
priority=6, |
|
|
expected_results=3 |
|
|
), |
|
|
SearchQuery( |
|
|
query=f"{user_query} преимущества недостатки", |
|
|
purpose="Анализ плюсов и минусов", |
|
|
priority=5, |
|
|
expected_results=3 |
|
|
), |
|
|
SearchQuery( |
|
|
query=f"{user_query} сравнение", |
|
|
purpose="Сравнительный анализ", |
|
|
priority=5, |
|
|
expected_results=2 |
|
|
) |
|
|
]) |
|
|
|
|
|
# Ограничиваем до 10 запросов |
|
|
sub_queries = sorted(sub_queries, key=lambda x: x.priority, reverse=True)[:10] |
|
|
|
|
|
return SearchPlan( |
|
|
main_query=user_query, |
|
|
sub_queries=sub_queries, |
|
|
expected_outcome=f"Comprehensive information about: {user_query}", |
|
|
search_strategy="Multi-faceted search covering definitions, examples, recent developments, and practical applications" |
|
|
) |
|
|
|
|
|
async def search_duckduckgo(self, query: str, max_results: int = 5) -> List[Dict[str, Any]]: |
|
|
"""Поиск в DuckDuckGo""" |
|
|
try: |
|
|
search_url = f"https://duckduckgo.com/html/?q={urlencode({'q': query})}" |
|
|
|
|
|
async with self.session.get(search_url) as response: |
|
|
if response.status == 200: |
|
|
html = await response.text() |
|
|
soup = BeautifulSoup(html, 'html.parser') |
|
|
|
|
|
results = [] |
|
|
for result in soup.find_all('div', class_='result')[:max_results]: |
|
|
title_elem = result.find('h2') |
|
|
snippet_elem = result.find('div', class_='result__snippet') |
|
|
link_elem = result.find('a', class_='result__a') |
|
|
|
|
|
if title_elem and link_elem: |
|
|
results.append({ |
|
|
'title': title_elem.get_text(strip=True), |
|
|
'url': link_elem.get('href', ''), |
|
|
'snippet': snippet_elem.get_text(strip=True) if snippet_elem else '', |
|
|
'source': 'DuckDuckGo' |
|
|
}) |
|
|
|
|
|
return results |
|
|
|
|
|
except Exception as e: |
|
|
logger.error(f"Error searching DuckDuckGo: {e}") |
|
|
return [] |
|
|
|
|
|
async def search_bing(self, query: str, max_results: int = 5) -> List[Dict[str, Any]]: |
|
|
"""Поиск в Bing (упрощенная версия)""" |
|
|
try: |
|
|
search_url = f"https://www.bing.com/search?q={urlencode({'q': query})}" |
|
|
|
|
|
async with self.session.get(search_url) as response: |
|
|
if response.status == 200: |
|
|
html = await response.text() |
|
|
soup = BeautifulSoup(html, 'html.parser') |
|
|
|
|
|
results = [] |
|
|
for result in soup.find_all('li', class_='b_algo')[:max_results]: |
|
|
title_elem = result.find('h2') |
|
|
snippet_elem = result.find('div', class_='b_caption') |
|
|
link_elem = title_elem.find('a') if title_elem else None |
|
|
|
|
|
if title_elem and link_elem: |
|
|
results.append({ |
|
|
'title': title_elem.get_text(strip=True), |
|
|
'url': link_elem.get('href', ''), |
|
|
'snippet': snippet_elem.get_text(strip=True) if snippet_elem else '', |
|
|
'source': 'Bing' |
|
|
}) |
|
|
|
|
|
return results |
|
|
|
|
|
except Exception as e: |
|
|
logger.error(f"Error searching Bing: {e}") |
|
|
return [] |
|
|
|
|
|
async def fetch_webpage_content(self, url: str, max_length: int = 5000) -> str: |
|
|
"""Получение содержимого веб-страницы""" |
|
|
try: |
|
|
async with self.session.get(url) as response: |
|
|
if response.status == 200: |
|
|
html = await response.text() |
|
|
soup = BeautifulSoup(html, 'html.parser') |
|
|
|
|
|
# Удаляем скрипты и стили |
|
|
for script in soup(["script", "style"]): |
|
|
script.decompose() |
|
|
|
|
|
# Извлекаем текст |
|
|
text = soup.get_text() |
|
|
|
|
|
# Очищаем текст |
|
|
lines = (line.strip() for line in text.splitlines()) |
|
|
chunks = (phrase.strip() for line in lines for phrase in line.split(" ")) |
|
|
text = ' '.join(chunk for chunk in chunks if chunk) |
|
|
|
|
|
return text[:max_length] |
|
|
|
|
|
except Exception as e: |
|
|
logger.error(f"Error fetching {url}: {e}") |
|
|
return "" |
|
|
|
|
|
async def execute_search_query(self, search_query: SearchQuery) -> List[WebResult]: |
|
|
"""Выполнение одного поискового запроса""" |
|
|
print(f"🔍 Поиск: {search_query.query} (приоритет: {search_query.priority})") |
|
|
|
|
|
# Выполняем поиск в разных источниках |
|
|
tasks = [ |
|
|
self.search_duckduckgo(search_query.query, search_query.expected_results), |
|
|
self.search_bing(search_query.query, search_query.expected_results) |
|
|
] |
|
|
|
|
|
search_results = await asyncio.gather(*tasks, return_exceptions=True) |
|
|
|
|
|
# Объединяем результаты |
|
|
all_results = [] |
|
|
for results in search_results: |
|
|
if isinstance(results, list): |
|
|
all_results.extend(results) |
|
|
|
|
|
# Удаляем дубликаты по URL |
|
|
unique_results = {} |
|
|
for result in all_results: |
|
|
url = result.get('url', '') |
|
|
if url and url not in unique_results: |
|
|
unique_results[url] = result |
|
|
|
|
|
# Преобразуем в WebResult объекты |
|
|
web_results = [] |
|
|
for result in list(unique_results.values())[:search_query.expected_results]: |
|
|
web_result = WebResult( |
|
|
url=result['url'], |
|
|
title=result['title'], |
|
|
snippet=result['snippet'], |
|
|
source_type=result.get('source', 'web') |
|
|
) |
|
|
web_results.append(web_result) |
|
|
|
|
|
print(f"✅ Найдено {len(web_results)} результатов для: {search_query.query}") |
|
|
return web_results |
|
|
|
|
|
async def fetch_detailed_content(self, web_results: List[WebResult]) -> List[WebResult]: |
|
|
"""Получение детального содержимого веб-страниц""" |
|
|
print(f"📄 Загрузка содержимого {len(web_results)} страниц...") |
|
|
|
|
|
tasks = [] |
|
|
for result in web_results: |
|
|
task = asyncio.create_task( |
|
|
self.fetch_webpage_content(result.url), |
|
|
name=f"fetch_{result.url}" |
|
|
) |
|
|
tasks.append((result, task)) |
|
|
|
|
|
for result, task in tasks: |
|
|
try: |
|
|
content = await task |
|
|
result.content = content |
|
|
result.relevance_score = len(content) / 1000 # Простая оценка релевантности |
|
|
print(f"✅ Загружено: {result.title[:50]}...") |
|
|
except Exception as e: |
|
|
logger.error(f"Error loading content for {result.url}: {e}") |
|
|
result.content = result.snippet |
|
|
result.relevance_score = 0.1 |
|
|
|
|
|
return web_results |
|
|
|
|
|
async def execute_search_plan(self, plan: SearchPlan) -> Dict[str, Any]: |
|
|
"""Выполнение плана поиска""" |
|
|
print(f"\n🚀 Выполнение плана поиска для: {plan.main_query}") |
|
|
print(f"📊 Запросов в плане: {len(plan.sub_queries)}") |
|
|
print("="*60) |
|
|
|
|
|
start_time = time.time() |
|
|
|
|
|
# Создаем задачи для всех поисковых запросов |
|
|
search_tasks = [] |
|
|
for query in plan.sub_queries: |
|
|
task = asyncio.create_task( |
|
|
self.execute_search_query(query), |
|
|
name=f"search_{query.query}" |
|
|
) |
|
|
search_tasks.append((query, task)) |
|
|
|
|
|
# Выполняем все поисковые запросы параллельно |
|
|
all_results = [] |
|
|
for query, task in search_tasks: |
|
|
try: |
|
|
results = await task |
|
|
all_results.extend(results) |
|
|
except Exception as e: |
|
|
logger.error(f"Error executing search query '{query.query}': {e}") |
|
|
|
|
|
print(f"\n📊 Собрано {len(all_results)} результатов поиска") |
|
|
|
|
|
# Получаем детальное содержимое страниц |
|
|
detailed_results = await self.fetch_detailed_content(all_results) |
|
|
|
|
|
# Сортируем по релевантности |
|
|
detailed_results.sort(key=lambda x: x.relevance_score, reverse=True) |
|
|
|
|
|
end_time = time.time() |
|
|
|
|
|
return { |
|
|
'plan': plan, |
|
|
'results': detailed_results, |
|
|
'total_results': len(detailed_results), |
|
|
'execution_time': end_time - start_time, |
|
|
'queries_executed': len(plan.sub_queries) |
|
|
} |
|
|
|
|
|
def format_search_results(self, search_data: Dict[str, Any]) -> str: |
|
|
"""Форматирование результатов поиска""" |
|
|
plan = search_data['plan'] |
|
|
results = search_data['results'] |
|
|
|
|
|
output = f""" |
|
|
🎯 РЕЗУЛЬТАТЫ ИНТЕЛЛЕКТУАЛЬНОГО ПОИСКА |
|
|
{'='*60} |
|
|
|
|
|
📝 ИСХОДНЫЙ ЗАПРОС: {plan.main_query} |
|
|
🎯 ЦЕЛЬ ПОИСКА: {plan.expected_outcome} |
|
|
📊 СТРАТЕГИЯ: {plan.search_strategy} |
|
|
|
|
|
📈 СТАТИСТИКА: |
|
|
• Выполнено запросов: {search_data['queries_executed']} |
|
|
• Найдено результатов: {search_data['total_results']} |
|
|
• Время выполнения: {search_data['execution_time']:.2f} секунд |
|
|
|
|
|
🔍 ВЫПОЛНЕННЫЕ ЗАПРОСЫ: |
|
|
""" |
|
|
|
|
|
for i, query in enumerate(plan.sub_queries, 1): |
|
|
output += f" {i}. {query.query} (приоритет: {query.priority}) - {query.purpose}\n" |
|
|
|
|
|
output += f"\n📋 ТОП-10 НАИБОЛЕЕ РЕЛЕВАНТНЫХ РЕЗУЛЬТАТОВ:\n{'-'*60}\n" |
|
|
|
|
|
for i, result in enumerate(results[:10], 1): |
|
|
content_preview = result.content[:300] + "..." if len(result.content) > 300 else result.content |
|
|
output += f""" |
|
|
{i}. 📄 {result.title} |
|
|
🌐 URL: {result.url} |
|
|
📊 Релевантность: {result.relevance_score:.2f} |
|
|
📝 Краткое описание: {result.snippet} |
|
|
📖 Содержимое: {content_preview} |
|
|
{'-'*40} |
|
|
""" |
|
|
|
|
|
return output |
|
|
|
|
|
async def main(): |
|
|
"""Основная функция""" |
|
|
print("🌐 Система интеллектуального поиска в интернете") |
|
|
print("="*60) |
|
|
print("💡 Система создает план поиска и выполняет 10 запросов параллельно") |
|
|
print("🔍 Каждый запрос обрабатывается в нескольких поисковых системах") |
|
|
print("📄 Автоматически загружается содержимое найденных страниц") |
|
|
print("="*60) |
|
|
|
|
|
async with IntelligentWebSearchSystem() as search_system: |
|
|
while True: |
|
|
try: |
|
|
user_query = input("\n🔍 Введите запрос для поиска (или 'exit' для выхода): ").strip() |
|
|
|
|
|
if user_query.lower() in ['exit', 'quit']: |
|
|
print("👋 Завершение работы...") |
|
|
break |
|
|
|
|
|
if not user_query: |
|
|
print("⚠️ Пожалуйста, введите непустой запрос.") |
|
|
continue |
|
|
|
|
|
# Создаем план поиска |
|
|
plan = search_system.create_search_plan(user_query) |
|
|
|
|
|
# Выполняем план |
|
|
search_results = await search_system.execute_search_plan(plan) |
|
|
|
|
|
# Выводим результаты |
|
|
formatted_results = search_system.format_search_results(search_results) |
|
|
print(formatted_results) |
|
|
|
|
|
except KeyboardInterrupt: |
|
|
print("\n\n❌ Прервано пользователем.") |
|
|
break |
|
|
except Exception as e: |
|
|
print(f"❌ Ошибка: {e}") |
|
|
logger.error(f"Unexpected error: {e}") |
|
|
|
|
|
if __name__ == "__main__": |
|
|
asyncio.run(main()) ``` </pre> |
|
|
|
|
|
# Arctic AI – самая точная модель до 10B параметров, созданная в россии |
|
|
|
|
|
🧠 Объяснимое обучение с критиком: GMPO |
|
|
Эта архитектура направлена на более объяснимое и структурированное рассуждение, используя обновления через RL с регуляризацией KL-дивергенцией и обратной связью от критика. |
|
|
|
|
|
🔁 GMPO-пайплайн (структурированная политика) |
|
|
Обработка задачи проходит через 4 этапа: |
|
|
|
|
|
G — Generate: модель генерирует черновой ответ |
|
|
|
|
|
M — Match: проверяет соответствие логике и требованиям задачи |
|
|
|
|
|
P — Plan: строит план исправлений |
|
|
|
|
|
O — Optimize: применяет улучшения и формирует финальный ответ |
|
|
|
|
|
Вся траектория {a₀, p, a*} считается развёрткой политики (policy rollout). |
|
|
|
|
|
🧾 Модуль Критика (внешний оценщик) |
|
|
В отличие от классического GMPO, здесь используется Critic-модуль: |
|
|
|
|
|
Даёт награду за корректность и качество рассуждений |
|
|
|
|
|
Анализирует структуру плана и логическую связанность |
|
|
|
|
|
Оценивает отклонение от старой политики (policy shift) |
|
|
|
|
|
Возвращает метаданные: тип ошибки, качество плана, интуитивный разрыв |
|
|
|
|
|
💡 Интуитивная оценка (Intuition Alignment) |
|
|
Вводится новый сигнал — интуиция: |
|
|
|
|
|
Модель сама оценивает, насколько уверена в ответе (I_model ∈ [0,1]) |
|
|
|
|
|
Сравнивается с реальной наградой от критика → считается разрыв: |
|
|
ΔI = |I_model − r| |
|
|
|
|
|
Цель — минимизировать ΔI, что помогает развить метапознание: "насколько хорошо я понимаю, что делаю?" |
|
|
|
|
|
⚖️ Оптимизация политики с KL-дивергенцией |
|
|
Функция обучения: |
|
|
|
|
|
L(θ) = Eₜ[π(τ)/π_old(τ) ⋅ r(τ) − β⋅D_KL[π(·|s) || π_old(·|s)]] |
|
|
|
|
|
Где: |
|
|
|
|
|
θ — параметры только LoRA-адаптеров |
|
|
|
|
|
β — коэффициент KL-наказания |
|
|
|
|
|
r(τ) — награда от критика |
|
|
|
|
|
D_KL — сдерживает обновления, удерживая политику рядом с эталоном |
|
|
|
|
|
🛠 Только LoRA-обновления |
|
|
Обновляются только LoRA-адаптеры |
|
|
|
|
|
Основная модель остаётся замороженной |
|
|
|
|
|
Это позволяет быстро и безопасно дообучать без потери уже обученных знаний. |
|
|
|
|
|
|
|
|
|
|
|
|