---
license: mit
base_model:
  - FacebookAI/xlm-roberta-large
language:
  - ru
tags:
  - Reasoning
  - Logical-Analysis
  - Text-Classification
  - AI-Safety
  - Evaluation
  - Judge-model
  - Argumentation
---

[![Hugging Face](https://img.shields.io/badge/Hugging%20Face-model-blue)](https://huggingface.co/skatzR/RQA-X1.1)

# 🧠 RQA — Reasoning Quality Analyzer (R1)

**RQA** is a **judge model** designed to evaluate the *quality of reasoning in text*.  
It does **not** generate, rewrite, or explain content — instead, it **assesses whether a text contains logical problems**, and if so, **what kind**.

> **RQA is a judge, not a teacher and not a generator.**

---

## 🔍 What Problem Does RQA Solve?

Texts written by humans or LLMs can:

- sound coherent,
- use correct vocabulary,
- appear persuasive,

…but still contain **logical problems** that are:

- implicit,
- structural,
- hidden in argumentation.

**RQA focuses strictly on reasoning quality**, not on style, sentiment, or factual correctness.

---

## 🧩 Model Overview

| Property | Value |
|--------|------|
| **Model Type** | Judge / Evaluator |
| **Base Encoder** | [XLM-RoBERTa Large](https://huggingface.co/FacebookAI/xlm-roberta-large) |
| **Pooling** | Mean pooling |
| **Heads** | 2 (binary + multi-label) |
| **Language** | Russian 🇷🇺 |
| **License** | MIT |

---

## 🧠 What the Model Predicts

RQA produces **two independent signals** that are combined at inference time:

### 1️⃣ Logical Issue Detection (Binary)

- `has_issue ∈ {false, true}`
- Calibrated probability available
- Designed to answer:  
  **“Does this text contain a reasoning problem?”**

### 2️⃣ Error Type Signals (Multi-label)

The model estimates probabilities for specific error types:

- `false_causality`
- `unsupported_claim`
- `overgeneralization`
- `missing_premise`
- `contradiction`
- `circular_reasoning`

⚠️ **Important**  
Error type probabilities are **diagnostic signals**, not mandatory labels.  
They are surfaced **only if `has_issue == true`** during inference.

---

## 🟡 Hidden Logical Problems (Key Concept)

RQA explicitly distinguishes between:

### 🔴 Explicit Logical Errors
Clearly identifiable fallacies:
- invalid causal inference
- circular reasoning
- contradictions
- unsupported claims

### 🟡 Hidden Logical Problems
Texts that are:
- argumentative or persuasive,
- structurally incomplete,
- reliant on implicit assumptions,

but **do not contain a cleanly classifiable fallacy**.

Examples:
- missing or unstated premises
- rhetorical generalizations
- context-dependent claims

Hidden problems are **not misclassifications** —  
they are an **intended diagnostic category**.

---

## ⚖️ Inference Logic (Important)

The model uses **decision logic on top of raw logits**:

- Binary head decides **whether a problem exists**
- Error heads provide **type-level evidence**
- If:
  - `has_issue == false`
  - but error probabilities are non-zero  
  → the text may be flagged as **borderline** or **hidden problem**

This prevents:
- false positive error labels,
- incoherent outputs,
- over-triggering on clean factual texts.

---

## 🏗️ Architecture Details

- **Encoder**: XLM-RoBERTa Large (pretrained weights preserved)
- **Pooling**: Mean pooling (robust for long texts)
- **Two independent projections**:
  - binary reasoning head
  - multi-label error head
- Separate dropout and projections to reduce negative transfer

---

## 🎓 Training Philosophy

### 🔒 Strict Data Contract

- Logical texts **contain no errors**
- Hidden-problem texts **contain no explicit fallacies**
- Invalid samples are **removed**, not auto-corrected

### ⚖️ Balanced Difficulty

- Hidden problems ≤ **30%** of problematic texts
- Prevents collapse into vague uncertainty detection

### 🎯 Loss Design

- Binary BCE for issue detection
- Masked multi-label loss for error types
- Stability-oriented multi-task optimization

---

## 🌡️ Confidence Calibration

RQA applies **post-hoc temperature scaling**:

- Separate calibration for:
  - `has_issue`
  - each error type
- Enables:
  - meaningful probabilities
  - safe threshold tuning
  - production use without retraining

---

## 🚀 Intended Use

### ✅ Recommended for:

- Reasoning quality evaluation
- LLM output auditing
- AI safety pipelines
- Argumentation analysis
- Pre-filtering / routing systems

### ❌ Not intended for:

- Text generation
- Error correction
- Explanation or tutoring
- Grammar or style analysis
- Fact checking

---

## 🧪 Model Behavior

- Conservative by design
- Optimized for **low false positives**
- Explicitly robust to:
  - topic changes
  - writing style
  - emotional tone

RQA judges **logical structure**, not persuasion quality.

---

## 📚 Training Data (High-level)

- **Custom-built dataset**
- **Thousands of long-form argumentative texts**
- **Multiple domains and reasoning styles**
- Carefully controlled balance of:
  - logical texts
  - explicit errors
  - hidden problems

> The dataset was designed specifically for **judge behavior**, not for text generation.

---

## ⚠️ Limitations

- Logical validity ≠ factual correctness
- Purely descriptive texts may still trigger *diagnostic signals*
- Highly rhetorical or persuasive texts can be flagged as **hidden problems**
- Philosophical disagreement is **not always** a logical error

---

## 🧩 Philosophy

> **Good reasoning is not about sounding convincing —  
> it is about what actually follows from what.**

RQA is built around this principle.

---

## 🔧 Implementation Details

- Custom Hugging Face architecture (`modeling_rqa.py`)
- Requires:
  - `trust_remote_code=True`
- Uses `safetensors`
- No `.bin` weights (this is expected behavior)

---

## 🚀 Quick Start

```python
import torch
from transformers import AutoTokenizer, AutoModel

device = "cuda" if torch.cuda.is_available() else "cpu"

tokenizer = AutoTokenizer.from_pretrained(
    "skatzR/RQA-R1",
    trust_remote_code=True
)

model = AutoModel.from_pretrained(
    "skatzR/RQA-R1",
    trust_remote_code=True
).to(device)

model.eval()

```
---

## 🧠 Reference Inference Logic

RQA is designed to be used with **explicit post-processing logic**, including:

- temperature scaling
- thresholding
- disagreement diagnostics
- hidden-problem detection

A **fully working reference implementation** is provided here:


👉 **[📄 inference.py](https://huggingface.co/skatzR/RQA-X1.1/blob/main/inference.py) — Reference Inference Implementation**

---
## ✅ Example
```
📄 Текст:
После того как в городе открыли новый торговый центр, увеличилось количество разводов. 
Следовательно, открытие торгового центра разрушает семьи.

🔎 Обнаружена проблема: ДА (100.00%)

❌ Явные логические ошибки:
  • Ложная причинно-следственная связь — 95.95%

📊 Disagreement: 0.034
```
---

## 📜 License

MIT

---