arxiv:2601.20055

VERGE: Formal Refinement and Guidance Engine for Verifiable LLM Reasoning

Published on Jan 27

· Submitted by

Vikash Singh on Jan 29

Amazon Web Services (AWS)

Upvote

Authors:

Vikash Singh ,

Abstract

A neurosymbolic framework integrates large language models with SMT solvers to enhance logical correctness through verification-guided iterative refinement, utilizing formal semantic equivalence checking, semantic routing, and minimal correction subsets for precise error localization and consensus verification.

AI-generated summary

Despite the syntactic fluency of Large Language Models (LLMs), ensuring their logical correctness in high-stakes domains remains a fundamental challenge. We present a neurosymbolic framework that combines LLMs with SMT solvers to produce verification-guided answers through iterative refinement. Our approach decomposes LLM outputs into atomic claims, autoformalizes them into first-order logic, and verifies their logical consistency using automated theorem proving. We introduce three key innovations: (1) multi-model consensus via formal semantic equivalence checking to ensure logic-level alignment between candidates, eliminating the syntactic bias of surface-form metrics, (2) semantic routing that directs different claim types to appropriate verification strategies: symbolic solvers for logical claims and LLM ensembles for commonsense reasoning, and (3) precise logical error localization via Minimal Correction Subsets (MCS), which pinpoint the exact subset of claims to revise, transforming binary failure signals into actionable feedback. Our framework classifies claims by their logical status and aggregates multiple verification signals into a unified score with variance-based penalty. The system iteratively refines answers using structured feedback until acceptance criteria are met or convergence is achieved. This hybrid approach delivers formal guarantees where possible and consensus verification elsewhere, advancing trustworthy AI. With the GPT-OSS-120B model, VERGE demonstrates an average performance uplift of 18.7% at convergence across a set of reasoning benchmarks compared to single-pass approaches.

View arXiv page View PDF Add to collection

Community

optimusPrimeBee

Paper author Paper submitter about 6 hours ago

•

edited about 6 hours ago

In this paper, we introduce VERGE, a neuro-symbolic framework designed to bridge the semantic gap between LLM fluency and formal correctness by addressing the "all-or-nothing" limitation of traditional solvers. A key innovation is Semantic Routing, which dynamically classifies atomic claims to handle non-formalizable natural language; verifiable logic is sent to SMT solvers (Z3) for proof, while ambiguous or commonsense claims are routed to a consensus-based "soft verifier," preventing the system from breaking on vague predicates. Furthermore, we replace generic error signals with actionable feedback via Minimal Correction Subsets (MCS), which pinpoint the precise subset of conflicting claims to revise, enabling the model to iteratively refine its reasoning from "probabilistic" to "provable" with an 18.7% performance uplift

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2601.20055 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2601.20055 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2601.20055 in a Space README.md to link it from this page.