164 kB

Title: Superposition for Lambda-Free Higher-Order Logic

URL Source: https://arxiv.org/html/2005.02094

Markdown Content: \lmcsdoi

1721 \lmcsheadingLABEL:LastPageMay 07, 2020Apr. 12, 2021

\titlecomment

Extended version of Bentkamp et al., “Superposition for lambda-free higher-order logic” [BBCW18]

Superposition for Lambda-Free Higher-Order Logic Alexander Bentkamp\rsupera , Jasmin Blanchette\rsupera \lsuperaVrije Universiteit Amsterdam, Department of Computer Science, De Boelelaan 1111, 1081 HV Amsterdam, The Netherlands a.bentkamp@vu.nl j.c.blanchette@vu.nl , Simon Cruanes\rsuperb \lsuperbAesthetic Integration, 600 Congress Ave., Austin, Texas, 78701, USA simon@imandra.ai and Uwe Waldmann\rsuperc \lsupercMax-Planck-Institut für Informatik, Saarland Informatics Campus, Saarbrücken, Germany uwe@mpi-inf.mpg.de Abstract.

We introduce refutationally complete superposition calculi for intentional and extensional clausal 𝜆 -free higher-order logic, two formalisms that allow partial application and applied variables. The calculi are parameterized by a term order that need not be fully monotonic, making it possible to employ the 𝜆 -free higher-order lexicographic path and Knuth–Bendix orders. We implemented the calculi in the Zipperposition prover and evaluated them on Isabelle/HOL and TPTP benchmarks. They appear promising as a stepping stone towards complete, highly efficient automatic theorem provers for full higher-order logic.

Key words and phrases: superposition calculus, Boolean-free lambda-free higher-order logic, refutational completeness

Introduction

Superposition is a highly successful calculus for reasoning about first-order logic with equality. We are interested in graceful generalizations to higher-order logic: calculi that, as much as possible, coincide with standard superposition on first-order problems and that scale up to arbitrary higher-order problems.

As a stepping stone towards full higher-order logic, in this article we restrict our attention to a clausal 𝜆 -free fragment of polymorphic higher-order logic that supports partial application and application of variables (

9). This formalism is expressive enough to permit the axiomatization of higher-order combinators such as 𝗉𝗈𝗐 : 𝛱 𝛼 . 𝗇𝖺𝗍 → ( 𝛼 → 𝛼 ) → 𝛼 → 𝛼 (intended to denote the iterated application ℎ 𝑛 ⁢ 𝑥 ):

𝗉𝗈𝗐 ⁢ ⟨ 𝛼 ⟩ ⁢ 𝖹𝖾𝗋𝗈 ⁢ ℎ

≈ 𝗂𝖽 ⁢ ⟨ 𝛼 ⟩

𝗉𝗈𝗐 ⁢ ⟨ 𝛼 ⟩ ⁢ ( 𝖲𝗎𝖼𝖼 ⁢ 𝑛 ) ⁢ ℎ ⁢ 𝑥

≈ ℎ ⁢ ( 𝗉𝗈𝗐 ⁢ ⟨ 𝛼 ⟩ ⁢ 𝑛 ⁢ ℎ ⁢ 𝑥 )

Conventionally, functions are applied without parentheses and commas, and variables are italicized. Notice the variable number of arguments to 𝗉𝗈𝗐 ⁢ ⟨ 𝛼 ⟩ and the application of ℎ . The expressiveness of full higher-order logic can be recovered by introducing 𝖲𝖪 -style combinators to represent 𝜆 -abstractions and proxies for the logical symbols [Ker91, Rob70].

A widespread technique to support partial application and application of variables in first-order logic is to make all symbols nullary and to represent application of functions by a distinguished binary symbol 𝖺𝗉𝗉 : 𝛱 𝛼 , 𝛽 . 𝖿𝗎𝗇 ( 𝛼 , 𝛽 ) × 𝛼 → 𝛽 , where 𝖿𝗎𝗇 is an uninterpreted binary type constructor. Following this scheme, the higher-order term 𝖿 ⁢ ( ℎ ⁢ 𝖿 ) , where 𝖿 : 𝜅 → 𝜅 ′ , is translated to 𝖺𝗉𝗉 ⁢ ( 𝖿 , 𝖺𝗉𝗉 ⁢ ( ℎ , 𝖿 ) ) —or rather 𝖺𝗉𝗉 ⁢ ⟨ 𝜅 , 𝜅 ′ ⟩ ⁢ ( 𝖿 , 𝖺𝗉𝗉 ⁢ ⟨ 𝖿𝗎𝗇 ⁢ ( 𝜅 , 𝜅 ′ ) , 𝜅 ⟩ ⁢ ( ℎ , 𝖿 ) ) if we specify the type arguments. We call this the applicative encoding. The existence of such a reduction to first-order logic explains why 𝜆 -free higher-order terms are also called “applicative first-order terms.” Unlike for full higher-order logic, most general unifiers are unique for our 𝜆 -free fragment, just as they are for applicatively encoded first-order terms.

Although the applicative encoding is complete [Ker91] and is employed fruitfully in tools such as HOLyHammer and Sledgehammer [BKPU16], it suffers from a number of weaknesses, all related to its gracelessness. Transforming all the function symbols into constants considerably restricts what can be achieved with term orders; for example, argument tuples cannot easily be compared using different methods for different symbols [Kop12,

2.3.1] . In a prover, the encoding also clutters the data structures, slows down the algorithms, and neutralizes the heuristics that look at the terms’ root symbols. But our chief objection is the sheer clumsiness of encodings and their poor integration with interpreted symbols. And they quickly accumulate; for example, using the traditional encoding of polymorphism relying on a distinguished binary function symbol 𝗍 [BBPS16,
3.3] in conjunction with the applicative encoding, the term 𝖲𝗎𝖼𝖼 ⁢ 𝑥 becomes 𝗍 ⁢ ( 𝗇𝖺𝗍 , 𝖺𝗉𝗉 ⁢ ( 𝗍 ⁢ ( 𝖿𝗎𝗇 ⁢ ( 𝗇𝖺𝗍 , 𝗇𝖺𝗍 ) , 𝖲𝗎𝖼𝖼 ) , 𝗍 ⁢ ( 𝗇𝖺𝗍 , 𝑥 ) ) ) . The term’s simple structure is lost in translation.

Hybrid schemes have been proposed to strengthen the applicative encoding: If a given symbol always occurs with at least 𝑘 arguments, these can be passed directly [MP08]. However, this relies on a closed-world assumption: that all terms that will ever be compared arise in the initial problem. This noncompositionality conflicts with the need for complete higher-order calculi to synthesize arbitrary terms during proof search [BM14]. As a result, hybrid encodings are not an ideal basis for higher-order automated reasoning.

Instead, we propose to generalize the superposition calculus to intensional and extensional clausal 𝜆 -free higher-order logic. For the extensional version of the logic, the property ( ∀ 𝑥 . ℎ 𝑥 ≈ 𝑘 𝑥 )

■→ ℎ ≈ 𝑘 holds for all functions ℎ , 𝑘 of the same type. For each logic, we present two calculi (

12). The intentional calculi perfectly coincide with standard superposition on first-order clauses; the extensional calculi depend on an extra axiom.

Superposition is parameterized by a term order, which is used to prune the search space. If we assume that the term order is a simplification order enjoying totality on ground terms (i.e., terms containing no term or type variables), the standard calculus rules and completeness proof can be lifted verbatim. The only necessary changes concern the basic definitions of terms and substitutions. However, there is one monotonicity property that is hard to obtain unconditionally: compatibility with arguments. It states that 𝑠 ′ ≻ 𝑠 implies 𝑠 ′ ⁢ 𝑡 ≻ 𝑠 ⁢ 𝑡 for all terms 𝑠 , 𝑠 ′ , 𝑡 such that 𝑠 ⁢ 𝑡 and 𝑠 ′ ⁢ 𝑡 are well typed. Blanchette, Waldmann, and colleagues recently introduced graceful generalizations of the lexicographic path order (LPO) [BWW17] and the Knuth–Bendix order (KBO) [BBWW17] with argument coefficients, but they both lack this property. For example, given a KBO with 𝗀 ≻ 𝖿 , it may well be that 𝗀 ⁢ 𝖺 ≺ 𝖿 ⁢ 𝖺 if 𝖿 has a large enough multiplier on its argument.

Our superposition calculi are designed to be refutationally complete for such nonmonotonic orders (

16). To achieve this, they include an inference rule for argument congruence, which derives 𝐶 ∨ 𝑠 ⁢ 𝑥 ≈ 𝑡 ⁢ 𝑥 from 𝐶 ∨ 𝑠 ≈ 𝑡 . The redundancy criterion is defined in such a way that the larger, derived clause is not subsumed by the premise. In the completeness proof, the most difficult case is the one that normally excludes superposition at or below variables using the induction hypothesis. With nonmonotonicity, this approach no longer works, and we propose two alternatives: Either perform some superposition inferences into higher-order variables or “purify” the clauses to circumvent the issue. We refer to the corresponding calculi as nonpurifying and purifying.

The calculi are implemented in the Zipperposition prover [Cru17] (

20). We evaluate them on first- and higher-order Isabelle/HOL [BN10] and TPTP benchmarks [SSCB12, SBBT09] and compare them with the applicative encoding (

21). We find that there is a substantial cost associated with the applicative encoding, that the nonmonotonicity is not particularly expensive, and that the nonpurifying calculi outperform the purifying calculi.

An earlier version of this work was presented at IJCAR 2018 [BBCW18]. This article extends the conference paper with detailed soundness and completeness proofs and more explanations. Because of too weak selection restrictions on the purifying calculi, our claim of refutational completeness in the conference version was not entirely correct. We now strengthened the selection restrictions accordingly. Moreover, we extended the logic with polymorphism, leading to minor modifications to the calculus. We also simplified the presentation of the clausal fragment of the logic that interests us. In particular, we removed mandatory arguments. The redundancy criterion also differs slightly from the conference version. Finally, we updated the empirical evaluation to reflect recent improvements in the Zipperposition prover.

Since the publication of the conference paper, two research groups have built on our work. Bhayat and Reger [BR20] extended our intensional nonpurifying calculus to a version of higher-order logic with combinators. They use it with a nonmonotonic order that orients the defining equations of 𝖲𝖪 -style combinators. Bentkamp et al. [BBT + 21] extended our extensional nonpurifying calculus to a calculus on 𝜆 -terms. For them, support for nonmonotonic orders is crucial as well because no ground-total simplification order exists for 𝜆 -terms up to 𝛽 -conversion. Based on these two extensions of our calculi, the provers Zipperposition and Vampire achieved first and third place, respectively, in the higher-order category of the 2020 edition of the CADE ATP System Competition (CASC-J10).

Logic

Our logic is intended as an intermediate step on the way towards full higher-order logic (also called simple type theory) [Chu40, GM93]. Refutational completeness of calculi for higher-order logic is usually stated in terms of Henkin semantics [Hen50, BM14], in which the universes used to interpret functions need only contain the functions that can be expressed as terms. Since the terms of 𝜆 -free higher-order logic exclude 𝜆 -abstractions, in “ 𝜆 -free Henkin semantics” the universes interpreting functions can be even smaller. In that sense, our semantics resemble Henkin prestructures [Lei94,

5.4] . In contrast to other higher-order logics [Vää19], there are no comprehension principles, and we disallow nesting of Boolean formulas inside terms.

10.1. Syntax

We fix a set Σ 𝗍𝗒 of type constructors with arities and a set 𝒱 𝗍𝗒 of type variables. We require at least one nullary type constructor and a binary type constructor → to be present in Σ 𝗍𝗒 . We inductively define a 𝜆 -free higher-order type to be either a type variable 𝛼 ∈ 𝒱 𝗍𝗒 or of the form 𝜅 ⁢ ( 𝜏 ¯ 𝑛 ) for an 𝑛 -ary type constructor 𝜅 ∈ Σ 𝗍𝗒 and types 𝜏 ¯ 𝑛 . Here and elsewhere, we write 𝑎 ¯ 𝑛 or 𝑎 ¯ to abbreviate the tuple ( 𝑎 1 , … , 𝑎 𝑛 ) or product 𝑎 1 × ⋯ × 𝑎 𝑛 , for 𝑛 ≥ 0 . We write 𝜅 for 𝜅 ⁢ ( ) and 𝜏 → 𝜐 for → ( 𝜏 , 𝜐 ) . A type declaration is an expression of the form 𝛱 ⁢ 𝛼 ¯ 𝑚 . 𝜏 (or simply 𝜏 if 𝑚

0 ), where all type variables occurring in 𝜏 belong to 𝛼 ¯ 𝑚 .

We fix a set Σ of symbols with type declarations, written as 𝖿 : 𝛱 𝛼 ¯ 𝑚 . 𝜏 or 𝖿 , and a set 𝒱 of typed variables, written as 𝑥 : 𝜏 or 𝑥 . We require Σ to contain a symbol with type declaration 𝛱 ⁢ 𝛼 . 𝛼 , to ensure that the (Herbrand) domain of every type is nonempty. The sets ( Σ 𝗍𝗒 , Σ ) form the logic’s signature. We reserve the letters 𝑠 , 𝑡 , 𝑢 , 𝑣 , 𝑤 for terms and 𝑥 , 𝑦 , 𝑧 for variables and write : 𝜏 to indicate their type. The set of 𝜆 -free higher-order terms is defined inductively as follows. Every variable 𝑥 : 𝜏 ∈ 𝒱 is a term. If 𝖿 : 𝛱 𝛼 ¯ 𝑚 . 𝜏 is a symbol and 𝜐 ¯ 𝑚 are types, then 𝖿 ⁢ ⟨ 𝜐 ¯ 𝑚 ⟩ : 𝜏 ⁢ { 𝛼 ¯ 𝑚 ↦ 𝜐 ¯ 𝑚 } is a term. If 𝑡 : 𝜏 → 𝜐 and 𝑢 : 𝜏 , then 𝑡 ⁢ 𝑢 : 𝜐 is a term, called an application. Nonapplication terms are called heads. Application is left-associative, and correspondingly the function type constructor → is right-associative. Using the spine notation [CP03], terms can be decomposed in a unique way as a head 𝑡 applied to zero or more arguments: 𝑡 ⁢ 𝑠 1 ⁢ … ⁢ 𝑠 𝑛 or 𝑡 ⁢ 𝑠 ¯ 𝑛 (abusing notation). A term is ground if it is built without using type or term variables.

Substitution and unification are generalized in the obvious way, without the difficulties caused by 𝜆 -abstractions. A substitution has the form { 𝛼 ¯ 𝑚 , 𝑥 ¯ 𝑛 ↦ 𝜐 ¯ 𝑚 , 𝑠 ¯ 𝑛 } , where each 𝑥 𝑗 has type 𝜏 𝑗 and each 𝑠 𝑗 has type 𝜏 𝑗 ⁢ { 𝛼 ¯ 𝑚 ↦ 𝜐 ¯ 𝑚 } , mapping 𝑚 type variables to 𝑚 types and 𝑛 term variables to 𝑛 terms. A unifier of two terms 𝑠 and 𝑡 is a substitution 𝜌 such that 𝑠 ⁢ 𝜌

𝑡 ⁢ 𝜌 . A most general unifier mgu ⁡ ( 𝑠 , 𝑡 ) of two terms 𝑠 and 𝑡 is a unifier 𝜎 of 𝑠 and 𝑡 such that for every other unifier 𝜃 , there exists a substitution 𝜌 such that 𝛼 ⁢ 𝜃

𝛼 ⁢ 𝜎 ⁢ 𝜌 and 𝑥 ⁢ 𝜃

𝑥 ⁢ 𝜎 ⁢ 𝜌 for all 𝛼 ∈ 𝒱 𝗍𝗒 and all 𝑥 ∈ 𝒱 . As in first-order logic, the most general unifier is unique up to variable renaming. For example, mgu ⁡ ( 𝑥 ⁢ 𝖻 ⁢ 𝑧 , 𝖿 ⁢ 𝖺 ⁢ 𝑦 ⁢ 𝖼 )

{ 𝑥 ↦ 𝖿 ⁢ 𝖺 , 𝑦 ↦ 𝖻 , 𝑧 ↦ 𝖼 } , and mgu ⁡ ( 𝑦 ⁢ ( 𝖿 ⁢ 𝖺 ) , 𝖿 ⁢ ( 𝑦 ⁢ 𝖺 ) )

{ 𝑦 ↦ 𝖿 } , assuming that the types of the unified subterms are equal.

An equation 𝑠 ≈ 𝑡 is formally an unordered pair of terms 𝑠 and 𝑡 of the same type. A literal is an equation or a negated equation, written 𝑠 ≉ 𝑡 . A clause 𝐿 1 ∨ ⋯ ∨ 𝐿 𝑛 is a finite multiset of literals 𝐿 𝑗 . The empty clause is written as ⊥ .

10.2. Semantics

A type interpretation ℐ 𝗍𝗒

( 𝒰 , 𝒥 𝗍𝗒 ) is defined as follows. The set 𝒰 is a nonempty collection of nonempty sets, called universes. The function 𝒥 𝗍𝗒 associates a function 𝒥 𝗍𝗒 ⁢ ( 𝜅 ) : 𝒰 𝓃 → 𝒰 with each 𝑛 -ary type constructor 𝜅 . A type valuation 𝜉 is a function that maps every type variable to a universe. The denotation of a type for a type interpretation ℐ 𝗍𝗒 and a type valuation 𝜉 is defined by ⟦ 𝛼 ⟧ ℐ 𝗍𝗒 𝜉

𝜉 ( 𝛼 ) and ⟦ 𝜅 ( 𝜏 ¯ ) ⟧ ℐ 𝗍𝗒 𝜉

𝒥 𝗍𝗒 ( 𝜅 ) ( ⟦ 𝜏 ¯ ⟧ ℐ 𝗍𝗒 𝜉 ) . Here and elsewhere, we abuse notation by applying an operation on a tuple when it must be applied elementwise; thus, ⟦ 𝜏 ¯ 𝑛 ⟧ ℐ 𝗍𝗒 𝜉 stands for ⟦ 𝜏 1 ⟧ ℐ 𝗍𝗒 𝜉 , … , ⟦ 𝜏 𝑛 ⟧ ℐ 𝗍𝗒 𝜉 .

A type valuation 𝜉 can be extended to be a valuation by additionally assigning an element 𝜉 ( 𝑥 ) ∈ ⟦ 𝜏 ⟧ ℐ 𝗍𝗒 𝜉 to each variable 𝑥 : 𝜏 . An interpretation function 𝒥 for a type interpretation ℐ 𝗍𝗒 associates with each symbol 𝖿 : 𝛱 𝛼 ¯ 𝑚 . 𝜏 and universe tuple 𝑈 ¯ 𝑚 ∈ 𝒰 𝓂 a value 𝒥 ( 𝖿 , 𝒰 ¯ 𝓂 ) ∈ ⟦ 𝜏 ⟧ ℐ 𝗍𝗒 𝜉 , where 𝜉 is the type valuation that maps each 𝛼 𝑖 to 𝑈 𝑖 . Loosely following Fitting [Fit02,

2.5] , an extension function ℰ associates to any pair of universes 𝑈 1 , 𝑈 2 ∈ 𝒰 a function ℰ 𝒰 1 , 𝒰 2 : 𝒥 𝗍𝗒 ⁢ ( → ) ⁢ ( 𝒰 1 , 𝒰 2 ) → ( 𝒰 1 → 𝒰 2 ) . Together, a type interpretation, an interpretation function, and an extension function form an interpretation ℐ
( 𝒰 , 𝒥 𝗍𝗒 , 𝒥 , ℰ ) .

An interpretation is extensional if ℰ 𝒰 1 , 𝒰 2 is injective for all 𝑈 1 , 𝑈 2 . Both intensional and extensional logics are widely used for interactive theorem proving; for example, Coq’s calculus of inductive constructions is intensional [BC04], whereas Isabelle/HOL is extensional [NPW02]. The semantics is standard if ℰ 𝒰 1 , 𝒰 2 is bijective for all 𝑈 1 , 𝑈 2 .

For an interpretation ℐ

( 𝒰 , 𝒥 𝗍𝗒 , 𝒥 , ℰ ) and a valuation 𝜉 , the denotation of a term is defined as follows: For variables 𝑥 , let ⟦ 𝑥 ⟧ ℐ 𝜉

𝜉 ( 𝑥 ) . For symbols 𝖿 , let ⟦ 𝖿 ⟨ 𝜏 ¯ ⟩ ⟧ ℐ 𝜉

𝒥 ( 𝖿 , ⟦ 𝜏 ¯ ⟧ ℐ 𝗍𝗒 𝜉 ) . For applications 𝑠 ⁢ 𝑡 of a term 𝑠 : 𝜏 → 𝜐 to a term 𝑡 : 𝜏 , let 𝑈 1

⟦ 𝜏 ⟧ ℐ 𝗍𝗒 𝜉 , 𝑈 2

⟦ 𝜐 ⟧ ℐ 𝗍𝗒 𝜉 , and ⟦ 𝑠 𝑡 ⟧ ℐ 𝜉

ℰ 𝒰 1 , 𝒰 2 ( ⟦ 𝓈 ⟧ ℐ 𝜉 ) ( ⟦ 𝓉 ⟧ ℐ 𝜉 ) . If 𝑡 is a ground term, we also write ⟦ 𝑡 ⟧ ℐ for the denotation of 𝑡 because it does not depend on the valuation.

An equation 𝑠 ≈ 𝑡 is true in ℐ for 𝜉 if ⟦ 𝑠 ⟧ ℐ 𝜉

⟦ 𝑡 ⟧ ℐ 𝜉 ; otherwise, it is false. A disequation 𝑠 ≉ 𝑡 is true if 𝑠 ≈ 𝑡 is false. A clause is true if at least one of its literals is true. The interpretation ℐ is a model of a clause 𝐶 , written ℐ ⊧ 𝐶 , if 𝐶 is true in ℐ for all valuations 𝜉 . It is a model of a set of clauses if it is a model of all contained clauses.

For example, given the signature ( { 𝜅 , → } , { 𝖺 : 𝜅 } ) and a variable ℎ : 𝜅 → 𝜅 , the clause ℎ ⁢ 𝖺 ≉ 𝖺 has an extensional model with 𝒰

{ 𝒰 1 , 𝒰 2 } , 𝑈 1

{ 𝑎 , 𝑏 } ( 𝑎 ≠ 𝑏 ), 𝑈 2

{ 𝑓 } , 𝒥 𝗍𝗒 ⁢ ( 𝜅 )

𝒰 1 , 𝒥 𝗍𝗒 ⁢ ( → ) ⁢ ( 𝒰 1 , 𝒰 1 )

𝒰 2 , 𝒥 ⁢ ( 𝖺 )

𝒶 , ℰ 𝒰 1 , 𝒰 1 ⁢ ( 𝒻 ) ⁢ ( 𝒶 )

ℰ 𝒰 1 , 𝒰 1 ⁢ ( 𝒻 ) ⁢ ( 𝒷 )

𝒷 .

The Calculi

We introduce four versions of the Boolean-free 𝜆 -free higher-order superposition calculus, articulated along two axes: intentional versus extensional, and nonpurifying versus purifying. To avoid repetitions, our presentation unifies them into a single framework.

12.1. The Inference Rules

To support nonmonotonic term orders, we restrict superposition inferences to green subterms, which are defined inductively as follows:

Definition \thethm (Green subterms and contexts).

A term 𝑡 ′ is a green subterm of 𝑡 if 𝑡

𝑡 ′ or if 𝑡

𝑠 ⁢ 𝑢 ¯ and 𝑡 ′ is a green subterm of 𝑢 𝑖 for some 𝑖 . We write 𝑠 ⁢

⁢ 𝑢 ⁢

to indicate that the subterm 𝑢 of 𝑠 ⁢ [ 𝑢 ] is a green subterm. Correspondingly, we call the context 𝑠 ⁢

⁢

around a green subterm a green context.

By this definition, 𝖿 and 𝖿 ⁢ 𝖺 are subterms of 𝖿 ⁢ 𝖺 ⁢ 𝖻 , but not green subterms. The green subterms of 𝖿 ⁢ 𝖺 ⁢ 𝖻 are 𝖺 , 𝖻 , and 𝖿 ⁢ 𝖺 ⁢ 𝖻 . Thus, [ ] ⁢ 𝖺 ⁢ 𝖻 and [ ] ⁢ 𝖻 are not green contexts, but 𝖿 ⁢ [ ] ⁢ 𝖻 , 𝖿 ⁢ 𝖺 ⁢ [ ] , and [ ] are.

The calculi are parameterized by a partial order ≻ on terms that

–

is well founded on ground terms;

–

is total on ground terms;

–

has the subterm property on ground terms;

–

is compatible with green contexts on ground terms: 𝑡 ′ ≻ 𝑡 implies 𝑠 ⁢

⁢ 𝑡 ′ ⁢

≻ 𝑠 ⁢

⁢ 𝑡 ⁢

;

–

is stable under grounding substitutions: 𝑡 ≻ 𝑠 implies 𝑡 ⁢ 𝜃 ≻ 𝑠 ⁢ 𝜃 for all substitutions 𝜃 grounding 𝑡 and 𝑠 .

The order need not be compatible with arguments: 𝑠 ′ ≻ 𝑠 need not imply 𝑠 ′ ⁢ 𝑡 ≻ 𝑠 ⁢ 𝑡 , even on ground terms. The literal and clause orders are defined from ≻ as multiset extensions in the standard way [BG94]. Despite their names, the term, literal, and clause orders need not be transitive on nonground entities.

The 𝜆 -free higher-order generalizations of LPO [BWW17] and KBO [BBWW17] as well as EPO [Ben18] fulfill these requirements, with the caveat that they are defined on untyped terms. To use them on polymorphic terms, we can encode type arguments as term arguments and ignore the remaining type information.

Literal selection is supported. The selection function maps each clause 𝐶 to a subclause of 𝐶 consisting of negative literals. A literal 𝐿 is (strictly) eligible w.r.t. a substitution 𝜎 in 𝐶 if it is selected in 𝐶 or there are no selected literals in 𝐶 and 𝐿 ⁢ 𝜎 is (strictly) maximal in 𝐶 ⁢ 𝜎 . If 𝜎 is the identity substitution, we leave it implicit.

The following four rules are common to all four calculi. We regard positive and negative superposition as two cases of the same rule

\prftree ⁢ [ 𝑟 ] ⁢ Sup ⁢ 𝐷 ′ ∨ 𝑡 ≈ 𝑡 ′ ⏞ 𝐷 𝐶 ′ ∨ 𝑠 ⁢

⁢ 𝑢 ⁢

≈ ˙ 𝑠 ′ ⏞ 𝐶 ⁢ ( 𝐷 ′ ∨ 𝐶 ′ ∨ 𝑠 ⁢

⁢ 𝑡 ′ ⁢

≈ ˙ 𝑠 ′ ) ⁢ 𝜎

where ≈ ˙ denotes ≈ or ≉ and the following conditions are fulfilled:

𝜎

mgu ⁡ ( 𝑡 , 𝑢 ) ; 2. 𝑡 ⁢ 𝜎 ⋠ 𝑡 ′ ⁢ 𝜎 ; 3. 𝑠 ⁢

⁢ 𝑢 ⁢

⁢ 𝜎 ⋠ 𝑠 ′ ⁢ 𝜎 ;

𝑡 ≈ 𝑡 ′ is strictly eligible w.r.t. 𝜎 in 𝐷 ; 5. 𝐶 ⁢ 𝜎 ⋠ 𝐷 ⁢ 𝜎 ;

𝑠 ⁢

⁢ 𝑢 ⁢

≈ ˙ 𝑠 ′ is eligible w.r.t. 𝜎 in 𝐶 and, if positive, even strictly eligible;

the variable condition must hold, which is specified individually for each calculus below.

In each calculus, we will define the variable condition to coincide with the condition “ 𝑢 ∉ 𝒱 ” if the premises are first-order.

The equality resolution and equality factoring rules are almost identical to their standard counterparts:

\prftree ⁢ [ 𝑟 ] ⁢ ERes ⁢ 𝐶 ′ ∨ 𝑠 ≉ 𝑠 ′ ⏞ 𝐶 ⁢ 𝐶 ′ ⁢ 𝜎

where

𝜎

mgu ⁡ ( 𝑠 , 𝑠 ′ ) ; 2. 𝑠 ≉ 𝑠 ′ is eligible w.r.t. 𝜎 in 𝐶 .

\prftree ⁢ [ 𝑟 ] ⁢ EFact ⁢ 𝐶 ′ ∨ 𝑠 ′ ≈ 𝑡 ′ ∨ 𝑠 ≈ 𝑡 ⏞ 𝐶 ⁢ ( 𝐶 ′ ∨ 𝑡 ≉ 𝑡 ′ ∨ 𝑠 ≈ 𝑡 ′ ) ⁢ 𝜎

where

𝜎

mgu ⁡ ( 𝑠 , 𝑠 ′ ) ; 2. 𝑠 ⁢ 𝜎 ⋠ 𝑡 ⁢ 𝜎 ; 3. 𝑠 ≈ 𝑡 is eligible w.r.t. 𝜎 in 𝐶 .

The following argument congruence rule compensates for the limitation that the superposition rule applies only to green subterms:

\prftree ⁢ [ 𝑟 ] ⁢ ArgCong ⁢ 𝐶 ′ ∨ 𝑠 ≈ 𝑠 ′ ⏞ 𝐶 ⁢ 𝐶 ′ ⁢ 𝜎 ∨ 𝑠 ⁢ 𝜎 ⁢ 𝑥 ¯ ≈ 𝑠 ′ ⁢ 𝜎 ⁢ 𝑥 ¯

where

𝑠 ≈ 𝑠 ′ is strictly eligible w.r.t. 𝜎 in 𝐶 ;

𝑥 ¯ is a nonempty tuple of distinct fresh variables;

𝜎 is the most general type substitution that ensures well-typedness of the conclusion.

In particular, if 𝑠 takes 𝑚 arguments, there are 𝑚 ArgCong conclusions for this literal, for which 𝜎 is the identity and 𝑥 ¯ is a tuple of 1, …, 𝑚 − 1 , or 𝑚 variables. If the result type of 𝑠 is a type variable, we have in addition infinitely many ArgCong conclusions, for which 𝜎 instantiates the type variable in the result type of 𝑠 with 𝛼 ¯ 𝑘 → 𝛽 for some 𝑘

0 and fresh type variables 𝛼 ¯ 𝑘 and 𝛽 and for which 𝑥 ¯ is a tuple of 𝑚 + 𝑘 variables. In practice, the enumeration of the infinitely many conclusions must be interleaved with other inferences via some form of dovetailing.

For the intensional nonpurifying calculus, the variable condition of the Sup rule is as follows:

Either 𝑢 ∉ 𝒱 or there exists a grounding substitution 𝜃 with 𝑡 ⁢ 𝜎 ⁢ 𝜃 ≻ 𝑡 ′ ⁢ 𝜎 ⁢ 𝜃 and 𝐶 ⁢ 𝜎 ⁢ 𝜃 ≺ 𝐶 ⁢ { 𝑢 ↦ 𝑡 ′ } ⁢ 𝜎 ⁢ 𝜃 .

This condition generalizes the standard condition that 𝑢 ∉ 𝒱 . The two coincide if 𝐶 is first-order or if the term order is monotonic. In some cases involving nonmonotonicity, the variable condition effectively mandates Sup inferences at variable positions of the right premise, but never below. We will call theses inferences at variables.

For the extensional nonpurifying calculus, the variable condition uses the following definition.

Definition \thethm.

A term of the form 𝑥 ⁢ 𝑠 ¯ 𝑛 , for 𝑛 ≥ 0 , jells with a literal 𝑡 ≈ 𝑡 ′ ∈ 𝐷 if 𝑡

𝑡 ~ ⁢ 𝑦 ¯ 𝑛 and 𝑡 ′

𝑡 ~ ′ ⁢ 𝑦 ¯ 𝑛 for some terms 𝑡 ~ , 𝑡 ~ ′ and distinct variables 𝑦 ¯ 𝑛 that do not occur elsewhere in 𝐷 .

Using the naming convention from Definition 12.1 for 𝑡 ~ ′ , the variable condition can be stated as follows:

If 𝑢 has a variable head 𝑥 and jells with the literal 𝑡 ≈ 𝑡 ′ ∈ 𝐷 , there must exist a grounding substitution 𝜃 with 𝑡 ⁢ 𝜎 ⁢ 𝜃 ≻ 𝑡 ′ ⁢ 𝜎 ⁢ 𝜃 and 𝐶 ⁢ 𝜎 ⁢ 𝜃 ≺ 𝐶 ′′ ⁢ 𝜎 ⁢ 𝜃 , where 𝐶 ′′

𝐶 ⁢ { 𝑥 ↦ 𝑡 ~ ′ } .

If 𝐶 is first-order, this amounts to 𝑢 ∉ 𝒱 . Since the order is compatible with green contexts, the substitution 𝜃 can exist only if 𝑥 occurs applied in 𝐶 .

Moreover, the extensional nonpurifying calculus has one additional rule, the positive extensionality rule, and one axiom, the extensionality axiom. The rule is

\prftree ⁢ [ 𝑟 ] ⁢ PosExt ⁢ 𝐶 ′ ∨ 𝑠 ⁢ 𝑥 ¯ ≈ 𝑠 ′ ⁢ 𝑥 ¯ ⁢ 𝐶 ′ ∨ 𝑠 ≈ 𝑠 ′

where

𝑥 ¯ is a tuple of distinct variables that do not occur in 𝐶 ′ , 𝑠 , or 𝑠 ′

𝑠 ⁢ 𝑥 ¯ ≈ 𝑠 ′ ⁢ 𝑥 ¯ is strictly eligible in the premise.

The extensionality axiom uses a polymorphic Skolem symbol 𝖽𝗂𝖿𝖿 : 𝛱 𝛼 , 𝛽 . ( 𝛼 → 𝛽 ) 2 → 𝛼 characterized by the axiom

𝑥 ⁢ ( 𝖽𝗂𝖿𝖿 ⁢ ⟨ 𝛼 , 𝛽 ⟩ ⁢ 𝑥 ⁢ 𝑦 ) ≉ 𝑦 ⁢ ( 𝖽𝗂𝖿𝖿 ⁢ ⟨ 𝛼 , 𝛽 ⟩ ⁢ 𝑥 ⁢ 𝑦 ) ∨ 𝑥 ≈ 𝑦 (Ext)

Unlike the nonpurifying calculi, the purifying calculi never perform superposition at variables. Instead, they rely on purification [Bra75, DH86, SS89, SL91] (also called abstraction) to circumvent nonmonotonicity. The idea is to rename apart problematic occurrences of a variable 𝑥 in a clause to 𝑥 1 , … , 𝑥 𝑛 and to add purification literals 𝑥 1 ≉ 𝑥 , …, 𝑥 𝑛 ≉ 𝑥 to connect the new variables to 𝑥 . We must then ensure that all clauses are purified, by processing the initial clause set and the conclusion of every inference or simplification.

In the intensional purifying calculus, the purification 𝓅 ⁢ 𝓊 ⁢ 𝓇 ⁢ ℯ ⁡ ( 𝐶 ) of clause 𝐶 is defined as the result of the following procedure. Choose a variable 𝑥 that occurs applied in 𝐶 and also unapplied in a literal of 𝐶 that is not of the form 𝑥 ≉ 𝑦 . If no such variable exists, terminate. Otherwise, replace all unapplied occurrences of 𝑥 in 𝐶 by a fresh variable 𝑥 ′ and add the purification literal 𝑥 ′ ≉ 𝑥 . Then repeat these steps with another variable. The procedure terminates because the number of variables that can be chosen reduces with each step. For example,

𝓅 ⁢ 𝓊 ⁢ 𝓇 ⁢ ℯ ⁡ ( 𝑥 ⁢ 𝖺 ≈ 𝑥 ⁢ 𝖻 ∨ 𝖿 ⁢ 𝑥 ≈ 𝗀 ⁢ 𝑥 )

𝑥 ⁢ 𝖺 ≈ 𝑥 ⁢ 𝖻 ∨ 𝖿 ⁢ 𝑥 ′ ≈ 𝗀 ⁢ 𝑥 ′ ∨ 𝑥 ≉ 𝑥 ′

The variable condition is standard:

The term 𝑢 is not a variable.

The conclusion 𝐶 of ArgCong is changed to 𝓅 ⁢ 𝓊 ⁢ 𝓇 ⁢ ℯ ⁡ ( 𝐶 ) ; the other rules preserve purity of their premises.

In the extensional purifying calculus, 𝓅 ⁢ 𝓊 ⁢ 𝓇 ⁢ ℯ ⁡ ( 𝐶 ) is defined as follows. Choose a variable 𝑥 occurring in green subterms 𝑥 ⁢ 𝑢 ¯ and 𝑥 ⁢ 𝑣 ¯ in literals of 𝐶 that are not of the form 𝑥 ≉ 𝑦 , where 𝑢 ¯ and 𝑣 ¯ are distinct (possibly empty) term tuples. If no such variable exists, terminate. Otherwise, replace all green subterms 𝑥 ⁢ 𝑣 ¯ with 𝑥 ′ ⁢ 𝑣 ¯ , where 𝑥 ′ is fresh, and add the purification literal 𝑥 ′ ≉ 𝑥 . Then repeat the procedure until no variable fulfilling the requirements is left. The procedure terminates because the number of variables fulfilling the requirements does not increase and with each step, the number of distinct tuples 𝑢 ¯ and 𝑣 ¯ fulfilling the requirements decreases for the chosen variable 𝑥 . For example,

𝓅 ⁢ 𝓊 ⁢ 𝓇 ⁢ ℯ ⁡ ( 𝑥 ⁢ 𝖺 ≈ 𝑥 ⁢ 𝖻 ∨ 𝖿 ⁢ 𝑥 ≈ 𝗀 ⁢ 𝑥 )

𝑥 ⁢ 𝖺 ≈ 𝑥 ′ ⁢ 𝖻 ∨ 𝖿 ⁢ 𝑥 ′′ ≈ 𝗀 ⁢ 𝑥 ′′ ∨ 𝑥 ′ ≉ 𝑥 ∨ 𝑥 ′′ ≉ 𝑥

Like the extensional nonpurifying calculus, this calculus also contains the PosExt rule and axiom (Ext) introduced above. The variable condition is as follows:

Either 𝑢 has a nonvariable head or 𝑢 does not jell with the literal 𝑡 ≈ 𝑡 ′ ∈ 𝐷 .

The conclusion 𝐸 of each rule is changed to 𝓅 ⁢ 𝓊 ⁢ 𝓇 ⁢ ℯ ⁡ ( 𝐸 ) , except for PosExt, which preserves purity.

Finally, we impose further restrictions on literal selection. In the nonpurifying calculi, a literal must not be selected if 𝑥 ⁢ 𝑢 ¯ is a maximal term of the clause and the literal contains a green subterm 𝑥 ⁢ 𝑣 ¯ with 𝑣 ¯ ≠ 𝑢 ¯ . In the purifying calculi, a literal must not be selected if it contains a variable of functional type. These restrictions are needed in our completeness proof to show that superposition inferences below variables are redundant. For the purifying calculi, the restriction is also needed to avoid inferences whose conclusion is identical to their premise, such as

\prftree ⁢ [ 𝑟 ] ⁢ ERes ⁢ 𝑧 ⁢ 𝖺 ≈ 𝖻 ∨ 𝑧 ≉ 𝑥 ∨ 𝖿 ≉ 𝑥 ⁢ 𝑧 ⁢ 𝖺 ≈ 𝖻 ∨ 𝑧 ′ ≉ 𝖿 ∨ 𝑧 ′ ≉ 𝑧

where 𝖿 ≉ 𝑥 is selected. For the nonpurifying calculi, it might be possible to avoid the restriction at the cost of a more elaborate argument.

Example \thethm.

We illustrate the different variable condition with a few examples. Consider a left-to-right LPO [BWW17] instance with precedence 𝗁 ≻ 𝗀 ≻ 𝖿 ≻ 𝖼 ≻ 𝖻 ≻ 𝖺 . Given the clauses 𝐷 and 𝐶 of a superposition inference in which the grayed subterms are unified, we list below whether each calculus’s variable condition is fulfilled (✓) or not.

𝐷

𝐶 intensional nonpurifying extensional nonpurifying intensional purifying extensional purifying 𝗁 ≈ 𝗀

𝖿 ⁢ 𝑦 ≈ 𝖼

𝖿 ⁢ ( 𝗁 ⁢ 𝖺 ) ≈ 𝗁

𝗀 ⁢ 𝑦 ≈ 𝑦 ⁢ 𝖻 ✓ ✓ 𝗁 ⁢ 𝑥 ≈ 𝖿 ⁢ 𝑥

𝗀 ⁢ ( 𝑦 ⁢ 𝖻 ) ⁢ 𝑦 ≈ 𝖺 ✓ ✓ 𝑥 ≈ 𝖼 ∨ 𝗁 ⁢ 𝑥 ≈ 𝖿 ⁢ 𝑥

𝗀 ⁢ ( 𝑦 ⁢ 𝖻 ) ⁢ 𝑦 ≈ 𝖺 ✓ ✓ ✓ ✓

For the purifying calculi, the clauses would actually undergo purification first, but this has no impact on the variable conditions. In the last row, the term 𝑦 ⁢ 𝖻 does not jell with 𝗁 ⁢ 𝑥 ≈ 𝖿 ⁢ 𝑥 ∈ 𝐷 because 𝑥 occurs also in the first literal of 𝐷 .

Remark \thethm.

In descriptions of first-order logic with equality, the property 𝑦 ≈ 𝑦 ′

■→ 𝖿 ⁢ ( 𝑥 ¯ , 𝑦 , 𝑧 ¯ ) ≈ 𝖿 ⁢ ( 𝑥 ¯ , 𝑦 ′ , 𝑧 ¯ ) is often referred to as “function congruence.” It seems natural to use the same label for the higher-order version 𝑡 ≈ 𝑡 ′

■→ 𝑠 ⁢ 𝑡 ≈ 𝑠 ⁢ 𝑡 ′ and to call the companion property 𝑠 ≈ 𝑠 ′

■→ 𝑠 ⁢ 𝑡 ≈ 𝑠 ′ ⁢ 𝑡 “argument congruence,” whence the name ArgCong for our inference rule. This nomenclature is far from universal; for example, the Isabelle/HOL theorem fun_cong captures argument congruence and arg_cong captures function congruence.

12.2. Rationale for the Inference Rules

A key restriction of all four calculi is that they superpose only at green subterms, mirroring the term order’s compatibility with green contexts. The ArgCong rule then makes it possible to simulate superposition at nongreen subterms. However, in conjunction with the Sup rules, ArgCong can exhibit an unpleasant behavior, which we call argument congruence explosion:

\prftree ⁢ [ 𝑙 ] ⁢ Sup \prftree ⁢ [ 𝑙 ] ⁢ ArgCong ⁢ 𝗀 ≈ 𝖿𝗀 ⁢ 𝑥 ≈ 𝖿 ⁢ 𝑥ℎ ⁢ 𝖺 ≉ 𝖻𝖿 ⁢ 𝖺 ≉ 𝖻 ⁢ \prftree ⁢ [ 𝑙 ] ⁢ Sup \prftree ⁢ [ 𝑙 ] ⁢ ArgCong ⁢ 𝗀 ≈ 𝖿𝗀 ⁢ 𝑥 ⁢ 𝑦 ⁢ 𝑧 ≈ 𝖿 ⁢ 𝑥 ⁢ 𝑦 ⁢ 𝑧ℎ ⁢ 𝖺 ≉ 𝖻𝖿 ⁢ 𝑥 ⁢ 𝑦 ⁢ 𝖺 ≉ 𝖻

In both derivation trees, the higher-order variable ℎ is effectively the target of a Sup inference. Such derivations essentially amount to superposition at variable positions (as shown on the left) or even superposition below variable positions (as shown on the right), both of which can be extremely prolific. In standard superposition, the explosion is averted by the condition on the Sup rule that 𝑢 ∉ 𝒱 . In the extensional purifying calculus, the variable condition tests that either 𝑢 has a nonvariable head or 𝑢 does not jell with the literal 𝑡 ≈ 𝑡 ′ ∈ 𝐷 , which prevents derivations such as the above. In the corresponding nonpurifying calculus, some such derivations may need to be performed when the term order exhibits nonmonotonicity for the terms of interest.

In the intensional calculi, the explosion can arise because the variable conditions are weaker. The following example shows that the intensional nonpurifying calculus would be incomplete if we used the variable condition of the extensional nonpurifying calculus.

Example \thethm.

Consider a left-to-right LPO [BWW17] instance with precedence 𝗁 ≻ 𝗀 ≻ 𝖿 ≻ 𝖻 ≻ 𝖺 , and consider the following unsatisfiable clause set:

𝗁 ⁢ 𝑥

≈ 𝖿 ⁢ 𝑥

𝗀 ⁢ ( 𝑥 ⁢ 𝖻 ) ⁢ 𝑥

≈ 𝖺

𝗀 ⁢ ( 𝖿 ⁢ 𝖻 ) ⁢ 𝗁

≉ 𝖺

The only possible inference is a Sup inference of the first into the second clause, but the variable condition of the extensional nonpurifying calculus is not met.

It is unclear whether the variable condition of the intensional purifying calculus could be strengthened, but our completeness proof suggests that it cannot.

The variable conditions in the extensional calculi are designed to prevent the argument congruence explosion shown above, but since they consider only the shape of the clauses, they might also block Sup inferences whose side premises do not originate from ArgCong. This is why we need the PosExt rule.

Example \thethm.

In the following unsatisfiable clause set, the only possible inference from these clauses in the extensional nonpurifying calculus is PosExt, showing its necessity:

𝗀 ⁢ 𝑥

≈ 𝖿 ⁢ 𝑥

𝗀

≉ 𝖿

𝑥 ⁢ ( 𝖽𝗂𝖿𝖿 ⁢ ⟨ 𝛼 , 𝛽 ⟩ ⁢ 𝑥 ⁢ 𝑦 ) ≉ 𝑦 ⁢ ( 𝖽𝗂𝖿𝖿 ⁢ ⟨ 𝛼 , 𝛽 ⟩ ⁢ 𝑥 ⁢ 𝑦 ) ∨ 𝑥 ≈ 𝑦

The same argument applies for the purifying calculus with the difference that the third clause must be purified.

Due to nonmonotonicity, for refutational completeness we need either to purify the clauses or to allow some superposition at variable positions, as mandated by the respective variable conditions. Without either of these measures, at least the extensional calculi and presumably also the intensional calculi would be incomplete, as the next example demonstrates.

Example \thethm.

Consider the following clause set:

𝗄 ⁢ ( 𝗀 ⁢ 𝑥 ) ≈ 𝗄 ⁢ ( 𝑥 ⁢ 𝖻 ) 𝗄 ⁢ ( 𝖿 ⁢ ( 𝗁 ⁢ 𝖺 ) ⁢ 𝖻 ) ≉ 𝗄 ⁢ ( 𝗀 ⁢ 𝗁 ) 𝖿 ⁢ ( 𝗁 ⁢ 𝖺 ) ≈ 𝗁 𝖿 ⁢ ( 𝗁 ⁢ 𝖺 ) ⁢ 𝑥 ≈ 𝗁 ⁢ 𝑥

𝑥 ⁢ ( 𝖽𝗂𝖿𝖿 ⁢ ⟨ 𝛼 , 𝛽 ⟩ ⁢ 𝑥 ⁢ 𝑦 ) ≉ 𝑦 ⁢ ( 𝖽𝗂𝖿𝖿 ⁢ ⟨ 𝛼 , 𝛽 ⟩ ⁢ 𝑥 ⁢ 𝑦 ) ∨ 𝑥 ≈ 𝑦

Using a left-to-right LPO [BWW17] instance with precedence 𝗄 ≻ 𝗁 ≻ 𝗀 ≻ 𝖿 ≻ 𝖻 ≻ 𝖺 , this clause set is saturated w.r.t. the extensional purifying calculus when omitting purification. It also quickly saturates using the extensional nonpurifying calculus when omitting Sup inferences at variables. By contrast, the intensional calculi derive ⊥ , even without purification and without Sup inferences at variables, because of the less restrictive variable conditions.

This raises the question as to whether the intensional calculi actually need to purify or to perform Sup inferences at variables. We conjecture that omitting purification and Sup inferences at variables in the intensional calculi is complete when redundant clauses are kept but that it is incomplete in general.

We initially considered inference rules instead of axiom (Ext). However, we did not find a set of inference rules that is complete and leads to fewer inferences than (Ext). We considered the PosExt rule described above in combination with the following rule:

\prftree ⁢ [ 𝑟 ] ⁢ NegExt ⁢ 𝐶 ∨ 𝑠 ≉ 𝑡 ⁢ 𝐶 ∨ 𝑠 ⁢ ( 𝗌𝗄 ⁢ ⟨ 𝛼 ¯ ⟩ ⁢ 𝑥 ¯ 𝑛 ) ≉ 𝑡 ⁢ ( 𝗌𝗄 ⁢ ⟨ 𝛼 ¯ ⟩ ⁢ 𝑥 ¯ 𝑛 )

where 𝗌𝗄 is a fresh Skolem symbol and 𝛼 ¯ and 𝑥 ¯ 𝑛 are the type and term variables occurring free in the literal 𝑠 ≉ 𝑡 . However, these two rules do not suffice for a refutationally complete calculus, as the following example demonstrates:

Example \thethm.

Consider the clause set

𝖿 ⁢ 𝑥 ≈ 𝖺

𝗀 ⁢ 𝑥 ≈ 𝖺

𝗁 ⁢ 𝖿 ≈ 𝖻

𝗁 ⁢ 𝗀 ≉ 𝖻

Assuming that all four equations are oriented from left to right, this set is saturated w.r.t. the extensional calculi if (Ext) is replaced by NegExt; yet it is unsatisfiable in an extensional logic.

Example \thethm.

A significant advantage of our calculi over the use of standard superposition on applicatively encoded problems is the flexibility they offer in orienting equations. The following equations provide two definitions of addition on Peano numbers:

𝖺𝖽𝖽 𝖫 ⁢ 𝖹𝖾𝗋𝗈 ⁢ 𝑦

≈ 𝑦

𝖺𝖽𝖽 𝖱 ⁢ 𝑥 ⁢ 𝖹𝖾𝗋𝗈

≈ 𝑥

𝖺𝖽𝖽 𝖫 ⁢ ( 𝖲𝗎𝖼𝖼 ⁢ 𝑥 ) ⁢ 𝑦

≈ 𝖺𝖽𝖽 𝖫 ⁢ 𝑥 ⁢ ( 𝖲𝗎𝖼𝖼 ⁢ 𝑦 )

𝖺𝖽𝖽 𝖱 ⁢ 𝑥 ⁢ ( 𝖲𝗎𝖼𝖼 ⁢ 𝑦 )

≈ 𝖺𝖽𝖽 𝖱 ⁢ ( 𝖲𝗎𝖼𝖼 ⁢ 𝑥 ) ⁢ 𝑦

Let 𝖺𝖽𝖽 𝖫 ⁢ ( 𝖲𝗎𝖼𝖼 100 ⁢ 𝖹𝖾𝗋𝗈 ) ⁢ 𝗇 ≉ 𝖺𝖽𝖽 𝖱 ⁢ 𝗇 ⁢ ( 𝖲𝗎𝖼𝖼 100 ⁢ 𝖹𝖾𝗋𝗈 ) be the negated conjecture. With LPO, we can use a left-to-right comparison for 𝖺𝖽𝖽 𝖫 ’s arguments and a right-to-left comparison for 𝖺𝖽𝖽 𝖱 ’s arguments to orient all four equations from left to right. Then the negated conjecture can be simplified to 𝖲𝗎𝖼𝖼 100 ⁢ 𝗇 ≉ 𝖲𝗎𝖼𝖼 100 ⁢ 𝗇 by simplification (demodulation), and ⊥ can be derived with a single inference. If we use the applicative encoding instead, there is no instance of LPO or KBO that can orient both recursive equations from left to right. For at least one of the two sides of the negated conjecture, simplification is replaced by 100 Sup inferences, which is much less efficient, especially in the presence of additional axioms.

12.3. Soundness

To show the inferences’ soundness, we need the substitution lemma for our logic:

Lemma \thethm (Substitution lemma).

Let ℐ

( 𝒰 , 𝒥 𝗍𝗒 , 𝒥 , ℰ ) be a 𝜆 -free higher-order interpretation. Then

⟦ 𝜏 𝜌 ⟧ ℐ 𝗍𝗒 𝜉

⟦ 𝜏 ⟧ ℐ 𝗍𝗒 𝜉 ′ and ⟦ 𝑡 𝜌 ⟧ ℐ 𝜉

⟦ 𝑡 ⟧ ℐ 𝜉 ′

for all terms 𝑡 , all types 𝜏 , and all substitutions 𝜌 , where 𝜉 ′ ( 𝛼 )

⟦ 𝛼 𝜌 ⟧ ℐ 𝗍𝗒 𝜉 for all type variables 𝛼 and 𝜉 ′ ( 𝑥 )

⟦ 𝑥 𝜌 ⟧ ℐ 𝜉 for all term variables 𝑥 .

Proof.

First, we prove that ⟦ 𝜏 𝜌 ⟧ ℐ 𝗍𝗒 𝜉

⟦ 𝜏 ⟧ ℐ 𝗍𝗒 𝜉 ′ by induction on the structure of 𝜏 . If 𝜏

𝛼 is a type variable,

⟦ 𝛼 𝜌 ⟧ ℐ 𝗍𝗒 𝜉

𝜉 ′ ( 𝛼 )

⟦ 𝛼 ⟧ ℐ 𝗍𝗒 𝜉 ′

If 𝜏

𝜅 ⁢ ( 𝜐 ¯ ) for some type constructor 𝜅 and types 𝜐 ¯ ,

⟦ 𝜅 ( 𝜐 ¯ ) 𝜌 ⟧ ℐ 𝗍𝗒 𝜉

𝒥 𝗍𝗒 ( 𝜅 ) ( ⟦ 𝜐 ¯ 𝜌 ⟧ ℐ 𝗍𝗒 𝜉 )

IH 𝒥 𝗍𝗒 ( 𝜅 ) ( ⟦ 𝜐 ¯ ⟧ ℐ 𝗍𝗒 𝜉 ′ )

⟦ 𝜅 ( 𝜐 ¯ ) ⟧ ℐ 𝗍𝗒 𝜉 ′

Next, we prove ⟦ 𝑡 𝜌 ⟧ ℐ 𝜉

⟦ 𝑡 ⟧ ℐ 𝜉 ′ by structural induction on 𝑡 . If 𝑡

𝑦 , then by the definition of the denotation of a variable

⟦ 𝑦 𝜌 ⟧ ℐ 𝜉

𝜉 ′ ( 𝑦 )

⟦ 𝑦 ⟧ ℐ 𝜉 ′

If 𝑡

𝖿 ⁢ ⟨ 𝜏 ¯ ⟩ , then by the definition of the term denotation

⟦ 𝖿 ⟨ 𝜏 ¯ ⟩ 𝜌 ⟧ ℐ 𝜉

𝒥 ( 𝖿 , ⟦ 𝜏 ¯ 𝜌 ⟧ ℐ 𝗍𝗒 𝜉 )

𝒥 ( 𝖿 , ⟦ 𝜏 ¯ ⟧ ℐ 𝗍𝗒 𝜉 ′ )

⟦ 𝖿 ⟨ 𝜏 ¯ ⟩ ⟧ ℐ 𝜉 ′

If 𝑡

𝑢 ⁢ 𝑣 , then by the definition of the term denotation

⟦ ( 𝑢 𝑣 ) 𝜌 ⟧ ℐ 𝜉

ℰ 𝒰 1 , 𝒰 2 ( ⟦ 𝓊 𝜌 ⟧ ℐ 𝜉 ) ( ⟦ 𝓋 𝜌 ⟧ ℐ 𝜉 )

IH ℰ 𝒰 1 , 𝒰 2 ( ⟦ 𝓊 ⟧ ℐ 𝜉 ′ ) ( ⟦ 𝓋 ⟧ ℐ 𝜉 ′ )

⟦ 𝓊 𝓋 ⟧ ℐ 𝜉 ′

where 𝑢 is of type 𝜏 → 𝜐 , 𝑈 1

⟦ 𝜏 𝜌 ⟧ ℐ 𝗍𝗒 𝜉

⟦ 𝜏 ⟧ ℐ 𝗍𝗒 𝜉 ′ , and 𝑈 2

⟦ 𝜐 𝜌 ⟧ ℐ 𝗍𝗒 𝜉

⟦ 𝜐 ⟧ ℐ 𝗍𝗒 𝜉 ′ . ∎

Lemma \thethm.

If ℐ ⊧ 𝐶 for some interpretation ℐ and some clause 𝐶 , then ℐ ⊧ 𝐶 ⁢ 𝜌 for all substitutions 𝜌 .

Proof.

We need to show that 𝐶 ⁢ 𝜌 is true in ℐ for all valuations 𝜉 . Given a valuation 𝜉 , define 𝜉 ′ as in Lemma 12.3. Then, by Lemma 12.3, a literal in 𝐶 ⁢ 𝜌 is true in ℐ for 𝜉 if and only if the corresponding literal in 𝐶 is true in ℐ for 𝜉 ′ . There must be at least one such literal because ℐ ⊧ 𝐶 and hence 𝐶 is in particular true in ℐ for 𝜉 ′ . Therefore, 𝐶 ⁢ 𝜌 is true in ℐ for 𝜉 . ∎

Theorem \thethm (Soundness of the intensional calculi).

The inference rules Sup, ERes, EFact, and ArgCong are sound (even without the variable condition and the side conditions on order and eligibility).

Proof.

We fix an inference and an interpretation ℐ that is a model of the premises. We need to show that it is also a model of the conclusion.

From the definition of the denotation of a term, it is obvious that congruence holds at all subterms in our logic. By Lemma 12.3, ℐ is a model of the 𝜎 -instances of the premises as well, where 𝜎 is the substitution used for the inference. Fix a valuation 𝜉 . By making case distinctions on the truth in ℐ under 𝜉 of the literals of the 𝜎 -instances of the premises, using the conditions that 𝜎 is a unifier, and applying congruence, it follows that the conclusion is also true in ℐ under 𝜉 . ∎

Theorem \thethm (Soundness of the extensional calculi).

The inference rules Sup, ERes, EFact, ArgCong, and PosExt are sound w.r.t. extensional interpretations (even without the variable condition and the side conditions on order and eligibility).

Proof.

We only need to prove PosExt sound. For the other rules, we can proceed as in Theorem 12.3. By induction on the length of 𝑥 ¯ , it suffices to prove PosExt sound for one variable 𝑥 instead of a tuple 𝑥 ¯ . We fix an inference and an extensional interpretation ℐ that is a model of the premise 𝐶 ′ ∨ 𝑠 ⁢ 𝑥 ≈ 𝑠 ′ ⁢ 𝑥 . We need to show that it is also a model of the conclusion 𝐶 ′ ∨ 𝑠 ≈ 𝑠 ′ .

Let 𝜉 be a valuation. If 𝐶 ′ is true in ℐ under 𝜉 , the conclusion is clearly true as well. Otherwise 𝐶 ′ is false in ℐ under 𝜉 , and also under 𝜉 ⁢ [ 𝑥 ↦ 𝑎 ] for all 𝑎 because 𝑥 does not occur in 𝐶 ′ . Since the premise is true in ℐ , 𝑠 ⁢ 𝑥

𝑠 ′ ⁢ 𝑥 must be true in ℐ under 𝜉 ⁢ [ 𝑥 ↦ 𝑎 ] for all 𝑎 . Hence, for appropriate universes 𝑈 1 , 𝑈 2 , we have ℰ 𝒰 1 , 𝒰 2 ( ⟦ 𝓈 ⟧ ℐ 𝜉 ⁢ [ 𝓍 ↦ 𝒶 ] ) ( 𝒶 )

⟦ 𝓈 𝓍 ⟧ ℐ 𝜉 ⁢ [ 𝓍 ↦ 𝒶 ]

⟦ 𝓈 ′ 𝓍 ⟧ ℐ 𝜉 ⁢ [ 𝓍 ↦ 𝒶 ]

ℰ 𝒰 1 , 𝒰 2 ( ⟦ 𝓈 ′ ⟧ ℐ 𝜉 ⁢ [ 𝓍 ↦ 𝒶 ] ) ( 𝒶 ) . Since 𝑠 and 𝑠 ′ do not contain 𝑥 , ⟦ 𝑠 ⟧ ℐ 𝜉 ⁢ [ 𝑥 ↦ 𝑎 ] and ⟦ 𝑠 ′ ⟧ ℐ 𝜉 ⁢ [ 𝑥 ↦ 𝑎 ] do not depend on 𝑎 . Thus, ℰ 𝒰 1 , 𝒰 2 ( ⟦ 𝓈 ⟧ ℐ 𝜉 )

ℰ 𝒰 1 , 𝒰 2 ( ⟦ 𝓈 ′ ⟧ ℐ 𝜉 ) . Since ℐ is extensional, ℰ 𝒰 1 , 𝒰 2 is injective and hence ⟦ 𝑠 ⟧ ℐ 𝜉

⟦ 𝑠 ′ ⟧ ℐ 𝜉 . It follows that 𝑠 ≈ 𝑠 ′ is true in ℐ under 𝜉 , and so is the entire conclusion of the inference. ∎

A problem expressed in higher-order logic must be transformed into clausal normal form before the calculi can be applied. This process works as in the first-order case, except for skolemization. The issue is that skolemization, when performed naively, is unsound for higher-order logic with a Henkin semantics [Mil87,

6] , because it introduces new functions that can be used to instantiate variables.

The core of this article is not affected by this because the problems are given in clausal form. For the implementation, we claim soundness only w.r.t. models that satisfy the axiom of choice, which is the semantics mandated by the TPTP THF format [SBBT09]. By contrast, refutational completeness holds w.r.t. arbitrary models as defined above. Alternatively, skolemization can be made sound by introducing mandatory arguments as described by Miller [Mil87,

6] and in the conference version of this article [BBCW18].

This issue also affects axiom (Ext) because it contains the Skolem symbol 𝖽𝗂𝖿𝖿 . As a consequence, (Ext) does not hold in all extensional interpretations. The extensional calculi are thus only sound w.r.t. interpretations in which (Ext) holds. However, we can prove that (Ext) is compatible with our logic:

Theorem \thethm. Axiom (Ext) is satisfiable. Proof. For a given signature, let ( 𝒰 , 𝒥 𝗍𝗒 , 𝒥 , ℰ ) be an Herbrand interpretation. That is, we define 𝒰 to contain the set 𝒰 𝜏 of all terms of type 𝜏 for each ground type 𝜏 , we define 𝒥 𝗍𝗒 by 𝒥 𝗍𝗒 ⁢ ( 𝜅 ) ⁢ ( 𝜏 ¯ )

𝜅 ⁢ ( 𝜏 ¯ ) , we define 𝒥 by 𝒥 ⁢ ( 𝖿 , 𝒰 𝜏 ¯ )

𝖿 ⁢ ⟨ 𝜏 ¯ ⟩ , and we define ℰ by ℰ 𝒰 𝜏 , 𝒰 𝜐 ⁢ ( 𝖿 ) ⁢ ( 𝖺 )

𝖿 ⁢ 𝖺 . Then ℰ 𝒰 𝜏 , 𝒰 𝜐 is clearly injective and hence ℐ is extensional. To show that ℐ ⊧ ( Ext ) , we need to show that (Ext) is true under all valuations. Let 𝜉 be a valuation. If 𝑥 ≈ 𝑦 is true under 𝜉 , (Ext) is also true. Otherwise 𝑥 ≈ 𝑦 is false under 𝜉 , and hence 𝜉 ⁢ ( 𝑥 ) ≠ 𝜉 ⁢ ( 𝑦 ) . Then we have ⟦ 𝑥 ( 𝖽𝗂𝖿𝖿 ⟨ 𝛼 , 𝛽 ⟩ 𝑥 𝑦 ) ⟧ ℐ 𝜉

( 𝜉 ( 𝑥 ) ) ( 𝖽𝗂𝖿𝖿 ⟨ 𝛼 , 𝛽 ⟩ ( 𝜉 ( 𝑥 ) ) ( 𝜉 ( 𝑦 ) ) ) ≠ ( 𝜉 ( 𝑦 ) ) ( 𝖽𝗂𝖿𝖿 ⟨ 𝛼 , 𝛽 ⟩ ( 𝜉 ( 𝑥 ) ) ( 𝜉 ( 𝑦 ) ) )

⟦ 𝑦 ( 𝖽𝗂𝖿𝖿 ⟨ 𝛼 , 𝛽 ⟩ 𝑥 𝑦 ) ⟧ ℐ 𝜉 . Therefore, 𝑥 ⁢ ( 𝖽𝗂𝖿𝖿 ⁢ ⟨ 𝛼 , 𝛽 ⟩ ⁢ 𝑥 ⁢ 𝑦 ) ≉ 𝑦 ⁢ ( 𝖽𝗂𝖿𝖿 ⁢ ⟨ 𝛼 , 𝛽 ⟩ ⁢ 𝑥 ⁢ 𝑦 ) is true in ℐ under 𝜉 and so is (Ext). ∎ 14.1. The Redundancy Criterion

For our calculi, a redundant clause cannot simply be defined as a clause whose ground instances are entailed by smaller ( ≺ ) ground instances of existing clauses, because this would make all ArgCong inferences redundant. Our solution is to base the redundancy criterion on a weaker ground logic—ground monomorphic first-order logic—in which argument congruence does not hold. This logic also plays a central role in our refutational completeness proof.

We employ an encoding ℱ to translate ground 𝜆 -free higher-order terms into ground first-order terms. It indexes each symbol occurrence with its type arguments and its term argument count. Thus, ℱ ⁢ ( 𝖿 )

𝖿 0 , ℱ ⁢ ( 𝖿 ⁢ 𝖺 )

𝖿 1 ⁢ ( 𝖺 0 ) , and ℱ ⁢ ( 𝗀 ⁢ ⟨ 𝜅 ⟩ )

𝗀 0 𝜅 . This is enough to disable argument congruence; for example, { 𝖿 ≈ 𝗁 , 𝖿 ⁢ 𝖺 ≉ 𝗁 ⁢ 𝖺 } is unsatisfiable, whereas its encoding { 𝖿 0 ≈ 𝗁 0 , 𝖿 1 ⁢ ( 𝖺 0 ) ≉ 𝗁 1 ⁢ ( 𝖺 0 ) } is satisfiable. For clauses built from fully applied ground terms, the two logics are isomorphic, as we would expect from a graceful generalization.

Given a 𝜆 -free higher-order signature ( Σ 𝗍𝗒 , Σ ) , we define a first-order signature ( Σ 𝗍𝗒 , Σ GF ) as follows. The type constructors Σ 𝗍𝗒 are the same in both signatures, but → is uninterpreted in first-order logic. For each symbol 𝖿 : 𝛱 𝛼 ¯ 𝑚 . 𝜏 1 → ⋯ → 𝜏 𝑛 → 𝜏 in Σ , where 𝜏 is not functional, we introduce symbols 𝖿 𝑙 𝜐 ¯ 𝑚 ∈ Σ GF with argument types 𝜏 ¯ 𝑙 ⁢ 𝜎 and return type ( 𝜏 𝑙 + 1 → ⋯ → 𝜏 𝑛 → 𝜏 ) ⁢ 𝜎 , where 𝜎

{ 𝛼 ¯ 𝑚 ↦ 𝜐 ¯ 𝑚 } , for each tuple of ground types 𝜐 ¯ 𝑚 and each 𝑙 ∈ { 0 , … , 𝑛 } . For example, let Σ

{ 𝖺 : 𝜅 , 𝗀 : 𝜅 → 𝜅 → 𝜅 } . The corresponding first-order signature is Σ GF

{ 𝖺 0 : 𝜅 , 𝗀 0 : 𝜅 → 𝜅 → 𝜅 , 𝗀 1 : 𝜅 ⇒ 𝜅 → 𝜅 , 𝗀 2 : 𝜅 2 ⇒ 𝜅 } where 𝖿 : 𝜏 ¯ ⇒ 𝜐 denotes a first-order function symbol 𝖿 with argument types 𝜏 ¯ and return type 𝜐 , and → is an uninterpreted binary type constructor. The term ℱ ⁢ ( 𝗀 ⁢ 𝖺 ⁢ 𝖺 )

𝗀 2 ⁢ ( 𝖺 0 , 𝖺 0 ) has type 𝜅 , and ℱ ⁢ ( 𝗀 ⁢ 𝖺 )

𝗀 1 ⁢ ( 𝖺 0 ) has type 𝜅 → 𝜅 .

Thus, we consider three levels of logics: the 𝜆 -free higher-order level H over a given signature ( Σ 𝗍𝗒 , Σ ) , the ground 𝜆 -free higher-order level GH , corresponding to H ’s ground fragment, and the ground monomorphic first-order level GF over the signature ( Σ 𝗍𝗒 , Σ GF ) defined above. We use 𝒯 H , 𝒯 GH , and 𝒯 GF to denote the respective sets of terms, 𝒯 ⁢ 𝓎 H , 𝒯 ⁢ 𝓎 GH , and 𝒯 ⁢ 𝓎 GF to denote the respective sets of types, and 𝒞 H , 𝒞 GH , and 𝒞 GF to denote the respective sets of clauses. In the purifying calculi, we exceptionally let 𝒞 H denote the set of purified clauses. Each of the three levels has an entailment relation ⊧ . A clause set 𝑁 1 entails a clause set 𝑁 2 , denoted 𝑁 1 ⊧ 𝑁 2 , if any model of 𝑁 1 is also a model of 𝑁 2 . On H and GH , we use 𝜆 -free higher-order models for the intensional calculi and extensional 𝜆 -free higher-order models for the extensional calculi; on GF , we use first-order models. This machinery may seem excessive, but it is essential to define redundancy of clauses and inferences properly, and it will play an important role in the refutational completeness proof (Section 16).

The three levels are connected by two functions, 𝒢 and ℱ :

Definition \thethm (Grounding function 𝒢 on terms and clauses).

The grounding function 𝒢 maps terms 𝑡 ∈ 𝒯 H to the set of their ground instances—i.e., the set of all 𝑡 ⁢ 𝜃 ∈ 𝒯 GH where 𝜃 is a substitution. It also maps clauses 𝐶 ∈ 𝒞 H to the set of their ground instances—i.e., the set of all 𝐶 ⁢ 𝜃 ∈ 𝒞 GH where 𝜃 is a substitution.

Definition \thethm (Encoding ℱ on terms and clauses).

The encoding ℱ : 𝒯 GH → 𝒯 GF is recursively defined as ℱ ⁢ ( 𝖿 ⁢ ⟨ 𝜐 ¯ 𝓂 ⟩ ⁢ 𝓊 ¯ 𝓁 )

𝖿 𝓁 𝜐 ¯ 𝓂 ⁢ ( ℱ ⁢ ( 𝓊 ¯ 𝓁 ) ) . The encoding ℱ is extended to map from 𝒞 GH to 𝒞 GF by mapping each literal and each side of a literal individually.

The encoding ℱ is bijective with inverse ℱ − 1 . Using ℱ − 1 , the clause order ≻ on 𝒯 GH can be transferred to 𝒯 GF by defining 𝑡 ≻ 𝑠 as equivalent to ℱ − 1 ⁢ ( 𝓉 ) ≻ ℱ − 1 ⁢ ( 𝓈 ) . The property that ≻ on clauses is the multiset extension of ≻ on literals, which in turn is the multiset extension of ≻ on terms, is maintained because ℱ − 1 maps the multiset representations elementwise.

Schematically, the three levels are connected as follows:

{tikzpicture} [ 𝑙 𝑒 𝑣 𝑒 𝑙 𝑑 𝑖 𝑠 𝑡 𝑎 𝑛 𝑐 𝑒

13 𝑒 𝑚 ] \node [ 𝑎 𝑙 𝑖 𝑔 𝑛

𝑐 𝑒 𝑛 𝑡 𝑒 𝑟 ] H ℎ 𝑖 𝑔 ℎ 𝑒 𝑟 − 𝑜 𝑟 𝑑 𝑒 𝑟 [ 𝑔 𝑟 𝑜 𝑤 ′

𝑟 𝑖 𝑔 ℎ 𝑡 ] 𝑐 ℎ 𝑖 𝑙 𝑑 𝑛 𝑜 𝑑 𝑒 [ 𝑎 𝑙 𝑖 𝑔 𝑛

𝑐 𝑒 𝑛 𝑡 𝑒 𝑟 ] GH 𝑔 𝑟 𝑜 𝑢 𝑛 𝑑 ℎ 𝑖 𝑔 ℎ 𝑒 𝑟 − 𝑜 𝑟 𝑑 𝑒 𝑟 𝑐 ℎ 𝑖 𝑙 𝑑 𝑛 𝑜 𝑑 𝑒 [ 𝑎 𝑙 𝑖 𝑔 𝑛

𝑐 𝑒 𝑛 𝑡 𝑒 𝑟 ] GF 𝑔 𝑟 𝑜 𝑢 𝑛 𝑑 𝑓 𝑖 𝑟 𝑠 𝑡 − 𝑜 𝑟 𝑑 𝑒 𝑟 𝑒 𝑑 𝑔 𝑒 𝑓 𝑟 𝑜 𝑚 𝑝 𝑎 𝑟 𝑒 𝑛 𝑡 [ −

] 𝑛 𝑜 𝑑 𝑒 [ 𝑎 𝑏 𝑜 𝑣 𝑒 ] ℱ 𝑒 𝑑 𝑔 𝑒 𝑓 𝑟 𝑜 𝑚 𝑝 𝑎 𝑟 𝑒 𝑛 𝑡 [ −

] 𝑛 𝑜 𝑑 𝑒 [ 𝑎 𝑏 𝑜 𝑣 𝑒 ] 𝒢 ;

Crucially, green subterms in 𝒯 GH correspond to subterms in 𝒯 GF (Lemma 14.1), whereas nongreen subterms in 𝒯 GH are not subterms at all in 𝒯 GF . For example, 𝖺 is a green subterm of 𝖿 ⁢ 𝖺 , and correspondingly ℱ ⁢ ( 𝖺 )

𝖺 0 is a subterm of ℱ ⁢ ( 𝖿 ⁢ 𝖺 )

𝖿 1 ⁢ ( 𝖺 0 ) . On the other hand, 𝖿 is not a green subterm of 𝖿 ⁢ 𝖺 , and correspondingly ℱ ⁢ ( 𝖿 )

𝖿 0 is not a subterm of ℱ ⁢ ( 𝖿 ⁢ 𝖺 )

𝖿 1 ⁢ ( 𝖺 0 ) .

To state the correspondence between green subterms in 𝒯 GH and subterms in 𝒯 GF explicitly, we define positions of green subterms as follows. A term 𝑠 ∈ 𝒯 H is a green subterm at position 𝜖 of 𝑠 . If a term 𝑠 ∈ 𝒯 H is a green subterm at position 𝑝 of 𝑢 𝑖 for some 1 ≤ 𝑖 ≤ 𝑛 , then 𝑠 is a green subterm at position 𝑖 . 𝑝 of 𝖿 ⁢ ⟨ 𝜏 ¯ ⟩ ⁢ 𝑢 ¯ 𝑛 and of 𝑥 ⁢ 𝑢 ¯ 𝑛 . For 𝒯 GF , positions are defined as usual in first-order logic.

Lemma \thethm.

Let 𝑠 , 𝑡 ∈ 𝒯 GH . We have ℱ ⁢ ( 𝓉 ⁢

⁢ 𝓈 ⁢

𝓅 )

ℱ ⁢ ( 𝓉 ) ⁢ [ ℱ ⁢ ( 𝓈 ) ] 𝓅 . In other words, 𝑠 is a green subterm of 𝑡 at position 𝑝 if and only if ℱ ⁢ ( 𝓈 ) is a subterm of ℱ ⁢ ( 𝓉 ) at position 𝑝 .

Proof.

By induction on 𝑝 . If 𝑝

𝜀 , then 𝑠

𝑡 ⁢ [ 𝑠 ] 𝑝 . Hence ℱ ⁢ ( 𝓉 ⁢ [ 𝓈 ] 𝓅 )

ℱ ⁢ ( 𝓈 )

ℱ ⁢ ( 𝓉 ) ⁢

⁢ ℱ ⁢ ( 𝓈 ) ⁢

𝓅 . If 𝑝

𝑖 . 𝑝 ′ , then 𝑡 ⁢ [ 𝑠 ] 𝑝

𝖿 ⁢ ⟨ 𝜏 ¯ ⟩ ⁢ 𝑢 ¯ 𝑛 with 𝑢 𝑖

𝑢 𝑖 ⁢ [ 𝑠 ] 𝑝 ′ . Applying ℱ , we obtain by the induction hypothesis that ℱ ⁢ ( 𝓊 𝒾 )

ℱ ⁢ ( 𝓊 𝒾 ) ⁢

⁢ ℱ ⁢ ( 𝓈 ) ⁢

𝓅 ′ . Therefore, ℱ ⁢ ( 𝓉 ⁢ [ 𝓈 ] 𝓅 )

𝖿 𝓃 𝜏 ¯ ⁢ ( ℱ ⁢ ( 𝓊 1 ) , … , ℱ ⁢ ( 𝓊 𝒾 − 1 ) , ℱ ⁢ ( 𝓊 𝒾 ) ⁢

⁢ ℱ ⁢ ( 𝓈 ) ⁢

𝓅 ′ , ℱ ⁢ ( 𝓊 𝒾 + 1 ) , … , ℱ ⁢ ( 𝓊 𝓃 ) ) . It follows that ℱ ⁢ ( 𝓉 ⁢ [ 𝓈 ] 𝓅 )

ℱ ⁢ ( 𝓉 ) ⁢

⁢ ℱ ⁢ ( 𝓈 ) ⁢

𝓅 . ∎

Corollary \thethm.

Given 𝑠 , 𝑡 ∈ 𝒯 GF , we have ℱ − 1 ⁢ ( 𝓉 ⁢ [ 𝓈 ] 𝓅 )

ℱ − 1 ⁢ ( 𝓉 ) ⁢

⁢ ℱ − 1 ⁢ ( 𝓈 ) ⁢

𝓅 .

Lemma \thethm.

Well-foundedness, totality, compatibility with contexts, and the subterm property hold for ≻ on 𝒯 GF .

Proof.

Compatibility with contexts: We must show that 𝑠 ≻ 𝑠 ′ implies 𝑡 ⁢ [ 𝑠 ] 𝑝 ≻ 𝑡 ⁢ [ 𝑠 ′ ] 𝑝 for terms 𝑡 , 𝑠 , 𝑠 ′ ∈ 𝒯 GF . Assuming 𝑠 ≻ 𝑠 ′ , we have ℱ − 1 ⁢ ( 𝓈 ) ≻ ℱ − 1 ⁢ ( 𝓈 ′ ) . By compatibility with green contexts on 𝒯 GH , it follows that ℱ − 1 ⁢ ( 𝓉 ) ⁢

⁢ ℱ − 1 ⁢ ( 𝓈 ) ⁢

𝓅 ≻ ℱ − 1 ⁢ ( 𝓉 ) ⁢

⁢ ℱ − 1 ⁢ ( 𝓈 ′ ) ⁢

𝓅 . By Corollary 14.1, we have 𝑡 ⁢ [ 𝑠 ] 𝑝 ≻ 𝑡 ⁢ [ 𝑠 ′ ] 𝑝 .

Well-foundedness: Assume that there exists an infinite descending chain 𝑡 1 ≻ 𝑡 2 ≻ ⋯ in 𝒯 GF . By applying ℱ − 1 , we then obtain the chain ℱ − 1 ⁢ ( 𝓉 1 ) ≻ ℱ − 1 ⁢ ( 𝓉 2 ) ≻ ⋯ in 𝒯 GH , contradicting well-foundedness on 𝒯 GH .

Totality: Let 𝑠 , 𝑡 ∈ 𝒯 GF . Then ℱ − 1 ⁢ ( 𝓉 ) and ℱ − 1 ⁢ ( 𝓈 ) must be comparable by totality in 𝒯 GH . Hence, 𝑡 and 𝑠 are comparable.

Subterm property: By Corollary 14.1 and the subterm property on 𝒯 GH , we have ℱ − 1 ⁢ ( 𝓉 ⁢ [ 𝓈 ] 𝓅 )

ℱ − 1 ⁢ ( 𝓉 ) ⁢

⁢ ℱ − 1 ⁢ ( 𝓈 ) ⁢

𝓅 ≻ ℱ − 1 ⁢ ( 𝓈 ) . Hence, 𝑡 ⁢ [ 𝑠 ] 𝑝 ≻ 𝑠 . ∎

In standard superposition, redundancy relies on the entailment relation ⊧ on ground clauses. We will define redundancy on GH and H in the same way, but using GF ’s entailment relation. This notion of redundancy gracefully generalizes the first-order notion, without making all ArgCong inferences redundant.

The standard redundancy criterion for standard superposition cannot justify subsumption deletion. Following Waldmann et al. [WTRB20], we incorporate subsumption into the redundancy criterion. A clause 𝐶 subsumes 𝐷 if there exists a substitution 𝜎 such that 𝐶 ⁢ 𝜎 ⊆ 𝐷 . A clause 𝐶 strictly subsumes 𝐷 if 𝐶 subsumes 𝐷 but 𝐷 does not subsume 𝐶 . Let ⊐ stand for “is strictly subsumed by”. Using the applicative encoding, it is easy to show that ⊐ is well founded because strict subsumption is well founded in first-order logic.

We define the sets of redundant clauses w.r.t. a given clause set as follows:

–

Given 𝐶 ∈ 𝒞 GF and 𝑁 ⊆ 𝒞 GF , let 𝐶 ∈ 𝐺𝐹𝑅𝑒𝑑 C ⁢ ( 𝑁 ) if { 𝐷 ∈ 𝑁 ∣ 𝐷 ≺ 𝐶 } ⊧ 𝐶 .

–

Given 𝐶 ∈ 𝒞 GH and 𝑁 ⊆ 𝒞 GH , let 𝐶 ∈ 𝐺𝐻𝑅𝑒𝑑 C ⁢ ( 𝑁 ) if ℱ ⁢ ( 𝒞 ) ∈ 𝐺𝐹𝑅𝑒𝑑 C ⁢ ( ℱ ⁢ ( 𝒩 ) ) .

–

Given 𝐶 ∈ 𝒞 H and 𝑁 ⊆ 𝒞 H , let 𝐶 ∈ 𝐻𝑅𝑒𝑑 C ⁢ ( 𝑁 ) if for every 𝐷 ∈ 𝒢 ⁢ ( 𝒞 ) , we have 𝐷 ∈ 𝐺𝐻𝑅𝑒𝑑 C ⁢ ( 𝒢 ⁢ ( 𝒩 ) ) or there exists 𝐶 ′ ∈ 𝑁 such that 𝐶 ⁢ ⊐ ⁢ 𝐶 ′ and 𝐷 ∈ 𝒢 ⁢ ( 𝒞 ′ ) .

Along with the three levels of logics, we consider three inference systems: 𝐻𝐼𝑛𝑓 , 𝐺𝐻𝐼𝑛𝑓 and 𝐺𝐹𝐼𝑛𝑓 . 𝐻𝐼𝑛𝑓 is one of the four variants of the inference system described in

12.1. For uniformity, we regard axiom (Ext) as a premise-free inference rule Ext whose conclusion is (Ext). In the purifying calculi, the conclusion of Ext must be purified. 𝐺𝐻𝐼𝑛𝑓 consists of all Sup, ERes, and EFact inferences from 𝐻𝐼𝑛𝑓 whose premises and conclusion are ground, a premise-free rule GExt whose infinitely many conclusions are the ground instances of (Ext), and the following ground variant of ArgCong:

\prftree ⁢ [ 𝑟 ] ⁢ GArgCong ⁢ 𝐶 ′ ∨ 𝑠 ≈ 𝑠 ′ ⁢ 𝐶 ′ ∨ 𝑠 ⁢ 𝑢 ¯ 𝑛 ≈ 𝑠 ′ ⁢ 𝑢 ¯ 𝑛

where 𝑠 ≈ 𝑠 ′ is strictly eligible in the premise and 𝑢 ¯ 𝑛 is a nonempty tuple of ground terms. 𝐺𝐹𝐼𝑛𝑓 contains all Sup, ERes, and EFact inferences from 𝐺𝐻𝐼𝑛𝑓 translated by ℱ . It coincides exactly with standard first-order superposition. Given a Sup, ERes, or EFact inference 𝜄 ∈ 𝐺𝐻𝐼𝑛𝑓 , let ℱ ⁢ ( 𝜄 ) denote the corresponding inference in 𝐺𝐹𝐼𝑛𝑓 .

Each of the three inference systems is parameterized by a selection function. For 𝐻𝐼𝑛𝑓 , we globally fix a selection function 𝐻𝑆𝑒𝑙 . For 𝐺𝐻𝐼𝑛𝑓 and 𝐺𝐹𝐼𝑛𝑓 , we need to consider different selection functions 𝐺𝐻𝑆𝑒𝑙 and 𝐺𝐹𝑆𝑒𝑙 . We write 𝐺𝐻𝐼𝑛𝑓 𝐺𝐻𝑆𝑒𝑙 for 𝐺𝐻𝐼𝑛𝑓 and 𝐺𝐹𝐼𝑛𝑓 𝐺𝐹𝑆𝑒𝑙 for 𝐺𝐹𝐼𝑛𝑓 to make the dependency on the respective selection functions explicit. Let 𝒢 ⁢ ( 𝐻𝑆𝑒𝑙 ) denote the set of all selection functions on 𝒞 GH such that for each clause in 𝐶 ∈ 𝒞 GH , there exists a clause 𝐷 ∈ 𝒞 H with 𝐶 ∈ 𝒢 ⁢ ( 𝒟 ) and corresponding selected literals. For each selection function 𝐺𝐻𝑆𝑒𝑙 on 𝒞 GH , via the bijection ℱ , we obtain a corresponding selection function on 𝒞 GF , which we denote by ℱ ⁢ ( 𝐺𝐻𝑆𝑒𝑙 ) .

Notation \thethm.

Given an inference 𝜄 , we write 𝑝𝑟𝑒𝑚𝑠 ⁢ ( 𝜄 ) for the tuple of premises, 𝑚𝑝𝑟𝑒𝑚 ⁢ ( 𝜄 ) for the main (i.e., rightmost) premise, 𝑝𝑟𝑒𝑐𝑜𝑛𝑐𝑙 ⁢ ( 𝜄 ) for the conclusion before purification, and 𝑐𝑜𝑛𝑐𝑙 ⁢ ( 𝜄 ) for the conclusion after purification. For the nonpurifying calculi, 𝑝𝑟𝑒𝑐𝑜𝑛𝑐𝑙 ⁢ ( 𝜄 )

𝑐𝑜𝑛𝑐𝑙 ⁢ ( 𝜄 ) simply denotes the conclusion.

Definition \thethm (Encoding ℱ on inferences).

Given a Sup, ERes, or EFact inference 𝜄 ∈ 𝐺𝐻𝐼𝑛𝑓 , let ℱ ⁢ ( 𝜄 ) ∈ 𝐺𝐹𝐼𝑛𝑓 denote the inference defined by 𝑝𝑟𝑒𝑚𝑠 ⁢ ( ℱ ⁢ ( 𝜄 ) )

ℱ ⁢ ( 𝑝𝑟𝑒𝑚𝑠 ⁢ ( 𝜄 ) ) and 𝑐𝑜𝑛𝑐𝑙 ⁢ ( ℱ ⁢ ( 𝜄 ) )

ℱ ⁢ ( 𝑐𝑜𝑛𝑐𝑙 ⁢ ( 𝜄 ) ) .

Definition \thethm (Grounding function 𝒢 on inferences).

Given a selection function 𝐺𝐻𝑆𝑒𝑙 ∈ 𝒢 ⁢ ( 𝐻𝑆𝑒𝑙 ) , and a non-PosExt inference 𝜄 ∈ 𝐻𝐼𝑛𝑓 , we define the set 𝒢 𝐺𝐻𝑆𝑒𝑙 ⁢ ( 𝜄 ) of ground instances of 𝜄 to be all inferences 𝜄 ′ ∈ 𝐺𝐻𝐼𝑛𝑓 𝐺𝐻𝑆𝑒𝑙 such that 𝑝𝑟𝑒𝑚𝑠 ⁢ ( 𝜄 ′ )

𝑝𝑟𝑒𝑚𝑠 ⁢ ( 𝜄 ) ⁢ 𝜃 and 𝑐𝑜𝑛𝑐𝑙 ⁢ ( 𝜄 ′ )

𝑝𝑟𝑒𝑐𝑜𝑛𝑐𝑙 ⁢ ( 𝜄 ) ⁢ 𝜃 for some grounding substitution 𝜃 . For PosExt inferences 𝜄 , which cannot be grounded, we let 𝒢 𝐺𝐻𝑆𝑒𝑙 ⁢ ( 𝜄 )

𝑢𝑛𝑑𝑒𝑓 .

This will map Sup to Sup, EFact to EFact, ERes to ERes, Ext to GExt, and ArgCong to GArgCong inferences, but it is also possible that 𝒢 𝐺𝐻𝑆𝑒𝑙 ⁢ ( 𝜄 ) is the empty set for some inferences 𝜄 .

We define the sets of redundant inferences w.r.t. a given clause set as follows:

–

Given 𝜄 ∈ 𝐺𝐹𝐼𝑛𝑓 𝐺𝐹𝑆𝑒𝑙 and 𝑁 ⊆ 𝒞 GF , let 𝜄 ∈ 𝐺𝐹𝑅𝑒𝑑 I 𝐺𝐹𝑆𝑒𝑙 ⁢ ( 𝑁 ) if 𝑝𝑟𝑒𝑚𝑠 ⁢ ( 𝜄 ) ∩ 𝐺𝐹𝑅𝑒𝑑 C ⁢ ( 𝑁 ) ≠ ∅ or { 𝐷 ∈ 𝑁 ∣ 𝐷 ≺ 𝑚𝑝𝑟𝑒𝑚 ⁢ ( 𝜄 ) } ⊧ 𝑐𝑜𝑛𝑐𝑙 ⁢ ( 𝜄 ) .

–

Given 𝜄 ∈ 𝐺𝐻𝐼𝑛𝑓 𝐺𝐻𝑆𝑒𝑙 and 𝑁 ⊆ 𝒞 GH , let 𝜄 ∈ 𝐺𝐻𝑅𝑒𝑑 I 𝐺𝐻𝑆𝑒𝑙 ⁢ ( 𝑁 ) if

–

𝜄 is not a GArgCong or GExt inference and ℱ ⁢ ( 𝜄 ) ∈ 𝐺𝐹𝑅𝑒𝑑 I ℱ ⁢ ( 𝐺𝐻𝑆𝑒𝑙 ) ⁢ ( ℱ ⁢ ( 𝒩 ) ) ; or

–

the calculus is nonpurifying and 𝜄 is a GArgCong or GExt inference and 𝑐𝑜𝑛𝑐𝑙 ⁢ ( 𝜄 ) ∈ 𝑁 ∪ 𝐺𝐻𝑅𝑒𝑑 C ⁢ ( 𝑁 ) ; or

–

the calculus is purifying and 𝜄 is a GArgCong or GExt inference and ℱ ⁢ ( 𝒩 ) ⊧ ℱ ⁢ ( 𝑐𝑜𝑛𝑐𝑙 ⁢ ( 𝜄 ) ) .

–

Given 𝜄 ∈ 𝐻𝐼𝑛𝑓 and 𝑁 ⊆ 𝒞 H , let 𝜄 ∈ 𝐻𝑅𝑒𝑑 I ⁢ ( 𝑁 ) if

–

𝜄 is not a PosExt inference and 𝒢 𝐺𝐻𝑆𝑒𝑙 ⁢ ( 𝜄 ) ⊆ 𝐺𝐻𝑅𝑒𝑑 I ⁢ ( 𝒢 ⁢ ( 𝒩 ) ) for all 𝐺𝐻𝑆𝑒𝑙 ∈ 𝒢 ⁢ ( 𝐻𝑆𝑒𝑙 ) ; or

–

𝜄 is a PosExt inference and 𝒢 ⁢ ( 𝑐𝑜𝑛𝑐𝑙 ⁢ ( 𝜄 ) ) ⊆ 𝒢 ⁢ ( 𝒩 ) ∪ 𝐺𝐻𝑅𝑒𝑑 C ⁢ ( 𝒢 ⁢ ( 𝒩 ) ) .

Occasionally, we omit the selection function in the notation when it is irrelevant. A clause set 𝑁 is saturated w.r.t. an inference system and a redundancy criterion ( 𝑅𝑒𝑑 I , 𝑅𝑒𝑑 C ) if every inference from clauses in 𝑁 is in 𝑅𝑒𝑑 I ⁢ ( 𝑁 ) .

15.1. Simplification Rules

The redundancy criterion ( 𝐻𝑅𝑒𝑑 I , 𝐻𝑅𝑒𝑑 C ) is strong enough to support most of the simplification rules implemented in Schulz’s first-order prover E [Sch02, Sections 2.3.1 and 2.3.2], some only with minor adaptions. Deletion of duplicated literals, deletion of resolved literals, syntactic tautology deletion, negative simplify-reflect, and clause subsumption adhere to our redundancy criterion.

Positive simplify-reflect and equality subsumption are supported by our criterion if they are applied in green contexts. Semantic tautology deletion can be applied as well, but we must use the entailment relation of the GF level—i.e., only rewriting in green contexts can be used to establish the entailment. Similarly, rewriting of positive and negative literals (demodulation) can only be applied in green contexts. Moreover, for positive literals, the rewriting clause must be smaller than the rewritten clause—a condition that is also necessary with the standard first-order redundancy criterion but not always fulfilled by Schulz’s rule. As for destructive equality resolution, even in first-order logic the rule cannot be justified with the standard redundancy criterion, and it is unclear whether it preserves refutational completeness.

Refutational Completeness

Besides soundness, the most important property of the four calculi introduced in

12.1 is refutational completeness. We will prove the static and dynamic refutational completeness of 𝐻𝐼𝑛𝑓 w.r.t. ( 𝐻𝑅𝑒𝑑 I , 𝐻𝑅𝑒𝑑 C ) , which is defined as follows:

Definition \thethm (Static refutational completeness).

Let 𝐼𝑛𝑓 be an inference system and let ( 𝑅𝑒𝑑 I , 𝑅𝑒𝑑 C ) be a redundancy criterion. The inference system 𝐼𝑛𝑓 is called statically refutationally complete w.r.t. ( 𝑅𝑒𝑑 I , 𝑅𝑒𝑑 C ) if we have 𝑁 ⊧ ⊥ if and only if ⊥ ∈ 𝑁 for every clause set 𝑁 that is saturated w.r.t. 𝐼𝑛𝑓 and 𝑅𝑒𝑑 I .

Definition \thethm (Dynamic refutational completeness).

Let 𝐼𝑛𝑓 be an inference system and let ( 𝑅𝑒𝑑 I , 𝑅𝑒𝑑 C ) be a redundancy criterion. Let ( 𝑁 𝑖 ) 𝑖 be a finite or infinite sequence over sets of clauses. Such a sequence is called a derivation if 𝑁 𝑖 ∖ 𝑁 𝑖 + 1 ⊆ 𝑅𝑒𝑑 C ⁢ ( 𝑁 𝑖 + 1 ) for all 𝑖 . It is called fair if all 𝐼𝑛𝑓 -inferences from clauses in the limit inferior ⋃ 𝑖 ⋂ 𝑗 ≥ 𝑖 𝑁 𝑗 are contained in ⋃ 𝑖 𝑅𝑒𝑑 I ⁢ ( 𝑁 𝑖 ) . The inference system 𝐼𝑛𝑓 is called dynamically refutationally complete w.r.t. ( 𝑅𝑒𝑑 C , 𝑅𝑒𝑑 I ) if for every fair derivation ( 𝑁 𝑖 ) 𝑖 such that 𝑁 0 ⊧ ⊥ , we have ⊥ ∈ 𝑁 𝑖 for some 𝑖 .

To circumvent the term order’s potential nonmonotonicity, our Sup inference rule only considers green subterms. This is reflected in our proof by the reliance on the GF level introduced in

14.1. The equation 𝗀 0 ≈ 𝖿 0 ∈ 𝒞 GF , which corresponds to the equation 𝗀 ≈ 𝖿 ∈ 𝒞 GH , cannot be used directly to rewrite the clause 𝗀 1 ⁢ ( 𝖺 0 ) ≉ 𝖿 1 ⁢ ( 𝖺 0 ) ∈ 𝒞 GF , which corresponds to 𝗀 ⁢ 𝖺 ≉ 𝖿 ⁢ 𝖺 ∈ 𝒞 GH . Instead, we first need to apply ArgCong to derive 𝗀 ⁢ 𝑥 ≈ 𝖿 ⁢ 𝑥 , which after grounding and transfer to GF yields 𝗀 1 ⁢ ( 𝖺 0 ) ≈ 𝖿 1 ⁢ ( 𝖺 0 ) . The GF level is a device that enables us to reuse the refutational completeness result for standard (first-order) superposition.

The proof proceeds in three steps, corresponding to the three levels GF , GH , and H introduced in

14.1:

(1)

We use Bachmair and Ganzinger’s work on the refutational completeness of standard superposition [BG94] to prove the static refutational completeness of 𝐺𝐹𝐼𝑛𝑓 .

(2)

From the first-order model constructed in Bachmair and Ganzinger’s proof, we derive a 𝜆 -free higher-order model to prove the static refutational completeness of 𝐺𝐻𝐼𝑛𝑓 .

(3)

We use the saturation framework of Waldmann et al. [WTRB20] to lift the static refutational completeness of 𝐺𝐻𝐼𝑛𝑓 to static and dynamic refutational completeness of 𝐻𝐼𝑛𝑓 .

In the first step, since the inference system 𝐺𝐹𝐼𝑛𝑓 is standard ground superposition, we only need to work around minor differences between Bachmair and Ganzinger’s definitions and ours. Given a saturated clause set 𝑁 ⊆ 𝒞 GF with ⊥ ∉ 𝑁 , Bachmair and Ganzinger prove refutational completeness by constructing a term rewriting system 𝑅 𝑁 and showing that it can be viewed as an interpretation that is a model of 𝑁 . This step is exclusively concerned with ground first-order clauses.

In the second step, we derive refutational completeness of 𝐺𝐻𝐼𝑛𝑓 . Given a saturated clause set 𝑁 ⊆ 𝒞 GH with ⊥ ∉ 𝑁 , we use the first-order model 𝑅 ℱ ⁢ ( 𝒩 ) of ℱ ⁢ ( 𝒩 ) constructed in step (1) to derive a clausal higher-order interpretation that is a model of 𝑁 . Thanks to saturation w.r.t. GArgCong, the higher-order interpretation can conflate the interpretations of the members 𝖿 0 𝜐 ¯ , … , 𝖿 𝑛 𝜐 ¯ of a same symbol family. In the extensional calculi, saturation w.r.t. GExt can be used to show that the constructed interpretation is extensional.

In the third step, we employ the saturation framework by Waldmann et al. [WTRB20], which is largely based on Bachmair and Ganzinger’s [BG01], to prove 𝐻𝐼𝑛𝑓 refutationally complete. Like Bachmair and Ganzinger’s, the saturation framework allows us to prove the static and dynamic refutational completeness of our calculus on the nonground level. On top of that, it allows us to use the redundancy criterion defined in Section 14.1, which supports deletion of subsumed formulas. Moreover, the saturation framework provides completeness theorems for prover architectures, such as the DISCOUNT loop. The main proof obligation the saturation framework leaves to us is that there exist inferences in 𝐻𝐼𝑛𝑓 corresponding to all nonredundant inferences in 𝐺𝐻𝐼𝑛𝑓 . For monotone term orders, we can avoid Sup inferences into variables 𝑥 by exploiting the clause order’s compatibility with contexts: If 𝑡 ′ ≺ 𝑡 , we have 𝐶 ⁢ { 𝑥 ↦ 𝑡 ′ } ≺ 𝐶 ⁢ { 𝑥 ↦ 𝑡 } , which allows us to show that Sup inferences into variables are redundant. This technique fails for variables 𝑥 that occur applied in 𝐶 , because the order lacks compatibility with arguments. This is why the calculi must either purify clauses to make this line of reasoning work again or perform some Sup inferences into variables.

19.1. The Ground First-Order Level

We use Bachmair and Ganzinger’s results on standard superposition [BG94] to prove GF refutationally complete. In the subsequent steps, we will also make use of specific properties of Bachmair and Ganzinger’s model.

Bachmair and Ganzinger’s logic and inference system differ in some details from GF :

–

Bachmair and Ganzinger use untyped first-order logic, whereas GF ’s logic is typed. Bachmair and Ganzinger’s proof works verbatim for monomorphic first-order logic as well, but we need to require that the order ≻ has the subterm property to show that there exist no critical pairs in the term rewriting system, as observed by Wand [Wan17, Section 3.2.1].

–

In their redundancy criterion for clauses, Bachmair and Ganzinger require that a finite subset of { 𝐷 ∈ 𝑁 ∣ 𝐷 ≺ 𝐶 } entails 𝐶 , whereas we require that { 𝐷 ∈ 𝑁 ∣ 𝐷 ≺ 𝐶 } entails 𝐶 . By compactness of first-order logic, the two criteria are equivalent.

Bachmair and Ganzinger prove refutational completeness for nonground clause sets, but we only require the ground result here.

The basis of Bachmair and Ganzinger’s proof is that a term rewriting system 𝑅 defines an interpretation 𝒯 GF / ℛ such that for every ground equation 𝑠 ≈ 𝑡 , we have 𝒯 GF / ℛ ⊧ 𝓈 ≈ 𝓉 if and only if 𝑠 ← → 𝑅 * 𝑡 . Formally, 𝒯 GF / ℛ denotes the monomorphic first-order interpretation whose universes 𝑈 𝜏 consist of the 𝑅 -equivalence classes over 𝒯 GF containing terms of type 𝜏 . The interpretation 𝒯 GF / ℛ is term-generated—that is, for every element 𝑎 of the universe of this interpretation and for any valuation 𝜉 , there exists a ground term 𝑡 such that ⟦ 𝑡 ⟧ 𝒯 GF / ℛ 𝜉

𝑎 . To lighten notation, we will write 𝑅 to refer to both the term rewriting system 𝑅 and the interpretation 𝒯 GF / ℛ .

The term rewriting system is constructed as follows. Let 𝑁 ⊆ 𝒞 GF . We first define sets of rewrite rules 𝐸 𝑁 𝐶 and 𝑅 𝑁 𝐶 for all 𝐶 ∈ 𝑁 by induction on the clause order. Assume that 𝐸 𝑁 𝐷 has already been defined for all 𝐷 ∈ 𝑁 such that 𝐷 ≺ 𝐶 . Then 𝑅 𝑁 𝐶

⋃ 𝐷 ≺ 𝐶 𝐸 𝑁 𝐷 . Let 𝐸 𝑁 𝐶

{ 𝑠

■→ 𝑡 } if the following conditions are met:

(1)

𝐶

𝐶 ′ ∨ 𝑠 ≈ 𝑡 ;

(2)

𝑠 ≈ 𝑡 is strictly maximal in 𝐶 ;

(3)

𝑠 ≻ 𝑡 ;

(4)

𝐶 ′ is false in 𝑅 𝑁 𝐶 ;

(5)

𝑠 is irreducible w.r.t. 𝑅 𝑁 𝐶 .

Then 𝐶 is said to produce 𝑠

■→ 𝑡 . Otherwise, 𝐸 𝑁 𝐶

∅ . Finally, 𝑅 𝑁

⋃ 𝐷 𝐸 𝑁 𝐷 .

We call an inference 𝜄 ∈ 𝐺𝐹𝐼𝑛𝑓 B&G-redundant if some 𝐶 ∈ 𝑝𝑟𝑒𝑚𝑠 ⁢ ( 𝜄 ) is true in 𝑅 𝑁 𝐶 or 𝑐𝑜𝑛𝑐𝑙 ⁢ ( 𝜄 ) is true in 𝑅 𝑁 𝑚𝑝𝑟𝑒𝑚 ⁢ ( 𝜄 ) . We call a set 𝑁 ⊆ 𝒞 GF B&G-saturated if all inferences from 𝑁 are B&G-redundant.

Lemma \thethm.

If ⊥ ∉ 𝑁 and 𝑁 ⊆ 𝒞 GF is saturated w.r.t. 𝐺𝐹𝐼𝑛𝑓 and 𝐺𝐹𝑅𝑒𝑑 I , then 𝑁 is B&G-saturated.

Proof.

Let 𝑁 ⊆ 𝒞 GF be saturated w.r.t. 𝐺𝐹𝐼𝑛𝑓 and 𝐺𝐹𝑅𝑒𝑑 I . To show that 𝑁 is B&G-saturated, let 𝜄 be an inference from 𝑁 . We need to show that 𝜄 is B&G-redundant w.r.t. 𝑁 . We proceed by well-founded induction on 𝑚𝑝𝑟𝑒𝑚 ⁢ ( 𝜄 ) w.r.t. ≻ . By the induction hypothesis, for all inferences 𝜄 ′ with 𝑐𝑜𝑛𝑐𝑙 ⁢ ( 𝜄 ′ ) ≺ 𝑚𝑝𝑟𝑒𝑚 ⁢ ( 𝜄 ) , 𝜄 ′ is B&G-redundant w.r.t. 𝑁 . By Lemma 5.5 of Bachmair and Ganzinger, 𝜄 is B&G-redundant w.r.t. 𝑁 . ∎

Lemma \thethm.

Let ⊥ ∉ 𝑁 and 𝑁 ⊆ 𝒞 GF be saturated w.r.t. 𝐺𝐹𝐼𝑛𝑓 and 𝐺𝐹𝑅𝑒𝑑 I . If 𝐶

𝐶 ′ ∨ 𝑠 ≈ 𝑡 ∈ 𝑁 produces 𝑠

■→ 𝑡 , then 𝑠 ≈ 𝑡 is strictly eligible in 𝐶 and 𝐶 ′ is false in 𝑅 𝑁 .

Proof.

By Lemma 19.1, 𝑁 is also B&G-saturated. By condition 4, 𝐶 ′ is false in 𝑅 𝑁 𝐶 . Since 𝑠 ≻ 𝑡 by condition 3 and 𝑠 is irreducible w.r.t. 𝑅 𝑁 𝐶 by condition 5, 𝑠 ≈ 𝑡 is also false in 𝑅 𝑁 𝐶 . Hence, 𝐶 is false in 𝑅 𝑁 𝐶 . Using this and conditions 1, 2, 3, and 5, we can apply Lemma 4.11 of Bachmair and Ganzinger using 𝑁 for 𝑁 and for 𝑁 ′ , 𝐶 for 𝐶 and for 𝐷 , 𝑠 for 𝑠 , and 𝑡 for 𝑡 . Part (ii) of that lemma shows that 𝑠 ≈ 𝑡 is strictly eligible in 𝐶 , and part (iv) shows that 𝐶 ′ is false in 𝑅 𝑁 . ∎

Theorem \thethm (Ground first-order static refutational completeness).

The inference system 𝐺𝐹𝐼𝑛𝑓 is statically refutationally complete w.r.t. ( 𝐺𝐹𝑅𝑒𝑑 I , 𝐺𝐹𝑅𝑒𝑑 C ) . More precisely, if 𝑁 ⊆ 𝒞 GF is a clause set saturated w.r.t. 𝐺𝐹𝐼𝑛𝑓 and 𝐺𝐹𝑅𝑒𝑑 I such that ⊥ ∉ 𝑁 , then 𝑅 𝑁 is a model of 𝑁 .

Proof.

By Lemma 19.1, 𝑁 is also B&G-saturated. It follows that 𝑅 𝑁 is a model of 𝑁 , as shown in the proof of Theorem 4.14 of Bachmair and Ganzinger. ∎

19.2. The Ground Higher-Order Level

In this subsection, let 𝐺𝐻𝑆𝑒𝑙 be a selection function on 𝒞 GH , and let 𝑁 ⊆ 𝒞 GH with ⊥ ∉ 𝑁 be a clause set saturated w.r.t. 𝐺𝐻𝐼𝑛𝑓 𝐺𝐻𝑆𝑒𝑙 and 𝐺𝐻𝑅𝑒𝑑 I 𝐺𝐻𝑆𝑒𝑙 . Clearly, ℱ ⁢ ( 𝒩 ) is then saturated w.r.t. 𝐺𝐹𝐼𝑛𝑓 ℱ ⁢ ( 𝐺𝐻𝑆𝑒𝑙 ) and 𝐺𝐹𝑅𝑒𝑑 I ℱ ⁢ ( 𝐺𝐻𝑆𝑒𝑙 ) .

We abbreviate 𝑅 ℱ ⁢ ( 𝒩 ) as 𝑅 . From 𝑅 , we construct a model ℐ GH of 𝑁 . The key properties enabling us to perform this construction are that 𝑅 is term-generated and that the interpretations of the members 𝖿 0 𝜐 ¯ , … , 𝖿 𝑛 𝜐 ¯ of a same symbol family behave in the same way thanks to the ArgCong rule.

Lemma \thethm (Argument congruence).

For terms 𝑠 , 𝑡 , 𝑢 ∈ 𝒯 GH , if ⟦ ℱ ( 𝓈 ) ⟧ ℛ

⟦ ℱ ( 𝓉 ) ⟧ ℛ , then ⟦ ℱ ( 𝓈 𝓊 ) ⟧ ℛ

⟦ ℱ ( 𝓉 𝓊 ) ⟧ ℛ .

Proof.

What we want to show is equivalent to

ℱ ⁢ ( 𝓈 ) ← → ℛ * ℱ ⁢ ( 𝓉 ) ⁢ implies ⁢ ℱ ⁢ ( 𝓈 ⁢ 𝓊 ) ← → ℛ * ℱ ⁢ ( 𝓉 ⁢ 𝓊 )

By induction on the number of rewrite steps and due to symmetry, it suffices to show that

ℱ ⁢ ( 𝓈 )

■→ ℛ ℱ ⁢ ( 𝓉 ) ⁢ implies ⁢ ℱ ⁢ ( 𝓈 ⁢ 𝓊 ) ← → ℛ * ℱ ⁢ ( 𝓉 ⁢ 𝓊 )

If the rewrite step from ℱ ⁢ ( 𝓈 ) is below the top level, this is obvious because there is a corresponding rewrite step from ℱ ⁢ ( 𝓈 ⁢ 𝓊 ) . If it is at the top level, ℱ ⁢ ( 𝓈 )

■→ ℱ ⁢ ( 𝓉 ) must be rule of 𝑅 . This rule must originate from a productive clause of the form ℱ ⁢ ( 𝒞 )

ℱ ⁢ ( 𝒞 ′ ∨ 𝓈 ≈ 𝓉 ) . By Lemma 19.1, ℱ ⁢ ( 𝓈 ≈ 𝓉 ) is strictly eligible in ℱ ⁢ ( 𝒞 ) w.r.t. ℱ ⁢ ( 𝐺𝐻𝑆𝑒𝑙 ) , and hence 𝑠 ≈ 𝑡 is strictly eligible in 𝐶 w.r.t. 𝐺𝐻𝑆𝑒𝑙 . Moreover, 𝑠 and 𝑡 have functional type. Thus, the following GArgCong inference 𝜄 is applicable:

\prftree ⁢ [ 𝑟 ] ⁢ GArgCong ⁢ 𝐶 ′ ∨ 𝑠 ≈ 𝑡 ⁢ 𝐶 ′ ∨ 𝑠 ⁢ 𝑢 ≈ 𝑡 ⁢ 𝑢

By saturation, 𝜄 is redundant w.r.t. 𝑁 —i.e., we have 𝑐𝑜𝑛𝑐𝑙 ⁢ ( 𝜄 ) ∈ 𝑁 ∪ 𝐺𝐻𝑅𝑒𝑑 C ⁢ ( 𝑁 ) (for the nonpurifying calculi) or ℱ ⁢ ( 𝒩 ) ⊧ 𝑐𝑜𝑛𝑐𝑙 ⁢ ( 𝜄 ) (for the purifying calculi). In both cases, by Theorem 19.1, ℱ ⁢ ( 𝑐𝑜𝑛𝑐𝑙 ⁢ ( 𝜄 ) ) is then true in 𝑅 . By Lemma 19.1, ℱ ⁢ ( 𝒞 ′ ) is false in 𝑅 . Therefore, ℱ ⁢ ( 𝓈 ⁢ 𝓊 ≈ 𝓉 ⁢ 𝓊 ) must be true in 𝑅 . ∎

Lemma \thethm.

For terms 𝑠 , 𝑡 , 𝑢 , 𝑣 ∈ 𝒯 GH , if ⟦ ℱ ( 𝓈 ) ⟧ ℛ

⟦ ℱ ( 𝓉 ) ⟧ ℛ and ⟦ ℱ ( 𝓊 ) ⟧ ℛ

⟦ ℱ ( 𝓋 ) ⟧ ℛ , then ⟦ ℱ ( 𝓈 𝓊 ) ⟧ ℛ

⟦ ℱ ( 𝓉 𝓋 ) ⟧ ℛ .

Proof.

By Lemma 19.2, we have ⟦ ℱ ( 𝓈 𝓊 ) ⟧ ℛ

⟦ ℱ ( 𝓉 𝓊 ) ⟧ ℛ . It remains to show that ⟦ ℱ ( 𝓉 𝓊 ) ⟧ ℛ

⟦ ℱ ( 𝓉 𝓋 ) ⟧ ℛ . Since 𝑡 is ground, it must be of the form 𝖿 ⁢ ⟨ 𝜐 ¯ 𝑚 ⟩ ⁢ 𝑡 ¯ 𝑛 . The interpretation 𝑅 defined above is an interpretation ( 𝑈 , 𝒥 ) in monomorphic first-order logic. Then

⟦ ℱ ( 𝓉 𝓊 ) ⟧ ℛ

𝒥 ( 𝖿 𝓃 + 1 𝜐 ¯ 𝓂 ) ( ⟦ ℱ ( 𝓉 ¯ 𝓃 ) ⟧ ℛ , ⟦ ℱ ( 𝓊 ) ⟧ ℛ )

𝒥 ( 𝖿 𝓃 + 1 𝜐 ¯ 𝓂 ) ( ⟦ ℱ ( 𝓉 ¯ 𝓃 ) ⟧ ℛ , ⟦ ℱ ( 𝓋 ) ⟧ ℛ )

⟦ ℱ ( 𝓉 𝓋 ) ⟧ ℛ

Definition \thethm.

Define a higher-order interpretation ℐ GH

( 𝒰 GH , 𝒥 𝗍𝗒 GH , 𝒥 GH , ℰ GH ) as follows. The interpretation 𝑅 defined above is an interpretation ( 𝑈 , 𝒥 ) in monomorphic first-order logic, where 𝑈 𝜏 is its universe for type 𝜏 , and 𝒥 is its interpretation function. Let 𝒰 GH

{ 𝒰 𝜏 ∣ 𝜏 ⁢ is a ground type } . Let 𝒥 𝗍𝗒 GH ⁢ ( 𝜅 ) ⁢ ( 𝒰 𝜏 ¯ )

𝒰 𝜅 ⁢ ( 𝜏 ¯ ) for all type constructors 𝜅 and type tuples 𝜏 ¯ . Let 𝒥 GH ⁢ ( 𝖿 , 𝒰 𝜏 ¯ )

𝒥 ⁢ ( 𝖿 0 𝜏 ¯ ) .

Since 𝑅 is term-generated, for every 𝑎 ∈ 𝑈 𝜏 → 𝜐 and 𝑏 ∈ 𝑈 𝜏 , there exist ground terms 𝑠 : 𝜏 → 𝜐 and 𝑢 : 𝜏 such that ⟦ ℱ ( 𝓈 ) ⟧ ℛ

𝒶 and ⟦ ℱ ( 𝓊 ) ⟧ ℛ

𝒷 . We define ℰ GH by ℰ 𝒰 𝜏 , 𝒰 𝜐 GH ( 𝒶 ) ( 𝒷 )

⟦ ℱ ( 𝓈 𝓊 ) ⟧ ℛ for any term 𝑢 . By Lemma 19.2, this definition is independent of the choice of 𝑠 and 𝑢 .

Lemma \thethm (Model transfer to GH ).

ℐ GH is a model of 𝑁 . In the extensional calculi, ℐ GH is an extensional model of 𝑁 .

Proof.

We first prove by induction on terms 𝑡 ∈ 𝒯 GH that ⟦ 𝑡 ⟧ ℐ GH

⟦ ℱ ( 𝓉 ) ⟧ ℛ . Let 𝑡 ∈ 𝒯 GH , and assume as the induction hypothesis that ⟦ 𝑢 ⟧ ℐ GH

⟦ ℱ ( 𝓊 ) ⟧ ℛ for all subterms 𝑢 of 𝑡 . If 𝑡 is of the form 𝖿 ⁢ ⟨ 𝜐 ¯ ⟩ , then

⟦ 𝑡 ⟧ ℐ GH

𝒥 GH ( 𝖿 , 𝒰 𝜐 ¯ )

𝒥 ( 𝖿 0 𝜐 ¯ )

⟦ 𝖿 0 𝜐 ¯ ⟧ ℛ

⟦ ℱ ( 𝖿 ⟨ 𝜐 ¯ ⟩ ) ⟧ ℛ

⟦ ℱ ( 𝓉 ) ⟧ ℛ

If 𝑡 is an application 𝑡

𝑡 1 ⁢ 𝑡 2 , where 𝑡 1 is of type 𝜏 → 𝜐 , then, using the definition of the term denotation and of ℰ GH , we have

⟦ 𝑡 1 𝑡 2 ⟧ ℐ GH

ℰ 𝒰 𝜏 , 𝒰 𝜐 GH ( ⟦ 𝓉 1 ⟧ ℐ GH ) ( ⟦ 𝓉 2 ⟧ ℐ GH )

IH ℰ 𝒰 𝜏 , 𝒰 𝜐 GH ( ⟦ ℱ ( 𝓉 1 ) ⟧ ℛ ) ( ⟦ ℱ ( 𝓉 2 ) ⟧ ℛ )

⟦ ℱ ( 𝓉 1 𝓉 2 ) ⟧ ℛ

So we have shown that ⟦ 𝑡 ⟧ ℐ GH

⟦ ℱ ( 𝓉 ) ⟧ ℛ for all terms 𝑡 . It follows that a ground equation 𝑠 ≈ 𝑡 is true in ℐ GH if and only if ℱ ⁢ ( 𝓈 ≈ 𝓉 ) is true in 𝑅 . Hence a ground clause 𝐶 is true in ℐ GH if and only if ℱ ⁢ ( 𝒞 ) is true in 𝑅 . By Theorem 19.1, 𝑅 is a model of ℱ ⁢ ( 𝒩 ) . Thus, ℐ GH is a model of 𝑁 .

For the extensional calculi, it remains to show that ℐ GH is extensional—i.e., we have to show that for all 𝜏 and 𝜐 and all 𝑎 , 𝑏 ∈ 𝑈 𝜏 → 𝜐 , if 𝑎 ≠ 𝑏 , then ℰ GH ⁢ ( 𝒶 ) ≠ ℰ GH ⁢ ( 𝒷 ) . Since 𝑅 is term-generated, there are terms 𝑠 , 𝑡 ∈ 𝒯 GF such that ⟦ 𝑠 ⟧ 𝑅

𝑎 and ⟦ 𝑡 ⟧ 𝑅

𝑏 . By what we have shown above, it follows that ⟦ 𝑠 ′ ⟧ ℐ GH

𝑎 and ⟦ 𝑡 ′ ⟧ ℐ GH

𝑏 for 𝑠 ′

ℱ − 1 ⁢ ( 𝓈 ) and 𝑡 ′

ℱ − 1 ⁢ ( 𝓉 ) .

Since 𝑁 is saturated, the GExt inference that generates the clause

𝐶

𝑠 ′ ⁢ ( 𝖽𝗂𝖿𝖿 ⁢ ⟨ 𝜏 , 𝜐 ⟩ ⁢ 𝑠 ′ ⁢ 𝑡 ′ ) ≉ 𝑡 ′ ⁢ ( 𝖽𝗂𝖿𝖿 ⁢ ⟨ 𝜏 , 𝜐 ⟩ ⁢ 𝑠 ′ ⁢ 𝑡 ′ ) ∨ 𝑠 ′ ≈ 𝑡 ′

is redundant—i.e., 𝐶 ∈ 𝑁 ∪ 𝐺𝐻𝑅𝑒𝑑 C ⁢ ( 𝑁 ) (in the nonpurifying calculi) or ℱ ⁢ ( 𝒩 ) ⊧ ℱ ⁢ ( 𝒞 ) (in the purifying calculi). In both cases, it follows that 𝑅 ⊧ ℱ ⁢ ( 𝒞 ) by Theorem 19.1 and thus ℐ GH ⊧ 𝐶 by what we have shown above. The second literal of 𝐶 is false in ℐ GH because ⟦ 𝑠 ′ ⟧ ℐ GH

𝑎 ≠ 𝑏

⟦ 𝑡 ′ ⟧ ℐ GH . So the first literal of 𝐶 must be true in ℐ GH and thus

ℰ GH ( 𝒶 ) ( ⟦ 𝖽𝗂𝖿𝖿 ⟨ 𝜏 , 𝜐 ⟩ 𝓈 ′ 𝓉 ′ ⟧ ℐ GH )

⟦ 𝑠 ′ ( 𝖽𝗂𝖿𝖿 ⟨ 𝜏 , 𝜐 ⟩ 𝑠 ′ 𝑡 ′ ) ⟧ ℐ GH

≠ ⟦ 𝑡 ′ ( 𝖽𝗂𝖿𝖿 ⟨ 𝜏 , 𝜐 ⟩ 𝑠 ′ 𝑡 ′ ) ⟧ ℐ GH

ℰ GH ( 𝒷 ) ( ⟦ 𝖽𝗂𝖿𝖿 ⟨ 𝜏 , 𝜐 ⟩ 𝓈 ′ 𝓉 ′ ⟧ ℐ GH )

It follows that ℰ GH ⁢ ( 𝒶 ) ≠ ℰ GH ⁢ ( 𝒷 ) . ∎

We summarize the results of this subsection in the following theorem:

Theorem \thethm (Ground static refutational completeness).

Let 𝐺𝐻𝑆𝑒𝑙 be a selection function on 𝒞 GH . Then the inference system 𝐺𝐻𝐼𝑛𝑓 𝐺𝐻𝑆𝑒𝑙 is statically refutationally complete w.r.t. ( 𝐺𝐻𝑅𝑒𝑑 I , 𝐺𝐻𝑅𝑒𝑑 C ) . That means, if 𝑁 ⊆ 𝒞 GH is saturated w.r.t. 𝐺𝐻𝐼𝑛𝑓 𝐺𝐻𝑆𝑒𝑙 and 𝐺𝐻𝑅𝑒𝑑 I 𝐺𝐻𝑆𝑒𝑙 , then 𝑁 ⊧ ⊥ if and only if ⊥ ∈ 𝑁 .

The construction of ℐ GH relies on the specific properties of 𝑅 . It would not work with an arbitrary first-order interpretation. Transforming a 𝜆 -free higher-order interpretation into a first-order interpretation is easier:

Lemma \thethm.

Given an interpretation ℐ on GH , there exists an interpretation ℐ GF on GF such that for any clause 𝐶 ∈ 𝒞 GH the truth values of 𝐶 in ℐ and of ℱ ⁢ ( 𝒞 ) in ℐ GF coincide.

Proof.

Let ℐ

( 𝒰 , 𝒥 𝗍𝗒 , 𝒥 , ℰ ) be a 𝜆 -free higher-order interpretation. Let 𝒰 𝜏 GF

⟦ 𝜏 ⟧ ℐ 𝗍𝗒 be the first-order type universe for the ground type 𝜏 . For a symbol 𝖿 𝑙 𝜐 ¯ 𝑚 ∈ Σ GF and universe elements 𝑎 ¯ 𝑙 , let 𝒥 GF ( 𝖿 𝓁 𝜐 ¯ 𝓂 ) ( 𝒶 ¯ 𝓁 )

⟦ 𝖿 ⟨ 𝜐 ¯ 𝓂 ⟩ 𝓍 ¯ 𝓁 ⟧ ℐ { 𝓍 ¯ 𝓁 ↦ 𝒶 ¯ } . This defines an interpretation ℐ GF

( 𝒰 GF , 𝒥 GF ) on GF .

We need to show that for any 𝐶 ∈ 𝒞 GH , ℐ ⊧ 𝐶 if and only if ℐ GF ⊧ ℱ ⁢ ( 𝒞 ) . It suffices to show that ⟦ 𝑡 ⟧ ℐ

⟦ ℱ ( 𝓉 ) ⟧ ℐ GF for all terms 𝑡 ∈ 𝒯 GH . We prove this by induction on 𝑡 . Since 𝑡 is ground, it must be of the form 𝖿 ⁢ ⟨ 𝜐 ¯ 𝑚 ⟩ ⁢ 𝑠 ¯ 𝑙 . Then ℱ ⁢ ( 𝓉 )

𝖿 𝓁 𝜐 ¯ 𝓂 ⁢ ( ℱ ⁢ ( 𝓈 ¯ 𝓁 ) ) and hence

⟦ ℱ ( 𝓉 ) ⟧ ℐ GF

𝒥 GF ( 𝖿 𝓁 𝜐 ¯ 𝓂 ) ( ⟦ ℱ ( 𝓈 ¯ 𝓁 ) ⟧ ℐ GF )

IH 𝒥 GF ( 𝖿 𝓁 𝜐 ¯ 𝓂 ) ( ⟦ 𝓈 ¯ 𝓁 ⟧ ℐ )

⟦ 𝖿 ⟨ 𝜐 ¯ 𝓂 ⟩ 𝓈 ¯ 𝓁 ⟧ ℐ

⟦ 𝓉 ⟧ ℐ

using the definition of 𝒥 GF and Lemma 12.3 for the third step. ∎

19.3. The Nonground Higher-Order Level

To lift the result to the nonground level, we employ the saturation framework of Waldmann et al. [WTRB20]. It is easy to see that the entailment relation ⊧ on GH is a consequence relation in the sense of the framework. It remains to show that our redundancy criterion on GH is a redundancy criterion in the sense of the framework and that 𝒢 is a grounding function in the sense of the framework:

Lemma \thethm.

The redundancy criterion for GH is a redundancy criterion in the sense of the framework.

Proof.

We must prove the conditions (R1) to (R4) defined by Waldmann et al., which, adapted to our context, state the following for all clause sets 𝑁 , 𝑁 ′ ⊆ 𝒞 GH :

(R1)

if 𝑁 ⊧ ⊥ , then 𝑁 ∖ 𝐺𝐻𝑅𝑒𝑑 C ⁢ ( 𝑁 ) ⊧ ⊥ ;

(R2)

if 𝑁 ⊆ 𝑁 ′ , then 𝐺𝐻𝑅𝑒𝑑 C ⁢ ( 𝑁 ) ⊆ 𝐺𝐻𝑅𝑒𝑑 C ⁢ ( 𝑁 ′ ) and 𝐺𝐻𝑅𝑒𝑑 I ⁢ ( 𝑁 ) ⊆ 𝐺𝐻𝑅𝑒𝑑 I ⁢ ( 𝑁 ′ ) ;

(R3)

if 𝑁 ′ ⊆ 𝐺𝐻𝑅𝑒𝑑 C ⁢ ( 𝑁 ) , then 𝐺𝐻𝑅𝑒𝑑 C ⁢ ( 𝑁 ) ⊆ 𝐺𝐻𝑅𝑒𝑑 C ⁢ ( 𝑁 ∖ 𝑁 ′ ) and 𝐺𝐻𝑅𝑒𝑑 I ⁢ ( 𝑁 ) ⊆ 𝐺𝐻𝑅𝑒𝑑 I ⁢ ( 𝑁 ∖ 𝑁 ′ ) ;

(R4)

if 𝜄 ∈ 𝐺𝐻𝐼𝑛𝑓 and 𝑐𝑜𝑛𝑐𝑙 ⁢ ( 𝜄 ) ∈ 𝑁 , then 𝜄 ∈ 𝐺𝐻𝑅𝑒𝑑 I ⁢ ( 𝑁 ) .

(R1) It suffices to show that 𝑁 ∖ 𝐺𝐻𝑅𝑒𝑑 C ⁢ ( 𝑁 ) ⊧ 𝑁 for 𝑁 ⊆ 𝒞 GH . Let ℐ be a model of 𝑁 ∖ 𝐺𝐻𝑅𝑒𝑑 C ⁢ ( 𝑁 ) . In the extensional calculi, let ℐ be extensional. Then by Lemma 19.2, there exists a model ℐ GF of ℱ ⁢ ( 𝒩 ∖ 𝐺𝐻𝑅𝑒𝑑 C ⁢ ( 𝒩 ) )

ℱ ⁢ ( 𝒩 ) ∖ 𝐺𝐹𝑅𝑒𝑑 C ⁢ ( ℱ ⁢ ( 𝒩 ) ) . By Lemma 5.2 of Bachmair and Ganzinger, this is also a model of ℱ ⁢ ( 𝒩 ) . By Lemma 19.2, it follows that ℐ ⊧ 𝑁 .

(R2) By Lemma 5.6(i) of Bachmair and Ganzinger, this holds on GF . For clauses and all inferences except GArgCong and GExt, this implies that it holds on GH as well because ℱ is a redundancy-preserving bijection between 𝒞 GH and 𝒞 GF and between these inferences. For GArgCong and GExt inferences, it holds because it holds on clauses.

(R3) The proof is analogous to (R2), with Lemma 5.6(ii) of Bachmair and Ganzinger instead of Lemma 5.6(i).

(R4) We must show that for all inferences with 𝑐𝑜𝑛𝑐𝑙 ⁢ ( 𝜄 ) ∈ 𝑁 , we have 𝜄 ∈ 𝐺𝐻𝑅𝑒𝑑 I ⁢ ( 𝑁 ) . Since 𝑐𝑜𝑛𝑐𝑙 ⁢ ( 𝜄 ) ≺ 𝑚𝑝𝑟𝑒𝑚 ⁢ ( 𝜄 ) for all 𝜄 ∈ 𝐺𝐹𝐼𝑛𝑓 , this holds on GF . For all inferences except GArgCong and GExt, since ℱ is a bijection preserving redundancy, it follows that it also holds also on GH . For GArgCong and GExt inferences, it holds by definition. ∎

Lemma \thethm.

The grounding functions 𝒢 𝐺𝐻𝑆𝑒𝑙 for 𝐺𝐻𝑆𝑒𝑙 ∈ 𝒢 ⁢ ( 𝐻𝑆𝑒𝑙 ) are grounding functions in the sense of the framework.

Proof.

We must prove the conditions (G1) to (G3) defined by Waldmann et al., which, adapted to our context, state the following:

(G1)

𝒢 ⁢ ( ⊥ )

{ ⊥ } ;

(G2)

for every 𝐶 ∈ 𝒞 GH , if ⊥ ∈ 𝒢 ( 𝒞 ) , then 𝐶

⊥ ;

(G3)

for every 𝜄 ∈ 𝐻𝐼𝑛𝑓 , if 𝒢 𝐺𝐻𝑆𝑒𝑙 ⁢ ( 𝜄 ) ≠ 𝑢𝑛𝑑𝑒𝑓 , then 𝒢 𝐺𝐻𝑆𝑒𝑙 ⁢ ( 𝜄 ) ⊆ 𝐺𝐻𝑅𝑒𝑑 I 𝐺𝐻𝑆𝑒𝑙 ⁢ ( 𝒢 ⁢ ( 𝑐𝑜𝑛𝑐𝑙 ⁢ ( 𝜄 ) ) ) .

Clearly, 𝐶

⊥ if and only if ⊥ ∈ 𝒢 ( 𝒞 ) if and only if 𝒢 ⁢ ( 𝒞 )

{ ⊥ } , proving (G1) and (G2). For (G3), we have to show for all non-PosExt inferences 𝜄 ∈ 𝐻𝐼𝑛𝑓 that 𝒢 𝐺𝐻𝑆𝑒𝑙 ⁢ ( 𝜄 ) ⊆ 𝐺𝐻𝑅𝑒𝑑 I 𝐺𝐻𝑆𝑒𝑙 ⁢ ( 𝒢 ⁢ ( 𝑐𝑜𝑛𝑐𝑙 ⁢ ( 𝜄 ) ) ) . Let 𝜄 ∈ 𝐻𝐼𝑛𝑓 and 𝜄 ′ ∈ 𝒢 𝐺𝐻𝑆𝑒𝑙 ⁢ ( 𝜄 ) . By the definition of 𝒢 𝐺𝐻𝑆𝑒𝑙 , there exists a grounding substitution 𝜃 such that 𝑝𝑟𝑒𝑚𝑠 ⁢ ( 𝜄 ′ )

𝑝𝑟𝑒𝑚𝑠 ⁢ ( 𝜄 ) ⁢ 𝜃 and 𝑐𝑜𝑛𝑐𝑙 ⁢ ( 𝜄 ′ )

𝑝𝑟𝑒𝑐𝑜𝑛𝑐𝑙 ⁢ ( 𝜄 ) ⁢ 𝜃 . We want to show that 𝜄 ′ ∈ 𝐺𝐻𝑅𝑒𝑑 I 𝐺𝐻𝑆𝑒𝑙 ⁢ ( 𝒢 ⁢ ( 𝑐𝑜𝑛𝑐𝑙 ⁢ ( 𝜄 ) ) ) .

If 𝜄 ′ is not an GArgCong or GExt inference, by the definition of inference redundancy, it suffices to show that { 𝐷 ∈ ℱ ⁢ ( 𝒢 ⁢ ( 𝑐𝑜𝑛𝑐𝑙 ⁢ ( 𝜄 ) ) ) ∣ 𝒟 ≺ 𝑚𝑝𝑟𝑒𝑚 ⁢ ( ℱ ⁢ ( 𝜄 ′ ) ) } ⊧ 𝑐𝑜𝑛𝑐𝑙 ⁢ ( ℱ ⁢ ( 𝜄 ′ ) ) . We define a substitution 𝜃 ′ that extends 𝜃 to all variables in 𝑐𝑜𝑛𝑐𝑙 ⁢ ( 𝜄 ) . Due to purification, the clause 𝑐𝑜𝑛𝑐𝑙 ⁢ ( 𝜄 ) may contain variables not present in 𝑝𝑟𝑒𝑐𝑜𝑛𝑐𝑙 ⁢ ( 𝜄 ) . For each such variable 𝑥 ′ , let 𝑥 be the variable in 𝑝𝑟𝑒𝑐𝑜𝑛𝑐𝑙 ⁢ ( 𝜄 ) that 𝑥 ′ stems from and define 𝑥 ′ ⁢ 𝜃 ′

𝑥 ⁢ 𝜃 . Then the clause ℱ ⁢ ( 𝑐𝑜𝑛𝑐𝑙 ⁢ ( 𝜄 ) ⁢ 𝜃 ′ ) differs from the clause ℱ ⁢ ( 𝑐𝑜𝑛𝑐𝑙 ⁢ ( 𝜄 ′ ) )

ℱ ⁢ ( 𝑝𝑟𝑒𝑐𝑜𝑛𝑐𝑙 ⁢ ( 𝜄 ) ⁢ 𝜃 ′ ) only in some additional grounded purification literals, which all have the form 𝑡 ≉ 𝑡 and are thus trivially false in any interpretation. Hence, ℱ ⁢ ( 𝑐𝑜𝑛𝑐𝑙 ⁢ ( 𝜄 ) ⁢ 𝜃 ′ ) ⊧ ℱ ⁢ ( 𝑐𝑜𝑛𝑐𝑙 ⁢ ( 𝜄 ′ ) ) . Since one of the variables of a purification literal must appear applied in the clause, for each grounded purification literal 𝑡 ≉ 𝑡 the term 𝑡 must be smaller than the maximal term of the clause ℱ ⁢ ( 𝑐𝑜𝑛𝑐𝑙 ⁢ ( 𝜄 ′ ) ) .

If no literals are selected in 𝑚𝑝𝑟𝑒𝑚 ⁢ ( ℱ ⁢ ( 𝜄 ′ ) ) , inspection of the rules in 𝐺𝐹𝐼𝑛𝑓 shows that ℱ ⁢ ( 𝑐𝑜𝑛𝑐𝑙 ⁢ ( 𝜄 ) ⁢ 𝜃 ′ ) ≺ 𝑚𝑝𝑟𝑒𝑚 ⁢ ( ℱ ⁢ ( 𝜄 ′ ) ) . Otherwise, 𝜄 ′ can only be an ERes inference or a Sup inference into a negative literal. If it is an ERes inference, due to the selection restrictions, the substitution 𝜎 used in 𝜄 is the identity for all variables of functional type. Therefore, applying 𝜎 cannot trigger any purification and hence ℱ ⁢ ( 𝑐𝑜𝑛𝑐𝑙 ⁢ ( 𝜄 ) ⁢ 𝜃 ′ )

ℱ ⁢ ( 𝑝𝑟𝑒𝑐𝑜𝑛𝑐𝑙 ⁢ ( 𝜄 ) ⁢ 𝜃 ′ ) ≺ 𝑚𝑝𝑟𝑒𝑚 ⁢ ( ℱ ⁢ ( 𝜄 ′ ) ) . If 𝜄 ′ is a Sup inference into a negative literal, due to the selection restrictions, the substitution 𝜎

mgu ⁡ ( 𝑡 , 𝑢 ) used in 𝜄 is the identity for all variables of functional type that stem from the main premise. Therefore the variables from the main premise 𝐶 need not be purified. The variables from the side premise 𝐷 might need to be purified, yielding purification literals of the form 𝑥 ≉ 𝑦 where 𝑥 ⁢ 𝜃 ′

𝑦 ⁢ 𝜃 ′ . Then 𝑥 or 𝑦 must appear applied in 𝐷 and hence 𝑥 ⁢ 𝜃 ′ is smaller than 𝑡 ⁢ 𝜃 ′ . Again, it follows that ℱ ⁢ ( 𝑐𝑜𝑛𝑐𝑙 ⁢ ( 𝜄 ) ⁢ 𝜃 ′ ) ≺ 𝑚𝑝𝑟𝑒𝑚 ⁢ ( ℱ ⁢ ( 𝜄 ′ ) ) .

This proves { 𝐷 ∈ ℱ ⁢ ( 𝒢 ⁢ ( 𝑐𝑜𝑛𝑐𝑙 ⁢ ( 𝜄 ) ) ) ∣ 𝒟 ≺ 𝑚𝑝𝑟𝑒𝑚 ⁢ ( ℱ ⁢ ( 𝜄 ′ ) ) } ⊧ 𝑐𝑜𝑛𝑐𝑙 ⁢ ( ℱ ⁢ ( 𝜄 ′ ) ) .

In the nonpurifying calculi, if 𝜄 ′ is an GArgCong or GExt inference, it suffices to show that 𝑐𝑜𝑛𝑐𝑙 ⁢ ( 𝜄 ′ ) ∈ 𝒢 ⁢ ( 𝑐𝑜𝑛𝑐𝑙 ⁢ ( 𝜄 ) ) . This holds because 𝑐𝑜𝑛𝑐𝑙 ⁢ ( 𝜄 ′ )

𝑝𝑟𝑒𝑐𝑜𝑛𝑐𝑙 ⁢ ( 𝜄 ) ⁢ 𝜃

𝑐𝑜𝑛𝑐𝑙 ⁢ ( 𝜄 ) ⁢ 𝜃 . In the purifying calculi, if 𝜄 ′ is an GArgCong or GExt inference, we must show that ℱ ⁢ ( 𝒢 ⁢ ( 𝑐𝑜𝑛𝑐𝑙 ⁢ ( 𝜄 ) ) ) ⊧ ℱ ⁢ ( 𝑐𝑜𝑛𝑐𝑙 ⁢ ( 𝜄 ′ ) ) . Defining 𝜃 ′ as above, we have ℱ ⁢ ( 𝑐𝑜𝑛𝑐𝑙 ⁢ ( 𝜄 ) ⁢ 𝜃 ′ ) ⊧ ℱ ⁢ ( 𝑐𝑜𝑛𝑐𝑙 ⁢ ( 𝜄 ′ ) ) , as desired. ∎

To lift the completeness result of the previous section to the nonground calculus 𝐻𝐼𝑛𝑓 , we employ Theorem 14 of Waldmann et al., which, adapted to our context, is stated as follows. The theorem uses the notation 𝐼𝑛𝑓 ⁢ ( 𝑁 ) to denote the set of 𝐼𝑛𝑓 -inferences whose premises are in 𝑁 , for an inference system 𝐼𝑛𝑓 and a clause set 𝑁 . Moreover, it uses the Herbrand entailment ⊧ 𝐺 on 𝒞 H , which is defined as 𝑁 1 ⊧ 𝐺 ⁢ 𝑁 2 if 𝒢 ⁢ ( 𝒩 1 ) ⊧ 𝒢 ⁢ ( 𝒩 2 ) .

Theorem \thethm (Lifting theorem).

If 𝐺𝐻𝐼𝑛𝑓 𝐺𝐻𝑆𝑒𝑙 is statically refutationally complete w.r.t. ( 𝐺𝐻𝑅𝑒𝑑 I 𝐺𝐻𝑆𝑒𝑙 , 𝐺𝐻𝑅𝑒𝑑 C ) for every 𝐺𝐻𝑆𝑒𝑙 ∈ 𝒢 ⁢ ( 𝐻𝑆𝑒𝑙 ) , and if for every 𝑁 ⊆ 𝒞 H that is saturated w.r.t. 𝐻𝐼𝑛𝑓 and 𝐻𝑅𝑒𝑑 I there exists a 𝐺𝐻𝑆𝑒𝑙 ∈ 𝒢 ⁢ ( 𝐻𝑆𝑒𝑙 ) such that 𝐺𝐻𝐼𝑛𝑓 𝐺𝐻𝑆𝑒𝑙 ⁢ ( 𝒢 ⁢ ( 𝒩 ) ) ⊆ 𝒢 𝐺𝐻𝑆𝑒𝑙 ⁢ ( 𝐻𝐼𝑛𝑓 ⁢ ( 𝒩 ) ) ∪ 𝐺𝐻𝑅𝑒𝑑 I 𝐺𝐻𝑆𝑒𝑙 ⁢ ( 𝒢 ⁢ ( 𝒩 ) ) , then 𝐻𝐼𝑛𝑓 is statically refutationally complete w.r.t. ( 𝐻𝑅𝑒𝑑 I , 𝐻𝑅𝑒𝑑 C ) and ⊧ 𝐺 .

Proof.

This is essentially Theorem 14 of Waldmann et al. We take H for 𝐅 , GH for 𝐆 , and 𝒢 ⁢ ( 𝐻𝑆𝑒𝑙 ) for 𝑄 . It is easy to see that the entailment relation ⊧ on GH is a consequence relation in the sense of the framework. By Lemma 19.3 and 19.3, ( 𝐺𝐻𝑅𝑒𝑑 I 𝐺𝐻𝑆𝑒𝑙 , 𝐺𝐻𝑅𝑒𝑑 C ) is a redundancy criterion in the sense of the framework, and 𝒢 𝐺𝐻𝑆𝑒𝑙 are grounding functions in the sense of the framework, for all 𝐺𝐻𝑆𝑒𝑙 ∈ 𝒢 ⁢ ( 𝐻𝑆𝑒𝑙 ) . The redundancy criterion ( 𝐻𝑅𝑒𝑑 I , 𝐻𝑅𝑒𝑑 C ) matches exactly the intersected lifted redundancy criterion 𝑅𝑒𝑑 ∩ 𝒢 , ⊐ of Waldmann et al. Theorem 14 of Waldmann et al. states the theorem only for ⊐

∅ . By Lemma 16 of Waldmann et al., it also holds if ⊐ ≠ ∅ . ∎

Let 𝑁 ⊆ 𝒞 H be a clause set saturated w.r.t. 𝐻𝐼𝑛𝑓 and 𝐻𝑅𝑒𝑑 I . We assume that 𝐻𝑆𝑒𝑙 fulfills the selection restrictions introduced in Section 12.1. For the above theorem to apply, we need to show that there exists a selection function 𝐺𝐻𝑆𝑒𝑙 ∈ 𝒢 ⁢ ( 𝐻𝑆𝑒𝑙 ) such that all inferences 𝜄 ∈ 𝐺𝐻𝐼𝑛𝑓 𝐺𝐻𝑆𝑒𝑙 with 𝑝𝑟𝑒𝑚𝑠 ⁢ ( 𝜄 ) ∈ 𝒢 ⁢ ( 𝒩 ) are liftable or redundant. Here, by liftable, we mean that 𝜄 is a 𝒢 𝐺𝐻𝑆𝑒𝑙 -ground instance of a 𝐻𝐼𝑛𝑓 -inference from 𝑁 ; by redundant, we mean that 𝜄 ∈ 𝐺𝐻𝑅𝑒𝑑 I 𝐺𝐻𝑆𝑒𝑙 ⁢ ( 𝒢 ⁢ ( 𝒩 ) ) .

To choose the right selection function 𝐺𝐻𝑆𝑒𝑙 ∈ 𝒢 ⁢ ( 𝐻𝑆𝑒𝑙 ) , we observe that each ground clause 𝐶 ∈ 𝒢 ⁢ ( 𝒩 ) must have at least one corresponding clause 𝐷 ∈ 𝑁 such that 𝐶 is a ground instance of 𝐷 . We choose one of them for each 𝐶 ∈ 𝒢 ⁢ ( 𝒩 ) , which we denote by 𝒢 − 1 ⁢ ( 𝒞 ) . Then let 𝐺𝐻𝑆𝑒𝑙 select those literals in 𝐶 that correspond to the literals selected by 𝐻𝑆𝑒𝑙 in 𝒢 − 1 ⁢ ( 𝒞 ) . Given this selection function 𝐺𝐻𝑆𝑒𝑙 , we can show that all inferences from 𝒢 ⁢ ( 𝒩 ) are liftable or redundant.

All non-Sup inferences in 𝐺𝐻𝐼𝑛𝑓 are liftable (Lemma 19.3). For Sup, some inferences are liftable (Lemma 19.3) and some are redundant (Lemma 19.3). As in standard superposition, Sup inferences into positions below variables are redundant. The variable condition of each of the four calculi is designed to cover the nonredundant Sup inferences into positions of variable-headed terms, which makes these inferences liftable.

Lemma \thethm.

Let 𝜎 be the most general unifier of 𝑠 and 𝑠 ′ . Let 𝜃 be an arbitrary unifier of 𝑠 and 𝑠 ′ . Then 𝜎 ⁢ 𝜃

𝜃 .

Proof.

Like in first-order logic, we can assume that 𝜎 is idempotent without loss of generality [Fit96, Corollary 7.2.11]. Since 𝜎 is most general, there exists a substitution 𝜌 such that 𝜎 ⁢ 𝜌

𝜃 . Therefore, by idempotence, 𝜎 ⁢ 𝜃

𝜎 ⁢ 𝜎 ⁢ 𝜌

𝜎 ⁢ 𝜌

𝜃 . ∎

Lemma \thethm (Lifting of ERes, EFact, GArgCong, and GExt).

All ERes, EFact, GArgCong, and GExt inferences are liftable.

Proof.

ERes: Let 𝜄 ∈ 𝐺𝐻𝐼𝑛𝑓 𝐺𝐻𝑆𝑒𝑙 be an ERes inference with 𝑝𝑟𝑒𝑚𝑠 ⁢ ( 𝜄 ) ∈ 𝒢 ⁢ ( 𝒩 ) . Then 𝜄 is of the form

\prftree ⁢ [ 𝑟 ] ⁢ ERes ⁢ 𝐶 ⁢ 𝜃

𝐶 ′ ⁢ 𝜃 ∨ 𝑠 ⁢ 𝜃 ≉ 𝑠 ′ ⁢ 𝜃 ⁢ 𝐶 ′ ⁢ 𝜃

where 𝒢 − 1 ⁢ ( 𝒞 ⁢ 𝜃 )

𝒞

𝒞 ′ ∨ 𝓈 ≉ 𝓈 ′ and the literal 𝑠 ⁢ 𝜃 ≉ 𝑠 ′ ⁢ 𝜃 is eligible w.r.t. 𝐺𝐻𝑆𝑒𝑙 . Let 𝜎

mgu ⁡ ( 𝑠 , 𝑠 ′ ) . It follows that 𝑠 ≉ 𝑠 ′ is eligible in 𝐶 w.r.t. 𝜎 and 𝐻𝑆𝑒𝑙 . Moreover, 𝑠 ⁢ 𝜃 and 𝑠 ′ ⁢ 𝜃 are unifiable and ground, and therefore 𝑠 ⁢ 𝜃

𝑠 ′ ⁢ 𝜃 . Thus, the following inference 𝜄 ′ ∈ 𝐻𝐼𝑛𝑓 is applicable:

\prftree ⁢ [ 𝑟 ] ⁢ ERes ⁢ 𝐶 ′ ∨ 𝑠 ≉ 𝑠 ′ ⁢ 𝓅 ⁢ 𝓊 ⁢ 𝓇 ⁢ ℯ ⁡ ( 𝐶 ′ ⁢ 𝜎 )

(where 𝓅 ⁢ 𝓊 ⁢ 𝓇 ⁢ ℯ is the identity in the nonpurifying calculi). By Lemma 19.3, we have 𝐶 ′ ⁢ 𝜎 ⁢ 𝜃

𝐶 ′ ⁢ 𝜃 . Therefore, 𝜄 is the 𝜃 -ground instance of 𝜄 ′ and is therefore liftable.

EFact: Analogously, if 𝜄 ∈ 𝐺𝐻𝐼𝑛𝑓 𝐺𝐻𝑆𝑒𝑙 is an EFact inference with 𝑝𝑟𝑒𝑚𝑠 ⁢ ( 𝜄 ) ∈ 𝒢 ⁢ ( 𝒩 ) , then 𝜄 is of the form

\prftree ⁢ [ 𝑟 ] ⁢ EFact ⁢ 𝐶 ⁢ 𝜃

𝐶 ′ ⁢ 𝜃 ∨ 𝑠 ′ ⁢ 𝜃 ≈ 𝑡 ′ ⁢ 𝜃 ∨ 𝑠 ⁢ 𝜃 ≈ 𝑡 ⁢ 𝜃 ⁢ 𝐶 ′ ⁢ 𝜃 ∨ 𝑡 ⁢ 𝜃 ≉ 𝑡 ′ ⁢ 𝜃 ∨ 𝑠 ⁢ 𝜃 ≈ 𝑡 ′ ⁢ 𝜃

where 𝒢 − 1 ⁢ ( 𝒞 ⁢ 𝜃 )

𝒞

𝒞 ′ ∨ 𝓈 ′ ≈ 𝓉 ′ ∨ 𝓈 ≈ 𝓉 , the literal 𝑠 ⁢ 𝜃 ≈ 𝑡 ⁢ 𝜃 is eligible in 𝐶 w.r.t. 𝐺𝐻𝑆𝑒𝑙 , and 𝑠 ⁢ 𝜃 ⊀ 𝑡 ⁢ 𝜃 . Let 𝜎

mgu ⁡ ( 𝑠 , 𝑠 ′ ) . Hence, 𝑠 ≈ 𝑡 is eligible in 𝐶 w.r.t. 𝜎 and 𝐻𝑆𝑒𝑙 . We have 𝑠 ⊀ 𝑡 . Moreover, 𝑠 ⁢ 𝜃 and 𝑠 ′ ⁢ 𝜃 are unifiable and ground. Hence, 𝑠 ⁢ 𝜃

𝑠 ′ ⁢ 𝜃 . Thus, the following inference 𝜄 ′ ∈ 𝐻𝐼𝑛𝑓 is applicable:

\prftree ⁢ [ 𝑟 ] ⁢ EFact ⁢ 𝐶 ′ ∨ 𝑠 ′ ≈ 𝑡 ′ ∨ 𝑠 ≈ 𝑡 ⁢ 𝓅 ⁢ 𝓊 ⁢ 𝓇 ⁢ ℯ ⁡ ( ( 𝐶 ′ ∨ 𝑡 ≉ 𝑡 ′ ∨ 𝑠 ≈ 𝑡 ′ ) ⁢ 𝜎 )

By Lemma 19.3, we have 𝑝𝑟𝑒𝑐𝑜𝑛𝑐𝑙 ⁢ ( 𝜄 ′ ) ⁢ 𝜃

𝑐𝑜𝑛𝑐𝑙 ⁢ ( 𝜄 ) . Hence, 𝜄 is the 𝜃 -ground instance of 𝜄 ′ and is therefore liftable.

GArgCong: Let 𝜄 ∈ 𝐺𝐻𝐼𝑛𝑓 𝐺𝐻𝑆𝑒𝑙 be a GArgCong inference with 𝑝𝑟𝑒𝑚𝑠 ⁢ ( 𝜄 ) ∈ 𝒢 ⁢ ( 𝒩 ) . Then 𝜄 is of the form

\prftree ⁢ [ 𝑟 ] ⁢ GArgCong ⁢ 𝐶 ⁢ 𝜃

𝐶 ′ ⁢ 𝜃 ∨ 𝑠 ⁢ 𝜃 ≈ 𝑠 ′ ⁢ 𝜃 ⁢ 𝐶 ′ ⁢ 𝜃 ∨ 𝑠 ⁢ 𝜃 ⁢ 𝑢 ¯ 𝑛 ≈ 𝑠 ′ ⁢ 𝜃 ⁢ 𝑢 ¯ 𝑛

where 𝒢 − 1 ⁢ ( 𝒞 ⁢ 𝜃 )

𝒞

𝒞 ′ ∨ 𝓈 ≈ 𝓈 ′ , the literal 𝑠 ⁢ 𝜃 ≈ 𝑠 ′ ⁢ 𝜃 is strictly eligible w.r.t. 𝐺𝐻𝑆𝑒𝑙 , and 𝑠 ⁢ 𝜃 and 𝑠 ′ ⁢ 𝜃 are of functional type. It follows that 𝑠 and 𝑠 ′ have either a functional or a polymorphic type. Let 𝜎 be the most general substitution such that 𝑠 ⁢ 𝜎 and 𝑠 ′ ⁢ 𝜎 take 𝑛 arguments. Then 𝑠 ≉ 𝑠 ′ is eligible in 𝐶 w.r.t. 𝜎 and 𝐻𝑆𝑒𝑙 . Hence the following inference 𝜄 ′ ∈ 𝐻𝐼𝑛𝑓 is applicable:

\prftree ⁢ [ 𝑟 ] ⁢ ArgCong ⁢ 𝐶 ′ ∨ 𝑠 ≈ 𝑠 ′ ⁢ 𝓅 ⁢ 𝓊 ⁢ 𝓇 ⁢ ℯ ⁡ ( 𝐶 ′ ⁢ 𝜎 ∨ 𝑠 ⁢ 𝜎 ⁢ 𝑥 ¯ 𝑛 ≈ 𝑠 ′ ⁢ 𝜎 ⁢ 𝑥 ¯ 𝑛 )

Then 𝜄 is a ground instance of 𝜄 ′ and is therefore liftable.

GExt: The conclusion of a GExt inference in 𝐺𝐻𝐼𝑛𝑓 is by definition a ground instance of the conclusion of the Ext inference in 𝐻𝐼𝑛𝑓 before purification. Hence, the GExt inference is a ground instance of the Ext inference. Therefore it is liftable. ∎

Lemma \thethm (Lifting of Sup).

Let 𝜄 ∈ 𝐺𝐻𝐼𝑛𝑓 𝐺𝐻𝑆𝑒𝑙 be a Sup inference

\prftree ⁢ [ 𝑟 ] ⁢ Sup ⁢ 𝐷 ′ ⁢ 𝜃 ∨ 𝑡 ⁢ 𝜃 ≈ 𝑡 ′ ⁢ 𝜃 𝐶 ′ ⁢ 𝜃 ∨ 𝑠 ⁢ 𝜃 ⁢

⁢ 𝑡 ⁢ 𝜃 ⁢

𝑝 ≈ ˙ 𝑠 ′ ⁢ 𝜃 ⁢ 𝐷 ′ ⁢ 𝜃 ∨ 𝐶 ′ ⁢ 𝜃 ∨ 𝑠 ⁢ 𝜃 ⁢

⁢ 𝑡 ′ ⁢ 𝜃 ⁢

𝑝 ≈ ˙ 𝑠 ′ ⁢ 𝜃

where 𝒢 − 1 ⁢ ( 𝒟 ⁢ 𝜃 )

𝒟

𝒟 ′ ∨ 𝓉 ≈ 𝓉 ′ and 𝒢 − 1 ⁢ ( 𝒞 ⁢ 𝜃 )

𝒞

𝒞 ′ ∨ 𝓈 ≈ ˙ 𝓈 ′ . Suppose that the position 𝑝 exists as a green subterm in 𝑠 . Let 𝑢 be the green subterm of 𝑠 at that position and 𝜎

mgu ⁡ ( 𝑡 , 𝑢 ) (which exists since 𝜃 is a unifier). If the variable condition holds for 𝐶 , 𝑡 , 𝑡 ′ , 𝑢 , and 𝜎 , then 𝜄 is liftable.

Proof.

The inference conditions of 𝜄 can be lifted to 𝐷 and 𝐶 . That 𝑡 ⁢ 𝜃 ≈ 𝑡 ′ ⁢ 𝜃 is strictly eligible in 𝐷 ⁢ 𝜃 w.r.t. 𝐺𝐻𝑆𝑒𝑙 implies that 𝑡 ≈ 𝑡 ′ is strictly eligible in 𝐷 w.r.t. 𝜎 and 𝐻𝑆𝑒𝑙 . If 𝑠 ⁢ 𝜃 ≈ ˙ 𝑠 ′ ⁢ 𝜃 is (strictly) eligible in 𝐶 ⁢ 𝜃 w.r.t. 𝐺𝐻𝑆𝑒𝑙 , then 𝑠 ≈ ˙ 𝑠 ′ is (strictly) eligible in 𝐶 w.r.t. 𝜎 and 𝐻𝑆𝑒𝑙 . Moreover, 𝐷 ⁢ 𝜃 ⋡ 𝐶 ⁢ 𝜃 implies 𝐷 ⋡ 𝐶 , 𝑡 ⁢ 𝜃 ⊀ 𝑡 ′ ⁢ 𝜃 implies 𝑡 ⊀ 𝑡 ′ , and 𝑠 ⁢ 𝜃 ⊀ 𝑠 ′ ⁢ 𝜃 implies 𝑠 ⊀ 𝑠 ′ .

By assumption, 𝑝 is a position of 𝑠 and the variable condition holds. Thus, the following inference 𝜄 ′ ∈ 𝐻𝐼𝑛𝑓 is applicable:

\prftree ⁢ [ 𝑟 ] ⁢ Sup ⁢ 𝐷 ′ ∨ 𝑡 ≈ 𝑡 ′ 𝐶 ′ ∨ 𝑠 ⁢

⁢ 𝑢 ⁢

𝑝 ≈ ˙ 𝑠 ′ ⁢ 𝓅 ⁢ 𝓊 ⁢ 𝓇 ⁢ ℯ ⁡ ( ( 𝐷 ′ ∨ 𝐶 ′ ∨ 𝑠 ⁢

⁢ 𝑡 ′ ⁢

𝑝 ≈ ˙ 𝑠 ′ ) ⁢ 𝜎 )

By Lemma 19.3, we have ( 𝑝𝑟𝑒𝑐𝑜𝑛𝑐𝑙 ⁢ ( 𝜄 ′ ) ) ⁢ 𝜃

𝑐𝑜𝑛𝑐𝑙 ⁢ ( 𝜄 ) . Hence, 𝜄 is the 𝜃 -ground instance of 𝜄 ′ and is therefore liftable. ∎

The other Sup inferences might not be liftable, but they are redundant:

Lemma \thethm.

Let 𝜄 ∈ 𝐺𝐻𝐼𝑛𝑓 𝐺𝐻𝑆𝑒𝑙 be a Sup inference

\prftree ⁢ [ 𝑟 ] ⁢ Sup ⁢ 𝐷 ′ ⁢ 𝜃 ∨ 𝑡 ⁢ 𝜃 ≈ 𝑡 ′ ⁢ 𝜃 𝐶 ′ ⁢ 𝜃 ∨ 𝑠 ⁢ 𝜃 ⁢

⁢ 𝑡 ⁢ 𝜃 ⁢

𝑝 ≈ ˙ 𝑠 ′ ⁢ 𝜃 ⁢ 𝐷 ′ ⁢ 𝜃 ∨ 𝐶 ′ ⁢ 𝜃 ∨ 𝑠 ⁢ 𝜃 ⁢

⁢ 𝑡 ′ ⁢ 𝜃 ⁢

𝑝 ≈ ˙ 𝑠 ′ ⁢ 𝜃

where 𝒢 − 1 ⁢ ( 𝒟 ⁢ 𝜃 )

𝒟

𝒟 ′ ∨ 𝓉 ≈ 𝓉 ′ and 𝒢 − 1 ⁢ ( 𝒞 ⁢ 𝜃 )

𝒞

𝒞 ′ ∨ 𝓈 ≈ ˙ 𝓈 ′ . Suppose that Lemma 19.3 does not apply. This could be either because the position 𝑝 is below a variable in 𝑠 or because the variable condition does not hold. Then 𝜄 ∈ 𝐺𝐻𝑅𝑒𝑑 I 𝐺𝐻𝑆𝑒𝑙 ⁢ ( 𝒢 ⁢ ( 𝒩 ) ) .

Proof.

By the definition of 𝐺𝐻𝑅𝑒𝑑 I , to show 𝜄 ∈ 𝐺𝐻𝑅𝑒𝑑 I 𝐺𝐻𝑆𝑒𝑙 ⁢ ( 𝒢 ⁢ ( 𝒩 ) ) , it suffices to prove that { 𝐸 ∈ ℱ ⁢ ( 𝒢 ⁢ ( 𝒩 ) ) ∣ ℰ ≺ ℱ ⁢ ( 𝒞 ⁢ 𝜃 ) } ⊧ ℱ ⁢ ( 𝑐𝑜𝑛𝑐𝑙 ⁢ ( 𝜄 ) ) . Let ℐ be a first-order model of all 𝐸 ∈ ℱ ⁢ ( 𝒢 ⁢ ( 𝒩 ) ) with 𝐸 ≺ ℱ ⁢ ( 𝒞 ⁢ 𝜃 ) . We must show that ℐ ⊧ ℱ ⁢ ( 𝑐𝑜𝑛𝑐𝑙 ⁢ ( 𝜄 ) ) . If ℐ ⊧ ℱ ⁢ ( 𝒟 ′ ⁢ 𝜃 ) , this is obvious. So we further assume that ℐ ⊧̸ ℱ ⁢ ( 𝒟 ′ ⁢ 𝜃 ) . Since 𝐷 ⁢ 𝜃 ≺ 𝐶 ⁢ 𝜃 by the Sup inference conditions, it follows that ℐ ⊧ ℱ ⁢ ( 𝓉 ⁢ 𝜃 ≈ 𝓉 ′ ⁢ 𝜃 ) . By congruence, it suffices to show ℐ ⊧ ℱ ⁢ ( 𝒞 ⁢ 𝜃 ) . We proceed by a case distinction on the two possible reasons why Lemma 19.3 does not apply:

Case 1: The position 𝑝 is below a variable in 𝑠 . Then 𝑡 ⁢ 𝜃 is a proper green subterm of 𝑥 ⁢ 𝜃 and hence a green subterm of 𝑥 ⁢ 𝜃 ⁢ 𝑤 ¯ for any arguments 𝑤 ¯ . Let 𝑣 be the term that we obtain by replacing 𝑡 ⁢ 𝜃 by 𝑡 ′ ⁢ 𝜃 in 𝑥 ⁢ 𝜃 at the relevant position. It follows from our assumptions about ℐ that ℐ ⊧ ℱ ⁢ ( 𝓉 ⁢ 𝜃 ≈ 𝓉 ′ ⁢ 𝜃 ) , and by congruence, ℐ ⊧ ℱ ⁢ ( 𝓍 ⁢ 𝜃 ⁢ 𝓌 ¯ ≈ 𝓋 ⁢ 𝓌 ¯ ) for any arguments 𝑤 ¯ . Hence, ℐ ⊧ ℱ ⁢ ( 𝒞 ⁢ 𝜃 ) if and only if ℐ ⊧ ℱ ⁢ ( 𝒞 ⁢ { 𝓍 ↦ 𝓋 } ⁢ 𝜃 ) . By the inference conditions we have 𝑡 ⁢ 𝜃 ≻ 𝑡 ′ ⁢ 𝜃 , which implies ℱ ⁢ ( 𝒞 ⁢ 𝜃 ) ≻ ℱ ⁢ ( 𝒞 ⁢ { 𝓍 ↦ 𝓋 } ⁢ 𝜃 ) by compatibility with green contexts. Therefore, we have ℐ ⊧ ℱ ⁢ ( 𝒞 ⁢ { 𝓍 ↦ 𝓋 } ⁢ 𝜃 ) and hence ℐ ⊧ ℱ ⁢ ( 𝒞 ⁢ 𝜃 ) .

Case 2: The variable condition does not hold. In the extensional calculi, it follows that 𝑢 has a variable head and jells with 𝑡 ≈ 𝑡 ′ . By Definition 12.1, this means that 𝑢 , 𝑡 , and 𝑡 ′ have the following form: 𝑢

𝑥 ⁢ 𝑣 ¯ 𝑛 for some variable 𝑥 and a tuple of terms 𝑣 ¯ 𝑛 of length 𝑛 ≥ 0 ; 𝑡

𝑡 ~ ⁢ 𝑥 ¯ 𝑛 and 𝑡 ′

𝑡 ~ ′ ⁢ 𝑥 ¯ 𝑛 , where 𝑥 ¯ 𝑛 are variables that do not occur elsewhere in 𝐷 .

For the intensional calculi, we have 𝑢 ∈ 𝒱 . Thus, 𝑢 , 𝑡 , and 𝑡 ′ can be written in the same form as described above for the extensional calculi, with 𝑛

0 .

Case 2.1 (Purifying calculi): First, we assume that 𝑥 occurs only with arguments 𝑣 ¯ 𝑛 in 𝐶 . For the intensional calculus, this must be the case because 𝑛

0 and hence 𝑥 can only occur without arguments by the definition of 𝓅 ⁢ 𝓊 ⁢ 𝓇 ⁢ ℯ and the literal selection restriction. Define a substitution 𝜃 ′ by 𝑥 ⁢ 𝜃 ′

𝑡 ~ ′ ⁢ 𝜃 and 𝑦 ⁢ 𝜃 ′

𝑦 ⁢ 𝜃 for other variables 𝑦 . Since 𝑡 ⁢ 𝜃 ≻ 𝑡 ′ ⁢ 𝜃 by the inference conditions, we have 𝐶 ⁢ 𝜃 ≻ 𝐶 ⁢ 𝜃 ′ . Moreover, 𝐶 ⁢ 𝜃 ′ ∈ 𝒢 ⁢ ( 𝒩 ) . Then ℐ ⊧ ℱ ⁢ ( 𝒞 ⁢ 𝜃 ) by congruence, because ℐ ⊧ ℱ ⁢ ( 𝒞 ⁢ 𝜃 ′ ) and ℐ ⊧ ℱ ⁢ ( 𝓉 ⁢ 𝜃 ≈ 𝓉 ′ ⁢ 𝜃 ) .

Now we assume that 𝑥 occurs with arguments other than 𝑣 ¯ 𝑛 in 𝐶 . This can only happen in the extensional calculus and by the selection restrictions, 𝑠 ⁢ 𝜃 ≈ ˙ 𝑠 ′ ⁢ 𝜃 must not be selected in 𝐶 ⁢ 𝜃 . Therefore, 𝑠 ⁢ 𝜃 is the maximal term in 𝐶 ⁢ 𝜃 . Then 𝑠 ≠ 𝑥 and hence 𝑣 ¯ 𝑛 ≠ 𝜀 because otherwise 𝑠 ⁢ 𝜃

𝑥 ⁢ 𝜃 would be smaller than the applied occurrence of 𝑥 ⁢ 𝜃 in 𝐶 ⁢ 𝜃 .

Define a substitution 𝜃 ′′ such that 𝑥 ⁢ 𝜃 ′′

𝑡 ~ ′ ⁢ 𝜃 , 𝑦 ⁢ 𝜃 ′′

𝑡 ~ ′ ⁢ 𝜃 for other variables 𝑦 if 𝑦 ⁢ 𝜃

𝑠 ⁢ 𝜃 and 𝐶 contains the literal 𝑥 ≉ 𝑦 , and 𝑦 ⁢ 𝜃 ′′

𝑦 ⁢ 𝜃 otherwise.

We show that 𝐶 ⁢ 𝜃 ≻ 𝐶 ⁢ 𝜃 ′′ by proving that no literal of 𝐶 ⁢ 𝜃 ′′ is larger than the maximal literal 𝑠 ⁢ 𝜃 ≈ ˙ 𝑠 ′ ⁢ 𝜃 of 𝐶 ⁢ 𝜃 and that 𝑠 ⁢ 𝜃 ≈ ˙ 𝑠 ′ ⁢ 𝜃 appears more often in 𝐶 ⁢ 𝜃 than in 𝐶 ⁢ 𝜃 ′′ .

For a literal of the form 𝑥 ≉ 𝑦 , we have 𝑥 ⁢ 𝜃 ′′ ≺ 𝑠 ⁢ 𝜃 and 𝑦 ⁢ 𝜃 ′′ ≺ 𝑠 ⁢ 𝜃 . For literals that are not of this form, by the definition of 𝓅 ⁢ 𝓊 ⁢ 𝓇 ⁢ ℯ in the extensional calculus, 𝑥 occurs always with arguments 𝑣 ¯ 𝑛 . Hence these literals are equal or smaller in 𝐶 ⁢ 𝜃 ′′ than in 𝐶 ⁢ 𝜃 , because 𝑥 ⁢ 𝜃 ′′ ⁢ 𝑣 ¯ 𝑛 ≺ 𝑥 ⁢ 𝜃 ⁢ 𝑣 ¯ 𝑛 and 𝑦 ⁢ 𝜃 ′′ ⪯ 𝑦 ⁢ 𝜃 . Therefore, no literal of 𝐶 ⁢ 𝜃 ′′ is larger than the maximal literal 𝑠 ⁢ 𝜃 ≈ ˙ 𝑠 ′ ⁢ 𝜃 of 𝐶 ⁢ 𝜃 . Moreover, these inequalities show that every occurrence of 𝑠 ⁢ 𝜃 ≈ ˙ 𝑠 ′ ⁢ 𝜃 in 𝐶 ⁢ 𝜃 ′′ corresponds to an occurrence of 𝑠 ⁢ 𝜃 ≈ ˙ 𝑠 ′ ⁢ 𝜃 in 𝐶 ⁢ 𝜃 that corresponds to a literal in 𝐶 without the variable 𝑥 . Since at least one occurrence of 𝑠 ⁢ 𝜃 ≈ ˙ 𝑠 ′ ⁢ 𝜃 in 𝐶 ⁢ 𝜃 corresponds to a literal in 𝐶 containing 𝑥 , 𝑠 ⁢ 𝜃 ≈ ˙ 𝑠 ′ ⁢ 𝜃 appears more often in 𝐶 ⁢ 𝜃 than in 𝐶 ⁢ 𝜃 ′′ . This concludes the argument that 𝐶 ⁢ 𝜃 ≻ 𝐶 ⁢ 𝜃 ′′ . It follows that ℐ ⊧ ℱ ⁢ ( 𝒞 ⁢ 𝜃 ′′ ) .

We need to show that ℐ ⊧ ℱ ⁢ ( 𝒞 ⁢ 𝜃 ) . There is a PosExt inference from 𝐷 to 𝐷 ′ ∨ 𝑡 ~ ≈ 𝑡 ~ ′ . This inference is in 𝐻𝑅𝑒𝑑 I ⁢ ( 𝑁 ) because 𝑁 is saturated. Therefore, 𝐷 ′ ⁢ 𝜃 ∨ 𝑡 ~ ⁢ 𝜃 ≈ 𝑡 ~ ′ ⁢ 𝜃 is in 𝒢 ⁢ ( 𝒩 ) ∪ 𝐺𝐻𝑅𝑒𝑑 C ⁢ ( 𝒢 ⁢ ( 𝒩 ) ) . It follows that ℐ ⊧ ℱ ⁢ ( 𝒟 ′ ⁢ 𝜃 ∨ 𝓉 ~ ⁢ 𝜃 ≈ 𝓉 ~ ′ ⁢ 𝜃 ) because this clause is smaller than ℱ ⁢ ( 𝒟 ′ ⁢ 𝜃 ) and hence smaller than ℱ ⁢ ( 𝒞 ⁢ 𝜃 ) . Since ℱ ⁢ ( 𝒟 ′ ⁢ 𝜃 ) is false in ℐ , we have ℐ ⊧ ℱ ⁢ ( 𝓉 ~ ⁢ 𝜃 ≈ 𝓉 ~ ′ ⁢ 𝜃 ) .

For every literal of the form 𝑥 ≉ 𝑦 , where 𝑦 ⁢ 𝜃

𝑠 ⁢ 𝜃 , the variable 𝑦 can only occur without arguments in 𝐶 because of the maximality of 𝑠 ⁢ 𝜃 . We distinguish two cases. If for every literal of the form 𝑥 ≉ 𝑦 where 𝑦 ⁢ 𝜃

𝑠 ⁢ 𝜃 , we have ℐ ⊧ ℱ ⁢ ( 𝓎 ⁢ 𝜃 ′′ ≈ 𝓎 ⁢ 𝜃 ) , then ℐ ⊧ ℱ ⁢ ( 𝒞 ⁢ 𝜃 ) by congruence. If for some literal of the form 𝑥 ≉ 𝑦 where 𝑦 ⁢ 𝜃

𝑠 ⁢ 𝜃 , we have ℐ ⊧ ℱ ⁢ ( 𝓎 ⁢ 𝜃 ′′ ≉ 𝓎 ⁢ 𝜃 ) , then ℐ ⊧ ℱ ⁢ ( 𝓎 ⁢ 𝜃 ≉ 𝓍 ⁢ 𝜃 ) because 𝑦 ⁢ 𝜃 ′′

𝑡 ~ ′ ⁢ 𝜃 , ℐ ⊧ ℱ ⁢ ( 𝓉 ~ ′ ⁢ 𝜃 ≈ 𝓉 ~ ⁢ 𝜃 ) , and 𝑡 ~ ⁢ 𝜃

𝑥 ⁢ 𝜃 . Hence a literal of ℱ ⁢ ( 𝒞 ⁢ 𝜃 ) is true in ℐ and therefore ℐ ⊧ 𝐶 ⁢ 𝜃 .

Case 2.2 (Nonpurifying calculi): Since the variable condition does not hold, we have 𝐶 ⁢ 𝜃 ⪰ 𝐶 ′′ ⁢ 𝜃 , where 𝐶 ′′

𝐶 ⁢ { 𝑥 ↦ 𝑡 ~ ′ } . We cannot have 𝐶 ⁢ 𝜃

𝐶 ′′ ⁢ 𝜃 because 𝑥 ⁢ 𝜃

𝑡 ~ ⁢ 𝜃 ≠ 𝑡 ~ ′ ⁢ 𝜃 and 𝑥 occurs in 𝐶 . Hence, we have 𝐶 ⁢ 𝜃 ≻ 𝐶 ′′ ⁢ 𝜃 .

By the definition of ℐ , 𝐶 ⁢ 𝜃 ≻ 𝐶 ′′ ⁢ 𝜃 implies ℐ ⊧ ℱ ⁢ ( 𝒞 ′′ ⁢ 𝜃 ) . We will use equalities that are true in ℐ to rewrite ℱ ⁢ ( 𝒞 ⁢ 𝜃 ) into ℱ ⁢ ( 𝒞 ′′ ⁢ 𝜃 ) , which implies ℐ ⊧ ℱ ⁢ ( 𝒞 ⁢ 𝜃 ) by congruence.

By saturation of 𝑁 , for any well-typed 𝑚 -tuple of fresh variables 𝑧 ¯ , we can use a PosExt with premise 𝐷 (if 𝑛 > 𝑚 ) or ArgCong inference with premise 𝐷 (if 𝑛 < 𝑚 ) or using 𝐷 itself (if 𝑛

𝑚 ) to show that 𝒢 ⁢ ( 𝒟 ′ ∨ 𝓉 ~ ⁢ 𝓏 ¯ ≈ 𝓉 ~ ′ ⁢ 𝓏 ¯ ) ⊆ 𝒢 ⁢ ( 𝒩 ) ∪ 𝐺𝐻𝑅𝑒𝑑 C ⁢ ( 𝒢 ⁢ ( 𝒩 ) ) . Hence, 𝐷 ′ ⁢ 𝜃 ∨ 𝑡 ~ ⁢ 𝜃 ⁢ 𝑢 ¯ ≈ 𝑡 ~ ′ ⁢ 𝜃 ⁢ 𝑢 ¯ is in 𝒢 ⁢ ( 𝒩 ) ∪ 𝐺𝐻𝑅𝑒𝑑 C ⁢ ( 𝒢 ⁢ ( 𝒩 ) ) for any ground arguments 𝑢 ¯ .

We observe that whenever 𝑡 ~ ⁢ 𝜃 ⁢ 𝑢 ¯ and 𝑡 ~ ′ ⁢ 𝜃 ⁢ 𝑢 ¯ are smaller than the maximal term of 𝐶 ⁢ 𝜃 for some arguments 𝑢 ¯ , we have

ℐ ⊧ ℱ ⁢ ( 𝓉 ~ ⁢ 𝜃 ⁢ 𝓊 ¯ ) ≈ ℱ ⁢ ( 𝓉 ~ ′ ⁢ 𝜃 ⁢ 𝓊 ¯ )

To show this, we assume that 𝑡 ~ ⁢ 𝜃 ⁢ 𝑢 ¯ and 𝑡 ~ ′ ⁢ 𝜃 ⁢ 𝑢 ¯ are smaller than the maximal term of 𝐶 ⁢ 𝜃 and we distinguish two cases: If 𝑡 ⁢ 𝜃 is smaller than the maximal term of 𝐶 ⁢ 𝜃 , all terms in 𝐷 ′ ⁢ 𝜃 are smaller than the maximal term of 𝐶 ⁢ 𝜃 and hence 𝐷 ′ ⁢ 𝜃 ∨ 𝑡 ~ ⁢ 𝜃 ⁢ 𝑢 ¯ ≈ 𝑡 ~ ′ ⁢ 𝜃 ⁢ 𝑢 ¯ ≺ 𝐶 ⁢ 𝜃 . If, on the other hand, 𝑡 ⁢ 𝜃 is equal to the maximal term of 𝐶 ⁢ 𝜃 , 𝑡 ~ ⁢ 𝜃 ⁢ 𝑢 ¯ and 𝑡 ~ ′ ⁢ 𝜃 ⁢ 𝑢 ¯ are smaller than 𝑡 ⁢ 𝜃 . Hence 𝑡 ~ ⁢ 𝜃 ⁢ 𝑢 ¯ ≈ 𝑡 ~ ′ ⁢ 𝜃 ⁢ 𝑢 ¯ ≺ 𝑡 ⁢ 𝜃 ≈ 𝑡 ′ ⁢ 𝜃 and 𝐷 ′ ⁢ 𝜃 ∨ 𝑡 ~ ⁢ 𝜃 ⁢ 𝑢 ¯ ≈ 𝑡 ~ ′ ⁢ 𝜃 ⁢ 𝑢 ¯ ≺ 𝐷 ⁢ 𝜃 ≺ 𝐶 ⁢ 𝜃 . In both cases, since ℱ ⁢ ( 𝒟 ′ ⁢ 𝜃 ) is false in ℐ by assumption, ℐ ⊧ ℱ ⁢ ( 𝓉 ~ ⁢ 𝜃 ⁢ 𝓊 ¯ ) ≈ ℱ ⁢ ( 𝓉 ~ ′ ⁢ 𝜃 ⁢ 𝓊 ¯ ) .

We proceed by a case distinction on whether 𝑠 ⁢ 𝜃 appears in a selected or in a maximal literal of 𝐶 ⁢ 𝜃 . In both cases we provide an algorithm that establishes the equivalence of 𝐶 ⁢ 𝜃 and 𝐶 ′′ ⁢ 𝜃 via rewriting using (19.3). This might seem trivial at first sight, but we can only use the equations (19.3) if 𝑡 ~ ⁢ 𝜃 ⁢ 𝑢 ¯ and 𝑡 ~ ′ ⁢ 𝜃 ⁢ 𝑢 ¯ are smaller than the maximal term of 𝐶 ⁢ 𝜃 . Moreover, 𝑢 ¯ might itself contain positions where we want to rewrite, so the order of rewriting matters.

Case 2.2.1: 𝑠 ⁢ 𝜃 is the maximal side of a maximal literal of 𝐶 ⁢ 𝜃 . Then, since 𝐶 ⁢ 𝜃 ≻ 𝐶 ′′ ⁢ 𝜃 , every term in 𝐶 ⁢ 𝜃 and in 𝐶 ′′ ⁢ 𝜃 is smaller than or equal to 𝑠 ⁢ 𝜃 . Let 𝐶 0 and 𝐶 ~ 0 be the clauses resulting from rewriting ℱ ⁢ ( 𝓉 ⁢ 𝜃 )

■→ ℱ ⁢ ( 𝓉 ′ ⁢ 𝜃 ) wherever possible in ℱ ⁢ ( 𝒞 ⁢ 𝜃 ) and ℱ ⁢ ( 𝒞 ′′ ⁢ 𝜃 ) , respectively. Since ℱ ⁢ ( 𝓉 ⁢ 𝜃 ) is a subterm of ℱ ⁢ ( 𝓈 ⁢ 𝜃 ) , now every term in 𝐶 0 and 𝐶 ~ 0 is strictly smaller than ℱ ⁢ ( 𝓈 ⁢ 𝜃 ) .

We define 𝐶 1 , 𝐶 2 , … inductively as follows: Given 𝐶 𝑖 , choose a subterm of the form ℱ ⁢ ( 𝓉 ~ ⁢ 𝜃 ⁢ 𝓊 ¯ ) where 𝑡 ~ ⁢ 𝜃 ⁢ 𝑢 ¯ ≻ 𝑡 ~ ′ ⁢ 𝜃 ⁢ 𝑢 ¯ or of the form ℱ ⁢ ( 𝓉 ~ ′ ⁢ 𝜃 ⁢ 𝓊 ¯ ) where 𝑡 ~ ′ ⁢ 𝜃 ⁢ 𝑢 ¯ ≻ 𝑡 ~ ⁢ 𝜃 ⁢ 𝑢 ¯ . Let 𝐶 𝑖 + 1 be the clause resulting from rewriting that subterm ℱ ⁢ ( 𝓉 ~ ⁢ 𝜃 ⁢ 𝓊 ¯ ) to ℱ ⁢ ( 𝓉 ~ ′ ⁢ 𝜃 ⁢ 𝓊 ¯ ) or that subterm ℱ ⁢ ( 𝓉 ~ ′ ⁢ 𝜃 ⁢ 𝓊 ¯ ) to ℱ ⁢ ( 𝓉 ~ ⁢ 𝜃 ⁢ 𝓊 ¯ ) in 𝐶 𝑖 , depending on which term was chosen. Analogously, we define 𝐶 ~ 1 , 𝐶 ~ 2 , … by applying the same algorithm to 𝐶 ~ 0 . In both cases, the process terminates because ≻ is compatible with green contexts and well founded. Let 𝐶 * and 𝐶 ~ * be the respective final clauses.

The algorithm preserves the invariant that every term in 𝐶 𝑖 and 𝐶 ~ 𝑖 is strictly smaller than 𝑠 ⁢ 𝜃 . By congruence via (†), applied at every step of the algorithm, we know that 𝐶 * and ℱ ⁢ ( 𝒞 ⁢ 𝜃 ) are equivalent in ℐ and that 𝐶 ~ * and ℱ ⁢ ( 𝒞 ′′ ⁢ 𝜃 ) are equivalent in ℐ as well.

We show that 𝐶 *

𝐶 ~ * . Assume that 𝐶 * ≠ 𝐶 ~ * . The algorithm preserves a second invariant, namely that ℱ − 1 ⁢ ( 𝒞 𝒾 ) and ℱ − 1 ⁢ ( 𝒞 ~ 𝒿 ) are equal except for positions where one contains 𝑡 ~ ⁢ 𝜃 and the other one contains 𝑡 ~ ′ ⁢ 𝜃 . Consider a deepest position where ℱ − 1 ⁢ ( 𝒞 * ) and ℱ − 1 ⁢ ( 𝒞 ~ * ) are different. The respective position in 𝐶 * and 𝐶 ~ * then contains ℱ ⁢ ( 𝓉 ~ ⁢ 𝜃 ⁢ 𝓊 ¯ ) and ℱ ⁢ ( 𝓉 ~ ′ ⁢ 𝜃 ⁢ 𝓊 ¯ ) or vice versa. The arguments 𝑢 ¯ must be equal because we consider a deepest position. But then 𝑡 ~ ⁢ 𝜃 ⁢ 𝑢 ¯ ≻ 𝑡 ~ ′ ⁢ 𝜃 ⁢ 𝑢 ¯ or 𝑡 ~ ⁢ 𝜃 ⁢ 𝑢 ¯ ≺ 𝑡 ~ ′ ⁢ 𝜃 ⁢ 𝑢 ¯ , which is impossible since the algorithm terminated in 𝐶 * and 𝐶 ~ * . This shows that 𝐶 *

𝐶 ~ * . Hence ℱ ⁢ ( 𝒞 ⁢ 𝜃 ) and ℱ ⁢ ( 𝒞 ′′ ⁢ 𝜃 ) are equivalent, which proves ℐ ⊧ ℱ ⁢ ( 𝒞 ⁢ 𝜃 ) .

Case 2.2.2: 𝑠 ⁢ 𝜃 is the maximal side of a selected literal of 𝐶 ⁢ 𝜃 . Then, by the selection restrictions, 𝑥 cannot be the head of a maximal literal of 𝐶 .

At every position where 𝑥 ⁢ 𝑢 ¯ occurs in 𝐶 with some (or no) arguments 𝑢 ¯ , we rewrite ( 𝑡 ~ ⁢ 𝑢 ¯ ) ⁢ 𝜃 to ( 𝑡 ~ ′ ⁢ 𝑢 ¯ ) ⁢ 𝜃 in 𝐶 ⁢ 𝜃 if ( 𝑡 ~ ⁢ 𝑢 ¯ ) ⁢ 𝜃 ≻ ( 𝑡 ~ ′ ⁢ 𝑢 ¯ ) ⁢ 𝜃 . We start with the innermost occurrences of 𝑥 , so that the order of the two terms at one step does not change by later rewriting. Analogously, at every position where 𝑥 ⁢ 𝑢 ¯ occurs in 𝐶 with some (or no) arguments 𝑢 ¯ , we rewrite ( 𝑡 ~ ′ ⁢ 𝑢 ¯ ) ⁢ 𝜃 to ( 𝑡 ~ ⁢ 𝑢 ¯ ) ⁢ 𝜃 in 𝐶 ′′ ⁢ 𝜃 if ( 𝑡 ~ ′ ⁢ 𝑢 ¯ ) ⁢ 𝜃 ≻ ( 𝑡 ~ ⁢ 𝑢 ¯ ) ⁢ 𝜃 , again starting with the innermost occurrences.

We never rewrite at the top level of the maximal term of 𝐶 ⁢ 𝜃 or 𝐶 ′′ ⁢ 𝜃 because 𝑥 cannot be the head of a maximal literal of 𝐶 . The two resulting clauses are identical because 𝐶 ⁢ 𝜃 and 𝐶 ′′ ⁢ 𝜃 only differ at positions where 𝑥 occurs in 𝐶 . The rewritten terms are all smaller than the maximal term of 𝐶 ⁢ 𝜃 . With (19.3), this implies that ℐ ⊧ ℱ ⁢ ( 𝒞 ⁢ 𝜃 ) by congruence. ∎

With these properties of our inference systems in place, the saturation framework’s lifting theorem (Theorem 19.3) guarantees static and dynamic refutational completeness of 𝐻𝐼𝑛𝑓 w.r.t. 𝐻𝑅𝑒𝑑 I . However, this theorem gives us refutational completeness w.r.t. the Herbrand entailment ⊧ 𝐺 , defined as 𝑁 1 ⊧ 𝐺 ⁢ 𝑁 2 if 𝒢 ⁢ ( 𝒩 1 ) ⊧ 𝒢 ⁢ ( 𝒩 2 ) , whereas our semantics is Tarski entailment ⊧ , defined as 𝑁 1 ⊧ 𝑁 2 if any model of 𝑁 1 is a model of 𝑁 2 . The following lemma repairs this mismatch:

Lemma \thethm.

For 𝑁 ⊆ 𝒞 H , we have 𝑁 ⊧ 𝐺 ⊥ if and only if 𝑁 ⊧ ⊥ .

Proof.

By Lemma 12.3, any model of 𝑁 is also a model of 𝒢 ⁢ ( 𝒩 ) —i.e., 𝑁 ⊧̸ ⊥ implies 𝑁 ⊧̸ 𝐺 ⊥ . For the other direction, we need to show that 𝑁 ⊧̸ 𝐺 ⊥ implies 𝑁 ⊧̸ ⊥ . Assume that 𝑁 ⊧̸ 𝐺 ⊥ —i.e., 𝒢 ⁢ ( 𝒩 ) ⊧̸ ⊥ . Then there is a model ℐ of 𝒢 ⁢ ( 𝒩 ) . We must show that there exists a model of 𝑁 —i.e., 𝑁 ⊧̸ ⊥ . Let ℐ ′ be an interpretation derived from ℐ by removing all universes that are not the denotation of a type in 𝒯 ⁢ 𝓎 GH and removing all domain elements that are not the denotation of a term in 𝒯 GH , making ℐ ′ term-generated. Clearly, in our clausal logic, this leaves the denotations of terms and the truth of ground clauses unchanged. Thus, ℐ ′ ⊧ 𝒢 ⁢ ( 𝒩 ) . We will show that ℐ ′ ⊧ 𝑁 . Let 𝐶 ∈ 𝑁 . We want to show that 𝐶 is true in ℐ ′ for all valuations 𝜉 . Fix a valuation 𝜉 . By construction, for each variable 𝑥 , there exists a ground term 𝑠 𝑥 such that ⟦ 𝑠 𝑥 ⟧ ℐ ′

𝜉 ( 𝑥 ) . Let 𝜌 be the substitution that maps every free variable 𝑥 in 𝐶 to 𝑠 𝑥 . Then 𝜉 ( 𝑥 )

⟦ 𝑠 𝑥 ⟧ ℐ ′

⟦ 𝑥 𝜌 ⟧ ℐ ′ for all 𝑥 . By treating the type variables of 𝐶 in the same way, we can also achieve that 𝜉 ( 𝛼 )

⟦ 𝛼 𝜌 ⟧ ℐ ′ for all 𝑥 . By Lemma 12.3, ⟦ 𝑡 𝜌 ⟧ ℐ ′

⟦ 𝑡 ⟧ ℐ ′ 𝜉 for all terms 𝑡 and ⟦ 𝜏 𝜌 ⟧ ℐ ′

⟦ 𝜏 ⟧ ℐ ′ 𝜉 for all types 𝜏 . Hence, 𝐶 ⁢ 𝜌 and 𝐶 have the same truth value in ℐ ′ for 𝜉 . Since ℐ ′ ⊧ 𝒢 ⁢ ( 𝒩 ) , 𝐶 ⁢ 𝜌 is true in ℐ ′ and thus 𝐶 is true in ℐ ′ as well. ∎

Theorem \thethm (Static refutational completeness).

The inference system 𝐻𝐼𝑛𝑓 is statically refutationally complete w.r.t. ( 𝐻𝑅𝑒𝑑 I , 𝐻𝑅𝑒𝑑 C ) . That means, if 𝑁 ⊆ 𝒞 H is a clause set saturated w.r.t. 𝐻𝐼𝑛𝑓 and 𝐻𝑅𝑒𝑑 I , then we have 𝑁 ⊧ ⊥ if and only if ⊥ ∈ 𝑁 .

Proof.

We apply Theorem 19.3. By Theorem 19.2, 𝐺𝐻𝐼𝑛𝑓 𝐺𝐻𝑆𝑒𝑙 is statically refutationally complete for all 𝐺𝐻𝑆𝑒𝑙 ∈ 𝒢 ⁢ ( 𝐻𝑆𝑒𝑙 ) . By Lemmas 19.3, 19.3, and 19.3, for every saturated 𝑁 ⊆ 𝒞 H , there exists a selection function 𝐺𝐻𝑆𝑒𝑙 ∈ 𝒢 ⁢ ( 𝐻𝑆𝑒𝑙 ) such that all inferences 𝜄 ∈ 𝐺𝐻𝐼𝑛𝑓 𝐺𝐻𝑆𝑒𝑙 with 𝑝𝑟𝑒𝑚𝑠 ⁢ ( 𝜄 ) ∈ 𝒢 ⁢ ( 𝒩 ) either are 𝒢 𝐺𝐻𝑆𝑒𝑙 -ground instances of 𝐻𝐼𝑛𝑓 -inferences from 𝑁 or belong to 𝐺𝐻𝑅𝑒𝑑 I 𝐺𝐻𝑆𝑒𝑙 ⁢ ( 𝒢 ⁢ ( 𝒩 ) ) .

Theorem 19.3 implies that if 𝑁 ⊆ 𝒞 H is a clause set saturated w.r.t. 𝐻𝐼𝑛𝑓 and 𝐻𝑅𝑒𝑑 I , then 𝑁 ⊧ 𝐺 ⊥ if and only if ⊥ ∈ 𝑁 . By Lemma 19.3, this also holds for the Tarski entailment ⊧ . That is, if 𝑁 ⊆ 𝒞 H is a clause set saturated w.r.t. 𝐻𝐼𝑛𝑓 and 𝐻𝑅𝑒𝑑 I , then 𝑁 ⊧ ⊥ if and only if ⊥ ∈ 𝑁 . ∎

From static refutational completeness, we can easily derive dynamic refutational completeness.

Theorem \thethm (Dynamic refutational completeness).

The inference system 𝐻𝐼𝑛𝑓 is dynamically refutationally complete w.r.t. ( 𝐻𝑅𝑒𝑑 I , 𝐻𝑅𝑒𝑑 C ) , as defined in Definition 17.

Proof.

By Theorem 17 of Waldmann et al., this follows from Theorem 19.3 and Lemma 19.3. ∎

Implementation

Zipperposition [Cru15, Cru17] is an open source superposition-based theorem prover written in OCaml.111https://github.com/sneeuwballen/zipperposition It was initially designed for polymorphic first-order logic with equality, as embodied by TPTP TF1 [BP13]. We will refer to this implementation as Zipperposition’s first-order mode. Later, Cruanes extended the prover with a pragmatic higher-order mode with support for 𝜆 -abstractions and extensionality, without any completeness guarantees. We have now also implemented complete 𝜆 -free higher-order modes based on the four calculi described in this article.

The pragmatic higher-order mode provided a convenient basis to implement our calculi. It employs higher-order term and type representations and orders. Its ad hoc calculus extensions are similar to our calculi. They include an ArgCong-like rule and a PosExt-like rule, and Sup inferences are performed only at green subterms. One of the bugs we found during our implementation work occurred because argument positions shift when applied variables are instantiated. We resolved this by numbering argument positions in terms from right to left.

To implement the 𝜆 -free higher-order mode, we restricted the unification algorithm to non- 𝜆 -abstractions. To satisfy the requirements on selection, we avoid selecting literals that contain higher-order variables. To comply with our redundancy notion, we disabled rewriting of nongreen subterms. To improve term indexing of higher-order terms, we replaced the imperfect discrimination trees by fingerprint indices [Sch12]. To speed up the computation of the Sup conditions, we omit the condition 𝐶 ⁢ 𝜎 ⋠ 𝐷 ⁢ 𝜎 in the implementation, at the cost of performing some additional inferences.

For the purifying calculi, we implemented purification as a simplification rule. This ensures that it is applied aggressively on all clauses, whether initial clauses from the problem or clauses produced during saturation, before any inferences are performed.

For the nonpurifying calculi, we added the possibility to perform Sup inferences at variable positions. This means that variables must be indexed as well. In addition, we modified the variable condition. Depending on the term ordering, it may be expensive or even impossible to decide whether there exists a grounding substitution 𝜃 with 𝑡 ⁢ 𝜎 ⁢ 𝜃 ≻ 𝑡 ′ ⁢ 𝜎 ⁢ 𝜃 and 𝐶 ⁢ 𝜎 ⁢ 𝜃 ≺ 𝐶 ′′ ⁢ 𝜎 ⁢ 𝜃 . We overapproximate the condition as follows: (1) check whether 𝑥 appears with different arguments in the clause 𝐶 ; (2) use a term-order-specific algorithm to determine whether there might exist a grounding substitution 𝜃 and terms 𝑢 ¯ such that 𝑡 ⁢ 𝜎 ⁢ 𝜃 ≻ 𝑡 ′ ⁢ 𝜎 ⁢ 𝜃 and 𝑡 ⁢ 𝜎 ⁢ 𝜃 ⁢ 𝑢 ¯ ≺ 𝑡 ′ ⁢ 𝜎 ⁢ 𝜃 ⁢ 𝑢 ¯ ; and (3) check whether 𝐶 ⁢ 𝜎 ⋡ 𝐶 ′′ ⁢ 𝜎 . If these three conditions apply, we conclude that there might exist a ground substitution 𝜃 witnessing nonmonotonicity.

For the extensional calculi, we add axiom (Ext) to the clause set. To curb the explosion associated with extensionality, this axiom and all clauses derived from it are penalized by the clause selection heuristic. We also added the NegExt rule described in Section 12.2, which resembles Vampire’s extensionality resolution rule [GKKV14].

The ArgCong rule can have infinitely many conclusions on polymorphic clauses. To capture this in the implementation, we store these infinite sequences of conclusions in the form of finite instructions of how to obtain the actual clauses. In addition to the usual active and passive set of the DISCOUNT loop architecture [ADF95], we use a set of scheduled inferences that stores these instructions. We visit the scheduled inferences in this additional set and the clauses in the passive set fairly to achieve dynamic completeness of our prover architecture. Waldmann et al. [WTRB20, Example 34] and Bentkamp et al. [BBT + 21, Section 6] describe this architecture in more detail.

Using Zipperposition, we can quantify the disadvantage of the applicative encoding on Example 12.2. A well-chosen KBO instance with argument coefficients allows Zipperposition to derive ⊥ in 4 iterations of the prover’s main loop and 0.03 s. KBO or LPO with default settings needs 203 iterations and 0.4 s, whereas KBO or LPO on the applicatively encoded problem needs 203 iterations and more than 1 s due to the larger terms.

Evaluation

We evaluated Zipperposition’s implementation of our four calculi on Sledgehammer-generated Isabelle/HOL benchmarks [BN10] and on benchmarks from the TPTP library [SSCB12, SBBT09]. Our experimental data is available online.222https://doi.org/10.5281/zenodo.3992618 We used the development version of Zipperposition, revision 2031e216.333https://github.com/sneeuwballen/zipperposition/tree/2031e216c1941acd76187882a073e8f1e533

The Sledgehammer benchmarks, corresponding to Isabelle’s Judgment Day suite, were regenerated to target clausal 𝜆 -free higher-order logic. They comprise 2506 problems in total, divided in two groups based on the number of Isabelle facts (lemmas, definitions, etc.) selected for inclusion in each problem: either 16 facts (SH16) or 256 facts (SH256). The problems were generated by encoding 𝜆 -expressions as 𝜆 -lifted supercombinators [MP08].

From the TPTP library, we collected 708 first-order problems in TFF format and 717 higher-order problems in THF format, both groups containing both monomorphic and polymorphic problems. We excluded all problems that contain interpreted arithmetic symbols, the symbols (@@+), (@@-), (@+), (@-), (&), or tuples, as well as the SYN000 problems, which are only intended to test the parser, and problems whose clausal normal form takes longer than 15 s to compute or falls outside the 𝜆 -free fragment described in Section 9.

We want to answer the following questions:

(1)

What is the overhead of our calculi on first-order benchmarks?

(2)

Do the calculi outperform the applicative encoding?

(3)

Do the purifying or the nonpurifying calculi achieve better results?

(4)

What is the cost of nonmonotonicity?

Since the present work is only a stepping stone towards a prover for full higher-order logic, it would be misleading to compare this prototype with state-of-the-art higher-order provers that support a stronger logic. Many of the higher-order problems in the TPTP library are in fact satisfiable for our 𝜆 -free logic, even though they may be unsatisfiable for full higher-order logic and labeled as such in the TPTP.

To answer question (1) and (3), we ran Zipperposition’s first-order mode on the first-order benchmarks and the purifying and nonpurifying modes on all benchmarks. To answer question (2), we implemented an applicative encoding mode in Zipperposition, which performs the applicative encoding after the clausal normal form transformation and then proceeds with Zipperposition’s first-order mode. The encoding makes all function symbols nullary and replaces all applications with a polymorphic binary 𝖺𝗉𝗉 symbol.

We instantiated all four calculi with three term orders: LPO [BWW17], KBO [BBWW17] (without argument coefficients), and EPO [Ben18]. Among these, LPO is the only nonmonotonic order and therefore the most relevant option to evaluate our calculi, which are designed to cope with nonmonotonicity. To answer question (4), we also include the monotone orders KBO and EPO. EPO is an order designed to resemble LPO while fulfilling the requirements of a ground-total simplification order on 𝜆 -free terms. KBO and EPO serve as a yardstick to assess the cost of nonmonotonicity. With these monotone orders, no superposition inferences at variables are necessary and thus the nonpurifying calculi become a straightforward generalization of the standard superposition calculus with the caveat that it may be more efficient to superpose at nongreen subterms directly instead of relying on the ArgCong rule. On first-order benchmarks and in the applicative encoding mode, all three orders are monotone because they are monotone on first-order terms.

Figure 2 summarizes, for the intensional calculi, the number of solved satisfiable and unsatisfiable problems within 180 s, and the time taken to show unsatisfiability. Figure 2 presents the corresponding data for the extensional calculi. The average time is computed over the problems that all configurations for the respective benchmark set and term order found to be unsatisfiable within the time limit. For each combination of benchmark set and term order, the best result is highlighted in bold. The evaluation was carried out on StarExec Iowa [SST14] using Intel Xeon E5-2609 0 CPUs clocked at 2.40 GHz.

Figure 1. Evaluation of the intensional calculi # sat # unsat ⌀ time LPO KBO EPO LPO KBO EPO LPO KBO EPO SH16 applicative encoding 111 189 65 373 382 157 0.9 1.2 10.7 nonpurifying calculus 136 165 133 383 385 381 0.4 0.3 0.0 purifying calculus 82 98 82 363 363 355 1.3 2.0 0.0 SH256 applicative encoding 1 1 1 471 488 36 9.4 8.7 63.8 nonpurifying calculus 1 1 1 543 554 498 2.3 2.3 0.1 purifying calculus 1 1 1 523 528 484 2.6 3.4 0.5 TFF first-order mode 0 0 0 212 229 107 1.9 2.3 1.5 applicative encoding 0 0 0 180 205 21 7.0 10.0 4.6 nonpurifying calculus 0 0 0 210 229 105 1.9 2.4 1.5 purifying calculus 0 0 0 211 229 105 2.1 2.6 1.6 THF applicative encoding 127 115 111 523 522 428 0.9 0.6 0.8 nonpurifying calculus 111 114 112 529 527 516 0.3 0.3 0.0 purifying calculus 108 109 108 528 526 514 0.3 0.5 0.0 # sat # unsat ⌀ time LPO KBO EPO LPO KBO EPO LPO KBO EPO SH16 applicative encoding 79 152 48 379 386 157 1.2 1.3 11.4 nonpurifying calculus 103 131 95 386 393 387 0.4 0.1 0.0 purifying calculus 32 57 32 367 365 363 2.0 1.7 0.0 SH256 applicative encoding 1 1 1 462 486 36 7.5 9.4 63.8 nonpurifying calculus 1 1 1 548 572 504 1.9 2.1 0.1 purifying calculus 1 1 1 512 529 482 2.2 5.0 0.1 TFF first-order mode 0 0 0 212 229 107 1.9 2.5 1.5 applicative encoding 0 0 0 178 202 21 7.9 11.7 4.7 nonpurifying calculus 0 0 0 207 229 106 2.1 3.0 1.5 purifying calculus 0 0 0 210 229 105 2.2 3.2 1.6 THF applicative encoding 108 109 105 526 527 436 0.9 0.6 1.1 nonpurifying calculus 106 108 107 539 535 526 0.3 0.3 0.0 purifying calculus 96 97 96 530 529 519 0.3 0.6 0.0 Figure 1. Evaluation of the intensional calculi Figure 2. Evaluation of the extensional calculi

The experimental results on the TFF part of the TPTP library confirm that our calculi handle the vast majority of problems that are solvable in first-order mode gracefully, and thus that the overhead is minimal, answering question (1). On first-order problems, the calculi are occasionally at variance with the first-order mode, due to the interaction of ArgCong with polymorphic types and due to the extensionality axiom (Ext). In contrast, the applicative encoding is comparatively inefficient on problems that are already first-order. For LPO, the success rate decreases by around 15%, and the average time to show unsatisfiability triples.

The SH16 benchmarks consist mostly of small higher-order problems. The small number of axioms benefits the applicative encoding enough to outperform the purifying calculi but not the nonpurifying ones. The SH256 benchmarks are also higher-order but much larger. Such problems are underrepresented in the TPTP library. On these, our calculi clearly outperform the applicative encoding, answering question (2) decisively. This is hardly surprising given that the proving effort is dominated by first-order reasoning, which they can perform gracefully.

The THF benchmarks generally require more sophisticated higher-order reasoning than the Sledgehammer benchmarks, as observed by Sultana, Blanchette, and Paulson [SBP13, Section 5]. On these benchmarks, the empirical results are less clear; the applicative encoding and our calculi are roughly neck-and-neck. The nonpurifying calculi detect unsatisfiability slightly more frequently, whereas the applicative encoding tends to find more saturations. It seems that, due to the large amount of higher-order reasoning necessary to solve TPTP problems, the advantage of our calculi on the first-order parts of the derivation is not a decisive factor on these benchmarks.

Concerning question (3), the nonpurifying calculi outperform their purifying relatives across all benchmarks. The raw data show that on most benchmark sets, the problems solved by the nonpurifying calculi are almost a superset of the problems solved by the purifying calculi. Only on the SH256 benchmarks, the purifying calculi can solve a fair number of problems that the nonpurifying calculi cannot solve (11 problems for the intensional calculi with LPO and 9 problems for the extensional calculi with LPO).

KBO tends to have a slight advantage over LPO on all benchmark sets. But the gap between KBO and LPO is not larger on the higher-order benchmarks than on TFF. Since LPO is monotonic on first-order terms but nonmonotonic on higher-order terms, whereas KBO is monotonic on both, the best answer we can give to question (4) is that no substantial cost seems to be associated with nonmonotonicity. In particular, for the nonpurifying calculi, the additional superposition inferences at variables necessary with LPO do not have a negative impact on the overall performance. EPO generally performs worse than the other two orders, with the exception of the nonpurifying calculus on SH16 benchmarks, where it is roughly neck-and-neck with LPO. This suggests that for small, mildly higher-order problems, EPO can be a viable LPO-like complement to KBO if one considers the effort to implement our calculi too high.

Discussion and Related Work

Our calculi join a long list of extensions and refinements of superposition. Among the most closely related is Peltier’s [Pel16] Isabelle/HOL formalization of the refutational completeness of a superposition calculus that operates on 𝜆 -free higher-order terms and that is parameterized by a monotonic term order. Extensions with polymorphism and induction, independently developed by Cruanes [Cru15, Cru17] and Wand [Wan17], contribute to increasing the power of automatic provers. Detection of inconsistencies in axioms, as suggested by Schulz et al. [SSUP17], is important for large axiomatizations.

Also of interest is Bofill and Rubio’s [BR13] integration of nonmonotonic orders in ordered paramodulation, a precursor of superposition. Their work is a veritable tour de force, but it is also highly complicated and restricted to ordered paramodulation. Lack of compatibility with arguments being a mild form of nonmonotonicity, it seemed preferable to start with superposition, enrich it with an ArgCong rule, and tune the side conditions until we obtained a complete calculus.

Most complications can be avoided by using a monotonic order such as KBO without argument coefficients. However, coefficients can be useful to help achieve compatibility with 𝛽 -reduction. For example, the term 𝜆 ⁢ 𝑥 . 𝑥 + 𝑥 could be treated as a constant with a coefficient of 2 on its argument and a heavy weight to ensure ( 𝜆 𝑥 . 𝑥 + 𝑥 ) 𝑦 ≻ 𝑦 + 𝑦 . Although they do not use argument coefficients, the recently developed 𝜆 -superposition calculus by Bentkamp et al. [BBT + 21] and combinatory superposition calculus by Bhayat and Reger [BR20] need a nonmonotonic order to cope with 𝛽 -reduction. They are modeled after our extensional nonpurifying and intensional nonpurifying calculi, respectively.

Many researchers have proposed or used encodings of higher-order logic constructs into first-order logic, including Robinson [Rob70], Kerber [Ker91], Dougherty [Dou93], Dowek et al. [DHK95], Hurd [Hur03], Meng and Paulson [MP08], Obermeyer [Obe09], and Czajka [Cza16]. Encodings of types, such as those by Bobot and Paskevich [BP11] and Blanchette et al. [BBPS16], are also crucial to obtain a sound encoding of higher-order logic. These ideas are implemented in proof assistant tools such as HOLyHammer and Sledgehammer [BKPU16].

In the term rewriting community, 𝜆 -free higher-order logic is known as applicative first-order logic. First-order rewrite techniques can be applied to this logic via the applicative encoding. However, there are similar drawbacks as in theorem proving to having 𝖺𝗉𝗉 as the only nonnullary symbol. Hirokawa et al. [HMZ13] propose a technique that resembles our mapping ℱ to avoid these drawbacks.

Another line of research has focused on the development of automated proof procedures for higher-order logic. Robinson’s [Rob69], Andrews’s [And71], and Huet’s [Hue73] pioneering work stands out. Andrews [And01] and Benzmüller and Miller [BM14] provide excellent surveys. The competitive higher-order automatic theorem provers include Leo-II [BPTF08] (based on RUE resolution), Satallax [Bro12] (based on a tableau calculus and a SAT solver), agsyHOL [Lin14] (based on a focused sequent calculus and a generic narrowing engine), Leo-III [SB18] (based on a pragmatic higher-order version of ordered paramodulation with no completeness guarantees), CVC4 and veriT [BREO + 19] (both based on satisfiability modulo theories), and Vampire [BR19, BR20] (based on superposition and 𝖲𝖪 -style combinators). The Isabelle proof assistant [NPW02] (which includes a tableau reasoner and a rewriting engine) and its Sledgehammer subsystem have also participated in the higher-order division of the CADE ATP System Competition.

Zipperposition is a convenient vehicle for experimenting and prototyping because it is easier to understand and modify than highly-optimized C or C++ provers. Our middle-term goal is to design higher-order superposition calculi, implement them in state-of-the-art provers such as E [Sch13], SPASS [WDF + 09], and Vampire [KV13], and integrate these in proof assistants to provide a high level of automation. With its stratified architecture, Otter- 𝜆 [Bee04] is perhaps the closest to what we are aiming at, but it is limited to second-order logic and offers no completeness guarantees. As a first step, Vukmirović, Blanchette, Cruanes, and Schulz [VBCS19] have generalized E’s data structures and algorithms to clausal 𝜆 -free higher-order logic, assuming a monotonic KBO [BBWW17].

Conclusion

We presented four superposition calculi for intensional and extensional clausal 𝜆 -free higher-order logic and proved them refutationally complete. The calculi nicely generalize standard superposition and are compatible with our 𝜆 -free higher-order LPO and KBO. Especially on large problems, our experiments confirm what one would naturally expect: that native support for partial application and applied variables outperforms the applicative encoding.

The new calculi reduce the gap between proof assistants based on higher-order logic and superposition provers. We can use them to reason about arbitrary higher-order problems by axiomatizing suitable combinators. But perhaps more importantly, they appear promising as a stepping stone towards complete, highly efficient automatic theorem provers for full higher-order logic. Indeed, the subsequent work by Bentkamp et al. [BBT + 21], which introduces support for 𝜆 -expressions, and Bhayat and Reger [BR20], which works with 𝖲𝖪 -style combinators, is largely based on our nonpurifying calculi.

Acknowledgment.

We are grateful to the maintainers of StarExec for letting us use their service. We want to thank Petar Vukmirović for his work on Zipperposition and for his advice. We thank Christoph Benzmüller, Sander Dahmen, Johannes Hölzl, Anders Schlichtkrull, Stephan Schulz, Alexander Steen, Geoff Sutcliffe, Andrei Voronkov, Daniel Wand, Christoph Weidenbach, and the participants in the 2017 Dagstuhl Seminar on Deduction beyond First-Order Logic for stimulating discussions. We also want to thank Christoph Benzmüller, Ahmed Bhayat, Wan Fokkink, Alexander Steen, Mark Summerfield, Sophie Tourret, and the anonymous reviewers for suggesting several textual improvements. Bentkamp and Blanchette’s research has received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation program (grant agreement No. 713999, Matryoshka).

References [ADF95] Jürgen Avenhaus, Jörg Denzinger, and Matthias Fuchs. DISCOUNT: A system for distributed equational deduction. In Jieh Hsiang, editor, RTA-95, volume 914 of LNCS, pages 397–402. Springer, 1995. [And71] Peter B. Andrews. Resolution in type theory. J. Symb. Log., 36(3):414–432, 1971. [And01] Peter B. Andrews. Classical type theory. In John Alan Robinson and Andrei Voronkov, editors, Handbook of Automated Reasoning, volume II, pages 965–1007. Elsevier and MIT Press, 2001. [BBCW18] Alexander Bentkamp, Jasmin Christian Blanchette, Simon Cruanes, and Uwe Waldmann. Superposition for lambda-free higher-order logic. In Didier Galmiche, Stephan Schulz, and Roberto Sebastiani, editors, IJCAR 2018, volume 10900 of LNCS, pages 28–46. Springer, 2018. [BBPS16] Jasmin Christian Blanchette, Sascha Böhme, Andrei Popescu, and Nicholas Smallbone. Encoding monomorphic and polymorphic types. Log. Meth. Comput. Sci., 12(4:13):1–52, 2016. [BBT + 21] Alexander Bentkamp, Jasmin Blanchette, Sophie Tourret, Petar Vukmirović, and Uwe Waldmann. Superposition with lambdas. J. Autom. Reason., 2021. [BBWW17] Heiko Becker, Jasmin Christian Blanchette, Uwe Waldmann, and Daniel Wand. A transfinite Knuth–Bendix order for lambda-free higher-order terms. In Leonardo de Moura, editor, CADE-26, volume 10395 of LNCS, pages 432–453. Springer, 2017. [BC04] Yves Bertot and Pierre Castéran. Interactive Theorem Proving and Program Development: Coq’Art: The Calculus of Inductive Constructions. Texts in Theoretical Computer Science. Springer, 2004. [Bee04] Michael Beeson. Lambda logic. In David A. Basin and Michaël Rusinowitch, editors, IJCAR 2004, volume 3097 of LNCS, pages 460–474. Springer, 2004. [Ben18] Alexander Bentkamp. Formalization of the embedding path order for lambda-free higher-order terms. Archive of Formal Proofs, 2018. http://isa-afp.org/entries/Lambda_Free_EPO.html. [BG94] Leo Bachmair and Harald Ganzinger. Rewrite-based equational theorem proving with selection and simplification. J. Log. Comput., 4(3):217–247, 1994. [BG01] Leo Bachmair and Harald Ganzinger. Resolution theorem proving. In John Alan Robinson and Andrei Voronkov, editors, Handbook of Automated Reasoning, volume I, pages 19–99. Elsevier and MIT Press, 2001. [BKPU16] Jasmin Christian Blanchette, Cezary Kaliszyk, Lawrence C. Paulson, and Josef Urban. Hammering towards QED. J. Formaliz. Reas., 9(1):101–148, 2016. [BM14] Christoph Benzmüller and Dale Miller. Automation of higher-order logic. In Jörg H. Siekmann, editor, Computational Logic, volume 9 of Handbook of the History of Logic, pages 215–254. Elsevier, 2014. [BN10] Sascha Böhme and Tobias Nipkow. Sledgehammer: Judgement Day. In Jürgen Giesl and Reiner Hähnle, editors, IJCAR 2010, volume 6173 of LNCS, pages 107–121. Springer, 2010. [BP11] François Bobot and Andrei Paskevich. Expressing polymorphic types in a many-sorted language. In Cesare Tinelli and Viorica Sofronie-Stokkermans, editors, FroCoS 2011, volume 6989 of LNCS, pages 87–102. Springer, 2011. [BP13] Jasmin Christian Blanchette and Andrei Paskevich. TFF1: The TPTP typed first-order form with rank-1 polymorphism. In Maria Paola Bonacina, editor, CADE-24, volume 7898 of LNCS, pages 414–420. Springer, 2013. [BPTF08] Christoph Benzmüller, Lawrence C. Paulson, Frank Theiss, and Arnaud Fietzke. LEO-II—A cooperative automatic theorem prover for higher-order logic. In Alessandro Armando, Peter Baumgartner, and Gilles Dowek, editors, IJCAR 2008, volume 5195 of LNCS, pages 162–170. Springer, 2008. [BR13] Miquel Bofill and Albert Rubio. Paramodulation with non-monotonic orderings and simplification. J. Autom. Reason., 50(1):51–98, 2013. [BR19] Ahmed Bhayat and Giles Reger. Restricted combinatory unification. In Pascal Fontaine, editor, CADE-27, volume 11716 of LNCS, pages 74–93. Springer, 2019. [BR20] Ahmed Bhayat and Giles Reger. A combinator-based superposition calculus for higher-order logic. In Nicolas Peltier and Viorica Sofronie-Stokkermans, editors, IJCAR 2020, Part I, volume 12166 of LNCS, pages 278–296. Springer, 2020. [Bra75] D. Brand. Proving theorems with the modification method. SIAM J. Comput., 4:412–430, 1975. [BREO + 19] Haniel Barbosa, Andrew Reynolds, Daniel El Ouraoui, Cesare Tinelli, and Clark Barrett. Extending SMT solvers to higher-order logic. In Pascal Fontaine, editor, CADE-27, volume 11716 of LNCS, pages 35–54. Springer, 2019. [Bro12] Chad E. Brown. Satallax: An automatic higher-order prover. In Bernhard Gramlich, Dale Miller, and Uli Sattler, editors, IJCAR 2012, volume 7364 of LNCS, pages 111–117. Springer, 2012. [BWW17] Jasmin Christian Blanchette, Uwe Waldmann, and Daniel Wand. A lambda-free higher-order recursive path order. In Javier Esparza and Andrzej S. Murawski, editors, FoSSaCS 2017, volume 10203 of LNCS, pages 461–479. Springer, 2017. [Chu40] Alonzo Church. A formulation of the simple theory of types. J. Symb. Log., 5(2):56–68, 1940. [CP03] Iliano Cervesato and Frank Pfenning. A linear spine calculus. J. Log. Comput., 13(5):639–688, 2003. [Cru15] Simon Cruanes. Extending Superposition with Integer Arithmetic, Structural Induction, and Beyond. Ph.D. thesis, École polytechnique, 2015. [Cru17] Simon Cruanes. Superposition with structural induction. In Clare Dixon and Marcelo Finger, editors, FroCoS 2017, volume 10483 of LNCS, pages 172–188. Springer, 2017. [Cza16] Łukasz Czajka. Improving automation in interactive theorem provers by efficient encoding of lambda-abstractions. In Jeremy Avigad and Adam Chlipala, editors, CPP 2016, pages 49–57. ACM, 2016. [DH86] Vincent J. Digricoli and Malcolm C. Harrison. Equality-based binary resolution. J. ACM, 33(2):253–289, 1986. [DHK95] Gilles Dowek, Thérèse Hardin, and Claude Kirchner. Higher-order unification via explicit substitutions (extended abstract). In LICS ’95, pages 366–374. IEEE Computer Society, 1995. [Dou93] Daniel J. Dougherty. Higher-order unification via combinators. Theor. Comput. Sci., 114(2):273–298, 1993. [Fit96] Melvin Fitting. First-Order Logic and Automated Theorem Proving. Springer-Verlag, 2nd edition, 1996. [Fit02] Melvin Fitting. Types, Tableaus, and Gödel’s God. Kluwer, 2002. [GKKV14] Ashutosh Gupta, Laura Kovács, Bernhard Kragl, and Andrei Voronkov. Extensional crisis and proving identity. In Franck Cassez and Jean-François Raskin, editors, ATVA 2014, volume 8837 of LNCS, pages 185–200. Springer, 2014. [GM93] M. J. C. Gordon and T. F. Melham, editors. Introduction to HOL: A Theorem Proving Environment for Higher Order Logic. Cambridge University Press, 1993. [Hen50] Leon Henkin. Completeness in the theory of types. J. Symb. Log., 15(2):81–91, 1950. [HMZ13] Nao Hirokawa, Aart Middeldorp, and Harald Zankl. Uncurrying for termination and complexity. J. Autom. Reasoning, 50(3):279–315, 2013. [Hue73] Gérard P. Huet. A mechanization of type theory. In Nils J. Nilsson, editor, IJCAI-73, pages 139–146. William Kaufmann, 1973. [Hur03] Joe Hurd. First-order proof tactics in higher-order logic theorem provers. In Myla Archer, Ben Di Vito, and César Muñoz, editors, Design and Application of Strategies/Tactics in Higher Order Logics, NASA Technical Reports, pages 56–68, 2003. [Ker91] Manfred Kerber. How to prove higher order theorems in first order logic. In John Mylopoulos and Raymond Reiter, editors, IJCAI-91, pages 137–142. Morgan Kaufmann, 1991. [Kop12] Cynthia Kop. Higher Order Termination: Automatable Techniques for Proving Termination of Higher-Order Term Rewriting Systems. Ph.D. thesis, Vrije Universiteit Amsterdam, 2012. [KV13] Laura Kovács and Andrei Voronkov. First-order theorem proving and Vampire. In Natasha Sharygina and Helmut Veith, editors, CAV 2013, volume 8044 of LNCS, pages 1–35. Springer, 2013. [Lei94] Daniel Leivant. Higher order logic. In Dov M. Gabbay, Christopher J. Hogger, J. A. Robinson, and Jörg H. Siekmann, editors, Handbook of Logic in Artificial Intelligence and Logic Programming, Volume 2, Deduction Methodologies, pages 229–322. Oxford University Press, 1994. [Lin14] Fredrik Lindblad. A focused sequent calculus for higher-order logic. In Stéphane Demri, Deepak Kapur, and Christoph Weidenbach, editors, IJCAR 2014, volume 8562 of LNCS, pages 61–75. Springer, 2014. [Mil87] Dale A. Miller. A compact representation of proofs. Studia Logica, 46(4):347–370, 1987. [MP08] Jia Meng and Lawrence C. Paulson. Translating higher-order clauses to first-order clauses. J. Autom. Reason., 40(1):35–60, 2008. [NPW02] Tobias Nipkow, Lawrence C. Paulson, and Markus Wenzel. Isabelle/HOL: A Proof Assistant for Higher-Order Logic, volume 2283 of LNCS. Springer, 2002. [Obe09] Fritz H. Obermeyer. Automated Equational Reasoning in Nondeterministic 𝜆 -Calculi Modulo Theories ℋ * . Ph.D. thesis, Carnegie Mellon University, 2009. [Pel16] Nicolas Peltier. A variant of the superposition calculus. Archive of Formal Proofs, 2016. [Rob69] J.A. Robinson. Mechanizing higher order logic. In B. Meltzer and D. Michie, editors, Machine Intelligence, volume 4, pages 151–170. Edinburgh University Press, 1969. [Rob70] J.A. Robinson. A note on mechanizing higher order logic. In B. Meltzer and D. Michie, editors, Machine Intelligence, volume 5, pages 121–135. Edinburgh University Press, 1970. [SB18] Alexander Steen and Christoph Benzmüller. The higher-order prover Leo-III. In Didier Galmiche, Stephan Schulz, and Roberto Sebastiani, editors, IJCAR 2018, volume 10900 of LNCS, pages 108–116. Springer, 2018. [SBBT09] Geoff Sutcliffe, Christoph Benzmüller, Chad E. Brown, and Frank Theiss. Progress in the development of automated theorem proving for higher-order logic. In Renate A. Schmidt, editor, CADE-22, volume 5663 of LNCS, pages 116–130. Springer, 2009. [SBP13] Nik Sultana, Jasmin Christian Blanchette, and Lawrence C. Paulson. LEO-II and Satallax on the Sledgehammer test bench. J. Applied Logic, 11(1):91–102, 2013. [Sch02] Stephan Schulz. E - a brainiac theorem prover. AI Commun., 15(2-3):111–126, 2002. [Sch12] Stephan Schulz. Fingerprint indexing for paramodulation and rewriting. In Bernhard Gramlich, Dale Miller, and Uli Sattler, editors, IJCAR 2012, volume 7364 of LNCS, pages 477–483. Springer, 2012. [Sch13] Stephan Schulz. System description: E 1.8. In Kenneth L. McMillan, Aart Middeldorp, and Andrei Voronkov, editors, LPAR-19, volume 8312 of LNCS, pages 735–743. Springer, 2013. [SL91] Wayne Snyder and Christopher Lynch. Goal directed strategies for paramodulation. In Ronald V. Book, editor, RTA-91, volume 488 of LNCS, pages 150–161. Springer, 1991. [SS89] Manfred Schmidt-Schauß. Unification in a combination of arbitrary disjoint equational theories. J. Symb. Comput., 8:51–99, 1989. [SSCB12] Geoff Sutcliffe, Stephan Schulz, Koen Claessen, and Peter Baumgartner. The TPTP typed first-order form with arithmetic. In Nikolaj Bjørner and Andrei Voronkov, editors, LPAR-18, volume 7180 of LNCS, pages 406–419. Springer, 2012. [SST14] Aaron Stump, Geoff Sutcliffe, and Cesare Tinelli. StarExec: A cross-community infrastructure for logic solving. In Stéphane Demri, Deepak Kapur, and Christoph Weidenbach, editors, IJCAR 2014, volume 8562 of LNCS, pages 367–373. Springer, 2014. [SSUP17] Stephan Schulz, Geoff Sutcliffe, Josef Urban, and Adam Pease. Detecting inconsistencies in large first-order knowledge bases. In Leonardo de Moura, editor, CADE-26, volume 10395 of LNCS, pages 310–325. Springer, 2017. [Vää19] Jouko Väänänen. Second-order and higher-order logic. In Edward N. Zalta, editor, The Stanford Encyclopedia of Philosophy. Metaphysics Research Lab, Stanford University, fall 2019 edition, 2019. [VBCS19] Petar Vukmirović, Jasmin Christian Blanchette, Simon Cruanes, and Stephan Schulz. Extending a brainiac prover to lambda-free higher-order logic. In Tomas Vojnar and Lijun Zhang, editors, TACAS 2019, volume 11427 of LNCS, pages 192–210. Springer, 2019. [Wan17] Daniel Wand. Superposition: Types and Polymorphism. Ph.D. thesis, Universität des Saarlandes, 2017. [WDF + 09] Christoph Weidenbach, Dilyana Dimova, Arnaud Fietzke, Rohit Kumar, Martin Suda, and Patrick Wischnewski. SPASS version 3.5. In Renate A. Schmidt, editor, CADE-22, volume 5663 of LNCS, pages 140–145. Springer, 2009. [WTRB20] Uwe Waldmann, Sophie Tourret, Simon Robillard, and Jasmin Blanchette. A comprehensive framework for saturation theorem proving. In Nicolas Peltier and Viorica Sofronie-Stokkermans, editors, IJCAR 2020, Part I, volume 12166 of LNCS, pages 316–334. Springer, 2020. Generated on Thu Jul 13 18:04:06 2023 by LATExml

Xet Storage Details

Size:: 164 kB
Xet hash:: fa85af8fd9756d2fea5fbaa88af67f718aa330db33e44760ac801fa0a07b8d8b

Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.