Spaces:

Vilin97
/

VeriDeepResearch

Running

App Files Files Community

VeriDeepResearch / LOG.md

Vilin97's picture

12/12 Putnam 2025 verified — complete sweep, $1.13 total

759bac6 4 days ago

|

history blame contribute delete

4.97 kB

Test Log

All tests run on https://vilin97-verideepresearch.hf.space (local testing via Python 3.10).

Simple / LLM-tricky questions

#	Question	Time	Verified	Notes
1	"Show that for all natural numbers n, n + 0 = n."	16s	Yes	Proved by `rfl`.
2	"Prove that the sum of two even numbers is even."	19s	Yes	Used `Even.add` + explicit proof.
3	"Which is larger, 9.9 or 9.11? Prove it."	4s	Yes	9.9 > 9.11 via `norm_num`. Classic LLM failure — verified correct.
4	"Prove that 1/3 + 1/3 + 1/3 = 1."	16s	Yes	Over ℚ with `norm_num`.

False statements (negation proved)

#	Question	Time	Verified	Notes
5	"Prove that 1 = 2."	2s	N/A	Correctly refused.
6	"Prove every continuous function on [0,1] is differentiable."	32s	Yes	Proved negation:
7	"Prove every monotone sequence of reals converges."	61s	Yes	Proved negation: a_n = n is monotone but divergent.
8	"Prove every group of order 6 is abelian."	73s	Yes	Proved negation: DihedralGroup 3 is non-abelian.

Graduate / Prelim level

#	Question	Time	Verified	Notes
9	"Composition of injective functions is injective."	~20s	Yes
10	"Cantor's theorem: no surjection from S to P(S)."	16s	Yes	Mathlib one-liner + explicit diagonal argument.
11	"Every finite integral domain is a field."	16s	Yes
12	"n³ - n divisible by 6."	39s	Yes	50+ line proof, case analysis.
13	"x² + y² + z² ≥ xy + yz + xz."	19s	Yes	`nlinarith` with `sq_nonneg` hints.
14	"Continuous bijection compact → Hausdorff is homeomorphism."	52s	Yes
15	"Weakest assumptions for abelian group."	54s	Yes	3 answers, all verified.
16	"Cauchy functional equation."	38s	Yes
17	"Weakest condition for limit of continuous functions."	26s	Yes	Uniform convergence.
18	"Rolle's theorem without differentiability."	100s	Yes	Caught Mathlib's `deriv` convention.
19	"Weakest condition for countable subcovers."	192s	Yes	Second-countability → Lindelöf.
20	"Group of order pq is cyclic."	203s	Yes	Previously >15min timeout. Auto-finalize fix.
21	"Schur's inequality t=1."	67s	Yes	Previously >15min timeout. Qwen 3.5 proved it.

UW Analysis Prelim (Sept 2020)

#	Question	Time	Verified	Notes
22	#9a: "Weakly convergent ⟹ norm bounded."	53s	Yes	Banach-Steinhaus.
23	"Well-ordering principle."	18s	Yes	`Nat.find` + Loogle.

Putnam 2025

#	Problem	Time	Cost	Verified	Notes
24	A1: Coprimality of (2m_k+1, 2n_k+1).	74s	$0.04	Yes	Qwen + 2 Loogle + 2 Axle.
25	A2: Bounds a·x(π-x) ≤ sinx ≤ b·x(π-x).	370s	$0.03	Yes	a=0, b=4/π².
26	A5: Maximal f(s) permutation counting.	189s	$0.05	Yes
27	A6: b_{2k+1} - 2b_{2k} divisibility.	45s	$0.03	Yes	Fastest Putnam solve.
28	B2: Centroid comparison x₁ < x₂.	335s	$0.21	Yes	Submitted to Aristotle, but agent proved it first.
29	B3: Set S with divisor property of 2025ⁿ-15ⁿ.	624s	$0.21	Yes	Hardest — 10+ min, multiple Qwen attempts.
30	B4: Matrix sum bound S ≤ (n+2)N/3.	183s	$0.05	Yes	3 Qwen attempts, 5 Axle checks.
31	B5: Modular inverse ordering > p/4-1.	84s	$0.02	Yes	Cheapest Putnam solve.
32	B6: Largest r for g(n+1)-g(n) ≥ (g(g(n)))^r.	511s	$0.25	Yes	Aristotle started (1%), agent proved independently.
33	A3: Digit string game (Alice vs Bob).	163s	$0.12	Yes	Combinatorial game theory formalized.
34	A4: Minimal k for commuting matrices.	64s	$0.02	Yes	Fastest Putnam B-side solve.
35	B1: Plane coloring → monochromatic.	225s	$0.10	Yes	Geometric argument formalized.

Non-math rejection

#	Question	Time	Result
33	"Best Italian restaurant in New York?"	2s	Rejected
34	"What is the weather today?"	2s	Rejected

Summary

12/12 Putnam 2025 problems verified (A1-A6 and B1-B6). Total cost: $1.13.
33/33 mathematical questions verified in Lean 4 across all categories
3/3 false statements caught with verified negation proofs
2/2 non-math questions rejected
Median response time: ~70 seconds for verified proofs
All Lean code verified by Axle (Lean 4.28.0 + Mathlib)
Tools: TheoremSearch, LeanExplore, Loogle, Axle, Qwen 3.5, Aristotle, Kimi K2.5
Email notification with proof.lean and research_log.md attachments