- A Toy Model of Universality: Reverse Engineering How Networks Learn Group Operations Universality is a key hypothesis in mechanistic interpretability -- that different models learn similar features and circuits when trained on similar tasks. In this work, we study the universality hypothesis by examining how small neural networks learn to implement group composition. We present a novel algorithm by which neural networks may implement composition for any finite group via mathematical representation theory. We then show that networks consistently learn this algorithm by reverse engineering model logits and weights, and confirm our understanding using ablations. By studying networks of differing architectures trained on various groups, we find mixed evidence for universality: using our algorithm, we can completely characterize the family of circuits and features that networks learn on this task, but for a given network the precise circuits learned -- as well as the order they develop -- are arbitrary. 3 authors · Feb 6, 2023
- Feature emergence via margin maximization: case studies in algebraic tasks Understanding the internal representations learned by neural networks is a cornerstone challenge in the science of machine learning. While there have been significant recent strides in some cases towards understanding how neural networks implement specific target functions, this paper explores a complementary question -- why do networks arrive at particular computational strategies? Our inquiry focuses on the algebraic learning tasks of modular addition, sparse parities, and finite group operations. Our primary theoretical findings analytically characterize the features learned by stylized neural networks for these algebraic tasks. Notably, our main technique demonstrates how the principle of margin maximization alone can be used to fully specify the features learned by the network. Specifically, we prove that the trained networks utilize Fourier features to perform modular addition and employ features corresponding to irreducible group-theoretic representations to perform compositions in general groups, aligning closely with the empirical observations of Nanda et al. and Chughtai et al. More generally, we hope our techniques can help to foster a deeper understanding of why neural networks adopt specific computational strategies. 5 authors · Nov 13, 2023
- Learning words in groups: fusion algebras, tensor ranks and grokking In this work, we demonstrate that a simple two-layer neural network with standard activation functions can learn an arbitrary word operation in any finite group, provided sufficient width is available and exhibits grokking while doing so. To explain the mechanism by which this is achieved, we reframe the problem as that of learning a particular 3-tensor, which we show is typically of low rank. A key insight is that low-rank implementations of this tensor can be obtained by decomposing it along triplets of basic self-conjugate representations of the group and leveraging the fusion structure to rule out many components. Focusing on a phenomenologically similar but more tractable surrogate model, we show that the network is able to find such low-rank implementations (or approximations thereof), thereby using limited width to approximate the word-tensor in a generalizable way. In the case of the simple multiplication word, we further elucidate the form of these low-rank implementations, showing that the network effectively implements efficient matrix multiplication in the sense of Strassen. Our work also sheds light on the mechanism by which a network reaches such a solution under gradient descent. 3 authors · Sep 8
- A Group with Exactly One Noncommutator The question of whether there exists a finite group of order at least three in which every element except one is a commutator has remained unresolved in group theory. In this article, we address this open problem by developing an algorithmic approach that leverages several group theoretic properties of such groups. Specifically, we utilize a result of Frobenius and various necessary properties of such groups, combined with Plesken and Holt's extensive enumeration of finite perfect groups, to systematically examine all finite groups up to a certain order for the desired property. The computational core of our work is implemented using the computer system GAP (Groups, Algorithms, and Programming). We discover two nonisomorphic groups of order 368,640 that exhibit the desired property. Our investigation also establishes that this order is the minimum order for such a group to exist. As a result, this study provides a positive answer to Problem 17.76 in the Kourovka Notebook. In addition to the algorithmic framework, this paper provides a structural description of one of the two groups found. 2 authors · Nov 1
- Fixed point conditions for non-coprime actions In the setting of finite groups, suppose J acts on N via automorphisms so that the induced semidirect product Nrtimes J acts on some non-empty set Omega, with N acting transitively. Glauberman proved that if the orders of J and N are coprime, then J fixes a point in Omega. We consider the non-coprime case and show that if N is abelian and a Sylow p-subgroup of J fixes a point in Omega for each prime p, then J fixes a point in Omega. We also show that if N is nilpotent, Nrtimes J is supersoluble, and a Sylow p-subgroup of J fixes a point in Omega for each prime p, then J fixes a point in Omega. 1 authors · Aug 23, 2023
- Flat matrix models for quantum permutation groups We study the matrix models pi:C(S_N^+)to M_N(C(X)) which are flat, in the sense that the standard generators of C(S_N^+) are mapped to rank 1 projections. Our first result is a generalization of the Pauli matrix construction at N=4, using finite groups and 2-cocycles. Our second result is the construction of a universal representation of C(S_N^+), inspired from the Sinkhorn algorithm, that we conjecture to be inner faithful. 2 authors · Feb 14, 2016
1 Faster Algorithms for Structured Matrix Multiplication via Flip Graph Search We give explicit low-rank bilinear non-commutative schemes for multiplying structured n times n matrices with 2 leq n leq 5, which serve as building blocks for recursive algorithms with improved multiplicative factors in asymptotic complexity. Our schemes are discovered over F_2 or F_3 and lifted to Z or Q. Using a flip graph search over tensor decompositions, we derive schemes for general, upper-triangular, lower-triangular, symmetric, and skew-symmetric inputs, as well as products of a structured matrix with its transpose. In particular, we obtain 4 times 4 rank-34 schemes: (i) multiplying a general matrix by its transpose using 10 recursive calls, improving the factor from 26/41 (0.634) to 8/13 (0.615); and (ii) multiplying an upper-triangular matrix by a general matrix using 12 recursive calls, improving the factor from 8/13 (0.615) to 22/37 (0.595). Additionally, using F_3 flip graphs, we discover schemes over Q that fundamentally require the inverse of 2, including a 2 times 2 symmetric-symmetric multiplication of rank 5 and a 3 times 3 skew-symmetric-general multiplication of rank 14 (improving upon AlphaTensor's 15). 3 authors · Nov 13
- Knowledge Graph Embedding by Normalizing Flows A key to knowledge graph embedding (KGE) is to choose a proper representation space, e.g., point-wise Euclidean space and complex vector space. In this paper, we propose a unified perspective of embedding and introduce uncertainty into KGE from the view of group theory. Our model can incorporate existing models (i.e., generality), ensure the computation is tractable (i.e., efficiency) and enjoy the expressive power of complex random variables (i.e., expressiveness). The core idea is that we embed entities/relations as elements of a symmetric group, i.e., permutations of a set. Permutations of different sets can reflect different properties of embedding. And the group operation of symmetric groups is easy to compute. In specific, we show that the embedding of many existing models, point vectors, can be seen as elements of a symmetric group. To reflect uncertainty, we first embed entities/relations as permutations of a set of random variables. A permutation can transform a simple random variable into a complex random variable for greater expressiveness, called a normalizing flow. We then define scoring functions by measuring the similarity of two normalizing flows, namely NFE. We construct several instantiating models and prove that they are able to learn logical rules. Experimental results demonstrate the effectiveness of introducing uncertainty and our model. The code is available at https://github.com/changyi7231/NFE. 3 authors · Sep 30, 2024
- FiniteFieldSolve: Exactly Solving Large Linear Systems in High-Energy Theory Large linear systems play an important role in high-energy theory, appearing in amplitude bootstraps and during integral reduction. This paper introduces FiniteFieldSolve, a general-purpose toolkit for exactly solving large linear systems over the rationals. The solver interfaces directly with Mathematica, is straightforward to install, and seamlessly replaces Mathematica's native solvers. In testing, FiniteFieldSolve is approximately two orders of magnitude faster than Mathematica and uses an order of magnitude less memory. The package also compares favorably against other public solvers in FiniteFieldSolve's intended use cases. As the name of the package suggests, solutions are obtained via well-known finite field methods. These methods suffer from introducing an inordinate number of modulo (or integer division) operations with respect to different primes. By automatically recompiling itself for each prime, FiniteFieldSolve converts the division operations into much faster combinations of instructions, dramatically improving performance. The technique of compiling the prime can be applied to any finite field solver, where the time savings will be solver dependent. The operation of the package is illustrated through a detailed example of an amplitude bootstrap. 1 authors · Nov 2, 2023
- An Algorithm for Computing with Brauer's Group Equivariant Neural Network Layers The learnable, linear neural network layers between tensor power spaces of R^{n} that are equivariant to the orthogonal group, O(n), the special orthogonal group, SO(n), and the symplectic group, Sp(n), were characterised in arXiv:2212.08630. We present an algorithm for multiplying a vector by any weight matrix for each of these groups, using category theoretic constructions to implement the procedure. We achieve a significant reduction in computational cost compared with a naive implementation by making use of Kronecker product matrices to perform the multiplication. We show that our approach extends to the symmetric group, S_n, recovering the algorithm of arXiv:2303.06208 in the process. 1 authors · Apr 27, 2023
- EinHops: Einsum Notation for Expressive Homomorphic Operations on RNS-CKKS Tensors Fully Homomorphic Encryption (FHE) is an encryption scheme that allows for computation to be performed directly on encrypted data, effectively closing the loop on secure and outsourced computing. Data is encrypted not only during rest and transit, but also during processing. However, FHE provides a limited instruction set: SIMD addition, SIMD multiplication, and cyclic rotation of 1-D vectors. This restriction makes performing multi-dimensional tensor operations challenging. Practitioners must pack these tensors into 1-D vectors and map tensor operations onto this one-dimensional layout rather than their traditional nested structure. And while prior systems have made significant strides in automating this process, they often hide critical packing decisions behind layers of abstraction, making debugging, optimizing, and building on top of these systems difficult. In this work, we approach multi-dimensional tensor operations in FHE through Einstein summation (einsum) notation. Einsum notation explicitly encodes dimensional structure and operations in its syntax, naturally exposing how tensors should be packed and transformed. We decompose einsum expressions into a fixed set of FHE-friendly operations. We implement our design and present EinHops, a minimalist system that factors einsum expressions into a fixed sequence of FHE operations. EinHops enables developers to perform encrypted tensor operations using FHE while maintaining full visibility into the underlying packing strategy. We evaluate EinHops on a range of tensor operations from a simple transpose to complex multi-dimensional contractions. We show that the explicit nature of einsum notation allows us to build an FHE tensor system that is simple, general, and interpretable. We open-source EinHops at the following repository: https://github.com/baahl-nyu/einhops. 3 authors · Jul 10
- Enabling Efficient Equivariant Operations in the Fourier Basis via Gaunt Tensor Products Developing equivariant neural networks for the E(3) group plays an important role in modeling 3D data across real-world applications. Enforcing this equivariance primarily involves the tensor products of irreducible representations (irreps). However, the computational complexity of such operations increases significantly as higher-order tensors are used. In this work, we propose a systematic approach to substantially accelerate the computation of the tensor products of irreps. We mathematically connect the commonly used Clebsch-Gordan coefficients to the Gaunt coefficients, which are integrals of products of three spherical harmonics. Through Gaunt coefficients, the tensor product of irreps becomes equivalent to the multiplication between spherical functions represented by spherical harmonics. This perspective further allows us to change the basis for the equivariant operations from spherical harmonics to a 2D Fourier basis. Consequently, the multiplication between spherical functions represented by a 2D Fourier basis can be efficiently computed via the convolution theorem and Fast Fourier Transforms. This transformation reduces the complexity of full tensor products of irreps from O(L^6) to O(L^3), where L is the max degree of irreps. Leveraging this approach, we introduce the Gaunt Tensor Product, which serves as a new method to construct efficient equivariant operations across different model architectures. Our experiments on the Open Catalyst Project and 3BPA datasets demonstrate both the increased efficiency and improved performance of our approach. 3 authors · Jan 18, 2024
- Critical groups and partitions of finite groups We define a class of finite groups based on the properties of the closed twins of their power graphs and study the structure of those groups. As a byproduct, we obtain results about finite groups admitting a partition by cyclic subgroups. 2 authors · Dec 16, 2024
- Actions of nilpotent groups on nilpotent groups For finite nilpotent groups J and N, suppose J acts on N via automorphisms. We exhibit a decomposition of the first cohomology set in terms of the first cohomologies of the Sylow p-subgroups of J that mirrors the primary decomposition of H^1(J,N) for abelian N. We then show that if N rtimes J acts on some non-empty set Omega, where the action of N is transitive and for each prime p a Sylow p-subgroup of J fixes an element of Omega, then J fixes an element of Omega. 1 authors · Jan 25
- Certain residual properties of HNN-extensions with normal associated subgroups Let E be the HNN-extension of a group B with subgroups H and K associated according to an isomorphism varphicolon H to K. Suppose that H and K are normal in B and (H cap K)varphi = H cap K. Under these assumptions, we prove necessary and sufficient conditions for E to be residually a C-group, where C is a class of groups closed under taking subgroups, quotient groups, and unrestricted wreath products. Among other things, these conditions give new facts on the residual finiteness and the residual p-finiteness of the group E. 2 authors · Apr 30
- The Price of Freedom: Exploring Expressivity and Runtime Tradeoffs in Equivariant Tensor Products E(3)-equivariant neural networks have demonstrated success across a wide range of 3D modelling tasks. A fundamental operation in these networks is the tensor product, which interacts two geometric features in an equivariant manner to create new features. Due to the high computational complexity of the tensor product, significant effort has been invested to optimize the runtime of this operation. For example, Luo et al. (2024) recently proposed the Gaunt tensor product (GTP) which promises a significant speedup. In this work, we provide a careful, systematic analysis of a number of tensor product operations. In particular, we emphasize that different tensor products are not performing the same operation. The reported speedups typically come at the cost of expressivity. We introduce measures of expressivity and interactability to characterize these differences. In addition, we realized the original implementation of GTP can be greatly simplified by directly using a spherical grid at no cost in asymptotic runtime. This spherical grid approach is faster on our benchmarks and in actual training of the MACE interatomic potential by 30%. Finally, we provide the first systematic microbenchmarks of the various tensor product operations. We find that the theoretical runtime guarantees can differ wildly from empirical performance, demonstrating the need for careful application-specific benchmarking. Code is available at https://github.com/atomicarchitects/PriceofFreedom. 4 authors · Jun 16
- Generalized Convolution and Efficient Language Recognition Convolution is a broadly useful operation with applications including signal processing, machine learning, probability, optics, polynomial multiplication, and efficient parsing. Usually, however, this operation is understood and implemented in more specialized forms, hiding commonalities and limiting usefulness. This paper formulates convolution in the common algebraic framework of semirings and semimodules and populates that framework with various representation types. One of those types is the grand abstract template and itself generalizes to the free semimodule monad. Other representations serve varied uses and performance trade-offs, with implementations calculated from simple and regular specifications. Of particular interest is Brzozowski's method for regular expression matching. Uncovering the method's essence frees it from syntactic manipulations, while generalizing from boolean to weighted membership (such as multisets and probability distributions) and from sets to n-ary relations. The classic trie data structure then provides an elegant and efficient alternative to syntax. Pleasantly, polynomial arithmetic requires no additional implementation effort, works correctly with a variety of representations, and handles multivariate polynomials and power series with ease. Image convolution also falls out as a special case. 1 authors · Mar 26, 2019
- CayleyPy Growth: Efficient growth computations and hundreds of new conjectures on Cayley graphs (Brief version) This is the third paper of the CayleyPy project applying artificial intelligence to problems in group theory. We announce the first public release of CayleyPy, an open source Python library for computations with Cayley and Schreier graphs. Compared with systems such as GAP and Sage, CayleyPy handles much larger graphs and performs several orders of magnitude faster. Using CayleyPy we obtained about 200 new conjectures on Cayley and Schreier graphs, focused on diameters and growth. For many Cayley graphs of symmetric groups Sn we observe quasi polynomial diameter formulas: a small set of quadratic or linear polynomials indexed by n mod s. We conjecture that this is a general phenomenon, giving efficient diameter computation despite the problem being NP hard. We propose a refinement of the Babai type conjecture on diameters of Sn: n^2/2 + 4n upper bounds in the undirected case, compared to previous O(n^2) bounds. We also provide explicit generator families, related to involutions in a square with whiskers pattern, conjectured to maximize the diameter; search confirms this for all n up to 15. We further conjecture an answer to a question posed by V M Glushkov in 1968 on directed Cayley graphs generated by a cyclic shift and a transposition. For nilpotent groups we conjecture an improvement of J S Ellenberg's results on upper unitriangular matrices over Z/pZ, showing linear dependence of diameter on p. Moreover. Some conjectures are LLM friendly, naturally stated as sorting problems verifiable by algorithms or Python code. To benchmark path finding we created more than 10 Kaggle datasets. CayleyPy works with arbitrary permutation or matrix groups and includes over 100 predefined generators. Our growth computation code outperforms GAP and Sage up to 1000 times in speed and size. 49 authors · Sep 23
- Fast Matrix Multiplication via Ternary Meta Flip Graphs Matrix multiplication optimization remains a fundamental challenge in computational mathematics. This work introduces a novel approach that discovers matrix multiplication schemes in the ternary field (Z_T), where coefficients are restricted to {-1, 0, 1} to minimize naive additive complexity. The core of the method is a GPU-accelerated meta flip graph algorithm that maintains ternary safety through specialized arithmetic operations and sign symmetry breaking. Key results include new best ranks for the formats 4 times 5 times 12, 5 times 6 times 10, and 6 times 7 times 9, the independent discovery of 32 schemes in Z_T that match known optimal ranks (including 8 previously known only with rational coefficients), and 30 rank improvements in the binary field. The analysis of 164 known schemes shows that 92 can be implemented in Z_T, while 72 could not be found in the ternary field with current methods, defining the current boundaries of this approach. All software, results, and discovered schemes are provided as open-source. 1 authors · Nov 25
- IterLara: A Turing Complete Algebra for Big Data, AI, Scientific Computing, and Database Lara is a key-value algebra that aims at unifying linear and relational algebra with three types of operation abstraction. The study of Lara's expressive ability reports that it can represent relational algebra and most linear algebra operations. However, several essential computations, such as matrix inversion and determinant, cannot be expressed in Lara. Lara cannot represent global and iterative computation, either. This article proposes IterLara, extending Lara with iterative operators, to provide an algebraic model that unifies operations in general-purpose computing, like big data, AI, scientific computing, and database. We study the expressive ability of Lara and IterLara and prove that IterLara with aggregation functions can represent matrix inversion, determinant. Besides, we demonstrate that IterLara with no limitation of function utility is Turing complete. We also propose the Operation Count (OP) as a metric of computation amount for IterLara and ensure that the OP metric is in accordance with the existing computation metrics. 4 authors · Jul 17, 2023
- Abundance of progression in large set for non commutative semigroup The notion of abundance of certain type of configuration in certain large sets was first proved by Furstenberg and Glazner in 1998. After that many author investigate abundance of different types of configurations in different types of large sets. Hindman, Hosseini, Strauss and Tootkaboni recently introduced another notion of large sets called CR sets. Then Debnath and De proved abundance of arithmetic progression in CR sets for commutative semigroups. In the present article we investigate abundance of progressions in for non-commutative semigroups. 1 authors · Dec 13, 2023
- On Enumerating Higher Bruhat Orders Through Deletion and Contraction The higher Bruhat orders B(n,k) were introduced by Manin-Schechtman to study discriminantal hyperplane arrangements and subsequently studied by Ziegler, who connected B(n,k) to oriented matroids. In this paper, we consider the enumeration of B(n,k) and improve upon Balko's asymptotic lower and upper bounds on |B(n,k)| by a factor exponential in k. A proof of Ziegler's formula for |B(n,n-3)| is given and a bijection between a certain subset of B(n,n-4) and totally symmetric plane partitions is proved. Central to our proofs are deletion and contraction operations for the higher Bruhat orders, defined in analogy with matroids. Dual higher Bruhat orders are also introduced, and we construct isomorphisms relating the higher Bruhat orders and their duals. Additionally, weaving functions are introduced to generalize Felsner's encoding of elements in B(n,2) to all higher Bruhat orders B(n,k). 1 authors · Dec 13, 2024
- On affine spaces of alternating matrices with constant rank Let F be a field, and n geq r>0 be integers, with r even. Denote by A_n(F) the space of all n-by-n alternating matrices with entries in F. We consider the problem of determining the greatest possible dimension for an affine subspace of A_n(F) in which every matrix has rank equal to r (or rank at least r). Recently Rubei has solved this problem over the field of real numbers. We extend her result to all fields with large enough cardinality. Provided that n geq r+3 and |F|geq minbigl(r-1,r{2}+2bigr), we also determine the affine subspaces of rank r matrices in A_n(F) that have the greatest possible dimension, and we point to difficulties for the corresponding problem in the case nleq r+2. 1 authors · Jul 19, 2023
- Inversion of adjunction for quotient singularities III: semi-invariant case We prove the precise inversion of adjunction formula for finite linear group quotients of complete intersection varieties defined by semi-invariant equations. As an application, we prove the semi-continuity of minimal log discrepancies for them. These results extend the results in our first paper, where we prove the same results for complete intersection varieties defined by ``invariant equations". 2 authors · Dec 10, 2023
- Invariant subspaces for finite index shifts in Hardy spaces and the invariant subspace problem for finite defect operators Let mathbb H be the finite direct sums of H^2(mathbb D). In this paper, we give a characterization of the closed subspaces of mathbb H which are invariant under the shift, thus obtaining a concrete Beurling-type theorem for the finite index shift. This characterization presents any such a subspace as the finite intersection, up to an inner function, of pre-images of a closed shift-invariant subspace of H^2(mathbb D) under ``determinantal operators'' from mathbb H to H^2(mathbb D), that is, continuous linear operators which intertwine the shifts and appear as determinants of matrices with entries given by bounded holomorphic functions. With simple algebraic manipulations we provide a direct proof that every invariant closed subspace of codimension at least two sits into a non-trivial closed invariant subspace. As a consequence every bounded linear operator with finite defect has a nontrivial closed invariant subspace. 2 authors · Nov 4, 2024
- Homomorphisms between multidimensional constant-shape substitutions We study a class of Z^{d}-substitutive subshifts, including a large family of constant-length substitutions, and homomorphisms between them, i.e., factors modulo isomorphisms of Z^{d}. We prove that any measurable factor map and even any homomorphism associated to a matrix commuting with the expansion matrix, induces a continuous one. We also get strong restrictions on the normalizer group, proving that any endomorphism is invertible, the normalizer group is virtually generated by the shift action and the quotient of the normalizer group by the automorphisms is restricted by the digit tile of the substitution. 1 authors · Jun 19, 2021
- Complements of finite unions of convex sets Finite unions of convex sets are a central object of study in discrete and computational geometry. In this paper we initiate a systematic study of complements of such unions -- i.e., sets of the form S=R^d setminus (cup_{i=1}^n K_i), where K_i are convex sets. In the first part of the paper we study isolated points in S, whose number is related to the Betti numbers of cup_{i=1}^n K_i and to its non-convexity properties. We obtain upper bounds on the number of such points, which are sharp for n=3 and significantly improve previous bounds of Lawrence and Morris (2009) for all n ll 2^d{d}. In the second part of the paper we study coverings of S by well-behaved sets. We show that S can be covered by at most g(d,n) flats of different dimensions, in such a way that each x in S is covered by a flat whose dimension equals the `local dimension' of S in the neighborhood of x. Furthermore, we determine the structure of a minimum cover that satisfies this property. Then, we study quantitative aspects of this minimum cover and obtain sharp upper bounds on its size in various settings. 2 authors · Aug 26
- Generating functions for some series of characters of classical Lie groups There exist a number of well known multiplicative generating functions for series of Schur functions. Amongst these are some related to the dual Cauchy identity whose expansion coefficients are rather simple, and in some cases periodic in parameters specifying the Schur functions. More recently similar identities have been found involving expansions in terms of characters of the symplectic group. Here these results are extended and generalised to all classical Lie groups. This is done through the derivation of explicit recurrence relations for the expansion coefficients based on the action of the Weyl groups of both the symplectic and orthogonal groups. Copious results are tabulated in the form of explicit values of the expansion coefficients as functions of highest weight parameters. An alternative approach is then based on dual pairs of symplectic and/or orthogonal groups. A byproduct of this approach is that expansions in terms of spin orthogonal group characters can always be recovered from non-spin cases. 1 authors · Mar 1, 2023
1 Rewrite the Stars Recent studies have drawn attention to the untapped potential of the "star operation" (element-wise multiplication) in network design. While intuitive explanations abound, the foundational rationale behind its application remains largely unexplored. Our study attempts to reveal the star operation's ability to map inputs into high-dimensional, non-linear feature spaces -- akin to kernel tricks -- without widening the network. We further introduce StarNet, a simple yet powerful prototype, demonstrating impressive performance and low latency under compact network structure and efficient budget. Like stars in the sky, the star operation appears unremarkable but holds a vast universe of potential. Our work encourages further exploration across tasks, with codes available at https://github.com/ma-xu/Rewrite-the-Stars. 5 authors · Mar 29, 2024
- Distinguishability and linear independence for H-chromatic symmetric functions We study the H-chromatic symmetric functions X_G^H (introduced in (arXiv:2011.06063) as a generalization of the chromatic symmetric function (CSF) X_G), which track homomorphisms from the graph G to the graph H. We focus first on the case of self-chromatic symmetric functions (self-CSFs) X_G^G, making some progress toward a conjecture from (arXiv:2011.06063) that the self-CSF, like the normal CSF, is always different for different trees. In particular, we show that the self-CSF distinguishes trees from non-trees with just one exception, we check using Sage that it distinguishes all trees on up to 12 vertices, and we show that it determines the number of legs of a spider and the degree sequence of a caterpillar given its spine length. We also show that the self-CSF detects the number of connected components of a forest, again with just one exception. Then we prove some results about the power sum expansions for H-CSFs when H is a complete bipartite graph, in particular proving that the conjecture from (arXiv:2011.06063) about p-monotonicity of ω(X_G^H) for H a star holds as long as H is sufficiently large compared to G. We also show that the self-CSFs of complete multipartite graphs form a basis for the ring Λ of symmetric functions, and we give some construction of bases for the vector space Λ^n of degree n symmetric functions using H-CSFs X_G^H where H is a fixed graph that is not a complete graph, answering a question from (arXiv:2011.06063) about whether such bases exist. However, we show that there generally do not exist such bases with G fixed, even with loops, answering another question from (arXiv:2011.06063). We also define the H-chromatic polynomial as an analogue of the chromatic polynomial, and ask when it is the same for different graphs. 2 authors · Nov 11
- Group Equivariant Fourier Neural Operators for Partial Differential Equations We consider solving partial differential equations (PDEs) with Fourier neural operators (FNOs), which operate in the frequency domain. Since the laws of physics do not depend on the coordinate system used to describe them, it is desirable to encode such symmetries in the neural operator architecture for better performance and easier learning. While encoding symmetries in the physical domain using group theory has been studied extensively, how to capture symmetries in the frequency domain is under-explored. In this work, we extend group convolutions to the frequency domain and design Fourier layers that are equivariant to rotations, translations, and reflections by leveraging the equivariance property of the Fourier transform. The resulting G-FNO architecture generalizes well across input resolutions and performs well in settings with varying levels of symmetry. Our code is publicly available as part of the AIRS library (https://github.com/divelab/AIRS). 6 authors · Jun 9, 2023
- PauliComposer: Compute Tensor Products of Pauli Matrices Efficiently We introduce a simple algorithm that efficiently computes tensor products of Pauli matrices. This is done by tailoring the calculations to this specific case, which allows to avoid unnecessary calculations. The strength of this strategy is benchmarked against state-of-the-art techniques, showing a remarkable acceleration. As a side product, we provide an optimized method for one key calculus in quantum simulations: the Pauli basis decomposition of Hamiltonians. 2 authors · Jan 2, 2023
8 Language Models are Symbolic Learners in Arithmetic Large Language Models (LLMs) are thought to struggle with arithmetic learning due to the inherent differences between language modeling and numerical computation, but concrete evidence has been lacking. This work responds to this claim through a two-side experiment. We first investigate whether LLMs leverage partial products during arithmetic learning. We find that although LLMs can identify some partial products after learning, they fail to leverage them for arithmetic tasks, conversely. We then explore how LLMs approach arithmetic symbolically by breaking tasks into subgroups, hypothesizing that difficulties arise from subgroup complexity and selection. Our results show that when subgroup complexity is fixed, LLMs treat a collection of different arithmetic operations similarly. By analyzing position-level accuracy across different training sizes, we further observe that it follows a U-shaped pattern: LLMs quickly learn the easiest patterns at the first and last positions, while progressively learning the more difficult patterns in the middle positions. This suggests that LLMs select subgroup following an easy-to-hard paradigm during learning. Our work confirms that LLMs are pure symbolic learners in arithmetic tasks and underscores the importance of understanding them deeply through subgroup-level quantification. 5 authors · Oct 20, 2024 2
- On the Topological Complexity of Maps We define and develop a homotopy invariant notion for the topological complexity of a map f:X to Y, denoted TC(f), that interacts with TC(X) and TC(Y) in the same way cat(f) interacts with cat(X) and cat(Y). Furthermore, TC(f) and cat(f) satisfy the same inequalities as TC(X) and cat(X). We compare it to other invariants defined in the papers [15,16,17,18,20]. We apply TC(f) to studying group homomorphisms f:Hto G. 1 authors · Nov 20, 2020
- Finite sums associated with some polynomial identities In this paper, we present a general framework for the derivation of interesting finite combinatorial sums starting with certain classes of polynomial identities. The sums that can be derived involve products of binomial coefficients and also harmonic numbers and squared harmonic numbers. We apply the framework to discuss combinatorial sums associated with some prominent polynomial identities from the recent past. 3 authors · Mar 14
1 A Compositional Atlas for Algebraic Circuits Circuits based on sum-product structure have become a ubiquitous representation to compactly encode knowledge, from Boolean functions to probability distributions. By imposing constraints on the structure of such circuits, certain inference queries become tractable, such as model counting and most probable configuration. Recent works have explored analyzing probabilistic and causal inference queries as compositions of basic operators to derive tractability conditions. In this paper, we take an algebraic perspective for compositional inference, and show that a large class of queries - including marginal MAP, probabilistic answer set programming inference, and causal backdoor adjustment - correspond to a combination of basic operators over semirings: aggregation, product, and elementwise mapping. Using this framework, we uncover simple and general sufficient conditions for tractable composition of these operators, in terms of circuit properties (e.g., marginal determinism, compatibility) and conditions on the elementwise mappings. Applying our analysis, we derive novel tractability conditions for many such compositional queries. Our results unify tractability conditions for existing problems on circuits, while providing a blueprint for analysing novel compositional inference queries. 4 authors · Dec 6, 2024
- Constructing Invariant and Equivariant Operations by Symmetric Tensor Network Design of neural networks that incorporate symmetry is crucial for geometric deep learning. Central to this effort is the development of invariant and equivariant operations. This works presents a systematic method for constructing valid invariant and equivariant operations. It can handle inputs and outputs in the form of Cartesian tensors with different rank, as well as spherical tensors with different types. In addition, our method features a graphical representation utilizing the symmetric tensor network, which simplifies both the proofs and constructions related to invariant and equivariant functions. We also apply this approach to design the equivariant interaction message for the geometry graph neural network, and equivariant machine learning model to learn the constitutive law of materials. 5 authors · Aug 17
- A Categorical Framework for Learning Generalised Tree Automata Automata learning is a popular technique used to automatically construct an automaton model from queries. Much research went into devising ad hoc adaptations of algorithms for different types of automata. The CALF project seeks to unify these using category theory in order to ease correctness proofs and guide the design of new algorithms. In this paper, we extend CALF to cover learning of algebraic structures that may not have a coalgebraic presentation. Furthermore, we provide a detailed algorithmic account of an abstract version of the popular L* algorithm, which was missing from CALF. We instantiate the abstract theory to a large class of Set functors, by which we recover for the first time practical tree automata learning algorithms from an abstract framework and at the same time obtain new algorithms to learn algebras of quotiented polynomial functors. 5 authors · Jan 16, 2020
- Green functions of Energized complexes If h is a ring-valued function on a simplicial complex G we can define two matrices L and g, where the matrix entries are the h energy of homoclinic intersections. We know that the sum over all h values on G is equal to the sum of the Green matrix entries g(x,y). We also have already seen that that the determinants of L or g are both the product of the h(x). In the case where h(x) is the parity of dimension, the sum of the energy values was the standard Euler characteristic and the determinant was a unit. If h(x) was the unit in the ring then L,g are integral quadratic forms which are isospectral and inverse matrices of each other. We prove here that the quadratic energy expression summing over all pairs h(x)^* h(y) of intersecting sets is a signed sum of squares of Green function entries. The quadratic energy expression is Wu characteristic in the case when h is dimension parity. For general h, the quadratic energy expression resembles an Ising Heisenberg type interaction. The conjugate of g is the inverse of L if h takes unit values in a normed ring or in the group of unitary operators in an operator algebra. 1 authors · Oct 18, 2020
- Lie Group Decompositions for Equivariant Neural Networks Invariance and equivariance to geometrical transformations have proven to be very useful inductive biases when training (convolutional) neural network models, especially in the low-data regime. Much work has focused on the case where the symmetry group employed is compact or abelian, or both. Recent work has explored enlarging the class of transformations used to the case of Lie groups, principally through the use of their Lie algebra, as well as the group exponential and logarithm maps. The applicability of such methods to larger transformation groups is limited by the fact that depending on the group of interest G, the exponential map may not be surjective. Further limitations are encountered when G is neither compact nor abelian. Using the structure and geometry of Lie groups and their homogeneous spaces, we present a framework by which it is possible to work with such groups primarily focusing on the Lie groups G = GL^{+}(n, R) and G = SL(n, R), as well as their representation as affine transformations R^{n} rtimes G. Invariant integration as well as a global parametrization is realized by decomposing the `larger` groups into subgroups and submanifolds which can be handled individually. Under this framework, we show how convolution kernels can be parametrized to build models equivariant with respect to affine transformations. We evaluate the robustness and out-of-distribution generalisation capability of our model on the standard affine-invariant benchmark classification task, where we outperform all previous equivariant models as well as all Capsule Network proposals. 2 authors · Oct 17, 2023
1 All you need is spin: SU(2) equivariant variational quantum circuits based on spin networks Variational algorithms require architectures that naturally constrain the optimisation space to run efficiently. In geometric quantum machine learning, one achieves this by encoding group structure into parameterised quantum circuits to include the symmetries of a problem as an inductive bias. However, constructing such circuits is challenging as a concrete guiding principle has yet to emerge. In this paper, we propose the use of spin networks, a form of directed tensor network invariant under a group transformation, to devise SU(2) equivariant quantum circuit ans\"atze -- circuits possessing spin rotation symmetry. By changing to the basis that block diagonalises SU(2) group action, these networks provide a natural building block for constructing parameterised equivariant quantum circuits. We prove that our construction is mathematically equivalent to other known constructions, such as those based on twirling and generalised permutations, but more direct to implement on quantum hardware. The efficacy of our constructed circuits is tested by solving the ground state problem of SU(2) symmetric Heisenberg models on the one-dimensional triangular lattice and on the Kagome lattice. Our results highlight that our equivariant circuits boost the performance of quantum variational algorithms, indicating broader applicability to other real-world problems. 3 authors · Sep 13, 2023
- Categorification of Group Equivariant Neural Networks We present a novel application of category theory for deep learning. We show how category theory can be used to understand and work with the linear layer functions of group equivariant neural networks whose layers are some tensor power space of R^{n} for the groups S_n, O(n), Sp(n), and SO(n). By using category theoretic constructions, we build a richer structure that is not seen in the original formulation of these neural networks, leading to new insights. In particular, we outline the development of an algorithm for quickly computing the result of a vector that is passed through an equivariant, linear layer for each group in question. The success of our approach suggests that category theory could be beneficial for other areas of deep learning. 1 authors · Apr 27, 2023
1 Machine Learning meets Algebraic Combinatorics: A Suite of Datasets Capturing Research-level Conjecturing Ability in Pure Mathematics With recent dramatic increases in AI system capabilities, there has been growing interest in utilizing machine learning for reasoning-heavy, quantitative tasks, particularly mathematics. While there are many resources capturing mathematics at the high-school, undergraduate, and graduate level, there are far fewer resources available that align with the level of difficulty and open endedness encountered by professional mathematicians working on open problems. To address this, we introduce a new collection of datasets, the Algebraic Combinatorics Dataset Repository (ACD Repo), representing either foundational results or open problems in algebraic combinatorics, a subfield of mathematics that studies discrete structures arising from abstract algebra. Further differentiating our dataset collection is the fact that it aims at the conjecturing process. Each dataset includes an open-ended research-level question and a large collection of examples (up to 10M in some cases) from which conjectures should be generated. We describe all nine datasets, the different ways machine learning models can be applied to them (e.g., training with narrow models followed by interpretability analysis or program synthesis with LLMs), and discuss some of the challenges involved in designing datasets like these. 7 authors · Mar 8
1 Exact Coset Sampling for Quantum Lattice Algorithms We give a simple, fully correct, and assumption-light replacement for the contested "domain-extension" in Step 9 of a recent windowed-QFT lattice algorithm with complex-Gaussian windows~chen2024quantum. The published Step~9 suffers from a periodicity/support mismatch. We present a pair-shift difference construction that coherently cancels all unknown offsets, produces an exact uniform CRT-coset state over Z_{P}, and then uses the QFT to enforce the intended modular linear relation. The unitary is reversible, uses poly(log M_2) gates, and preserves the algorithm's asymptotics. Project Page: https://github.com/yifanzhang-pro/quantum-lattice. 1 authors · Sep 15 2
- Faces of highest weight modules and the universal Weyl polyhedron Let V be a highest weight module over a Kac-Moody algebra g, and let conv V denote the convex hull of its weights. We determine the combinatorial isomorphism type of conv V, i.e. we completely classify the faces and their inclusions. In the special case where g is semisimple, this brings closure to a question studied by Cellini-Marietti [IMRN 2015] for the adjoint representation, and by Khare [J. Algebra 2016; Trans. Amer. Math. Soc. 2017] for most modules. The determination of faces of finite-dimensional modules up to the Weyl group action and some of their inclusions also appears in previous work of Satake [Ann. of Math. 1960], Borel-Tits [IHES Publ. Math. 1965], Vinberg [Izv. Akad. Nauk 1990], and Casselman [Austral. Math. Soc. 1997]. For any subset of the simple roots, we introduce a remarkable convex cone which we call the universal Weyl polyhedron, which controls the convex hulls of all modules parabolically induced from the corresponding Levi factor. Namely, the combinatorial isomorphism type of the cone stores the classification of faces for all such highest weight modules, as well as how faces degenerate as the highest weight gets increasingly singular. To our knowledge, this cone is new in finite and infinite type. We further answer a question of Michel Brion, by showing that the localization of conv V along a face is always the convex hull of the weights of a parabolically induced module. Finally, as we determine the inclusion relations between faces representation-theoretically from the set of weights, without recourse to convexity, we answer a similar question for highest weight modules over symmetrizable quantum groups. 2 authors · Oct 31, 2016
2 Executing Arithmetic: Fine-Tuning Large Language Models as Turing Machines Large Language Models (LLMs) have demonstrated remarkable capabilities across a wide range of natural language processing and reasoning tasks. However, their performance in the foundational domain of arithmetic remains unsatisfactory. When dealing with arithmetic tasks, LLMs often memorize specific examples rather than learning the underlying computational logic, limiting their ability to generalize to new problems. In this paper, we propose a Composable Arithmetic Execution Framework (CAEF) that enables LLMs to learn to execute step-by-step computations by emulating Turing Machines, thereby gaining a genuine understanding of computational logic. Moreover, the proposed framework is highly scalable, allowing composing learned operators to significantly reduce the difficulty of learning complex operators. In our evaluation, CAEF achieves nearly 100% accuracy across seven common mathematical operations on the LLaMA 3.1-8B model, effectively supporting computations involving operands with up to 100 digits, a level where GPT-4o falls short noticeably in some settings. 6 authors · Oct 10, 2024
- Complexity of counting points on curves and the factor P_1(T) of the zeta function of surfaces This article concerns the computational complexity of a fundamental problem in number theory: counting points on curves and surfaces over finite fields. There is no subexponential-time algorithm known and it is unclear if it can be NP-hard. Given a curve, we present the first efficient Arthur-Merlin protocol to certify its point-count, its Jacobian group structure, and its Hasse-Weil zeta function. We extend this result to a smooth projective surface to certify the factor P_{1}(T), corresponding to the first Betti number, of the zeta function; by using the counting oracle. We give the first algorithm to compute P_{1}(T) that is poly(log q)-time if the degree D of the input surface is fixed; and in quantum poly(Dlog q)-time in general. Our technique in the curve case, is to sample hash functions using the Weil and Riemann-Roch bounds, to certify the group order of its Jacobian. For higher dimension varieties, we first reduce to the case of a surface, which is fibred as a Lefschetz pencil of hyperplane sections over P^{1}. The formalism of vanishing cycles, and the inherent big monodromy, enable us to prove an effective version of Deligne's `theoreme du pgcd' using the hard-Lefschetz theorem and an equidistribution result due to Katz. These reduce our investigations to that of computing the zeta function of a curve, defined over a finite field extension F_{Q}/F_{q} of poly-bounded degree. This explicitization of the theory yields the first nontrivial upper bounds on the computational complexity. 3 authors · Nov 4
1 Galois Theory These are the notes for an undergraduate course at the University of Edinburgh, 2021-2023. Assuming basic knowledge of ring theory, group theory and linear algebra, the notes lay out the theory of field extensions and their Galois groups, up to and including the fundamental theorem of Galois theory. Also included are a section on ruler and compass constructions, a proof that solvable polynomials have solvable Galois groups, and the classification of finite fields. 1 authors · Aug 14, 2024
- Higher Categories and Slices of Globular Operads In an unpublished preprint batanin, Batanin conjectures that it is possible to take `slices' of a globular operad, thereby isolating the algebraic structure in each dimension. It was further hypothesised that the slices of a globular operad for some theory of higher category contain essential information about those higher categories, namely whether or not they are equivalent to the fully weak variety. In this paper, we use the theory of presentations for globular operads developed in Me to provide a concrete definition of slices, and calculate the slices for several key theories of n-category. 1 authors · May 24, 2023
- Connecting Permutation Equivariant Neural Networks and Partition Diagrams We show how the Schur-Weyl duality that exists between the partition algebra and the symmetric group results in a stronger theoretical foundation for characterising all of the possible permutation equivariant neural networks whose layers are some tensor power of the permutation representation M_n of the symmetric group S_n. In doing so, we unify two separate bodies of literature, and we correct some of the major results that are now widely quoted by the machine learning community. In particular, we find a basis of matrices for the learnable, linear, permutation equivariant layer functions between such tensor power spaces in the standard basis of M_n by using an elegant graphical representation of a basis of set partitions for the partition algebra and its related vector spaces. Also, we show how we can calculate the number of weights that must appear in these layer functions by looking at certain paths through the McKay quiver for M_n. Finally, we describe how our approach generalises to the construction of neural networks that are equivariant to local symmetries. 1 authors · Dec 16, 2022
1 FIMO: A Challenge Formal Dataset for Automated Theorem Proving We present FIMO, an innovative dataset comprising formal mathematical problem statements sourced from the International Mathematical Olympiad (IMO) Shortlisted Problems. Designed to facilitate advanced automated theorem proving at the IMO level, FIMO is currently tailored for the Lean formal language. It comprises 149 formal problem statements, accompanied by both informal problem descriptions and their corresponding LaTeX-based informal proofs. Through initial experiments involving GPT-4, our findings underscore the existing limitations in current methodologies, indicating a substantial journey ahead before achieving satisfactory IMO-level automated theorem proving outcomes. 12 authors · Sep 8, 2023
- Automorphisms and subdivisions of Helly graphs We study Helly graphs of finite combinatorial dimension, i.e. whose injective hull is finite-dimensional. We describe very simple fine simplicial subdivisions of the injective hull of a Helly graph, following work of Lang. We also give a very explicit simplicial model of the injective hull of a Helly graphs, in terms of cliques which are intersections of balls. We use these subdivisions to prove that any automorphism of a Helly graph with finite combinatorial dimension is either elliptic or hyperbolic. Moreover, every such hyperbolic automorphism has an axis in an appropriate Helly subdivision, and its translation length is rational with uniformly bounded denominator. 1 authors · Jul 1, 2023
- EN-T: Optimizing Tensor Computing Engines Performance via Encoder-Based Methodology Tensor computations, with matrix multiplication being the primary operation, serve as the fundamental basis for data analysis, physics, machine learning, and deep learning. As the scale and complexity of data continue to grow rapidly, the demand for tensor computations has also increased significantly. To meet this demand, several research institutions have started developing dedicated hardware for tensor computations. To further improve the computational performance of tensor process units, we have reexamined the issue of computation reuse that was previously overlooked in existing architectures. As a result, we propose a novel EN-T architecture that can reduce chip area and power consumption. Furthermore, our method is compatible with existing tensor processing units. We evaluated our method on prevalent microarchitectures, the results demonstrate an average improvement in area efficiency of 8.7\%, 12.2\%, and 11.0\% for tensor computing units at computational scales of 256 GOPS, 1 TOPS, and 4 TOPS, respectively. Similarly, there were energy efficiency enhancements of 13.0\%, 17.5\%, and 15.5\%. 6 authors · Apr 18, 2024
- Learning to grok: Emergence of in-context learning and skill composition in modular arithmetic tasks Large language models can solve tasks that were not present in the training set. This capability is believed to be due to in-context learning and skill composition. In this work, we study the emergence of in-context learning and skill composition in a collection of modular arithmetic tasks. Specifically, we consider a finite collection of linear modular functions z = a , x + b , y ;mod; p labeled by the vector (a, b) in Z_p^2. We use some of these tasks for pre-training and the rest for out-of-distribution testing. We empirically show that a GPT-style transformer exhibits a transition from in-distribution to out-of-distribution generalization as the number of pre-training tasks increases. We find that the smallest model capable of out-of-distribution generalization requires two transformer blocks, while for deeper models, the out-of-distribution generalization phase is transient, necessitating early stopping. Finally, we perform an interpretability study of the pre-trained models, revealing the highly structured representations in both phases; and discuss the learnt algorithm. 4 authors · Jun 4, 2024
9 Group Downsampling with Equivariant Anti-aliasing Downsampling layers are crucial building blocks in CNN architectures, which help to increase the receptive field for learning high-level features and reduce the amount of memory/computation in the model. In this work, we study the generalization of the uniform downsampling layer for group equivariant architectures, e.g., G-CNNs. That is, we aim to downsample signals (feature maps) on general finite groups with anti-aliasing. This involves the following: (a) Given a finite group and a downsampling rate, we present an algorithm to form a suitable choice of subgroup. (b) Given a group and a subgroup, we study the notion of bandlimited-ness and propose how to perform anti-aliasing. Notably, our method generalizes the notion of downsampling based on classical sampling theory. When the signal is on a cyclic group, i.e., periodic, our method recovers the standard downsampling of an ideal low-pass filter followed by a subsampling operation. Finally, we conducted experiments on image classification tasks demonstrating that the proposed downsampling operation improves accuracy, better preserves equivariance, and reduces model size when incorporated into G-equivariant networks 2 authors · Apr 24 2
- Preservation of Loewy Diagrams Under Exact Functors We derive sufficient conditions for exact functors on locally finite abelian categories to preserve Loewy diagrams of objects. We apply our results to determine sufficient conditions for induction functors associated to simple current extensions of vertex algebras to preserve Loewy diagrams. 1 authors · May 1, 2023
- Fault-Tolerant Strassen-Like Matrix Multiplication In this study, we propose a simple method for fault-tolerant Strassen-like matrix multiplications. The proposed method is based on using two distinct Strassen-like algorithms instead of replicating a given one. We have realized that using two different algorithms, new check relations arise resulting in more local computations. These local computations are found using computer aided search. To improve performance, special parity (extra) sub-matrix multiplications (PSMMs) are generated (two of them) at the expense of increasing communication/computation cost of the system. Our preliminary results demonstrate that the proposed method outperforms a Strassen-like algorithm with two copies and secures a very close performance to three copy version using only 2 PSMMs, reducing the total number of compute nodes by around 24\% i.e., from 21 to 16. 2 authors · Oct 10, 2022
- Categories of Differentiable Polynomial Circuits for Machine Learning Reverse derivative categories (RDCs) have recently been shown to be a suitable semantic framework for studying machine learning algorithms. Whereas emphasis has been put on training methodologies, less attention has been devoted to particular model classes: the concrete categories whose morphisms represent machine learning models. In this paper we study presentations by generators and equations of classes of RDCs. In particular, we propose polynomial circuits as a suitable machine learning model. We give an axiomatisation for these circuits and prove a functional completeness result. Finally, we discuss the use of polynomial circuits over specific semirings to perform machine learning with discrete values. 2 authors · Mar 12, 2022
- The algebra of higher homotopy operations We explain how the simplicial higher-order unstable homotopy operations defined in [BBS2] may be composed and inserted one in another, thus forming a coherent if complicated algebraic structure. 3 authors · Jul 22, 2023
- Efficient Algorithms for Recognizing Weighted Tree-Adjoining Languages The class of tree-adjoining languages can be characterized by various two-level formalisms, consisting of a context-free grammar (CFG) or pushdown automaton (PDA) controlling another CFG or PDA. These four formalisms are equivalent to tree-adjoining grammars (TAG), linear indexed grammars (LIG), pushdown-adjoining automata (PAA), and embedded pushdown automata (EPDA). We define semiring-weighted versions of the above two-level formalisms, and we design new algorithms for computing their stringsums (the weight of all derivations of a string) and allsums (the weight of all derivations). From these, we also immediately obtain stringsum and allsum algorithms for TAG, LIG, PAA, and EPDA. For LIG, our algorithm is more time-efficient by a factor of O(n|N|) (where n is the string length and |N| is the size of the nonterminal set) and more space-efficient by a factor of O(|Gamma|) (where |Gamma| is the size of the stack alphabet) than the algorithm of Vijay-Shanker and Weir (1989). For EPDA, our algorithm is both more space-efficient and time-efficient than the algorithm of Alonso et al. (2001) by factors of O(|Gamma|^2) and O(|Gamma|^3), respectively. Finally, we give the first PAA stringsum and allsum algorithms. 4 authors · Oct 23, 2023
- On Signs of eigenvalues of Modular forms satisfying Ramanujan Conjecture Let F in S_{k_1}(Gamma^{(2)}(N_1)) and G in S_{k_2}(Gamma^{(2)}(N_2)) be two Siegel cusp forms over the congruence subgroups Gamma^{(2)}(N_1) and Gamma^{(2)}(N_2) respectively. Assume that they are Hecke eigenforms in different eigenspaces and satisfy the Generalized Ramanujan Conjecture. Let lambda_F(p) denote the eigenvalue of F with respect to the Hecke operator T(p). In this article, we compute a lower bound for the density of the set of primes, { p : lambda_F(p) lambda_G(p) < 0 }. 1 authors · Dec 12, 2024
- Is Computational Complexity a Barrier to Manipulation? When agents are acting together, they may need a simple mechanism to decide on joint actions. One possibility is to have the agents express their preferences in the form of a ballot and use a voting rule to decide the winning action(s). Unfortunately, agents may try to manipulate such an election by misreporting their preferences. Fortunately, it has been shown that it is NP-hard to compute how to manipulate a number of different voting rules. However, NP-hardness only bounds the worst-case complexity. Recent theoretical results suggest that manipulation may often be easy in practice. To address this issue, I suggest studying empirically if computational complexity is in practice a barrier to manipulation. The basic tool used in my investigations is the identification of computational "phase transitions". Such an approach has been fruitful in identifying hard instances of propositional satisfiability and other NP-hard problems. I show that phase transition behaviour gives insight into the hardness of manipulating voting rules, increasing concern that computational complexity is indeed any sort of barrier. Finally, I look at the problem of computing manipulation of other, related problems like stable marriage and tournament problems. 1 authors · Jul 5, 2010
- SymmetricDiffusers: Learning Discrete Diffusion on Finite Symmetric Groups Finite symmetric groups S_n are essential in fields such as combinatorics, physics, and chemistry. However, learning a probability distribution over S_n poses significant challenges due to its intractable size and discrete nature. In this paper, we introduce SymmetricDiffusers, a novel discrete diffusion model that simplifies the task of learning a complicated distribution over S_n by decomposing it into learning simpler transitions of the reverse diffusion using deep neural networks. We identify the riffle shuffle as an effective forward transition and provide empirical guidelines for selecting the diffusion length based on the theory of random walks on finite groups. Additionally, we propose a generalized Plackett-Luce (PL) distribution for the reverse transition, which is provably more expressive than the PL distribution. We further introduce a theoretically grounded "denoising schedule" to improve sampling and learning efficiency. Extensive experiments show that our model achieves state-of-the-art or comparable performances on solving tasks including sorting 4-digit MNIST images, jigsaw puzzles, and traveling salesman problems. Our code is released at https://github.com/DSL-Lab/SymmetricDiffusers. 3 authors · Oct 3, 2024
- Variance Reduced Halpern Iteration for Finite-Sum Monotone Inclusions Machine learning approaches relying on such criteria as adversarial robustness or multi-agent settings have raised the need for solving game-theoretic equilibrium problems. Of particular relevance to these applications are methods targeting finite-sum structure, which generically arises in empirical variants of learning problems in these contexts. Further, methods with computable approximation errors are highly desirable, as they provide verifiable exit criteria. Motivated by these applications, we study finite-sum monotone inclusion problems, which model broad classes of equilibrium problems. Our main contributions are variants of the classical Halpern iteration that employ variance reduction to obtain improved complexity guarantees in which n component operators in the finite sum are ``on average'' either cocoercive or Lipschitz continuous and monotone, with parameter L. The resulting oracle complexity of our methods, which provide guarantees for the last iterate and for a (computable) operator norm residual, is mathcal{O}( n + nLvarepsilon^{-1}), which improves upon existing methods by a factor up to n. This constitutes the first variance reduction-type result for general finite-sum monotone inclusions and for more specific problems such as convex-concave optimization when operator norm residual is the optimality measure. We further argue that, up to poly-logarithmic factors, this complexity is unimprovable in the monotone Lipschitz setting; i.e., the provided result is near-optimal. 3 authors · Oct 4, 2023
- On the Hasse principle for divisibility in elliptic curves Let p be a prime number and n a positive integer. Let E be an elliptic curve defined over a number field k. It is known that the local-global divisibility by p holds in E/k, but for powers of p^n counterexamples may appear. The validity or the failing of the Hasse principle depends on the elliptic curve E and the field k and, consequently, on the group Gal(k(E[p^n])/k). For which kind of these groups does the principle hold? For which of them can we find a counterexample? The answer to these questions was known for n=1,2, but for ngeq 3 they were still open. We show some conditions on the generators of Gal(k(E[p^n])/k) implying an affirmative answer to the local-global divisibility by p^n in E over k, for every ngeq 2. We also prove that these conditions are necessary by producing counterexamples in the case when they do not hold. These last results generalize to every power p^n, a result obtained by Ranieri for n=2. 2 authors · Nov 3
- Exploiting Movable Logical Qubits for Lattice Surgery Compilation Lattice surgery with two-dimensional quantum error correcting codes is among the leading schemes for fault-tolerant quantum computation, motivated by superconducting hardware architectures. In conventional lattice surgery compilation schemes, logical circuits are compiled following a place-and-route paradigm, where logical qubits remain statically fixed in space throughout the computation. In this work, we introduce a paradigm shift by exploiting movable logical qubits via teleportation during the logical lattice surgery CNOT gate. Focusing on lattice surgery with the color code, we propose a proof-of-concept compilation scheme that leverages this capability. Numerical simulations show that the proposed approach can substantially reduce the routed circuit depth compared to standard place-and-route compilation techniques. Our results demonstrate that optimizations based on movable logical qubits are not limited to architectures with physically movable qubits, such as neutral atoms or trapped ions - they are also readily applicable to superconducting quantum hardware. An open-source implementation of our method is available on GitHub https://github.com/munich-quantum-toolkit/qecc. 4 authors · Dec 3
- Product representation of perfect cubes Let F_{k,d}(n) be the maximal size of a set {A}subseteq [n] such that the equation \[a_1a_2\dots a_k=x^d, \; a_1<a_2<\ldots<a_k\] has no solution with a_1,a_2,ldots,a_kA and integer x. Erdos, S\'ark\"ozy and T. S\'os studied F_{k,2}, and gave bounds when k=2,3,4,6 and also in the general case. We study the problem for d=3, and provide bounds for k=2,3,4,6 and 9, furthermore, in the general case, as well. In particular, we refute an 18 years old conjecture of Verstra\"ete. We also introduce another function f_{k,d} closely related to F_{k,d}: While the original problem requires a_1, ldots , a_k to all be distinct, we can relax this and only require that the multiset of the a_i's cannot be partitioned into d-tuples where each d-tuple consists of d copies of the same number. 5 authors · May 20, 2024
- Construction of simplicial complexes with prescribed degree-size sequences We study the realizability of simplicial complexes with a given pair of integer sequences, representing the node degree distribution and the facet size distribution, respectively. While the s-uniform variant of the problem is NP-complete when s geq 3, we identify two populations of input sequences, most of which can be solved in polynomial time using a recursive algorithm that we contribute. Combining with a sampler for the simplicial configuration model [J.-G. Young et al., Phys. Rev. E 96, 032312 (2017)], we facilitate the efficient sampling of simplicial ensembles from arbitrary degree and size distributions. We find that, contrary to expectations based on dyadic networks, increasing the nodes' degrees reduces the number of loops in simplicial complexes. Our work unveils a fundamental constraint on the degree-size sequences and sheds light on further analysis of higher-order phenomena based on local structures. 1 authors · May 31, 2021
3 Group Representational Position Encoding We present GRAPE (Group RepresentAtional Position Encoding), a unified framework for positional encoding based on group actions. GRAPE brings together two families of mechanisms: (i) multiplicative rotations (Multiplicative GRAPE) in SO(d) and (ii) additive logit biases (Additive GRAPE) arising from unipotent actions in the general linear group GL. In Multiplicative GRAPE, a position n in Z (or t in R) acts as G(n)=exp(n,ω,L) with a rank-2 skew generator L in R^{d times d}, yielding a relative, compositional, norm-preserving map with a closed-form matrix exponential. RoPE is recovered exactly when the d/2 planes are the canonical coordinate pairs with log-uniform spectrum. Learned commuting subspaces and compact non-commuting mixtures strictly extend this geometry to capture cross-subspace feature coupling at O(d) and O(r d) cost per head, respectively. In Additive GRAPE, additive logits arise as rank-1 (or low-rank) unipotent actions, recovering ALiBi and the Forgetting Transformer (FoX) as exact special cases while preserving an exact relative law and streaming cacheability. Altogether, GRAPE supplies a principled design space for positional geometry in long-context models, subsuming RoPE and ALiBi as special cases. Project Page: https://github.com/model-architectures/GRAPE. math-ai · Dec 8 2
- Geometric Clifford Algebra Networks We propose Geometric Clifford Algebra Networks (GCANs) for modeling dynamical systems. GCANs are based on symmetry group transformations using geometric (Clifford) algebras. We first review the quintessence of modern (plane-based) geometric algebra, which builds on isometries encoded as elements of the Pin(p,q,r) group. We then propose the concept of group action layers, which linearly combine object transformations using pre-specified group actions. Together with a new activation and normalization scheme, these layers serve as adjustable geometric templates that can be refined via gradient descent. Theoretical advantages are strongly reflected in the modeling of three-dimensional rigid body transformations as well as large-scale fluid dynamics simulations, showing significantly improved performance over traditional methods. 5 authors · Feb 13, 2023
- Private Machine Learning in TensorFlow using Secure Computation We present a framework for experimenting with secure multi-party computation directly in TensorFlow. By doing so we benefit from several properties valuable to both researchers and practitioners, including tight integration with ordinary machine learning processes, existing optimizations for distributed computation in TensorFlow, high-level abstractions for expressing complex algorithms and protocols, and an expanded set of familiar tooling. We give an open source implementation of a state-of-the-art protocol and report on concrete benchmarks using typical models from private machine learning. 8 authors · Oct 18, 2018
- Brauer's Group Equivariant Neural Networks We provide a full characterisation of all of the possible group equivariant neural networks whose layers are some tensor power of R^{n} for three symmetry groups that are missing from the machine learning literature: O(n), the orthogonal group; SO(n), the special orthogonal group; and Sp(n), the symplectic group. In particular, we find a spanning set of matrices for the learnable, linear, equivariant layer functions between such tensor power spaces in the standard basis of R^{n} when the group is O(n) or SO(n), and in the symplectic basis of R^{n} when the group is Sp(n). 1 authors · Dec 16, 2022
- Finite random iterated function systems do not always satisfy Bowen's formula In this paper, we provide a finite random iterated function system satisfying the open set condition, for which the random version of Bowen's formula fails to hold. This counterexample shows that analogous results established for random recursive constructions are not always obtained for random iterated function systems. 1 authors · Sep 2
- Models of Abelian varieties over valued fields, using model theory Given an elliptic curve E over a perfect defectless henselian valued field (F,val) with perfect residue field k_F and valuation ring O_F, there exists an integral separated smooth group scheme E over O_F with Etimes_{Spec O_F}Spec Fcong E. If char(k_F)neq 2,3 then one can be found over O_{F^{alg}} such that the definable group E(O) is the maximal generically stable subgroup of E. We also give some partial results on general Abelian varieties over F. The construction of E is by means of generating a birational group law over O_F by the aid of a generically stable generic type of a definable subgroup of E. 1 authors · Mar 28, 2023
- Generative Logic: A New Computer Architecture for Deterministic Reasoning and Knowledge Generation We present Generative Logic (GL), a deterministic architecture that begins from user-supplied axiomatic definitions -- written in a minimalist Mathematical Programming Language (MPL) -- and systematically explores their deductive neighborhood. Definitions are compiled into a distributed grid of simple Logic Blocks (LBs) that exchange messages; any time several expressions unify under an inference rule, a new fact is emitted with full provenance to its sources, yielding replayable, auditable proof graphs. A prototype software implementation instantiates the workflow on first-order Peano arithmetic. Starting only from the Peano axioms, GL enumerates candidate implications, applies normalization and type filters, and automatically reconstructs machine-checkable proofs of foundational arithmetic laws including associativity and commutativity of addition, associativity and commutativity of multiplication, and distributivity. Generated proofs export to navigable HTML so that every inference step can be inspected independently. We outline a hardware-software co-design path toward massively parallel realizations and describe prospective integration with probabilistic models (e.g., Large Language Models (LLMs)) for autoformalization and conjecture seeding. The Python and MPL code to reproduce the Peano experiments, along with the full HTML proof graphs, are available in the project's GitHub repository at https://github.com/Generative-Logic/GL/tree/35a111ea9ba53afe051703d6050be0c3923e9724 and are permanently archived at https://doi.org/10.5281/zenodo.16408441. We invite community feedback and collaboration. 1 authors · Jul 25
1 Deep Sets We study the problem of designing models for machine learning tasks defined on sets. In contrast to traditional approach of operating on fixed dimensional vectors, we consider objective functions defined on sets that are invariant to permutations. Such problems are widespread, ranging from estimation of population statistics poczos13aistats, to anomaly detection in piezometer data of embankment dams Jung15Exploration, to cosmology Ntampaka16Dynamical,Ravanbakhsh16ICML1. Our main theorem characterizes the permutation invariant functions and provides a family of functions to which any permutation invariant objective function must belong. This family of functions has a special structure which enables us to design a deep network architecture that can operate on sets and which can be deployed on a variety of scenarios including both unsupervised and supervised learning tasks. We also derive the necessary and sufficient conditions for permutation equivariance in deep models. We demonstrate the applicability of our method on population statistic estimation, point cloud classification, set expansion, and outlier detection. 6 authors · Mar 10, 2017 1
- The Syntax and Semantics of einsum In 2011, einsum was introduced to NumPy as a practical and convenient notation for tensor expressions in machine learning, quantum circuit simulation, and other fields. It has since been implemented in additional Python frameworks such as PyTorch and TensorFlow, as well as in other programming languages such as Julia. Despite its practical success, the einsum notation still lacks a solid theoretical basis, and is not unified across the different frameworks, limiting opportunities for formal reasoning and systematic optimization. In this work, we discuss the terminology of tensor expressions and provide a formal definition of the einsum language. Based on this definition, we formalize and prove important equivalence rules for tensor expressions and highlight their relevance in practical applications. 4 authors · Sep 24
- Volumes of Nullhomotopies in Nilpotent Spaces The Shadowing Principle of Manin has proved a valuable tool for addressing questions of quantitative topology raised by Gromov in the late 1900s. The principle informally provides a way for bounded algebraic maps between differential graded algebras to be translated into nearby genuine maps between their geometric realizations. We extend this principle to finite towers of principal K(G,n) fibrations, and in particular apply this construction to nilpotent spaces. As a specific application of the extended principle, we provide upper bounds on the asymptotic behavior of volumes of nullhomotopies of Lipschitz maps into nilpotent spaces. We further refine these bounds in the case when c = 1 to nearly meet those of the simply connected setting. We similarly refine these bounds in the event the target space is coformal, and demonstrate that the bounds in this setting are nearly sharp. 1 authors · Sep 30
- How Do Language Models Compose Functions? While large language models (LLMs) appear to be increasingly capable of solving compositional tasks, it is an open question whether they do so using compositional mechanisms. In this work, we investigate how feedforward LLMs solve two-hop factual recall tasks, which can be expressed compositionally as g(f(x)). We first confirm that modern LLMs continue to suffer from the "compositionality gap": i.e. their ability to compute both z = f(x) and y = g(z) does not entail their ability to compute the composition y = g(f(x)). Then, using logit lens on their residual stream activations, we identify two processing mechanisms, one which solves tasks compositionally, computing f(x) along the way to computing g(f(x)), and one which solves them directly, without any detectable signature of the intermediate variable f(x). Finally, we find that which mechanism is employed appears to be related to the embedding space geometry, with the idiomatic mechanism being dominant in cases where there exists a linear mapping from x to g(f(x)) in the embedding spaces. We fully release our data and code at: https://github.com/apoorvkh/composing-functions . 2 authors · Oct 2
1 All for One: LLMs Solve Mental Math at the Last Token With Information Transferred From Other Tokens Large language models (LLMs) demonstrate proficiency across numerous computational tasks, yet their inner workings remain unclear. In theory, the combination of causal self-attention and multilayer perceptron layers allows every token to access and compute information based on all preceding tokens. In practice, to what extent are such operations present? In this paper, on mental math tasks (i.e., direct math calculation via next-token prediction without explicit reasoning), we investigate this question in three steps: inhibiting input-specific token computations in the initial layers, restricting the routes of information transfer across token positions in the next few layers, and forcing all computation to happen at the last token in the remaining layers. With two proposed techniques, Context-Aware Mean Ablation (CAMA) and Attention-Based Peeking (ABP), we identify an All-for-One subgraph (AF1) with high accuracy on a wide variety of mental math tasks, where meaningful computation occurs very late (in terms of layer depth) and only at the last token, which receives information of other tokens in few specific middle layers. Experiments on a variety of models and arithmetic expressions show that this subgraph is sufficient and necessary for high model performance, transfers across different models, and works on a variety of input styles. Ablations on different CAMA and ABP alternatives reveal their unique advantages over other methods, which may be of independent interest. 4 authors · Sep 11
- MiniF2F: a cross-system benchmark for formal Olympiad-level mathematics We present miniF2F, a dataset of formal Olympiad-level mathematics problems statements intended to provide a unified cross-system benchmark for neural theorem proving. The miniF2F benchmark currently targets Metamath, Lean, Isabelle (partially) and HOL Light (partially) and consists of 488 problem statements drawn from the AIME, AMC, and the International Mathematical Olympiad (IMO), as well as material from high-school and undergraduate mathematics courses. We report baseline results using GPT-f, a neural theorem prover based on GPT-3 and provide an analysis of its performance. We intend for miniF2F to be a community-driven effort and hope that our benchmark will help spur advances in neural theorem proving. 3 authors · Aug 31, 2021