Buckets:

|
download
raw
48 kB

Title: Learnersโ€™ Languages

URL Source: https://arxiv.org/html/2103.01189

Markdown Content: Back to arXiv

This is experimental HTML to improve accessibility. We invite you to report rendering errors. Use Alt+Y to toggle on accessible reporting links and Alt+Shift+Y to toggle off. Learn more about this project and help improve conversions.

Why HTML? Report Issue Back to Abstract Download PDF Abstract 1Introduction 2Constructions in ๐๐จ๐ฅ๐ฒ 3Toposes of learners

HTML conversions sometimes display errors due to content that did not convert correctly from the source. This paper uses the following packages that are not yet supported by the HTML conversion tool. Feedback on these issues are not necessary; they are known and are being worked on.

failed: dutchcal failed: biblatex

Authors: achieve the best HTML results from your LaTeX submissions by following these best practices.

License: arXiv.org perpetual non-exclusive license arXiv:2103.01189v3 [math.CT] null \addbibresource

LibrarySmall.bib

Learnersโ€™ Languages David I. Spivak Topos Institute Berkeley USA david@topos.institute Abstract

In โ€œBackprop as functorโ€, the authors show that the fundamental elements of deep learningโ€”gradient descent and backpropagationโ€”can be conceptualized as a strong monoidal functor ๐๐š๐ซ๐š โข ( ๐„๐ฎ๐œ ) โ†’ ๐‹๐ž๐š๐ซ๐ง from the category of parameterized Euclidean spaces to that of learners, a category developed explicitly to capture parameter update and backpropagation. It was soon realized that there is an isomorphism ๐‹๐ž๐š๐ซ๐ง โ‰… ๐๐š๐ซ๐š โข ( ๐’๐‹๐ž๐ง๐ฌ ) , where ๐’๐‹๐ž๐ง๐ฌ is the symmetric monoidal category of simple lenses as used in functional programming.

In this note, we observe that ๐’๐‹๐ž๐ง๐ฌ is a full subcategory of ๐๐จ๐ฅ๐ฒ , the category of polynomial functors in one variable, via the functor ๐ด โ†ฆ ๐ด โข ๐“Ž ๐ด . Using the fact that ( ๐๐จ๐ฅ๐ฒ , โŠ— ) is monoidal closed, we show that a map ๐ด โ†’ ๐ต in ๐๐š๐ซ๐š โข ( ๐’๐‹๐ž๐ง๐ฌ ) has a natural interpretation in terms of dynamical systems (more precisely, generalized Moore machines) whose interface is the internal-hom type [ ๐ด โข ๐“Ž ๐ด , ๐ต โข ๐“Ž ๐ต ] .

Finally, we review the fact that the category ๐‘ โข

โข ๐‚๐จ๐š๐ฅ๐  of dynamical systems on any ๐‘ โˆˆ ๐๐จ๐ฅ๐ฒ forms a topos, and consider the logical propositions that can be stated in its internal language. We give gradient descent as an example, and we conclude by discussing some directions for future work.

1Introduction

In the paper โ€œBackprop as functorโ€ [fong2019backprop], the authors show that gradient descent and backpropagationโ€”as used in deep learningโ€”can be conceptualized as a strong monoidal functor ๐ฟ : ๐๐š๐ซ๐š โข ( ๐„๐ฎ๐œ ) โ†’ ๐‹๐ž๐š๐ซ๐ง from the category of parameterized euclidean spaces to that of learners, a category developed explicitly to capture parameter update and backpropagation. Here, ๐๐š๐ซ๐š is a monad on the category of symmetric monoidal categories. It sends ( ๐’ž , ๐ผ , โŠ— ) to a category with the same objects Ob โก ๐๐š๐ซ๐š โข ( ๐’ž ) โ‰” Ob โก ๐’ž , but with hom-sets that include a parameterizing object

๐๐š๐ซ๐š ( ๐’ž ) ( ๐‘ 1 , ๐‘ 2 ) โ‰” { ( ๐‘ , ๐‘“ ) โˆฃ ๐‘ โˆˆ ๐’ž , ๐‘“ : ๐‘ 1 โŠ— ๐‘ โ†’ ๐‘ 2 } / โˆผ

where the parameterizing object ๐‘ is considered up to an equivalence relation โˆผ .1 The composite of ๐‘ 1 โŠ— ๐‘ 1 โ†’ ๐‘ 2 and ๐‘ 2 โŠ— ๐‘ 2 โ†’ ๐‘ 3 in ๐๐š๐ซ๐š โข ( ๐’ž ) has parameterizing object ๐‘ 1 โŠ— ๐‘ 2 and is given by the ordinary composite

๐‘ 1 โŠ— ( ๐‘ 1 โŠ— ๐‘ 2 ) โ†’ ๐‘ 2 โŠ— ๐‘ 2 โ†’ ๐‘ 3 .

The domain of the backpropagation functor, ๐๐š๐ซ๐š โข ( ๐„๐ฎ๐œ ) , is thus the Para construction applied to the Cartesian monoidal category of Euclidean spaces โ„ ๐‘› and smooth maps.

But it was soon realized that ๐‹๐ž๐š๐ซ๐ง is in fact also given by a Para construction, namely there is an isomorphism ๐‹๐ž๐š๐ซ๐ง โ‰… ๐๐š๐ซ๐š โข ( ๐’๐‹๐ž๐ง๐ฌ ) , where ๐’๐‹๐ž๐ง๐ฌ is the symmetric monoidal category of simple lenses as used in functional programming. The objects of ๐’๐‹๐ž๐ง๐ฌ are sets Ob โก ( ๐’๐‹๐ž๐ง๐ฌ )

Ob โก ( ๐’๐ž๐ญ ) , but a morphism consists of a pair of functions

๐’๐‹๐ž๐ง๐ฌ โข ( ๐ด , ๐ต ) :- { ( ๐‘“ 1 , ๐‘“ โ™ฏ ) โˆฃ ๐‘“ 1 : ๐ด โ†’ ๐ต , ๐‘“ โ™ฏ : ๐ด ร— ๐ต โ†’ ๐ด } .

(1)

Thus a map ๐ด โ†’ ๐ต in ๐๐š๐ซ๐š โข ( ๐’๐‹๐ž๐ง๐ฌ ) consists of a set ๐‘ƒ and functions ๐‘“ 1 : ๐ด ร— ๐‘ƒ โ†’ ๐ต and ๐‘“ โ™ฏ : ๐ด ร— ๐ต ร— ๐‘ƒ โ†’ ๐ด ร— ๐‘ƒ . The authors of [fong2019backprop] developed this structure in order to conceptualize the compositional nature of deep learning as comprising a parameterizing set ๐‘ƒ (often called โ€œthe space of weights and biasesโ€) and three functions:

๐ผ

: ๐ด ร— ๐‘ƒ โ†’ ๐ต implement

๐‘ˆ

: ๐ด ร— ๐ต ร— ๐‘ƒ โ†’ ๐‘ƒ update

๐‘…

: ๐ด ร— ๐ต ร— ๐‘ƒ โ†’ ๐ด request (2)

The implement function is a ๐‘ƒ -parameterized function ๐ด โ†’ ๐ต , and the update and request functions take a pair ( ๐‘Ž , ๐‘ ) of โ€œtraining dataโ€ and both updates the parameterโ€”e.g. by gradient descentโ€”and returns an element of the input space ๐ด , which is used to train another such function in the network. This last stepโ€”the requestโ€”is not just found in deep learning as practiced, but is in fact crucial for defining composition.2

But by this point, the notation ( ๐‘ƒ , ๐ผ , ๐‘ˆ , ๐‘… ) of ๐‹๐ž๐š๐ซ๐ง has become heavy and the structure seems to be getting lost. Even knowing ๐‹๐ž๐š๐ซ๐ง โ‰… ๐๐š๐ซ๐š โข ( ๐’๐‹๐ž๐ง๐ฌ ) seems ad-hoc since the morphisms (1) of ๐’๐‹๐ž๐ง๐ฌ areโ€”to this pointโ€”mathematically unmotivated. This is where ๐๐จ๐ฅ๐ฒ comes in.

In this note, we observe that ๐’๐‹๐ž๐ง๐ฌ is a full subcategory of ๐๐จ๐ฅ๐ฒ , the category of polynomial functors in one variable, via the functor ๐ด โ†ฆ ๐ด โข ๐“Ž ๐ด . Using the fact that ( ๐๐จ๐ฅ๐ฒ , โŠ— ) is monoidal closed, we will reconceptualize ๐๐š๐ซ๐š โข ( ๐’๐‹๐ž๐ง๐ฌ ) in terms of polynomial coalgebras, which can be understood as dynamical systems: machines with states that can be observed as โ€œoutputโ€ and updated based on โ€œinputโ€ [jacobs2017introduction]. In particular, a morphism ๐ด โ†’ ๐ต in ๐‹๐ž๐š๐ซ๐ง will be recast as a coalgebra on the internal hom polynomial [ ๐ด โข ๐“Ž ๐ด , ๐ต โข ๐“Ž ๐ต ] , and we will explain this in terms of dynamics.

This viewpoint allows us to substantially generalize the construction in ๐‹๐ž๐š๐ซ๐ง , a construction which also appears prominently in the theory of open games [ghani2016compositional]. Perhaps more interestingly, it allows us to use the fact that the category ๐‘ โข

โข ๐‚๐จ๐š๐ฅ๐  of dynamical systems on any interface ๐‘ โˆˆ ๐๐จ๐ฅ๐ฒ forms a topos. A topos is a setting in which one can do dependent type theory and higher-order logic. In fact, the topos of ๐‘ -coalgebras is in some ways as simple as possible: it is not only a copresheaf topos ๐‘ โข

โข ๐‚๐จ๐š๐ฅ๐  โ‰… ๐’๐ž๐ญ ๐’ž for a certain category ๐’ž , but in fact the site ๐’ž is the free category on a directed graph that weโ€™ll call ๐–ณ๐—‹๐–พ๐–พ ๐‘ . This makes the logic of ๐‘ -coalgebrasโ€”and hence of dynamical systems, learners, and game-playersโ€”quite simple. However, the particular graph ๐–ณ๐—‹๐–พ๐–พ ๐‘ associated to ๐‘ is highly-structured, and we should find that this structure is inherited by the internal language of ๐‘ โข

โข ๐‚๐จ๐š๐ฅ๐  .

The point is to consider logical propositions that can be stated in the internal language of ๐‘ โข

โข ๐‚๐จ๐š๐ฅ๐  and to use these propositions in order to constrain the behavior of learners and game-players (categorified as discussed above), and of interaction patterns between dynamical systems more generally. For example, โ€œgradient descent and backpropagationโ€ is a property we can express in the internal language. Note that the term language in the title refers to the internal language of the topos ๐‘ โข

โข ๐‚๐จ๐š๐ฅ๐  , which can be thought of as a language for specifying or constraining learning algorithms or dynamic organizational patterns more generally.

Plan for the paper

In Section 2 we will discuss various relevant constructions in the category ๐๐จ๐ฅ๐ฒ of polynomial functors in one variable. In particular, we will review its symmetric monoidal closed structure ( ๐๐จ๐ฅ๐ฒ , ๐“Ž , โŠ— , [ โˆ’ , โˆ’ ] ) , its composition monoidal structure ( ๐“Ž , โ— ) , and the notion of coalgebras. We explain how morphisms in ๐‹๐ž๐š๐ซ๐ง can be phrased in terms of coalgebras on internal hom objects, and we reconstruct ๐๐š๐ซ๐š โข ( ๐’๐‹๐ž๐ง๐ฌ ) in these terms.

In Section 3, we first show that the category ๐‘ โข

โข ๐‚๐จ๐š๐ฅ๐  of ๐‘ -coalgebras for the endofunctor ๐‘ : ๐’๐ž๐ญ โ†’ ๐’๐ž๐ญ is a presheaf topos. We then discuss the internal logic of ๐‘ โข

โข ๐‚๐จ๐š๐ฅ๐  , and we conclude by giving several directions for future work.

Notation

We denote the category of sets by ๐’๐ž๐ญ ; we generally denote sets with upper-case letters ๐ด , ๐ต , etc. Given a natural number ๐‘ โˆˆ โ„• , we write ๐–ญ โ‰” { 1 , โ€ฆ , ๐‘ } , so 0

โˆ… , 1

{ 1 } , 2

{ 1 , 2 } , etc. Given sets ๐ด , ๐ต , we often write ๐ด โข ๐ต โ‰” ๐ด ร— ๐ต to denote their Cartesian product. We will denote polynomials with lower-case letters, ๐‘ , ๐‘ž , etc.

Acknowledgments

We thank David A. Dalrymple, Dai Girardo, Paul Kreiner, David Jaz Myers, and Alex Zhu for useful conversations. We also acknowledge support from AFOSR grant FA9550-20-10348.

2Constructions in ๐๐จ๐ฅ๐ฒ

In this section we review the category ๐๐จ๐ฅ๐ฒ , for which [kock2012polynomial] is an excellent reference; we also discuss its symmetric monoidal closed structure. Then we discuss polynomial coalgebras and reconceptualize the category ๐‹๐ž๐š๐ซ๐ง and Gavranoviฤ‡โ€™s bicategorical variant, in that language.

2.1Background on ๐๐จ๐ฅ๐ฒ as a monoidal closed category

For any set ๐ด , let ๐“Ž ๐ด : ๐’๐ž๐ญ โ†’ ๐’๐ž๐ญ be the functor represented by ๐ด ; that is, ๐“Ž ๐ด applied to a set ๐‘† is ๐’๐ž๐ญ โข ( ๐ด , ๐‘† ) โ‰… ๐‘† ๐ด . In particular, ๐“Ž โ‰” ๐“Ž 1 is (isomorphic to) the identity functor ๐‘† โ†ฆ ๐‘† and 1 โ‰” ๐“Ž 0 is the constant functor ๐‘† โ†ฆ 1 . Note that ๐“Ž ๐ด โข ( 1 ) โ‰… 1 ๐ด โ‰… 1 for any ๐ด .

The coproduct of functors ๐น and ๐บ , denoted ๐น + ๐บ , is taken pointwise; this means there is a natural isomorphism

( ๐น + ๐บ ) โข ( ๐‘† ) โ‰… ๐น โข ( ๐‘† ) + ๐บ โข ( ๐‘† )

where the coproduct ๐น โข ( ๐‘† ) + ๐บ โข ( ๐‘† ) is taken in ๐’๐ž๐ญ . Similarly, for any set ๐ผ and functors ๐น ๐‘– , one for each ๐‘– โˆˆ ๐ผ , their coproduct is computed pointwise

( โˆ‘ ๐‘– โˆˆ ๐ผ ๐น ๐‘– ) โข ( ๐‘† ) โ‰… โˆ‘ ๐‘– โˆˆ ๐ผ ๐น ๐‘– โข ( ๐‘† ) .

Definition 2.1.

A polynomial functor ๐‘ is any coproduct

๐‘ โ‰” โˆ‘ ๐‘– โˆˆ ๐ผ ๐“Ž ๐‘ โข [ ๐‘– ]

of representable functors, where ๐ผ โˆˆ ๐’๐ž๐ญ and each ๐‘ โข [ ๐‘– ] โˆˆ ๐’๐ž๐ญ are sets. We denote the category of polynomial functors and natural transformations between them by ๐๐จ๐ฅ๐ฒ .

We note that if ๐‘

โˆ‘ ๐‘– โˆˆ ๐ผ ๐“Ž ๐‘ โข [ ๐‘– ] then ๐‘ โข ( 1 ) โ‰… ๐ผ ; hence we can write any ๐‘ โˆˆ ๐๐จ๐ฅ๐ฒ in canonical form

๐‘ โ‰… โˆ‘ ๐‘– โˆˆ ๐‘ โข ( 1 ) ๐“Ž ๐‘ โข [ ๐‘– ] .

(3)

We refer to each ๐‘– โˆˆ ๐‘ โข ( 1 ) as a position in ๐‘ and to each ๐‘‘ โˆˆ ๐‘ โข [ ๐‘– ] as a direction at ๐‘– .

Example 2.2.

We can consider any set ๐‘† as a constant polynomial โˆ‘ ๐‘  โˆˆ ๐‘† ๐“Ž 0 .

We can consider a polynomial ๐‘ โˆˆ ๐๐จ๐ฅ๐ฒ as a set (or discrete category) ๐‘ โข ( 1 ) equipped with a functor ๐‘ โข [ โˆ’ ] : ๐‘ โข ( 1 ) โ†’ ๐’๐ž๐ญ . Then a map of polynomials ๐œ‘ : ๐‘ โ†’ ๐‘ž can be identified with a diagram as follows

๐‘ โข ( 1 ) ๐‘ž โข ( 1 ) ๐’๐ž๐ญ ๐œ‘ 1 ๐‘ โข [ โˆ’ ] โ‡ ๐œ‘ โ™ฏ ๐‘ž โข [ โˆ’ ]

(4)

That is, ๐œ‘ can be decomposed into a function ๐œ‘ 1 : ๐‘ โข ( 1 ) โ†’ ๐‘ž โข ( 1 ) on positions, and for every ๐‘– โˆˆ ๐‘ โข ( 1 ) with ๐‘— โ‰” ๐œ‘ 1 โข ( ๐‘– ) , a component function ๐œ‘ ๐‘– โ™ฏ : ๐‘ž โข [ ๐‘— ] โ†’ ๐‘ โข [ ๐‘– ] on directions. This follows from the Yoneda lemma and the universal property of coproducts. We will sometimes use this ( ๐œ‘ 1 , ๐œ‘ โ™ฏ ) : ๐‘ โ†’ ๐‘ž notation below.

Example 2.3.

A morphism ๐ด 1 โข ๐“Ž ๐ด 2 โ†’ ๐ต 1 โข ๐“Ž ๐ต 2 can be identified with a function ๐œ‘ 1 : ๐ด 1 โ†’ ๐ต 1 and a function ๐œ‘ โ™ฏ : ๐ด 1 ร— ๐ต 2 โ†’ ๐ด 2 . That is,

๐๐จ๐ฅ๐ฒ โข ( ๐ด 1 โข ๐“Ž ๐ด 2 , ๐ต 1 โข ๐“Ž ๐ต 2 ) โ‰… ๐ต 1 ๐ด 1 โข ๐ด 2 ๐ด 1 โข ๐ต 2 .

(5) Proposition 2.4.

The composite of polynomial functors ๐‘ , ๐‘ž โˆˆ ๐๐จ๐ฅ๐ฒ , which we denote ๐‘ โ— ๐‘ž , is again polynomial with formula

๐‘ โ— ๐‘ž โ‰… โˆ‘ ๐‘– โˆˆ ๐‘ โข ( 1 ) โˆ‘ ๐‘— : ๐‘ โข [ ๐‘– ] โ†’ ๐‘ž โข ( 1 ) ๐“Ž โˆ‘ ๐‘‘ โˆˆ ๐‘ โข [ ๐‘– ] ๐‘ž โข [ ๐‘— โข ๐‘‘ ]

(6)

The composition operation โ— is a (nonsymmetric) monoidal structure on ๐๐จ๐ฅ๐ฒ , with unit ๐“Ž .

Proof.

See page A. โˆŽ

Example 2.5.

If ๐‘

๐“Ž 2 and ๐‘ž

๐“Ž + 1 , then ๐‘ โ— ๐‘ž โ‰… ๐“Ž 2 + 2 โข ๐“Ž + 1 whereas ๐‘ž โ— ๐‘ โ‰… ๐“Ž 2 + 1 .

Example 2.6.

Applying a polynomial ๐‘ to a set ๐‘† is given by composition: ๐‘ โข ( ๐‘† ) โ‰… ๐‘ โ— ๐‘† .

Proposition 2.7 (The symmetric monoidal category ( ๐๐จ๐ฅ๐ฒ , ๐“Ž , โŠ— ) ).

The category ๐๐จ๐ฅ๐ฒ has a symmetric monoidal structure with unit ๐“Ž and monoidal product โŠ— on objects given by the following formula

๐‘ โŠ— ๐‘ž โ‰” โˆ‘ ๐‘– โˆˆ ๐‘ โข ( 1 ) โˆ‘ ๐‘— โˆˆ ๐‘ž โข ( 1 ) ๐“Ž ๐‘ โข [ ๐‘– ] ร— ๐‘ž โข [ ๐‘— ]

Proof.

See page A. โˆŽ

Proposition 2.8 (Internal hom [ โˆ’ , โˆ’ ] ).

The โŠ— monoidal structure on ๐๐จ๐ฅ๐ฒ is closed; that is, for every ๐‘ , ๐‘ž โˆˆ ๐๐จ๐ฅ๐ฒ there is a polynomial

[ ๐‘ , ๐‘ž ] โ‰” โˆ‘ ๐œ‘ : ๐‘ โ†’ ๐‘ž ๐“Ž โˆ‘ ๐‘– โˆˆ ๐‘ โข ( 1 ) ๐‘ž โข [ ๐œ‘ 1 โข ( ๐‘– ) ]

(7)

for which we have a natural isomorphism

๐๐จ๐ฅ๐ฒ โข ( ๐‘Ÿ โŠ— ๐‘ , ๐‘ž ) โ‰… ๐๐จ๐ฅ๐ฒ โข ( ๐‘Ÿ , [ ๐‘ , ๐‘ž ] ) .

(8) Proof.

See page A. โˆŽ

Example 2.9.

Given sets ๐ด and ๐ต , we use Eqs. 5 and 7 to compute that the internal hom between ๐ด โข ๐“Ž ๐ด and ๐ต โข ๐“Ž ๐ต is

[ ๐ด โข ๐“Ž ๐ด , ๐ต โข ๐“Ž ๐ต ] โ‰… ๐ต ๐ด โข ๐ด ๐ด โข ๐ต โข ๐“Ž ๐ด โข ๐ต .

The counit of the adjunction (8) is a natural map eval : ๐‘ โŠ— [ ๐‘ , ๐‘ž ] โ†’ ๐‘ž called evaluation. In very much the same way, it induces two sorts of morphisms we will use later:

[ ๐‘ 1 , ๐‘ž 1 ] โŠ— [ ๐‘ 2 , ๐‘ž 2 ] โ†’ [ ๐‘ 1 โŠ— ๐‘ 2 , ๐‘ž 1 โŠ— ๐‘ž 2 ] and [ ๐‘ , ๐‘ž ] โŠ— [ ๐‘ž , ๐‘Ÿ ] โ†’ [ ๐‘ , ๐‘Ÿ ] .

(9) 2.2Coalgebras, generalized Moore machines, and learners

Coalgebras for endofunctors ๐น : ๐’๐ž๐ญ โ†’ ๐’๐ž๐ญ form a major topic of study [arbib1982parametrized, adamek2005introduction, jacobs2017introduction]. In this section we recall the definition and explain the relevance to dynamical systems (generalized Moore machines) and learners.

Definition 2.10 (Coalgebra).

Given a polynomial ๐‘ , a ๐‘ -coalgebra is a pair ( ๐‘† , ๐›ฝ ) where ๐‘† โˆˆ ๐’๐ž๐ญ and ๐›ฝ : ๐‘† โ†’ ๐‘ โ— ๐‘† . A ๐‘ -coalgebra morphism from ( ๐‘† , ๐›ฝ ) to ( ๐‘† โ€ฒ , ๐›ฝ โ€ฒ ) consists of a function ๐‘“ : ๐‘† โ†’ ๐‘† โ€ฒ such that the following diagram commutes:

๐‘† ๐‘ โ— ๐‘† ๐‘† โ€ฒ ๐‘ โ— ๐‘† โ€ฒ ๐›ฝ ๐‘“ ๐‘ โ— ๐‘“ ๐›ฝ โ€ฒ

(10)

We denote the category of ๐‘ -coalgebras and their morphisms by ๐‘ โข

โข ๐‚๐จ๐š๐ฅ๐  .

Proposition 2.11.

A ๐‘ -coalgebra ( ๐‘† , ๐›ฝ ) can be identified with a map of polynomials

๐‘† โข ๐“Ž ๐‘† โ†’ ๐‘ .

(11) Proof.

One finds an isomorphism ๐๐จ๐ฅ๐ฒ โข ( ๐‘† , ๐‘ โ— ๐‘† ) โ‰… ๐๐จ๐ฅ๐ฒ โข ( ๐‘† โข ๐“Ž ๐‘† , ๐‘ ) by direct calculation. โˆŽ

Warning 2.12.

Looking at Proposition 2.11, one might be tempted to think that a map of ๐‘ -coalgebras as in (10) can be identified with a commuting triangle

๐‘† โข ๐“Ž ๐‘† ๐‘† โ€ฒ โข ๐“Ž ๐‘† โ€ฒ ๐‘ ?

but this is not the case; for one thing, the marked arrow does not arise from a function ๐‘“ : ๐‘† โ†’ ๐‘† โ€ฒ . The point is, (11) can be misleading when it comes to maps, and hence we will depart from the so-called ๐๐š๐ซ๐š construction for 2-cells. For us, the correct sort of map between ๐‘ -coalgebras is the usual one, as shown in (10).

Proposition 2.13.

For any ๐‘ , ๐‘ž โˆˆ ๐๐จ๐ฅ๐ฒ there is a functor

๐‘ โข

โข ๐‚๐จ๐š๐ฅ๐  ร— ๐‘ž โข

โข ๐‚๐จ๐š๐ฅ๐  โ†’ ( ๐‘ โŠ— ๐‘ž ) โข

โข ๐‚๐จ๐š๐ฅ๐ 

making โˆ™

โข ๐‚๐จ๐š๐ฅ๐  a lax monoidal functor ๐๐จ๐ฅ๐ฒ โ†’ ๐‚๐š๐ญ .

Proof.

See page A. โˆŽ

The relevance of coalgebras to dynamics was of interest in the earliest of references we know of, namely [arbib1982parametrized], where they are referred to as codynamics. We will proceed with our own terminology.

Definition 2.14 (Moore machine).

For sets ๐ด , ๐ต , an ( ๐ด , ๐ต ) -Moore machine consists of

โ€ข

a set ๐‘† , elements of which are called states,

โ€ข

a function ๐‘Ÿ : ๐‘† โ†’ ๐ต , called readout, and

โ€ข

a function ๐‘ข : ๐‘† ร— ๐ด โ†’ ๐‘† , called update.

It is further called initialized if it is equipped with an element ๐‘  0 โˆˆ ๐‘† .

With an initialized ( ๐ด , ๐ต ) -Moore machine ( ๐‘† , ๐‘Ÿ , ๐‘ข , ๐‘  0 ) , we can take any ๐ด -stream ๐‘Ž : โ„• โ†’ ๐ด and produce a ๐ต -stream ๐‘ : โ„• โ†’ ๐ต inductively using the formula

๐‘  ๐‘› + 1 โ‰” ๐‘ข โข ( ๐‘  ๐‘› , ๐‘Ž ๐‘› ) and ๐‘ ๐‘› โ‰” ๐‘Ÿ โข ( ๐‘  ๐‘› ) .

Proposition 2.15.

An ( ๐ด , ๐ต ) -Moore machine with states ๐‘† can be identified with a map of polynomials ๐‘† โข ๐“Ž ๐‘† โ†’ ๐ต โข ๐“Ž ๐ด , and hence with a ๐ต โข ๐“Ž ๐ด -coalgebra ๐‘† โ†’ ๐ต โข ๐“Ž ๐ด โ— ๐‘† by Proposition 2.11.

Proof.

The identification uses ๐œ‘ 1 โ‰” ๐‘Ÿ and ๐œ‘ โ™ฏ โ‰” ๐‘ข . โˆŽ

Replacing ๐ต โข ๐“Ž ๐ด with an arbitrary polynomial ๐‘ โˆˆ ๐๐จ๐ฅ๐ฒ , we think of ๐‘ -coalgebras as generalized Moore machines. We will refer to them as ๐‘ -dynamical systems and call ๐‘ the interface. Mathematically, given ๐›ฝ : ๐‘† โ†’ ๐‘ โ— ๐‘† , we also get the two-fold composite

๐‘† โ†’ ๐›ฝ ๐‘ โ— ๐‘† โ†’ ๐‘ โ— ๐›ฝ ๐‘ โ— ๐‘ โ— ๐‘†

and indeed the ๐‘› -fold composite ๐‘† โ†’ ๐‘ โŠฒ ๐‘› โ— ๐‘† for any ๐‘› โˆˆ โ„• . The idea is that for every state ๐‘  โˆˆ ๐‘† , we get a position ๐‘Ÿ โข ( ๐‘  ) โˆˆ ๐‘ โข ( 1 ) , and for every direction ๐‘‘ โˆˆ ๐‘ โข [ ๐‘Ÿ โข ( ๐‘  ) ] there, we get a new state ๐‘ข โข ( ๐‘  , ๐‘‘ ) . We thus think of ๐‘ as an interface for the dynamical system: ๐‘ โข ( 1 ) says what the world can see about the current stateโ€”i.e. its outward position ๐‘– โ‰” ๐‘Ÿ โข ( ๐‘  ) โ€”and ๐‘ โข [ ๐‘– ] says what sort of forces or inputs the state can be subjected to.

A map of polynomials ๐œ‘ : ๐‘ โ†’ ๐‘ โ€ฒ is a change of interface. We can transform a ๐‘ -dynamical system into a ๐‘ โ€ฒ -dynamical system that has the same set of states. Indeed, simply compose any ๐‘† โ†’ ๐‘ โ— ๐‘† with ๐œ‘ โ— ๐‘† : ๐‘ โ— ๐‘† โ†’ ๐‘ โ€ฒ โ— ๐‘† .

More generally, a map ๐œ‘ : ๐‘ 1 โŠ— โ‹ฏ โŠ— ๐‘ ๐‘˜ โ†’ ๐‘ โ€ฒ allows us to take ๐‘˜ -many dynamical systems ๐‘† 1 โ†’ ๐‘ 1 โ— ๐‘† 1 through ๐‘† ๐‘˜ โ†’ ๐‘ ๐‘˜ โ— ๐‘† ๐‘˜ and use Proposition 2.13 to combine them into a single dynamical system

๐‘† โ†’ ( ๐‘ 1 โŠ— โ‹ฏ โŠ— ๐‘ ๐‘˜ ) โ— ๐‘† โ†’ ๐œ‘ ๐‘ โ€ฒ โ— ๐‘†

with interface ๐‘ โ€ฒ and states ๐‘† โ‰” ๐‘† 1 ร— โ‹ฏ ร— ๐‘† ๐‘˜ .

Example 2.16.

Wiring diagrams are one way of combining dynamical systems as above.

๐œ‘

Plant Controller ๐ด ๐ต ๐ถ Controlled_Plant

(12)

In the wiring diagram (12) three boxes are shown: the controller, the plant, and the system; we can consider each as having a monomial interface:

Plant

๐ถ โข ๐“Ž ๐ด โข ๐ต Controller

๐ต โข ๐“Ž ๐ถ Controlled_Plant

๐ถ โข ๐“Ž ๐ด .

(13)

The wiring diagram itself represents a morphism

๐œ‘ : ๐ถ โข ๐“Ž ๐ด โข ๐ต โŠ— ๐ต โข ๐“Ž ๐ถ โ†’ ๐ถ โข ๐“Ž ๐ด

in ๐๐จ๐ฅ๐ฒ . Defining ๐œ‘ requires a function ๐œ‘ 1 : ๐ถ ร— ๐ต โ†’ ๐ถ and a function ๐œ‘ โ™ฏ : ๐ถ ร— ๐ต ร— ๐ด โ†’ ๐ด ร— ๐ต ร— ๐ถ ; the first is projection and the second is an isomorphism. Together these simply say how the wiring diagram shuttles information within the controlled plant. Indeed, the wiring diagram lets us put together dynamics of the controller and the plant to give dynamics for the controlled plant. That is, given Moore machines ๐‘† โข ๐“Ž ๐‘† โ†’ Plant and ๐‘‡ โข ๐“Ž ๐‘‡ โ†’ Controller , we get a Moore machine ๐‘† โข ๐‘‡ โข ๐“Ž ๐‘† โข ๐‘‡ โ†’ Controlled_Plant .

More generally, we can think of transistors in a computer as dynamical systems, and the logic gates, adder circuits, memory circuits, a connected keyboard or monitor, etc. each as a wiring diagram comprising these simpler systems.

Example 2.17.

In Example 2.16 the wiring pattern is fixed, but as we show in [spivak2020poly], ๐๐จ๐ฅ๐ฒ also supports wiring diagrams for dynamical systems that can change their interaction pattern based on their internal states.

We now come to learners. As mentioned in the introduction, the category ๐‹๐ž๐š๐ซ๐ง from [fong2019backprop] is better understood as a bicategory which weโ€™ll denote ๐•ƒ โข ๐ž๐š๐ซ๐ง . Its objects are sets, Ob โก ( ๐•ƒ โข ๐ž๐š๐ซ๐ง )

Ob โก ( ๐’๐ž๐ญ ) , a 1-morphism (learner) from ๐ด to ๐ต consists of a set ๐‘ƒ and maps ๐ผ : ๐ด ร— ๐‘ƒ โ†’ ๐ต and ( ๐‘… , ๐‘ˆ ) : ๐ด ร— ๐ต ร— ๐‘ƒ โ†’ ๐ด ร— ๐‘ƒ , and a 2-morphismโ€”a morphism between learnersโ€”is a function ๐‘“ : ๐‘ƒ โ†’ ๐‘ƒ โ€ฒ making the following squares commute:

๐ด ร— ๐‘ƒ ๐ต ๐ด ร— ๐‘ƒ โ€ฒ ๐ต ๐ผ ๐ด ร— ๐‘“ ๐ผ โ€ฒ             ๐ด ร— ๐ต ร— ๐‘ƒ ๐ด ร— ๐‘ƒ ๐ด ร— ๐ต ร— ๐‘ƒ โ€ฒ ๐ด ร— ๐‘ƒ โ€ฒ ( ๐‘… , ๐‘ˆ ) ๐ด ร— ๐ต ร— ๐‘“ ๐ด ร— ๐‘“ ( ๐‘… โ€ฒ , ๐‘ˆ โ€ฒ )

(14)

We denote the category of learners from ๐ด to ๐ต as ๐•ƒ โข ๐ž๐š๐ซ๐ง โข ( ๐ด , ๐ต ) โˆˆ ๐‚๐š๐ญ .

Proposition 2.18.

For sets ๐ด , ๐ต , there is an equivalence of categories

๐•ƒ โข ๐ž๐š๐ซ๐ง โข ( ๐ด , ๐ต ) โ‰… [ ๐ด โข ๐“Ž ๐ด , ๐ต โข ๐“Ž ๐ต ] โข

โข ๐‚๐จ๐š๐ฅ๐  .

Proof.

See page A. โˆŽ

We will now give a definition that generalizes the bicategory ๐•ƒ โข ๐ž๐š๐ซ๐ง , give examples, and discuss intuition. In particular, we define a category-enriched operad ๐•† โข ๐ซ๐  , which includes ๐•ƒ โข ๐ž๐š๐ซ๐ง as a full subcategory.

Definition 2.19 (The operad ๐•† โข ๐ซ๐  ).

We define ๐•† โข ๐ซ๐  to be the category-enriched operad defined as follows. The objects of ๐•† โข ๐ซ๐  are polynomials: Ob โก ( ๐•† โข ๐ซ๐  ) โ‰” Ob โก ( ๐๐จ๐ฅ๐ฒ ) . For objects ๐‘ 1 , โ€ฆ , ๐‘ ๐‘˜ , ๐‘ โ€ฒ , the category of maps between them is defined by

๐•† โข ๐ซ๐  โข ( ๐‘ 1 , โ€ฆ , ๐‘ ๐‘˜ ; ๐‘ โ€ฒ ) โ‰” [ ๐‘ 1 โŠ— โ‹ฏ โŠ— ๐‘ ๐‘˜ , ๐‘ โ€ฒ ] โข

โข ๐‚๐จ๐š๐ฅ๐  .

For any object ๐‘ , the identity on ๐‘ is given by the [ ๐‘ , ๐‘ ] -coalgebra 1 โ†’ [ ๐‘ , ๐‘ ] โข ( 1 ) โ‰… ๐๐จ๐ฅ๐ฒ โข ( ๐‘ , ๐‘ ) that sends 1 โ†ฆ id ๐‘ .

Given objects ๐‘ 1 , 1 , โ€ฆ , ๐‘ 1 , ๐‘— 1 , โ€ฆ , ๐‘ ๐‘˜ , 1 , โ€ฆ , ๐‘ ๐‘˜ , ๐‘— ๐‘˜ , the composition functor

[ ๐‘ 1 , 1 โŠ— โ‹ฏ โŠ— ๐‘ 1 , ๐‘— 1 , ๐‘ 1 ] โข

โข ๐‚๐จ๐š๐ฅ๐  ร— โ‹ฏ ร— [ ๐‘ ๐‘˜ , 1 โŠ— โ‹ฏ โŠ— ๐‘ ๐‘˜ , ๐‘— ๐‘˜ , ๐‘ ๐‘˜ ] โข

โข ๐‚๐จ๐š๐ฅ๐ 

ร— [ ๐‘ 1 โŠ— โ‹ฏ โŠ— ๐‘ ๐‘˜ , ๐‘ โ€ฒ ]

๐‚๐จ๐š๐ฅ๐  โ†’ [ ๐‘ 1 , 1 โŠ— โ‹ฏ โŠ— ๐‘ ๐‘˜ , ๐‘— ๐‘˜ , ๐‘ โ€ฒ ]

๐‚๐จ๐š๐ฅ๐ 

is given by repeated application of the maps in (9) and Proposition 2.13.

How do we think of a morphism ( ๐‘† , ๐›ฝ ) : ( ๐‘ 1 , โ€ฆ , ๐‘ ๐‘˜ ) โ†’ ๐‘ โ€ฒ in ๐•† โข ๐ซ๐  ? It is a dynamical system which has a set ๐‘† of states. For every state ๐‘  โˆˆ ๐‘† , we can read out an associated element ๐›ฝ 1 โข ( ๐‘  ) : ๐‘ 1 โŠ— โ‹ฏ โŠ— ๐‘ ๐‘˜ โ†’ ๐‘ โ€ฒ ; we can think of this as a wiring diagram as in Example 2.16 or a generalization thereof. That is, the current state ๐‘  dictates an organization pattern ๐›ฝ 1 โข ( ๐‘  ) : how outputs of the internal systems are aggregated and output from the outer interface, and how feedback from outside is distributed internally.

But so far, this is only the readout of ๐›ฝ . Whatโ€™s an input? An input to this system consists of a tuple of outputs ๐‘– โ‰” ( ๐‘– 1 , โ€ฆ , ๐‘– ๐‘˜ ) โˆˆ ๐‘ 1 โข ( 1 ) ร— โ‹ฏ ร— ๐‘ ๐‘˜ โข ( 1 ) , one output for each of the internal systems, together with an input ๐‘‘ โˆˆ ๐‘ โ€ฒ โข [ ๐›ฝ 1 โข ( ๐‘– ) ] to the outer system.

Imagine youโ€™re the officer in charge of an organization: youโ€™re in charge of the system by which your employees and other resources are arranged, how they send information to each other and the outside world, and how the feedback from the outside world is disbursed to the employees and resources. You see what they do, you see how the world responds, and you update your internal state and hence the system itself, however you see fit. In this image, you as the officer are playing the role of ( ๐‘† , ๐›ฝ ) , i.e. a morphism in ๐•† โข ๐ซ๐  . But even a simple logic gate or adder circuit in a computerโ€”something that doesnโ€™t have a changing internal state or update how resources are connectedโ€”counts as a morphism in ๐•† โข ๐ซ๐  . Again, the only difference in that case is that the state set ๐‘† โ‰… 1 , the way the internal resources are connectedโ€”is unchanged by inputs.

Example 2.20.

For any operad, there is an algebra of 0 -ary morphisms. In the case of ๐•† โข ๐ซ๐  , this algebra sends ๐‘ โ†ฆ ๐‘ โข

โข ๐‚๐จ๐š๐ฅ๐  , the category of dynamical systems on ๐‘ , since the unit of โŠ— is ๐“Ž and [ ๐“Ž , ๐‘ ] โ‰… ๐‘ .

Next weโ€™ll give a mathematical language for describing dynamical systems as in Example 2.20 as well as the generalized learners (or officers) described above.

3Toposes of learners

We ended the previous section by defining the (category-enriched) operad ๐•† โข ๐ซ๐  and explaining how it generalizes the bicategory ๐•ƒ โข ๐ž๐š๐ซ๐ง . In this section we mainly discuss the internal language for each learner. That is, given ๐‘ , ๐‘ โ€ฒ โˆˆ Ob โก ( ๐๐จ๐ฅ๐ฒ )

Ob โก ( ๐•† โข ๐ซ๐  ) , where perhaps ๐‘

๐‘ 1 โŠ— โ‹ฏ โŠ— ๐‘ ๐‘˜ , we discuss the category ๐•† โข ๐ซ๐  โข ( ๐‘ ; ๐‘ โ€ฒ ) of such learners.

Our first job is to show that every such category is a topos; this will give us access to the Mitchell-Benabou language and Kripke-Joyal semanticsโ€”the so-called internal language of the topos and its interpretation [macLane1992sheaves]. We then explain the sorts of thingsโ€”propositionsโ€”that one can express in this language, e.g. the proposition โ€œI will follow the gradient descent algorithmโ€ is a particular case.

3.1The topos of ๐‘ -coalgebras

In this section, we show that for any polynomial ๐‘ , there is a category ๐’ž ๐‘ , called the cofree category on ๐‘ , for which we can find an equivalence

๐‘ โข

โข ๐‚๐จ๐š๐ฅ๐  โ‰… ๐’ž ๐‘ โข

โข ๐’๐ž๐ญ

between ๐‘ -coalgebras and functors ๐’ž ๐‘ โ†’ ๐’๐ž๐ญ . In fact, the category ๐’ž ๐‘ is free on a graph, making it quite easy to understand in certain respects.3

Following [nlab:tree], we define a rooted tree to be a graph ๐‘‡ whose free category has an initial object, called the root; the idea is that for any node ๐‘› , there is exactly one path from the root to ๐‘› . We denote the nodes of ๐‘‡ by ๐‘‡ 0 , the root by root ๐‘‡ โˆˆ ๐‘‡ 0 , and for any node ๐‘› โˆˆ ๐‘‡ 0 we denote the set of arrows emanating from ๐‘› by ๐‘‡ โข [ ๐‘› ] . Note that at the target ๐‘› โ€ฒ of any arrow ๐‘Ž โˆˆ ๐‘‡ โข [ ๐‘› ] , there sits another rooted tree with root ๐‘› โ€ฒ ; we denote this tree by cod ๐‘‡ โก ( ๐‘Ž ) .

Definition 3.1 (The graph ๐–ณ๐—‹๐–พ๐–พ ๐‘ of ๐‘ -trees).

For a polynomial ๐‘ โˆˆ ๐๐จ๐ฅ๐ฒ , define a ๐‘ -tree to be a tuple ( ๐‘‡ , ๐œ™ 1 , ๐œ™ โ™ฏ ) , where ๐‘‡ is a rooted tree, ๐œ™ 1 : ๐‘‡ 0 โ†’ ๐‘ โข ( 1 ) is a function called the position function, and ๐œ™ ๐‘› โ™ฏ is a bijection

๐œ™ ๐‘› โ™ฏ : ๐‘ โข [ ๐œ™ 1 โข ( ๐‘› ) ] โ†’ โ‰… ๐‘‡ โข [ ๐‘› ]

for each node ๐‘› โˆˆ ๐‘‡ 0 , identifying the set of branches in the tree ๐‘‡ at node ๐‘› with the set of directions in the polynomial ๐‘ at the position ๐œ™ 1 โข ( ๐‘› ) .

We denote by ๐–ณ๐—‹๐–พ๐–พ ๐‘ the graph whose vertex set is the set of ๐‘ -trees, and for which an arrow ๐‘Ž : ๐‘‡ โ†’ ๐‘‡ โ€ฒ is a branch ๐‘Ž โˆˆ ๐‘‡ โข [ root ๐‘‡ ] with ๐‘‡ โ€ฒ

cod ๐‘‡ โก ( ๐‘Ž ) .

Example 3.2.

If ๐‘

๐“Ž ๐ด for a nonempty set ๐ด then there is only one ๐‘ -tree: each node has the unique label ๐‘ โข ( 1 ) โ‰… 1 and ๐ด -many branches.

If ๐‘

{ go } โข ๐“Ž 1 + { stop } โข ๐“Ž 0 โ‰… ๐“Ž + 1 then counting the number of nodes gives a bijection between set of ๐‘ -trees and the set โ„• โˆช { โˆž }

โˆ™ stop , โˆ™ go โ†’ โˆ™ stop , โˆ™ go โ†’ โˆ™ go โ†’ โˆ™ stop , โ€ฆ , โˆ™ go โ†’ โˆ™ go โ†’ โˆ™ go โ†’ โ‹ฏ

Theorem 3.3.

For any polynomial ๐‘ there is an equivalence of categories

๐‘ โข

โข ๐‚๐จ๐š๐ฅ๐  โ‰… ๐“๐ซ๐ž๐ž ๐‘ โข

โข ๐’๐ž๐ญ

where ๐“๐ซ๐ž๐ž ๐‘ is the free category on the graph ๐–ณ๐—‹๐–พ๐–พ ๐‘ of ๐‘ -trees.

Proof.

See page A. โˆŽ

โˆ™ โˆ™ โˆ™ โˆ™ โˆ™ โˆ™ โˆ™ โˆ™ โˆ™ โˆ™ โˆ™ โˆ™ โˆ™ โˆ™ โˆ™ โˆ™ โˆ™ โˆ™ โˆ™ โˆ™ โˆ™ โˆ™ โˆ™ โˆ™ โˆ™ โˆ™
Figure 1:Left: a dynamical system, i.e. coalgebra, for the polynomial ๐‘ โ‰” { โˆ™ , โˆ™ } ๐“Ž 2 + โˆ™ โ‰… 2 ๐“Ž 2 + 1 . Right: the ๐‘ -tree corresponding to the node โˆ™ . 3.2The internal language of ๐‘ โข

โข ๐‚๐จ๐š๐ฅ๐ 

For any category ๐’ž , the category ๐’ž โข

โข ๐’๐ž๐ญ of functors ๐’ž โ†’ ๐’๐ž๐ญ forms a topos. In particular, this means that mathematicians have already developed a language and logic that faithfully represents the structures of ๐’ž โข

โข ๐’๐ž๐ญ , and we can import it wholesale; see [fong2019seven, Chapter 7] or [macLane1992sheaves, Chapter VI]. Now that we know from Theorem 3.3 that ๐‘ โข

โข ๐‚๐จ๐š๐ฅ๐  is a topos for any ๐‘ โˆˆ ๐๐จ๐ฅ๐ฒ , we are interested in corresponding language for the topos ๐•ƒ โข ๐ž๐š๐ซ๐ง โข ( ๐ด , ๐ต )

[ ๐ด โข ๐“Ž ๐ด , ๐ต โข ๐“Ž ๐ต ] โข

โข ๐‚๐จ๐š๐ฅ๐  of learners, for any sets ๐ด , ๐ต ; hence the title โ€œlearnersโ€™ languages.โ€ However since most of the relevant abstractions work more generally for ๐‘ โข

โข ๐‚๐จ๐š๐ฅ๐  , weโ€™ll mainly work there.

Not assuming the reader knows topos theory, we will proceed as though we are defining the few relevant concepts from scratch, when in actuality we are merely โ€œreading them offโ€ from the established literature. For example Definition 3.4 simply unpacks the topos-theoretic definition of a logical proposition as a subobject of the terminal object in the topos ๐‘ โข

โข ๐‚๐จ๐š๐ฅ๐  .

Definition 3.4.

A logical proposition (about ๐‘ -coalgebras) is defined to be a set ๐‘ƒ โІ ๐–ณ๐—‹๐–พ๐–พ ๐‘ of ๐‘ -trees satisfying the condition that if ๐‘‡ โˆˆ ๐‘ƒ is a tree in ๐‘ƒ , then for any direction ๐‘‘ โˆˆ ๐‘‡ โข [ root ๐‘‡ ] , the tree cod ๐‘‡ โก ( ๐‘‘ ) โˆˆ ๐‘ƒ is also a tree in ๐‘ƒ .

Proposition 3.5 gives us an easy way to construct logical propositions about ๐‘ -coalgebras, and hence learners. Namely, it says if we put a condition on the ๐‘ -positions that can show up as labels, and if we put a condition on the codomain map (how directions in the tree lead to new positions), we get a logical proposition. Of course, these arenโ€™t the only ones, but they form a nice special case.

Recall from Definition 3.1 that a ๐‘ -tree is a rooted tree ๐‘‡ equipped with a position function ๐œ™ 1 : ๐‘‡ 0 โ†’ ๐‘ โข ( 1 ) ; we elide the bijections (earlier denoted ๐œ™ ๐‘› โ™ฏ : ๐‘‡ โข [ ๐‘› ] โ‰… ๐‘ โข [ ๐œ™ 1 โข ( ๐‘› ) ] ) in what follows.

Proposition 3.5.

Given ๐‘ โˆˆ ๐๐จ๐ฅ๐ฒ , suppose given subsets

๐‘„ โІ ๐‘ โข ( 1 ) and ๐‘… โІ โˆ ๐‘– โˆˆ ๐‘„ โˆ ๐‘‘ โˆˆ ๐‘ โข [ ๐‘– ] ๐‘„ .

Then the following set of trees is a logical proposition:

๐‘ƒ ๐‘„ ๐‘… โ‰” { ๐‘‡ โˆˆ ๐–ณ๐—‹๐–พ๐–พ ๐‘ โˆฃ โˆ€ ( ๐‘– : ๐‘‡ 0 ) . ๐œ™ 1 ( ๐‘– ) โˆˆ ๐‘„ โˆง โˆ€ ( ๐‘‘ : ๐‘‡ [ ๐‘– ] ) . cod ๐‘‡ ( ๐‘‘ ) โˆˆ ( ๐‘… ๐‘– ๐‘‘ ) } .

Proof.

The result is immediate from Definition 3.4. โˆŽ

Example 3.6 (Gradient descent).

The gradient descent, backpropagation algorithm used by each โ€œneuronโ€ in a deep learning architecture can be phrased as a logical proposition about learners. The whole learning architecture is then put together as in [fong2019backprop], or as weโ€™ve explained things above, using the operad ๐•† โข ๐ซ๐  from Definition 2.19.

So suppose a neuron is tasked with learning a function โ„ ๐‘š โ†’ โ„ ๐‘› , and it has a parameter space โ„ ๐‘˜ , i.e. we are given a smooth function ๐‘“ : โ„ ๐‘˜ ร— โ„ ๐‘š โ†’ โ„ ๐‘› . We will define a corresponding logical proposition using Proposition 3.5. Define ๐‘ โˆˆ ๐๐จ๐ฅ๐ฒ by

๐‘ โ‰” [ โ„ ๐‘š โข ๐“Ž โ„ ๐‘š , โ„ ๐‘› โข ๐“Ž โ„ ๐‘› ] โ‰… โˆ‘ ๐‘” : โ„ ๐‘š โข ๐“Ž โ„ ๐‘š โ†’ โ„ ๐‘› โข ๐“Ž โ„ ๐‘› ๐“Ž โ„ ๐‘š ร— โ„ ๐‘› .

Define ๐‘„ โІ { ( ๐‘” 1 , ๐‘” โ™ฏ ) โˆฃ ๐‘” 1 : โ„ ๐‘š โ†’ โ„ ๐‘› , ๐‘” โ™ฏ : โ„ ๐‘š ร— โ„ ๐‘› โ†’ โ„ ๐‘š } by saying that ๐‘” 1 โข ( ๐‘ฅ ) must be of the form ๐‘“ โข ( ๐‘Ž , ๐‘ฅ ) for some ๐‘Ž โˆˆ โ„ ๐‘˜ in the parameter set and that ๐‘” ๐‘ฅ โ™ฏ is given by โ€œpulling back gradient vectorsโ€ using the map on cotangent spaces defined by composing with the derivative of ๐‘” 1 at ๐‘ฅ , in the usual way.

Now given ( ๐‘” 1 , ๐‘” โ™ฏ ) โˆˆ ๐‘„ , we continue with the setup of Proposition 3.5 by defining ๐‘… โข ( ๐‘” 1 , ๐‘” โ™ฏ ) : โ„ ๐‘š ร— โ„ ๐‘› โ†’ ๐‘„ to say how the learner updates its current parameter value ๐‘Ž โˆˆ โ„ ๐‘˜ given an input-output pair; this again is specified by the deep learning algorithm. Typically, it uses a loss function to calculate a cotangent vector at ๐‘“ โข ( ๐‘Ž , ๐‘ฅ ) which is passed back to a cotangent vector at ๐‘Ž , and a dual vector of some โ€œlearning rateโ€ ๐œ– is traversed.

The details are important for implementation, but not for understanding the idea. The idea is that as long as we say what sorts of maps are allowed (smooth maps with reverse derivatives) and how they update, we have defined a logical proposition.

The logical propositions that come from Proposition 3.5 are very special. More generally, one could have a logical proposition like โ€œwhenever I receive two red tokens within three seconds, I will wait five seconds and then send either three blue tokens or two blues and six reds.โ€ As long as this behavior has the โ€œwheneverโ€ flavorโ€”more precisely as long as it satisfies the condition in Definition 3.4โ€”it will be a logical proposition in the topos.

3.3Future work

There are many avenues for future work. One is to give more syntactic languageโ€”beyond the logical symbols true , false , โˆง , โˆจ , โ‡’ , ยฌ , โˆ€ , โˆƒ that exist in any toposโ€”for building logical propositions in the ๐‘ โข

โข ๐‚๐จ๐š๐ฅ๐  toposes specifically. Another is to understand various modalities in these toposes.

The sort of morphisms between toposes that seem to arise most naturally in this context are not the usual kindโ€”adjoint functors โ„ฐ โ‡† โ„ฐ โ€ฒ for which the left adjoint preserves all finite limits, called geometric morphismsโ€”but instead adjoint functors โ„ฐ โ‡† โ„ฐ โ€ฒ for which the left adjoint preserves all connected limits. Thus another avenue for future research is to consider how logical and type-theoretic statements move between toposes that are connected in this way.

Appendix AProofs Proof of Proposition 2.4.

It is well-known that composition of functors is a monoidal operation, so it suffices to see that the polynomial (6) is the composite of functors ๐‘ , ๐‘ž . To show this, we use the fact that for any set ๐ด we have a bijection ๐“Ž ๐ด โ‰… โˆ ๐‘Ž โˆˆ ๐ด ๐“Ž to calculate the composite

๐‘ โ— ๐‘ž

โ‰… โˆ‘ ๐‘– โˆˆ ๐‘ โข ( 1 ) โˆ ๐‘‘ โˆˆ ๐‘ โข [ ๐‘– ] ๐“Ž โ— โˆ‘ ๐‘— โˆˆ ๐‘ž โข ( 1 ) โˆ ๐‘’ โˆˆ ๐‘ž โข [ ๐‘— ] ๐“Ž

โ‰… โˆ‘ ๐‘– โˆˆ ๐‘ โข ( 1 ) โˆ ๐‘‘ โˆˆ ๐‘ โข [ ๐‘– ] โˆ‘ ๐‘— โˆˆ ๐‘ž โข ( 1 ) โˆ ๐‘’ โˆˆ ๐‘ž โข [ ๐‘— ] ๐“Ž

โ‰… โˆ‘ ๐‘– โˆˆ ๐‘ โข ( 1 ) โˆ‘ ๐‘— : ๐‘ โข [ ๐‘– ] โ†’ ๐‘ž โข ( 1 ) โˆ ๐‘‘ โˆˆ ๐‘ โข [ ๐‘– ] โˆ ๐‘’ โˆˆ ๐‘ž โข [ ๐‘— โข ( ๐‘‘ ) ] ๐“Ž โ‰… โˆ‘ ๐‘– โˆˆ ๐‘ โข ( 1 ) โˆ‘ ๐‘— : ๐‘ โข [ ๐‘– ] โ†’ ๐‘ž โข ( 1 ) ๐“Ž โˆ‘ ๐‘‘ โˆˆ ๐‘ โข [ ๐‘– ] ๐‘ž โข [ ๐‘— โข ๐‘‘ ]

where the first isomorphism is (3), the second is substitution, the third is the distributive law, and the fourth is properties of exponents. โˆŽ

Proof of Proposition 2.7.

With the formula given, it is clear that the โŠ— -operation is associative (up to isomorphism), and that ๐“Ž , which has ๐“Ž โข ( 1 ) โ‰… 1 and ๐“Ž โข [ 1 ] โ‰… ๐“Ž , is a unit. One can also check that the formula is functorial in ๐‘ and ๐‘ž , completing the proof. โˆŽ

Proof of Proposition 2.8.

The natural isomorphism is given by rearranging terms:

๐๐จ๐ฅ๐ฒ โข ( ๐‘Ÿ โŠ— ๐‘ , ๐‘ž )

โ‰… โˆ ๐‘˜ โˆˆ ๐‘Ÿ โข ( 1 ) โˆ ๐‘– โˆˆ ๐‘ โข ( 1 ) โˆ‘ ๐‘— โˆˆ ๐‘ž โข ( 1 ) โˆ ๐‘’ โˆˆ ๐‘ž โข [ ๐‘— ] ๐‘Ÿ โข [ ๐‘˜ ] ร— ๐‘ โข [ ๐‘– ]

โ‰… โˆ ๐‘˜ โˆˆ ๐‘Ÿ โข ( 1 ) โˆ‘ ๐œ‘ 1 : ๐‘ โข ( 1 ) โ†’ ๐‘ž โข ( 1 ) โˆ ๐‘– โˆˆ ๐‘ โข ( 1 ) โˆ ๐‘’ โˆˆ ๐‘ž โข [ ๐œ‘ 1 โข ( ๐‘– ) ] ๐‘Ÿ โข [ ๐‘˜ ] ร— ๐‘ โข [ ๐‘– ]

โ‰… โˆ ๐‘˜ โˆˆ ๐‘Ÿ โข ( 1 ) โˆ‘ ๐œ‘ 1 : ๐‘ โข ( 1 ) โ†’ ๐‘ž โข ( 1 ) ( โˆ ๐‘– โˆˆ ๐‘ โข ( 1 ) โˆ ๐‘’ โˆˆ ๐‘ž โข [ ๐œ‘ 1 โข ( ๐‘– ) ] ๐‘ โข [ ๐‘– ] ) ร— ( โˆ ๐‘– โˆˆ ๐‘ โข ( 1 ) โˆ ๐‘’ โˆˆ ๐‘ž โข [ ๐œ‘ 1 โข ( ๐‘– ) ] ๐‘Ÿ โข [ ๐‘˜ ] )

โ‰… โˆ ๐‘˜ โˆˆ ๐‘Ÿ โข ( 1 ) โˆ‘ ๐œ‘ : ๐‘ โ†’ ๐‘ž โˆ ๐‘– โˆˆ ๐‘ โข ( 1 ) โˆ ๐‘’ โˆˆ ๐‘ž โข [ ๐œ‘ 1 โข ( ๐‘– ) ] ๐‘Ÿ โข [ ๐‘˜ ]

โ‰… ๐๐จ๐ฅ๐ฒ โข ( ๐‘Ÿ , โˆ‘ ๐œ‘ : ๐‘ โ†’ ๐‘ž ๐“Ž โˆ‘ ๐‘– โˆˆ ๐‘ โข ( 1 ) ๐‘ž โข [ ๐œ‘ 1 โข ( ๐‘– ) ] ) โ‰… ๐๐จ๐ฅ๐ฒ โข ( ๐‘Ÿ , [ ๐‘ , ๐‘ž ] ) .

In order, these isomorphisms are given by: unfolding the definition of morphisms in ๐๐จ๐ฅ๐ฒ , distributivity, products commuting with products, definition of morphisms in ๐๐จ๐ฅ๐ฒ , rules of exponents, and Eq. 7โ€™s definition of [ ๐‘ , ๐‘ž ] , respectively. โˆŽ

Proof of Proposition 2.13.

We need to give not only the functor ๐œ† : ๐‘ โข

โข ๐‚๐จ๐š๐ฅ๐  ร— ๐‘ž โข

โข ๐‚๐จ๐š๐ฅ๐  โ†’ ( ๐‘ โŠ— ๐‘ž ) โข

โข ๐‚๐จ๐š๐ฅ๐  , for any ๐‘ , ๐‘ž โˆˆ ๐๐จ๐ฅ๐ฒ but also a functor { 1 } โ†’ ๐“Ž โข

โข ๐‚๐จ๐š๐ฅ๐  , which we can identify with a ๐“Ž -coalgebra; we take the latter to be the unique function 1 โ†’ ๐“Ž โ— 1 . For the former, one could proceed abstractly using the fact that there is a duoidal structure on ๐๐จ๐ฅ๐ฒ

( ๐‘ โ— ๐‘  ) โŠ— ( ๐‘ž โ— ๐‘ก ) โ†’ ( ๐‘ โŠ— ๐‘ž ) โ— ( ๐‘  โŠ— ๐‘ก ) .

Indeed, since for sets ๐‘† , ๐‘‡ we have ๐‘† โŠ— ๐‘‡ โ‰… ๐‘† ร— ๐‘‡ , the result will follow from the properties of duoidal structures (applied in the case where ๐‘  โ‰” ๐‘† and ๐‘ก โ‰” ๐‘‡ are constant polynomials). However, for the readerโ€™s convenience, we will give the map ๐œ† : ๐‘ โข

โข ๐‚๐จ๐š๐ฅ๐  ร— ๐‘ž โข

โข ๐‚๐จ๐š๐ฅ๐  โ†’ ( ๐‘ โŠ— ๐‘ž ) โข

โข ๐‚๐จ๐š๐ฅ๐  more explicitly.

Given ๐›ฝ : ๐‘† โ†’ ๐‘ โ— ๐‘† and ๐›พ : ๐‘‡ โ†’ ๐‘ž โ— ๐‘‡ , we define a function

๐‘† โข ๐‘‡

โ†’ ( ๐‘ โŠ— ๐‘ž ) โ— ( ๐‘† โข ๐‘‡ ) โ‰… โˆ‘ ๐‘– โˆˆ ๐‘ โข ( 1 ) โˆ‘ ๐‘— โˆˆ ๐‘ž โข ( 1 ) ( ๐‘† โข ๐‘‡ ) ๐‘ โข [ ๐‘– ] ร— ๐‘ž โข [ ๐‘— ]

( ๐‘  , ๐‘ก )

โ†ฆ ( ๐‘– โ‰” ๐›ฝ 1 ( ๐‘  ) , ๐‘— โ‰” ๐›พ 1 ( ๐‘ก ) , ( ๐‘‘ , ๐‘’ ) โ†ฆ ( ๐›ฝ ๐‘– โ™ฏ ( ๐‘‘ ) , ๐›พ ๐‘— โ™ฏ ( ๐‘’ ) ) .

This is natural in ๐‘† , ๐‘‡ , which makes ๐œ† a functor for any ๐‘ , ๐‘ž . One can check that all the axioms of a lax monoidal functor are verified, in the sense that the required diagrams commute up to natural isomorphism. โˆŽ

Proof of Proposition 2.18.

On one hand, an object in ๐•ƒ โข ๐ž๐š๐ซ๐ง โข ( ๐ด , ๐ต ) as described in (2) consists of a set ๐‘ƒ and functions ๐ด ร— ๐‘ƒ โ†’ ๐ต and ๐ด ร— ๐ต ร— ๐‘ƒ โ†’ ๐ด and ๐ด ร— ๐ต ร— ๐‘ƒ โ†’ ๐‘ƒ . On the other hand, we have [ ๐ด โข ๐“Ž ๐ด , ๐ต โข ๐“Ž ๐ต ] โ‰… ๐ต ๐ด โข ๐ด ๐ด โข ๐ต โข ๐“Ž ๐ด โข ๐ต by Example 2.9, so a coalgebra ๐‘ƒ โ†’ [ ๐ด โข ๐“Ž ๐ด , ๐ต โข ๐“Ž ๐ต ] โ— ๐‘ƒ consists of a function ๐‘ƒ โ†’ ๐ต ๐ด , a function ๐‘ƒ โ†’ ๐ด ๐ด โข ๐ต , and a function ๐‘ƒ โ†’ ๐‘ƒ ๐ด โข ๐ต . The two descriptions can be identified by currying. The [ ๐ด โข ๐“Ž ๐ด , ๐ต โข ๐“Ž ๐ต ] โข

โข ๐‚๐จ๐š๐ฅ๐  morphisms

๐‘ƒ ๐ต ๐ด โข ๐ด ๐ด โข ๐ต โข ๐‘ƒ ๐ด โข ๐ต ๐‘ƒ โ€ฒ ๐ต ๐ด โข ๐ด ๐ด โข ๐ต โข ( ๐‘ƒ โ€ฒ ) ๐ด โข ๐ต

are easily seen to coincide with those shown in (14). โˆŽ

Proof of Theorem 3.3.

It is well-known that ๐’ž โข

โข ๐’๐ž๐ญ is equivalent to the category of discrete opfibrations over ๐’ž via the category-of-elements construction. When ๐’ž is free on a graph ๐บ , the category of elements for any functor โ„Ž : ๐’ž โ†’ ๐’๐ž๐ญ is also free on a graph, say ๐ป . In this case the opfibration can be identified with a graph homomorphism ๐œ‹ : ๐ป โ†’ ๐บ with the property (โ€œopfibโ€) that for any vertex โ„Ž โˆˆ ๐ป , the function on arrows ๐ป โข [ โ„Ž ] โ†’ โ‰… ๐บ โข [ ๐œ‹ โข ( โ„Ž ) ] induced by ๐œ‹ is a bijection. Under this correspondence, a morphism โ„Ž โ†’ โ„Ž โ€ฒ of copresheaves is identified with a graph homomorphism ๐‘“ : ๐ป โ†’ ๐ป โ€ฒ for which ๐œ‹

๐œ‹ โ€ฒ โˆ˜ ๐‘“ .

Thus we have reduced to showing that there is an equivalence between ๐‘ โข

โข ๐‚๐จ๐š๐ฅ๐  and the category of those graph homomorphisms ๐œ‹ : ๐ป โ†’ ๐–ณ๐—‹๐–พ๐–พ ๐‘ that have the opfib property. Suppose given a ๐‘ -coalgebra ๐›ฝ : ๐‘† โ†’ ๐‘ โ— ๐‘† ; it includes a function ๐›ฝ 1 : ๐‘† โ†’ ๐‘ โข ( 1 ) and for each ๐‘  โˆˆ ๐‘† a function ๐›ฝ ๐‘  โ™ฏ : ๐‘ โข [ ๐›ฝ 1 โข ( ๐‘  ) ] โ†’ ๐‘† . We define the corresponding graph ๐บ ๐‘† , ๐›ฝ to have vertex set ๐‘† , and each ๐‘  โˆˆ ๐‘† to have ๐‘ โข [ ๐›ฝ 1 โข ( ๐‘  ) ] -many outgoing arrows; the target of each outgoing arrow ๐‘‘ โˆˆ ๐‘ โข [ ๐›ฝ 1 โข ( ๐‘  ) ] is defined to be ๐›ฝ ๐‘  โ™ฏ โข ( ๐‘‘ ) . The graph homomorphism ๐œ‹ : ๐บ ๐‘† , ๐›ฝ โ†’ ๐–ณ๐—‹๐–พ๐–พ ๐‘ is defined inductively: for any ๐‘  โˆˆ ๐‘† , the ๐‘ -tree ๐œ‹ โข ( ๐‘  ) has root labeled ๐›ฝ 1 โข ( ๐‘  ) , and for each outgoing branch ๐‘‘ โˆˆ ๐‘ โข [ ๐›ฝ 1 โข ( ๐‘  ) ] the target vertex is assigned the label ๐›ฝ 1 โข ( ๐‘  โ€ฒ ) , where ๐‘  โ€ฒ โ‰” ๐›ฝ ๐‘  โ™ฏ โข ( ๐‘‘ ) , and for each outgoing branch ๐‘‘ โ€ฒ โˆˆ ๐‘ โข [ ๐›ฝ 1 โข ( ๐‘  โ€ฒ ) ] the target vertex is assigned the label ๐›ฝ 1 โข ( ๐‘  โ€ฒโ€ฒ ) where ๐‘  โ€ฒโ€ฒ โ‰” ๐›ฝ ๐‘  โ€ฒ โ™ฏ โข ( ๐‘‘ โ€ฒ ) , and so on. It is clear that ๐œ‹ satisfies the opfib property, since it assigns to each vertex ๐‘  in the graph ๐บ ๐‘† , ๐›ฝ a vertex in ๐–ณ๐—‹๐–พ๐–พ ๐‘ (the ๐‘ -tree ๐œ‹ โข ( ๐‘  ) ) with the same set ๐‘ โข [ ๐›ฝ 1 โข ( ๐‘  ) ] of outgoing arrows.

Conversely, given a graph homomorphism ๐œ‹ : ๐บ โ†’ ๐–ณ๐—‹๐–พ๐–พ ๐‘ with the opfib property, let ๐‘† ๐บ be the set of vertices in ๐บ . The required coalgebra map ๐›ฝ : ๐‘† ๐บ โ†’ ๐‘ โ— ๐‘† ๐บ consists of a function ๐›ฝ 1 : ๐‘† ๐บ โ†’ ๐‘ โข ( 1 ) and a function ๐›ฝ ๐‘  โ™ฏ : ๐‘ โข [ ๐›ฝ 1 โข ( ๐‘  ) ] โ†’ ๐‘† ๐บ for every ๐‘  โˆˆ ๐‘† ๐บ . We take the function ๐›ฝ 1 to send vertex ๐‘  to the root label ๐œ™ โข ( root ๐œ‹ โข ( ๐‘  ) ) for tree ๐œ‹ โข ( ๐‘  ) . Since we have a bijection ๐‘ โข [ ๐›ฝ 1 โข ( ๐‘  ) ] โ‰… ๐บ โข [ ๐‘  ] , we can take ๐›ฝ ๐‘  โ™ฏ to simply be the target function ๐บ โข [ ๐‘  ] โ†’ ๐‘† ๐บ for the graph ๐บ .

It is a straightforward calculation to check that these two constructions are mutually inverse, and to check that graph homomorphisms over ๐–ณ๐—‹๐–พ๐–พ ๐‘ correspond bijectively to morphisms of ๐‘ -coalgebras. โˆŽ

\printbibliography Report Issue Report Issue for Selection Generated by L A T E xml Instructions for reporting errors

We are continuing to improve HTML versions of papers, and your feedback helps enhance accessibility and mobile support. To report errors in the HTML that will help us improve conversion and rendering, choose any of the methods listed below:

Click the "Report Issue" button. Open a report feedback form via keyboard, use "Ctrl + ?". Make a text selection and click the "Report Issue for Selection" button near your cursor. You can use Alt+Y to toggle on and Alt+Shift+Y to toggle off accessible reporting links at each section.

Our team has already identified the following issues. We appreciate your time reviewing and reporting rendering errors we may not have found yet. Your efforts will help us improve the HTML versions for all readers, because disability should not be a barrier to accessing research. Thank you for your continued support in championing open access for all.

Have a free development cycle? Help support accessibility at arXiv! Our collaborators at LaTeXML maintain a list of packages that need conversion, and welcome developer contributions.

Xet Storage Details

Size:
48 kB
ยท
Xet hash:
8edf31c226e54277b8b797aa04e83433b6b4b34b131095100757e368a28f7e23

Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.