## Bookmarks

18 bookmarks saved.

18 bookmarks saved.

cLang-8 (“cleaned Lang-8”) is a dataset for grammatical error correction (GEC). The source sentences originate from the popular NAIST Lang-8 Learner Corpora, while the target sentences are generated by our state-of-the-art GEC method called gT5. The method is described in our ACL-IJCNLP 2021 paper.

We survey the mathematical foundations of geometric deep learning, focusing on group equivariant and gauge equivariant neural networks. We develop gauge equivariant convolutional neural networks on arbitrary manifolds M using principal bundles with structure group K and equivariant maps between sections of associated vector bundles. We also discuss group equivariant neural networks for homogeneous spaces M=G/K, which are instead equivariant with respect to the global symmetry G on M. Group equivariant layers can be interpreted as intertwiners between induced representations of G, and we show their relation to gauge equivariant convolutional layers. We analyze several applications of this formalism, including semantic segmentation and object detection networks. We also discuss the case of spherical networks in great detail, corresponding to the case M=S2=SO(3)/SO(2). Here we emphasize the use of Fourier analysis involving Wigner matrices, spherical harmonics and Clebsch-Gordan coefficients for G=SO(3), illustrating the power of representation theory for deep learning.

We present our position on the elusive quest for a general-purpose framework for specifying and studying deep learning architectures. Our opinion is that the key attempts made so far lack a coherent bridge between specifying constraints which models must satisfy and specifying their implementations. Focusing on building a such a bridge, we propose to apply category theory — precisely, the universal algebra of monads valued in a 2-category of parametric maps — as asingle theory elegantly subsuming both of these flavours of neural network design. To defend our position, we show how this theory recovers constraints induced by geometric deep learning, as well as implementations of many architectures drawn from the diverse landscape of neural networks, such as RNNs. We also illustrate how the theory naturally encodes many standard constructs in computer science and automata theory.

This site contains Lambek's recent papers, nearly all written in this millennium. They are mostly undated, so I have sorted them by subject matter, which mostly breaks up into linguistics, physics, and category theory. Most of the category theory was for the linguistics. Some of the papers are not complete, since I got them from his typist. Some were published (and I have provided the citations) and some are not. Among the unpublished, most are undated and a couple seem to be duplicates, doubtless revisions. If anyone figures out which the latest version is, I will try to remove the duplicates. If I was able to download the actual publication, it appears as a normal citation. If the citation is preceded by "Appeared in:", I was not able to download the actual publication, but I compiled and included the source file for what may not be the final version. Dates of writing or of appearance are provided when discernible. All in all it is a remarkable amount of writing for someone aged between 75 and 90. According to MathSciNet, he had 19 publications between 2001 and 2013.

A pregroup is a partially ordered monoid endowed with two unary operations called left and right adjunction. Pregroups were recently introduced to help with natural language processing, as we illustrate here by looking at small fragments of three modern European languages. As it turns out, the apparently new algebraic concept of a pregroup had been around for some time in recreational mathematics, although not under this name.

Grammar can be formulated as a kind of substructural propositional logic. In support of this claim, we survey bare Gentzen style deductive systems and two kinds

of non-commutative linear logic: intuitionistic and compact bilinear logic. We also glance at their categorical refinements

We propose a mathematical framework for a unification of the distributional theory of meaning in terms of vector space models, and a compositional theory for grammatical types, for which we rely on the algebra of Pregroups, introduced by Lambek. This mathematical framework enables us to compute the meaning of a well-typed sentence from the meanings of its constituents. Concretely, the type reductions of Pregroups are `lifted' to morphisms in a category, a procedure that transforms meanings of constituents into a meaning of the (well-typed) whole. Importantly, meanings of whole sentences live in a single space, independent of the grammatical structure of the sentence. Hence the inner-product can be used to compare meanings of arbitrary sentences, as it is for comparing the meanings of words in the distributional model. The mathematical structure we employ admits a purely diagrammatic calculus which exposes how the information flows between the words in a sentence in order to make up the meaning of the whole sentence. A variation of our `categorical model' which involves constraining the scalars of the vector spaces to the semiring of Booleans results in a Montague-style Boolean-valued semantics.

Supertagging in lexicalized grammar parsing is known as almost parsing (Bangalore and Joshi, 1999), in that each supertag is syntactically informative and most ambiguities are resolved once a correct supertag is assigned to every word. Recently this property is effectively exploited in A* Combinatory Categorial Grammar (CCG; Steedman (2000)) parsing (Lewis and Steedman, 2014; Lewis et al., 2016), in which the probability of a CCG tree y on a sentence x of length N is the product of the probabilities of supertags (categories) ci (locally factored model):

P (y|x) = ∏ i∈[1,N] P_tag (ci|x). (1)

By not modeling every combinatory rule in a derivation, this formulation enables us to employ efficient A* search (see Section 2), which finds the most probable supertag sequence that can build a well-formed CCG tree.

Although much ambiguity is resolved with this supertagging, some ambiguity still remains. Figure 1 shows an example, where the two CCG parses are derived from the same supertags. Lewis et al.’s approach to this problem is resorting to some deterministic rule. For example, Lewis et al. (2016) employ the attach low heuristics, which is motivated by the right-branching tendency of English, and always prioritizes (b) for this type

of ambiguity. Though for English it empirically works well, an obvious limitation is that it does not always derive the correct parse; consider a phrase “a house in Paris with a garden”, for which the correct parse has the structure corresponding to (a) instead.

In this paper, we provide a way to resolve these remaining ambiguities under the locally factored model, by explicitly modeling bilexical dependencies as shown in Figure 1. Our joint model is still locally factored so that an efficient A* search can be applied. The key idea is to predict the head of every word independently as in Eq. 1 with a strong unigram model, for which we utilize the scoring model in the recent successful graph-based dependency parsing on LSTMs (Kiperwasser and Goldberg, 2016; Dozat and Manning, 2016). Specifically, we extend the bi-directional LSTM (bi-LSTM) architecture of Lewis et al. (2016) predicting the supertag of a word to predict the head of it at the same time with a bilinear transformation.

The importance of modeling structures beyond supertags is demonstrated in the performance gain in Lee et al. (2016), which adds a recursive component to the model of Eq. 1. Unfortunately, this formulation loses the efficiency of the original one since it needs to compute a recursive neural network every time it searches for a new node. Our model does not resort to the recursive networks while modeling tree structures via dependencies.

We also extend the tri-training method of Lewis et al. (2016) to learn our model with dependencies from unlabeled data. On English CCGbank test data, our model with this technique achieves 88.8% and 94.0% in terms of labeled and unlabeled F1, which mark the best scores so far.

Besides English, we provide experiments on Japanese CCG parsing. Japanese employs freer

word order dominated by the case markers and a deterministic rule such as the attach low method may not work well. We show that this is actually the case; our method outperforms the simple application of Lewis et al. (2016) in a large margin, 10.0 points in terms of clause dependency accuracy.

Statisches Archiv von linksunten.archive.indymedia.org

Book recommendations on western mysticism (gnosticism, hermeticism, neo-platonism, ...)

This article is part of the series Understanding ActivityPub, which takes a look at the ActivityPub protocol through the lens of real-world examples. The protocol exchanges are taken from ActivityPub.Academy, a modified Mastodon instance that shows ActivityPub messages in real time (see the announcement post).