LearnAut 2022

9:00 Registration
9:25 Opening remarks
9:30 Invited lecture: Exploiting Hankel Matrices for Diversity Sampling with Small Annotation Budgets
In this talk I will address the annotation data bottleneck in the context of sequence modeling. I will attempt to answer the question: if one has a budget of N annotations, which samples should we select for annotation? I will discuss an approach that is based on searching for diversity in the selected sample, by maximizing the amount of information that is useful for the learning algorithm, or equivalently by minimizing the redundancy of samples in the selection. I will formulate this in the context of spectral learning of Weighted Automata for sequence classification. I will show how we can represent unlabeled data using a Hankel matrix, and exploit the notion of spectral max-volume to find a compact sub-block from which annotation samples are drawn. I will show experiments on sequence classification that confirm that the spectral sampling strategy is efficient and yields good models.
10:15 Extending Shinohara's Algorithm for Computing Descriptive (Angluin-Style) Patterns to Subsequence Patterns (link to paper)
(Markus L. Schmid)
The introduction of pattern languages in the seminal work [Angluin, "Finding Patterns Common to a Set of Strings", JCSS 1980] has revived the classical model of inductive inference (learning in the limit, gold-style learning). In [Shinohara, "Polynomial Time Inference of Pattern Languages and Its Application", 7th IBM Symposium on Mathematical Foundations of Computer Science 1982] a simple and elegant algorithm has been introduced that, based on membership queries, computes a pattern that is descriptive for a given sample of input strings (and, consequently, can be employed in strategies for inductive inference). In this paper, we give a brief survey of the recent work [Kleest-Meißner et al., "Discovering Event Queries from Traces: Laying Foundations for Subsequence-Queries with Wildcards and Gap-Size Constraints", ICDT 2022], where the classical concepts of Angluin-style (descriptive) patterns and the respective Shinohara's algorithm are extended to a query class with applications in complex event recognition - a modern topic from databases.
10:35 Coffee break
11:00 On the limit of gradient descent for Simple Recurrent Neural Networks with finite precision (link to paper)
(Rémi Eyraud and Volodimir Mitarchuk)
Despite their great practical successes, the understanding of neural network behavior is a topical research issue. In particular, the class of functions learnable in the context of a finite precision configuration is an open question. In this paper, we propose to study the limits of gradient descent when such a configuration is set for the class of Simple Recurrent Networks. We exhibit conditions under with the gradient descend will provably fail and argue that these conditions correspond to classical experimental setup. We also design a class of SRN based on finite state automata that fulfills the failure requirements.
11:20 Analyzing Büchi Automata with Graph Neural Networks (link to paper)
(Christophe Stammet, Prisca Dotti, Ulrich Ultes-Nitsche and Andreas Fischer )
Büchi Automata on infinite words present many interesting problems and are used frequently in program verification and model checking. A lot of these problems on Büchi automata are computationally hard, raising the question if a learning-based data-driven analysis might be more efficient than using traditional algorithms. Since Büchi automata can be represented by graphs, graph neural networks are a natural choice for such a learning-based analysis. In this paper, we demonstrate how graph neural networks can be used to reliably predict basic properties of Büchi automata when trained on automatically generated random automata datasets.
11:40 Spectral Initialization of Recurrent Neural Networks: Proof of Concept (link to paper)
(Maude Lizaire, Simon Verret and Guillaume Rabusseau)
We propose a new initialization method for recurrent neural networks in the context of language modelling based on their connection with weighted finite automata. In particular, we leverage that linear second order RNN and WFA are equivalent and that WFA can be learned consistently with the spectral learning algorithm. The spectral initialization consists in setting the weights of the RNN with the spectral solution of the corresponding WFA, then letting these parameters be optimized by gradient descent. As a proof of concept, we tested the spectral initialization on synthetic datasets from the PautomAC competition showing the approach outperforms random initializations even with less training data and independently of the model capacity or characteristics of the datasets (alphabet size, type and number of states of the generating machine).
12:00 Robust Attack Graph Generation (link to paper)
(Dennis Mouwen, Sicco Verwer and Azqa Nadeem)
We present a method to learn automaton models that are more robust to input modifications. It iteratively aligns sequences to a learned model, modifies the sequences to their aligned versions, and re-learns the model. Automaton learning algorithms are typically very good at modeling the frequent behavior of a software system. Our solution can be used to also learn the behavior present in infrequent sequences, as these will be aligned to the frequent ones represented by the model. We apply our method to the SAGE tool for modeling attacker behavior from intrusion alerts. In experiments, we demonstrate that our algorithm learns models that can handle noise such as added and removed symbols from sequences. Furthermore, it learns more concise models that fit better to the training data.
12:20 Learning regular non-deterministic distributions via non-linear optimization methods (link to paper)
(Wenjing Chu, Shuo Chen and Marcello Bonsangue)
Probabilistic finite automata (PFA) are recognizers of regular distributions over finite strings, a model that is widely applied in speech recognition and biological systems, for example. While the underlying structure of a PFA is just that of a normal automaton, it is well known that PFA with a non-deterministic underlying structure is more powerful than a deterministic one. In this paper, we concentrate on passive learning non-deterministic PFA from examples and counterexamples using a two steps procedure: first we learn the underlying structure using an algorithm for learning residual finite state automata (RFSA), and then we learn the probabilities of states and transitions using three different optimization methods. We experimentally show with a set of randomly generated probabilistic finite automata that the ones learned using RFSA combined with genetic algorithm outperform other existing methods greatly improving the distance to the automaton to be learned.
12:40 Lunch
14:00 Invited lecture: A Benchmark for the Machine Learning of Regular Languages

This talk presents a new benchmark for machine learning systems on sequence classification. It contains training, development, and test sets from 1980 regular languages from many distinct well-understood subregular language classes in terms of their logical and algebraic structure, providing the most comprehensive, systematic suite of regular languages we are aware of. For each language, we present three nested training set sizes, and four distinct test sets.

This benchmark was constructed for two reasons. First, testing machine learning systems on artificial data sets allow for controlled experiments. For example in our setting, we consider variables such as alphabet size, minimal dfa size, model signature, and logical complexity to help understand the learning capabilities and limitations of ML systems on learning patterns over sequences. Second, previous artificial regular language benchmarks unevenly represented different kinds of regular languages.

We also examine the performance of different neural networks on this benchmark. We draw many conclusions but here are some highlights. One, across the board, neural networks have difficulty learning periodic regular languages; i.e those that require monadic second order logic. Two, they generally perform worse classifying a string x for which there exists a string y such that x∈L iff y∉L and the string edit distance of x and y equals 1. We refer to such pairs of strings as forming the "border" of the language. Third, neither the size of the minimal dfa nor the size of its syntactic monoid correlate with neural network performance. Fourth, neural networks generally perform worse on classifying strings on languages defined in terms of propositional logic with precedence as opposed to propositional logic with successor.

14:45 Learning from Positive and Negative Examples: New Proof for Binary Alphabets (link to paper)
(Jonas Lingg, Mateus de Oliveira Oliveira and Petra Wolf)
One of the most fundamental problems in computational learning theory is the the problem of learning a finite automaton A consistent with a finite set P of positive examples and with a finite set N of negative examples. By consistency, we mean that A accepts all strings in P and rejects all strings in N. It is well known that this problem is NP-complete. In the literature, it is stated that this NP-hardness holds even in the case of a binary alphabet. As a standard reference for this theorem, the work of Gold from 1978 is either cited or adapted. But as a crucial detail, the work of Gold actually considered Mealy machines and not deterministic finite state automata (DFAs) as they are considered nowadays. As Mealy automata are equipped with an output function, they can be more compact than DFAs which accept the same language. We show that this details prevents the adaption of Gold's construction for Mealy machines stated in the literature to work in the setting of DFAs. On the other hand, we give a new construction for DFAs with a binary alphabet, that uses an approach independent from Gold's construction.
15:05 Towards an AAK Theory Approach to Approximate Minimization in the Multi-Letter Case (link to paper)
(Clara Lacroce, Prakash Panangaden and Guillaume Rabusseau)
We study the approximate minimization problem of weighted finite automata (WFAs): given a WFA, we want to compute its optimal approximation when restricted to a given size. We reformulate the problem as a rank-minimization task in the spectral norm, and propose a framework to apply Adamyan-Arov-Krein (AAK) theory to the approximation problem. This approach has already been successfully applied to the case of WFAs and language modelling black boxes over one-letter alphabets (Balle et al. 2021, Lacroce et al. 2021). Extending the result to multi-letter alphabets requires solving the following two steps. First, we need to reformulate the approximation problem in terms of noncommutative Hankel operators and noncommutative functions, in order to apply results from multivariable operator theory. Secondly, to obtain the optimal approximation we need a version of noncommutative AAK theory that is constructive. In this paper, we successfully tackle the first step, while the second challenge remains open.
15:25 An Algebraic Approach to Learning and Grounding (link to paper)
(Johanna Björklund, Adam Dahlgren Lindström and Frank Drewes)
We consider the problem of learning the semantics of composite algebraic expressions from examples. The outcome is a versatile framework for studying learning tasks that can be put into the following abstract form: The input is a partial algebra A and a finite set of samples (φ1, O1), (φ2, O2), ..., each consisting of an algebraic term φi and a set of objects Oi. The objective is to simultaneously fill in the missing algebraic operations in A and ground the variables of every φi in Oi, so that the combined value of the terms is optimised. We demonstrate the applicability of this framework through case studies in grammatical inference, picture-language learning, and the grounding of logic scene descriptions.
15:35 Coffee break
16:00 Invited lecture: Reward Machines: Formal Languages and Automata for Reinforcement Learning
Reinforcement Learning (RL) is proving to be a powerful technique for building sequential decision making systems in cases where the complexity of the underlying environment is difficult to model. Two challenges that face RL are reward specification and sample complexity. Specification of a reward function — a mapping from state to numeric value — can be challenging, particularly when reward-worthy behaviour is complex and temporally extended. Further, when reward is sparse, it can require millions of exploratory episodes for an RL agent to converge to a reasonable quality policy. In this talk I'll show how formal languages and automata can be used to represent complex non-Markovian reward functions. I'll present the notion of a Reward Machine, an automata-based structure that provides a normal form representation for reward functions, exposing function structure in a manner that greatly expedites learning. Finally, I'll also show how these machines can be generated via symbolic planning or learned from data, solving (deep) RL problems that otherwise could not be solved.
16:45 Learning state machines via efficient hashing of future traces (link to paper)
(Robert Baumgartner and Sicco Verwer)
State machines are popular models to model and visualize discrete systems such as software systems, and to represent regular grammars. Most algorithms that passively learn state machines from data assume all the data to be available from the beginning and they load this data into memory. This makes it hard to apply them to continuously streaming data and results in large memory requirements when dealing with large datasets. In this paper we propose a method to learn state machines from data streams using the count-min-sketch data structure to reduce memory requirements. We apply state merging using the well-known red-blue-framework to reduce the search space. We implemented our approach in an established framework for learning state machines, and evaluated it on a well know dataset to provide experimental data, showing the effectiveness of our approach with respect to quality of the results and run-time.
17:05 Marginal Inference queries in Hidden Markov Models under context-free grammar constraints (link to paper)
(Mohamed Reda Marzouk and Colin de la Higuera)
The primary use of any probabilistic model involving a set of random variables is to run inference and sampling queries on it. Inference queries in classical probabilistic models is concerned by the computation of marginal or conditional probabilities of events given as an input. When the probabilistic model is sequential, more sophisticated marginal inference queries involving complex grammars may be of interest in fields such as computational linguistics and NLP. In this work, we address the question of computing the likelihood of context-free grammars (CFGs) in Hidden Markov Models (HMMs). We provide a dynamic algorithm for the exact computation of the likelihood for the class of unambiguous context-free grammars. We show that the problem is NP-Hard, even with the promise that the input CFG has a degree of ambiguity less or equal to 2. We then propose a fully polynomial randomized approximation scheme (FPRAS) algorithm to approximate the likelihood for the case of polynomially-bounded ambiguous CFGs.
17:25 Spectral Regularization: an Inductive Bias for Sequence Modeling (link to paper)
(Kaiwen Hou and Guillaume Rabusseau)
Various forms of regularization in learning tasks strive for different notions of simplicity. This paper presents a spectral regularization technique, which attaches a unique inductive bias to sequence modeling based on an intuitive concept of simplicity defined in the Chomsky hierarchy. From fundamental connections between Hankel matrices and regular grammars, we propose to use the trace norm of the Hankel matrix, the tightest convex relaxation of its rank, as the spectral regularizer. To cope with the fact that the Hankel matrix is bi-infinite, we propose an unbiased stochastic estimator for its trace norm. Ultimately, we demonstrate experimental results on Tomita grammars, which exhibit the potential benefits of spectral regularization and validate the proposed stochastic estimator.
17:35 Sequential Density Estimation via Nonlinear Continuous Weighted Finite Automata (link to paper)
(Tianyu Li, Bogdan Mazoure and Guillaume Rabusseau)
Weighted finite automata (WFAs) have been widely applied in many fields. One of the classic problems for WFAs is probability distribution estimation over sequences of discrete symbols. Although WFAs have been extended to deal with continuous input data, namely continuous WFAs (CWFAs), it is still unclear how to approximate density functions over sequences of continuous random variables using WFA-based models, due to the limitation on the expressiveness of the model as well as the tractability of approximating density functions via CWFAs. In this paper, we propose a nonlinear extension to the CWFA model to first improve its expressiveness, we refer to it as the nonlinear continuous WFAs (NCWFAs). Then we leverage the so-called RNADE method, which is a well-known density estimator based on neural networks, and propose the RNADE-NCWFA model. The RNADE-NCWFA model computes a density function by design. We show that this model is strictly more expressive than the Gaussian HMM model, which CWFA cannot approximate. Empirically, we conduct a synthetic experiment using Gaussian HMM generated data. We focus on evaluating the model's ability to estimate densities for sequences of varying lengths (longer length than the training data). We observe that our model performs the best among the compared baseline methods.
17:45 Towards Efficient Active Learning of PDFA (link to paper)
(Franz Mayr, Sergio Yovine, Federico Pan, Nicolas Basset and Thao Dang)
We propose a new active learning algorithm for PDFA based on three main aspects: a congruence over states which takes into account next-symbol probability distributions, a quantization that copes with differences in distributions, and an efficient tree-based data structure. Experiments showed significant performance gains with respect to reference implementations.
18:05 Closing remarks
18:10 Workshop ends