Soutenance de thèse - Maude Lizaire
Bonjour à tous et à toutes,
Vous êtes cordialement invité.e.s à la défense de doctorat de Maude Lizaire, le 19 septembre à 13h30 (mode hybride).
Titre: Connecting Neural Networks, Automata Theory and Tensor Network Methods for Sequence Data Learning
Date: vendredi 19 septembre 2025 à 13h30
Location: Agora, MILA, 6650 rue Saint-Urbain
Jury
| Président | Ioannis Mitliagkas |
| Directeur | Guillaume Rabusseau |
| Membre du jury | Pierre-Luc Bacon |
| Examinateur externe | Tai-Danea Bradley |
Abstract:
Despite the remarkable progress of deep learning, neural networks remain mostly “black-box” models that are challenging to scale sustainably. To address these limitations, this thesis introduces a cross-disciplinary approach to sequence modeling that connects neural networks with automata theory and tensor network methods. Automata theory contributes formal guarantees and tractability, while tensor networks offer efficient representations to mitigate the curse of dimensionality. The starting point of this thesis is the recently uncovered connection between second-order recurrent neural networks (2RNNs, RNNs that have multiplicative interactions between input and previous hidden states), weighted finite automata (WFAs) and matrix product states (MPS), a tensor network architecture also known as tensor train.
Our first contribution introduces the spectral initialization for 2RNNs, which consists of setting the weights of the model with the solution of the spectral algorithm for WFAs. This initialization leverages data before gradient-based training, leading to faster convergence and improved performance over random methods, even with fewer data or smaller models
The second contribution characterizes the effect of second-order interactions on the expressive power of recurrent neural networks. While introducing multilinear dependencies between input and hidden state strictly increases the capacity of RNNs, they come at the cost of a large third-order tensor. An approach to mitigate this issue is to parameterize the second-order interactions using a CP decomposition. This model, which we refer to as a CPRNN, is characterized by a rank and we use this additional hyperparameter to formally compare the expressivity of recurrent architectures with varying degrees of multiplicative interactions. The rank proves to be an effective gauge of the bias–variance trade-off, and we corroborate these theoretical findings empirically with language modeling experiments.
The third contribution studies the role of depth in linear RNNs. Unlike feedforward networks, where depth without nonlinearities collapses to a linear map, for linear RNNs there is a subtle interplay between depth and recurrence, which we formally examine in this work. We show that deeper models are strictly more expressive due to an increased memory capacity. We extend our analysis to 2RNNs and show that, unlike linear RNNs, their computations are polynomial, with degree growing with depth. Empirically, we validate our theory on real and synthetic tasks using RNNs, 2RNNs, and state-space models. (modifié)