Passer au contenu

/ Department of Computer Science and Operations Research

Je donne

Rechercher

Navigation secondaire

Prédoc III - Mohammad Pezeshki : Dynamics of Learning and Inference in Neural Networks

Titre : Dynamics of Learning and Inference in Neural Networks
Jury : Aaron Courville, Yoshua Bengio, Pascal Vincent et Simon Lacoste-Julien.
Lieu : AA 3195
Date : Mercredi le 29 Août à 13h30


Résumé :
Neural networks have performed remarkably in a wide variety of machine learning tasks. However, their underlying dynamics of learning, generalization, and computation are far from being fully understood. A recent trend of research has been dedicated to studying the fundamentals of learning with the ambition of pushing the boundaries of neural networks further. Regarding the learning dynamics, in this work, we derive exact equations of learning in a non-linear neural network trained on a binary classification task with gradient descent, given simplifying assumptions such as linear separability of data. Under specific initialization, we find that the overall learning procedure is composed of learning several parallel and independent modes. We also observe that each mode obeys a sigmoidal pattern of learning through time. Regarding the generalization dynamics, we identify a subtle phenomenon in neural networks that we coin as gradient starvation - a phenomenon in which the most salient features prevent other features from being learned.We hypothesize that while this phenomenon could naturally protect neural networks from over-fitting, it could also lead to poor generalization. Along this direction, we provide a preliminary theory on the learning speed of each independent mode. Finally, regarding the inference dynamics, we focus on recurrent neural networks as a dynamical system. For future work, we seek to identify the ties between the structure of the recurrent weights and the patterns of the computations. Our preliminary results suggest that for simple tasks of memorization, the structure of the memory could be revealed by decomposing the recurrent weights.

 
Vous êtes cordialement invité.

Location: 3195, Pavillon André-Aisenstadt, 2920, Chemin de la Tour, Montréal, Canada