Passer au contenu

/ Département d'informatique et de recherche opérationnelle

Je donne

Rechercher

Navigation secondaire

Programme du 50e anniversaire du CRM : Yoshua Bengio - Deep Learning for AI

 Deep Learning for AI

par

Yoshua Bengio

 Université de Montréal

Lundi 16 avril 2018, 11:30-12:30, Salle 1360, Pavillon André-Aisenstadt

    Université de Montréal, 2920 Chemin de la Tour

 

La conférence sera présentée en anglais

Résumé:

There has been rather impressive progress recently with brain-inspiredstatistical learning algorithms based on the idea of learning multiplelevels of representation, also known as neural networks or deeplearning. They shine in artificial intelligence tasks involvingperception and generation of sensory data like images or sounds and tosome extent in understanding and generating natural language. We haveproposed new generative models which lead to training frameworks verydifferent from the traditional maximum likelihood framework, andborrowing from game theory. Theoretical understanding of the successof deep learning is work in progress but relies on representationaspects as well as optimization aspects, which interact. At the heartis the ability of these learning mechanisms to capitalize on thecompositional nature of the underlying data distributions, meaningthat some functions can be represented exponentially more efficientlywith deep distributed networks compared to approaches like standardnon-parametric methods which lack both depth and distributedrepresentations. On the optimization side, we now have evidence thatlocal minima (due to the highly non-convex nature of the trainingobjective) may not be as much of a problem as thought a few years ago,and that training with variants of stochastic gradient descentactually helps to quickly find better-generalizing solutions. Finally,new interesting questions and answers are arising regarding learningtheory for deep networks, why even very large networks do notnecessarily overfit and how the representation-forming structure ofthese networks may give rise to better error bounds which do notabsolutely depend on the iid data hypothesis.