By Jeremy Nixon [jnixon2@gmail.com]. Nov. 2017. Updated June 2018.

Overview

Categorization of Breakthroughs / Contents
Major / Minor Researchers List (All appearing on papers)
Genealogy
Sorted Researchers by Paper Count
Deep Learning
- Scalability and Speed
- Convolutional Neural Networks
- Recurrent Neural Networks
- Privacy
- Understanding / Theory
- Regularization
Applications
- Speech Recognition
- Image Categorization
- Image Captioning
- Machine Translation
- Natural Language Understanding
- Multi-Modal
- Pedestrian Detection
- Grasp Detection
- Go
- Video
- Dialogue
- 3D Object Reconstruction
- Speaker Verification
- Health Care
- Theorem Proving
- Music
- Pose Estimation
- Speech Generation
- Super Resolution
- Chemistry
- Robotics
  - Autonomous Vehicles
- Physics
- Device Placement
- Games
- Art
Unsupervised Learning
Attention
Memory
Transfer Learning
Representation Learning
Reinforcemnet Learning
- Model-Based Reinforcement Learning
- Multi-Task Learning
Metalearning
- Neural Programming
- Hyperparameter Optimization
Generative
- GANs
Intrepretability
Tools, Environments & Datasets
Adversarial Examples
Multi-Agent Systems
Variational Inference
Kernel Machines
Collaborative Filtering
Graphical / Relational Learning
Miscellaneous
Deep Learning
- Scalability and Speed
  - Large Scale Distributed Deep Networks
  - Multiframe Deep Neural Networks for Acoustic Modeling
  - Asynchronous Stochastic Optimization for Sequence Training of Deep Neural Networks
  - Tensorflow: Large-Scale Machine Learning on Heterogeneous Distributed Systems
  - Distilling the Knowledge in a Neural Network
  - Deep Networks with Large Output Spaces
  - TensorFlow: A System for Large-Scale Machine Learning
  - Revisiting Distributed Synchronous SGD
  - Depthwise Separable Convolutions for Neural Machine Translation
  - Large Scale Distributed Neural Network Training Through Online Distillation
  - Deep Gradient Compression: Reducing the Communication Bandwidth for Distributed Training
- Convolutional Neural Networks
  - Going Deeper with Convolutions [Inception]
  - Rethinking the Inception Architecture for Computer Vision
  - Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning
  - Towards Understanding the Invertibility of Convolutional Neural Networks
- Recurrent Neural Networks
  - Sequence to Sequence Learning with Neural Networks
  - Sequence Discriminative Distributed Training of Long Short-Term Memory Recurrent Neural Networks
  - Recurrent Neural Network Regularization [Also, Language Modeling]
  - Semi-supervised Sequence Learning
  - Learning to Execute
  - An Empirical Exploration of Recurrent Network Architectures
  - A Simple Way to Initialize Recurrent Networks of Rectified Linear Units
  - Using Fast Weights to Attend to the Recent Past
  - Unsupervised Pre-training for Sequence to Sequence Learning
  - Order Matters: Sequence to Sequence for Sets
  - Multi-Task Sequence to Sequence Learning
  - Generating Sentences from a Continuous Space
  - Exponential expressivity in deep neural networks through transient chaos
  - An Online Sequence-to-Sequence Model Using Partial Conditioning
  - A Neural Transducer
  - Tuning Recurrent Neural Networks with Reinforcement Learning
  - Sequence Tutor: Conservative Fine-Tuning of Sequence Generation Models with KL-control
  - SGD Learns the Conjugate Kernel Class of the Network
  - Learning Hierarchical Information Flow with Recurrent Neural Modules
  - Latent Sequence Decompositions
  - Capacity and Trainability in Recurrent Neural Networks
  - Initialization Matters: Orthogonal Predictive State Recurrent Neural Networks
- Privacy
  - Deep Learning with Differential Privacy
  - Semi-Supervised Knowledge Transfer for Deep Learning from Private Training Data
  - Glimmers: Resolving the Privacy / Trust Quagmire
  - Scalable Private Learning with PATE
  - Learning Differentially Private Recurrent Language Models [Also, Language Modeling]
- Understanding / Theory
  - Qualitatively Characterizing Neural Network Optimization Problems
  - Toward Deeper Understanding of Neural Networks: The Power of Initialization and a Dual View on Expressivity
  - Understanding Deep Learning Requires Re-Thinking Generalization
  - Sharp Minima Can Generalize for Deep Nets
  - On the Expressive Power of Deep Neural Networks
  - Nonlinear Random Matrix Theory for Deep Learning
  - Mean Field Residual Networks: On the Edge of Chaos
  - Identity Matters in Deep Learning
  - Geometry of Neural Network Loss Surfaces via Random Matrix Theory
  - Explaining the Learning Dynamics of Direct Feedback Alignment
  - Deep Information Propagation
  - The Emergence of Spectral Universality in Deep Networks
  - Sensitivity and Generalization in Neural Networks: An Empirical Study
  - Gradient Descent with identity initialization efficiently learns positive definite linear transformations by deep residual networks
  - Deep Neural Networks as Gaussian Processes
  - A Bayesian Perspective on Generalization and Stochastic Gradient Descent
- Regularization
  - Adding Gradient Noise Improves Learning for Very Deep Networks
  - Surprising Properties of Dropout in Deep Networks
  - Regularizing Neural Networks by Penalizing Confident Output Distributions
  - A Unified Approach to Adaptive Regularization in Online and Stochastic Optimization
- Training Highly Multiclass Classifiers
- Random Walk Initialization for Training Very Deep Feedforward Networks
- Learning Factored Representations in a Deep Mixture of Experts
- Training Deep Neural Networks on Noisy Labels with Bootstrapping
- Convolutional, Long Short-Term Memory, Fully Connected Deep Neural Networks
- Reward Augmented Maximum Likelihood for Neural Structured Prediction
- MuProp: Unbiased Backpropagation for Stochastic Neural Networks
- Chained predictions using convolutional neural networks
- Training a Subsampling Mechanism in Expectation
- Resurrecting the Sigmoid in deep learning through dynamical isometry: theory and practice
- Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer
- On Blackbox Backpropagation and Jacobian Sensing
- Deep Value Networks Learn to Evaluate and Iteratively Refine Structured Outputs
- Critical Hyper-Parameters: No Random, No Cry
- Distilling a Neural Network into a Soft Decision Tree
- Categorical Reparameterization with Gumbel-Softmax
- Training Confidence-Calibrated Classifiers For Detecting Out-of-Distribution Samples
- Fidelity-Weighted Learning
- Don’t Decay the Learning Rate, Increase the Batch Size
Applications
- Speech Recognition
  - Deep Neural Networks for Acoustic Modeling in Speech Recognition
  - Application of Pre-trained Deep Neural Networks to Large Vocabulary Speech Recognition
  - On Rectified Linear Units for Speech Processing
  - Multilingual Acoustic Models Using Distributed Deep Neural Networks
  - An Empirical Study of Learning Rates in DNNs for Speech Recognition
  - Word Embeddings for Speech Recognition
  - Autoregressive product of multi-frame predictions can improve the accuracy of hybrid models
  - Learning the Speech Front-end with Raw Waveform CLDNNs
  - Acoustic Modeling for Google Home
  - Multilingual Speech Recognition With a Single End-to-End Model
  - An Analysis of Incorporating an External Language Model into a Sequence-to-Sequence Model
- Image Classification
  - Using Web Co-occurrence Statistics for Improving Image Categorization
  - The Unreasonable Effectiveness of Noisy Data for Fine-Grained Recognition
- Image Captioning
  - Grounded Compositional Semantics for Finding and Describing Images with Sentences
  - Show and Tell: A Neural Image Caption Generator
  - Learning Semantic Relationships for Better Action Retrieval in Images
- Machine Translation
  - Exploiting Similarities among Languages for Machine Translation
  - Addressing the Rare Word Problem in Neural Machine Translation
  - Google’s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation
  - Sequence-to-Sequence Models Can Directly Translate Foreign Speech
  - Massive Exploration of Neural Machine Translation Architectures
- Natural Language Understanding
  - Efficient Estimation of Word Representations in Vector Space
  - Distributed Representations of Words and Their Compositionality
  - Zero-Shot Learning by Convex Combination of Semantic Embeddings
  - Distributed Representations of Sentences and Documents
  - Sentence Compression by Deletion with LSTMs
  - Grammar as a Foreign Language
  - BilBOWA: Fast Bilingual Distributed Representations without Word Alignments
  - Multilingual Language Processing From Bytes
  - Exploring the Limits of Language Modeling
  - Towards better decoding and language model integration in sequence to sequence models
  - Learning to Skim Text
  - Get To The Point: Summarization with Pointer-Generator Networks
  - Generating Wikipedia by Summarizing Long Sequences
  - An Efficient Framework for Learning Sentence Representations
- Multi-Modal
  - DeViSE: A Deep Visual-Semantic Embedding Model
  - Modulating Early Visual Processing by Language
  - Context-aware Captions from Context-agnostic Supervision
  - Better Text Understanding Through Image-To-Text Transfer
- Pedestrian Detection
  - Real Time Pedestrian Detection with Deep Network Cascades
  - Pedestrian Detection with a Large Field-Of-View Deep Network
- Grasp Detection
  - Real-Time Grasp Detection Using Convolutional Neural Networks
- Go
  - Move Evaluation in Go Using Deep Convolutional Neural Networks
  - Mastering the game of Go with deep neural networks and tree search
- Video
  - Beyond Short Snippets: Deep Networks for Video Classification
- Dialogue
  - A Neural Conversational Model
  - Smart Reply: Automated Response Suggestion for Email
  - Adversarial Evaluation of Dialogue Models
  - Generating High-Quality and Informative Conversation Responses with Sequence-to-Sequence Models
- 3D Object Reconstruction
  - Perspective Transformer Nets: Learning Single-View 3D Object Reconstruction without 3D Supervision
- Speaker Verification
  - End-to-End Text-Dependent Speaker Verification
- Health Care
  - Development and Validation of a Deep Learning Algorithm for Detection of Diabetic Retinopathy in Retinal Fundus Photographs
- Theorem Proving
  - DeepMath - Deep Sequence Models for Premise Selection
  - Deep Network Guided Proof Search
- Music
  - Audio Deepdream: Optimizing Raw Audio with Convolutional Networks
  - Generating Music by Fine-Tuning Recurrent Neural Networks with Reinforcement Learning
  - Neural Audio Synthesis of Musical Notes with WaveNet Autoencoders
- Pose Estimation
  - Towards Accurate Multi-person Pose Estimation in the Wild
- Speech Generation
  - Tacotron: Towards End-to-End Speech Synthesis
  - RNN Approaches to Text Normalization: A Challenge
  - On Using Backpropagation for Speech texture Generation and Voice Cnversion
  - Natural TTS Synthesis by Conditioning Wavenet on Mel Spectrogram Prediction [Tacotron 2]
- Super Resolution
  - Pixel Recursive Super Resolution
- Chemistry
  - Neural Message Passing for Quantum Chemistry
- Robotics
  - Autonomous Vehicles
- Physics
  - Accelerating Eulerian Fluid Simulation with Convolutional Networks
- Device Placement
  - Device Placement Optimization with Reinforcement Learning
  - A Hierarchical Model for Device Placement
- Games
  - Can Deep Reinforcement Learning Solve Erdos-Selfridge-Spencer Games?
- Art
  - A Neural Representation of Sketch Drawings
Unsupervised Learning
- Building High-level Features Using Large Scale Unsupervised Learning
- Towards Principled Unsupervised Learning
- Time-Contrastive Networks: Self-Supervised Learning from Video
- Stochastic Variational Video prediction [Also, Model-Based RL]
- Short and Deep: Sketching Neural Networks
- Geometry-Based Next Frame Prediction from Monocular Video
- Decomposing Motion and Content for Natural Video Sequence Prediction
- Cross-View Training for Semi-Supervised Learning
Attention
- On Learning Where to Look
- Pointer Networks
- Attention for Fine-Grained Categorization
- Listen, Attend and Spell
- Collective Entity Resolution with Multi-Focal Attention
- Attend, Infer, Repeat: Fast Scene Understanding with Generative Models
- Online and Linear-Time Attention by Enforcing Monotonic Alignments
- Learning Hard Alignments with Variational Inference [Hard Attention]
- Efficient Attention using a Fixed-Size Memory Representation
- Attention is All You Need
- An Analysis of “Attention” in Sequence-to-Sequence Models
- Monotonic Chunkwise Attention
Memory
- Learning to Remember Rare Events
Transfer Learning
- Net2Net: Accelerating Learning via Knowledge Transfer
- Domain Separation Networks
- Unsupervised Pixel-Level Domain Adaptation with Generative Adversarial Networks
- PathNet: Evolution Channels Gradient Descent in Super Neural networks
- One Model to Learn Them All
- Exploring the structure of a real-time, arbitrary neural artistic stylization network
- A Brief Study of In-Domain Transfer and Learning from Fewer Samples using a Few Simple Priors
Representation Learning
- SVCCA: Singular Vector Canonical Correlation Analysis for Deep Learning Dynamics and Interpretability
- A Learned Representation for Artistic Style
- Learning Latent Permutations with Gumbel-Sinkhorn Networks
Reinforcemnet Learning
- Model-Based Reinforcement Learning
  - Unsupervised Learning for Physical Interaction through Video Prediction [Also, Robotics]
  - Continuous Deep Q-Learning with Model-based Acceleration
  - Value Prediction Network
  - Learning to Generate Long-term Future via Hierarchical Prediction
  - Discrete Sequential Prediction of Continuous Actions for Deep RL
  - Deep Visual Foresight for Planning Robot Motion [Also, Robotics]
  - Temporal Difference Models: Model-Free Deep RL for Model-Based Control
  - Learning Unsupervised Latent Dynamics Models for Multi-task Continuous Control from Pixels
- Multi-Task Learning
  - Zero-Shot Task Generalization with Multi-Task Deep Reinforcement Learning
- Unsupervised Perceptual Rewards for Imitation Learning
- Trust-PCL: An Off-Policy Trust Region Method for Continuous Control
- Robust Adversarial Reinforcement Learning [Also, Multi-Agent Systems]
- REBAR: Low-Variance, unbiased gradient estimates for discrete latent variable models
- Q-Prop: Sample Efficient Policy Gradient with an Off-Policy Critic
- Particle Value Functions
- Path Integral Guided Policy Search [Also, Robotics]
- Interpolated Policy Gradient: Merging On-Policy and Off-Policy Gradient Estimation for Deep Reinforcement Learning
- Improving Policy Gradient by Exploring Under-Appreciated Rewards
- Deep Reinforcement Learning for Robotic Manipulation with Asynchronous Off-Policy Updates
- Collective Robot Reinforcement Learning with Distributed Asynchronous Guided Policy Search
- Changing Model Behavior at Test Time Using Reinforcement Learning
- Bridging the Gap Between Value and Policy Based Reinforcement Learning
- A comparative study of counterfactual estimators
- PRM-RL: Long Range Robotic Navigation Tasks by Combining Reinforcement Learning and Sampling-based Planning
- Path consistency Learning in Tsallis Entropy Regularized MDPs
- Leave No Trace: Learning to Reset for Safe and Autonomous Reinforcement Learning
- Deep Bayesian Bandits Showdown
Metalearning
- Neural Programming
- Hyperparameter Optimization
- Learned Optimizers that Scale and Generalize
- HyperNetworks
- Supervised Learning of Unsupervised Learning Rules
- MorphNet: Fast & Simple Resource-Constrained Structure Learning of Deep Networks
- A Meta-Learning Perspective on Cold-Start Recommendations for Items
- Meta-Learning for Semi-Supervised Few-Shot Classification
- Generalizing Hamiltonian Monte Carlo with Neural Networks
Generative
- GANs
- Experiments in Handwriting with a Neural Network
- From optimal transport to generative modeling: the VEGAN cookbook
- Density Estimation Using Real NVP
- A Neural Representation of Sketch Drawings
- Wasserstein Auto-Encoders
- Stochastic Variational Video Prediction
- Latent Constraints: Learning to Generate Conditionally From Unconditional Generative Models
Intrepretability
- Deconvolution and Checkerboard Artifacts
- Visualizing Dataflow Graphs of Deep Learning Models in Tensorflow
- Towards A Rigorous Science of Interpretable Machine Learning
- The (Un)reliability of Saliency Methods
- Input Switched Affine Networks: An RNN Architecture Designed for Interpretability [Also, Recurrent Neural Networks]
- VisualBackProp: Efficient Visualization of CNNs
- Learning How to Explain Neural Networks: PatternNet and PatternAttribution
- Beyond Word Importance: Contextual Decomposition to Extract Interactions from LSTMs [Also, Recurrent Neural Networks]
Tools, Environments & Datasets
- One Billion Word Benchmark for Measuring Progress in Statistical Language Modeling
Adversarial Examples
- Intriguing Properties of Neural Networks
- Explaining and Harnessing Adversarial Examples
- Virtual Adversarial Training for Semi-Supervised Text Classification
- The Space of Transferable Adversarial Examples
- Adversarial Examples in the Physical World
- Adversarial Training Methods for Semi-Supervised Text Classification
- Adversarial Machine Learning at Scale
- Thermometer Encoding: One Hot Way to Resist Adversarial Examples
- Intriguing Properties of Adversarial Examples
- Ensemble Adversarial Training: Attacks and Defences
- Adversarial Spheres
Multi-Agent Systems
- Learning to Protect Communications with Adversarial Neural Cryptography
- Adversarial Autoencoders
- XGAN: Unsupervised Image-To-Image Translation for Many-To-Many Mappings
- Supervision via Competition: Robot Adversaries for Learning Tasks
Variational Inference
- Variational Boosting: Iteratively Refining Posterior Approximations
- Reducing Reparameterization Gradient Variance
- Filtering Variational Objectives
Kernel Machines
- Fastfood - Approximating Kernel Expansions in Loglinear Time
- Random Features for Compositional Kernels
- The Geometry of Random Features
Collaborative Filtering
- Local Collaborative Ranking
Graphical / Relational Learning
- Large-Scale Object Classification Using Label Relation Graphs
- Graph Searching Games and Width Measures for Directed Graphs
- Graph Partition Neural Networks for Semi-Supervised Classification
Miscellaneous
- Tensorflow: Learning Functions at Scale
- Deep Learning Games
- Tangent: Automatic Differentiation Using Source Code Transformation in Python
- ExtDict: Extensible Dictionaries for Data and Platform-Aware Large Scale Learning
- Dynamic Routing between Capsules
- Climbing a Shaky Ladder: Better ADaptive Risk Estimation
- Avoiding Discrimination through Causal Reasoning
- Who Said What: Modeling Individual Labelers Improves Classification
- Matrix Capsules with EM Routing
- Graph sketching-based Space-efficient Data Clustering

Major Researchers [10+ Papers / Founding]

Jeff Dean
Samy Bengio
Geoffrey Hinton
~~Andrew Ng~~
Quoc Le
Greg Corrado
Vincent Vanhoucke
Yoran Singer
Ian Goodfellow
~~Tomas Mikolov~~
~~Ilya Sutskever~~
~~Oriol Vinyals~~
~~Marc’ Aurelio Ranzato~~
Christian Szegedy
Navdeep Jaitly
Mohammad Norouzi
Lukasz Kaiser
Jonathon Shlens

Minor Researchers

Rajat Monga
Kai Chen
Matthieu Devin
Mark Mao
Andrew Senior
Paul Tucker
Ke Yang
Patrick Nguyen
Dumitru Erhan
Eugene Ie
Andrew Rabinovich
Jon Shlens
Yoram Singer
Ciprian Chelba
Mike Schuster
Qi Ge
Thorsten Brants
Tamas Sarlos
Georg Heigold
Andrea Frome
Maya Gupta
David Sussillo
Dragonir Anguelov
Alexander Toshev
Andrew Dai
Anelia Angelova
Alex Krizhevsky
Lucasz Kaiser
Terry Koo
Slav Petrov
Tara Sainath
Hasim Sak
Pierre Sermanet
Esteban Real
Peter Liu
Sergey Levine
Amit Daniely
Roy Frostig
Martin Abadi
Zhifeng Chen
Yonghui Wu
Dale Schuurmans
Jianmin Chen
Rafal Jozefowicz
Sergey Ioffe
Honglak Lee
Manjunath Kudlur
Karol Kurach
Minh-Thang Luong
John Nahm
Alexander Alemi
Jascha Sohl-Dckstein
Noam Shazeer
David Ha
Shan Carter
Chris Olah
Ignacio Moreno
Douglas Eck
Natasha Jaques
Shixiang Gu
Konstantinos Bousmalis
Francois Chollet
Geoffrey Irving
Amarnag Subramanya
Michael Ringgaard
Fernando Pereira
Adam Roberts
Cinjon Resnick
Anjuli Kannan
Ryan Adams
David Dohan
Luke Metz
Kelvin Xu
Jan Chorowski
Colin Raffel
Dieterich Lawson
George Papandreou
Kevin Murphy
Jonathan Tompson
Olivier Bousquet
Sylvain Gelly
Olivier Teytaud
Damien Vincent
Eric Jang
Jasmine Hsu
Been Kim
Bart van Merrienboer
Alexander Wiltschko
Dan Moldovan
Yuxuan Wang
RJ Skerry-Ryan
James Davidson
Ron Weiss
Jan Chorowski
Yonghui Wu
Zhifeng Chen
Kunal Talwar
Barret Zoph
Maithra Raghu
Justin Gilmer
Jeffrey Pennington
Samuel Schoenholz
Gabriel Pereyra
George Tucker
Vineet Gupta
Ryan Dahl
Azalia Mirhoseini
Andy Davis
Ashish Vaswani
Krzysztof Maziarz
Vikas Sindhwani
Irwan Bello
Hugo Larochelle
Vijay Vasudevan
Hieu Pham
Jesse Engel
Denny Britz
Anna Goldie
Connor Schenck
Ruben Villegas
Yuliang Zou
Sungryull Sohn
Danijar Hafner
Alex Irpan
James Davidson
Chung-Cheng Chiu
Kevin Swersky
Olga Wichrowska
Jakob Forester
Andrew Lampinen
David So
Fred Bertsch
Reza Mahjourian
Yasaman Bahri
Ofir Nachum
Melody Guan
Julian Ibarz
Benoit Steiner
Rasmus Larsen
Ethan Holly
Gal Chechik
Augustus Odena
Christopher Olah
Jasmine Collins
Michal Jastrzebski
Philip Haeusser
Mario Lucic
Richard Sproat
Alexey Kurakin
Takeru Miyato
Kristofer Schlachter
Tomer Koren
Ayush Sekhari
Matthew Kelcey
Laura Downs

Genealogy Founding Teams

Jeff Dean
Samy Bengio
Geoffrey Hinton
~~Andrew Ng~~
Quoc Le
Greg Corrado
Vincent Vanhoucke
Yoran Singer
Ian Goodfellow
~~Tomas Mikolov~~
~~Rajat Monga~~
Kai Chen (Brain NY)
Matthieu Devin
Mark Mao
~~Marc’ Aurelio Ranzato (Brain NY)~~
Andrew Senior
Paul Tucker
Ke Yang
Patrick Nguyen
Yoram Singer
Dzmitry Bahdanau

In the early days, the exploration was mainly in scaling deep learning and discovering new applications to speech recognition, image categorization and language modeling.

Tensorflowers: Mart´ın Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, Greg S. Corrado, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Ian Goodfellow, Andrew Harp, Geoffrey Irving, Michael Isard, Yangqing Jia, Rafal Jozefowicz, Lukasz Kaiser, Manjunath Kudlur, Josh Levenberg, Dan Mane, Rajat Monga, Sherry Moore, Derek Murray, ´ Chris Olah, Mike Schuster, Jonathon Shlens, Benoit Steiner, Ilya Sutskever, Kunal Talwar, Paul Tucker, Vincent Vanhoucke, Vijay Vasudevan, Fernanda Viegas, Oriol Vinyals, Pete Warden, Martin Wattenberg, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng

Massive acceleration of Brain papers into ICLR 2016… Tensorflowers start making their way onto papers.

Noisy Counts (Scraped)

[29, ‘Oriol Vinyals’], [27, ‘Samy Bengio’], [23, ‘Ilya Sutskever’], [20, ‘Navdeep Jaitly’], [16, ‘Sergey Levine’], [14, ‘Mohammad Norouzi’],1 [14, ‘Ian Goodfellow’], [13, ‘Lukasz Kaiser’], [13, ‘Jonathon Shlens’], [12, ‘Vincent Vanhoucke’], [10, ‘Quoc Le’], [10, ‘Geoffrey Hinton’], [9, ‘Dumitru Erhan’], [8, ‘Shixiang Gu’], [8, ‘Rajat Monga’], [8, ‘Honglak Lee’], [8, ‘Greg Corrado’], [8, ‘Christian Szegedy’], [8, ‘Andrew Senior’], [7, ‘Yoram Singer’], [7, ‘Tomas Mikolov’], [7, ‘Sylvain Gelly’], [7, ‘Olivier Bousquet’], [7, ‘Karol Kurach’], [7, ‘Georg Heigold’], [7, ‘Anelia Angelova’], [6, ‘Zhifeng Chen’], [6, ‘Rafal Jozefowicz’], [6, ‘Ofir Nachum’], [6, ‘Matthieu Devin’], [6, ‘Martin Abadi’], [6, ‘James Davidson’], [6, ‘Dieterich Lawson’], [6, ‘Dale Schuurmans’], [5, ‘Yonghui Wu’], [5, ‘Yonghui Wu’], [5, ‘Tara Sainath’], [5, ‘Mike Schuster’], [5, ‘Manjunath Kudlur’], [5, ‘Kevin Murphy’], [5, ‘Justin Gilmer’], [5, ‘George Tucker’], [5, ‘Douglas Eck’], [4, ‘Pierre Sermanet’], [4, ‘Noam Shazeer’], [4, ‘Maithra Raghu’], [4, ‘Kunal Talwar’], [4, ‘Kelvin Xu’], [4, ‘Kai Chen’], [4, ‘Jeff Dean’], [4, ‘Jan Chorowski’], [4, ‘Geoffrey Irving’], [4, ‘David Sussillo’], [4, ‘David Ha’], [4, ‘Colin Raffel’], [4, ‘Chris Olah’], [4, ‘Andrea Frome’], [4, ‘Amit Daniely’], [4, ‘Alexander Toshev’], [3, ‘Vikas Sindhwani’], [3, ‘Vijay Vasudevan’], [3, ‘Tomer Koren’], [3, ‘Paul Tucker’], [3, ‘Patrick Nguyen’], [3, ‘Olivier Teytaud’], [3, ‘Natasha Jaques’], [3, ‘Konstantinos Bousmalis’], [3, ‘Julian Ibarz’], [3, ‘Jonathan Tompson’], [3, ‘Jeffrey Pennington’], [3, ‘Hasim Sak’], [3, ‘Denny Britz’], [3, ‘Damien Vincent’], [3, ‘Benoit Steiner’], [3, ‘Barret Zoph’], [3, ‘Azalia Mirhoseini’], [3, ‘Augustus Odena’], [3, ‘Andy Davis’], [3, ‘Andrew Rabinovich’], [3, ‘Alex Krizhevsky’], [2, ‘Vineet Gupta’], [2, ‘Sergey Ioffe’], [2, ‘Ryan Dahl’], [2, ‘Ruben Villegas’], [2, ‘Roy Frostig’], [2, ‘Peter Liu’], [2, ‘Melody Guan’], [2, ‘Luke Metz’], [2, ‘Ke Yang’], [2, ‘Jianmin Chen’], [2, ‘Irwan Bello’], [2, ‘Hugo Larochelle’], [2, ‘Hieu Pham’], [2, ‘Fred Bertsch’], [2, ‘Francois Chollet’], [2, ‘Esteban Real’], [2, ‘Eric Jang’], [2, ‘Cinjon Resnick’], [2, ‘Been Kim’], [2, ‘Ashish Vaswani’], [2, ‘Anna Goldie’], [2, ‘Anjuli Kannan’], [2, ‘Andrew Dai’], [2, ‘Amarnag Subramanya’], [2, ‘Alexey Kurakin’], [2, ‘Adam Roberts’], [1, ‘Yuxuan Wang’], [1, ‘Yuliang Zou’], [1, ‘Yasaman Bahri’], [1, ‘Thorsten Brants’], [1, ‘Terry Koo’], [1, ‘Tamas Sarlos’], [1, ‘Takeru Miyato’], [1, ‘Sungryull Sohn’], [1, ‘Slav Petrov’], [1, ‘Shan Carter’], [1, ‘Ryan Adams’], [1, ‘Richard Sproat’], [1, ‘Reza Mahjourian’], [1, ‘Rasmus Larsen’], [1, ‘RJ Skerry-Ryan’], [1, ‘Qi Ge’], [1, ‘Philip Haeusser’], [1, ‘Olga Wichrowska’], [1, ‘Michal Jastrzebski’], [1, ‘Mark Mao’], [1, ‘Krzysztof Maziarz’], [1, ‘Kristofer Schlachter’], [1, ‘Kevin Swersky’], [1, ‘Jesse Engel’], [1, ‘Jasmine Hsu’], [1, ‘Jasmine Collins’], [1, ‘Ignacio Moreno’], [1, ‘George Papandreou’], [1, ‘Gal Chechik’], [1, ‘Gabriel Pereyra’], [1, ‘Fernando Pereira’], [1, ‘Eugene Ie’], [1, ‘Ethan Holly’], [1, ‘David Dohan’], [1, ‘Danijar Hafner’], [1, ‘Dan Moldovan’], [1, ‘Connor Schenck’], [1, ‘Ciprian Chelba’], [1, ‘Chung-Cheng Chiu’], [1, ‘Christopher Olah’], [1, ‘Ayush Sekhari’], [1, ‘Andrew Ng’], [1, ‘Andrew Lampinen’], [1, ‘Alex Irpan’],