Jonathan Frankle - Chief Scientist at MosaicML

Faster Neural Network Training, Algorithmically
In my current capacity as Chief Scientist at MosaicML, I am developing ways to change the neural network training algorithm to improve the efficiency of training. The cost of training state-of-the-art neural networks is increasing exponentially, and hardware and compiler improvements alone are insufficient to counterbalance this trend. Instead, I believe we need to fundamentally change the underlying training algorithms. Training is an approximate computing problem; there is nothing sacred about the math or training recipes we use today. Instead, this line of work leverages empirical analysis of the training dynamics of real-world networks to change the math behind training in ways that improve efficiency without affecting quality. At MosaicML, we have developed dozens of speedup methods that improve the efficieny of training standard models for computer vision and natural language processing. All of these methods are available open-source in our Composer PyTorch trainer. We have written a description of each speedup method in the Composer documentation. You can interactively explore the results of applying these speedup methods to training standard benchmarks in the MosaicML Explorer. Our best recipes speedup ResNet-50 on ImageNet by 7x, DeepLabv3 on ADE20K by 5x, BERT Pre-Training by 2x, and GPT Language Modeling by 2x while maintaining the same quality as the baselines. For an overview of our research approach and how we evaluate speedups at MosaicML, you can see my blog post on the subject.
The Lottery Ticket Hypothesis
My main line of research during my PhD was on my lottery ticket hypothesis. This line of research focuses on understanding how large neural networks need to be to train in practice. We have long known that we can make neural networks much smaller after they have been trained. In this line of work, I showed that they can be equally small for much or all of training. This research has revealed new insights into how neural networks learn and offered opportunities for practical efficiency improvements.
2021	On the Predictability of Pruning Across Scales	ICML
	Pruning Neural Networks at Initialization: Why are we missing the mark?	ICLR
	Studying the Consistency and Composability of Lottery Tickets	ICLR Workshop
	Reconciling Sparse and Structured Pruning: A Study of Block Sparsity	ICLR Workshop
	Examining the Role of Normalization in the Lottery Ticket Hypothesis	ICLR Workshop
2020	The Lottery Ticket Hypothesis for Pre-Trained BERT Networks	NeurIPS
	Linear Mode Connectivity and the Lottery Ticket Hypothesis	ICML
	Comparing Fine-Tuning and Rewinding in Neural Network Pruning	ICLR Oral
	The Early Phase of Neural Network Training	ICLR
	What is the State of Neural Network Pruning?	MLSys
2019	Stabilizing the Lottery Ticket Hypothesis/The LTH at Scale	Arxiv
	The Lottery Ticket Hypothesis	ICLR Best Paper
My open-source library for conducting research on the lottery ticket hypothesis is called OpenLTH. This is my current working codebase for this line of research. It is written for PyTorch, and it includes the components necessary to reproduce the main experiments from my work on the lottery ticket hypothesis. For an updated version of the codebase that supports experiments on pruning neural networks at initialization and early in training, see the supplemental materials accompanying Pruning Neural Networks at Initialization: Why are we missing the mark? on OpenReview.
Science of Deep Learning
More broadly, I am interested in understanding the behavior of practical neural networks empirically. For all the extraordinary advances neural networks have enabled in recent years, our understanding of how and what they learn remains limited. I study these questions from a scientific perspective, posing hypotheses and performing large-scale experiments to empircally evaluate them. I believe we can improve our knowledge of neural networks by scientifically examining how they behave in practice.
2022	What Can Linear Interpolation of Neural Network Loss Landscapes Tell Us?	ICML
2021	Training BatchNorm and Only BatchNorm	ICLR
2020	Revisiting "Qualitatively Characterizing Neural Network Loss Lanscapes"	NeurIPS Workshop
	Trade-offs of Local SGD at Scale	NeurIPS Workshop
	Are All Negatives Created Equal in Contrastive Instance Discrimination?	Arxiv
2019	Dissecting Pruned Neural Networks	ICLR Workshop
Technology Policy
During my year as Staff Technologist at the Center on Privacy and Technology at Georgetown Law, I studied police use of face recognition technology. I also collaborated with Prof. Paul Ohm on both scholarship and teaching. In the years since, I have served as an invited expert at the OECD, contributing to the OECD AI Principles and follow-up work.
In Progress	Computer Programming for Lawyers	Course & Textbook
2018	Desirable Inefficiency	Florida Law Review
2016	The Perpetual Lineup: Unregulated Police Face Recognition in America	Investigative Report
	How Russia's New Facial Recognition App Could End Online Anonymity	The Atlantic
	Facial-Recognition Software Might Have a Racial Bias Problem	The Atlantic
Earlier Research
During the early part of my PhD, I studied cryptography and technology policy. My research on using zero-knowledge arguments to faciliate novel tradeoffs between secrecy and accountability in the court system (Practical Accountability of Secret Processes) was published in Usenix Security 2018. During my master's degree, I studied programming language theory with Prof. David Walker. My thesis work was on type-directed program synthesis, and I published an extended version of the thesis (Example-Directed Synthesis: A Type-Theoretic Interpretation) in POPL 2016.