Mike Gimelfarb
  1. Email
  2. Github
  3. Google Scholar
  4. ResearchGate


I'm currently a Post-Doc in the D3M Lab led by Prof. Scott Sanner at the University of Toronto. My work focuses on leveraging structure for planning and reinforcement learning (RL) (using tools such as probabilistic programming, i.e. RDDL). I am also actively looking for continuing post-doc and industry research positions in the area of reinforcement learning/dynamic decision making.

Prior to this, I completed my PhD under the supervision of Scott Sanner and Prof. Chi-Guhn Lee in the same department in December 2022. My thesis focused on rendering modern RL more efficient through the transfer of prior knowledge (e.g. skills, demonstrations) from multiple sources. I developed novel applications of Bayes' inference to robustly distinguish good knowledge sources from bad, and tackled the problem of transfer learning for risk-sensitive agents. My research has been published in top AI/ML conferences such as NeurIPS, UAI, AAAI, and ICLR. I also completed an internship at DeepMind in 2022, and I was previously a post-graduate affiliate of the Vector Institute from 2020 to 2022.

Prior to this, I completed my master's degree (MASc) at U of T under supervision of Prof. Michael J. Kim. My thesis focused on the theoretical developments of Thompson sampling applied in queueing and admission control problems with demand uncertainty. I received my Bachelor's degree in Business Administration (BBA) from the Schulich School of Business in 2014, graduating with distinction.

I enjoy reading books on cognitive science and play classical piano in my spare time.

Selected Publications

pyRDDLGym: From RDDL to Gym Environments
Ayal Taitler, Michael Gimelfarb, Jihwan Jeong, Sriram Gopalakrishnan, Martin Mladenov, Xiaotian Liu, Scott Sanner
arxiv, 2023

Conservative Bayesian Model-Based Value Expansion for Offline Policy Optimization
Jihwan Jeong, Xiaoyu Wang, Michael Gimelfarb, Hyunwoo Kim, Baher Abdulhai, Scott Sanner
ICLR (Forthcoming), 2023

A Distributional Framework for Risk-Sensitive End-to-End Planning in Continuous MDPs
Noah Patton, Jihwan Jeong, Michael Gimelfarb, Scott Sanner
AAAI, 2022

Risk-Aware Transfer in Reinforcement Learning using Successor Features
Michael Gimelfarb, Andre Barreto, Scott Sanner, Chi-Guhn Lee
NeurIPS, 2021

End-to-End Risk-Aware Planning by Gradient Descent
Noah Patton, Jihwan Jeong, Michael Gimelfarb, Scott Sanner
ICAPS Workshop on Bridging the Gap Between AI Planning and Reinforcement Learning, 2021

Bayesian Experience Reuse for Learning from Multiple Demonstrators
Michael Gimelfarb, Scott Sanner, Chi-Guhn Lee
IJCAI, 2021

Contextual Policy Transfer in Reinforcement Learning Domains via Deep Mixtures-of-Experts
Michael Gimelfarb, Scott Sanner, Chi-Guhn Lee
UAI, 2021

ε-BMC: A Bayesian Ensemble Approach to Epsilon-Greedy Exploration in Model-Free Reinforcement Learning
Michael Gimelfarb, Scott Sanner, Chi-Guhn Lee
UAI, 2019

Reinforcement Learning with Multiple Experts: A Bayesian Model Combination Approach
Michael Gimelfarb, Scott Sanner, Chi-Guhn Lee
NeurIPS, 2018

Thompson Sampling for the Control of a Queue with Demand Uncertainty
Michael Gimelfarb, Michael J. Kim
Master's Thesis, 2017

Selected Projects

A reusable implementation for successor features for transfer in deep reinforcement learning in tensorflow and Keras.
Michael Gimelfarb

A general framework for building and training constructive neural networks in tensorflow and keras.
Michael Gimelfarb

Plain Academic