Michael Gimelfarb - Personal Website

Summary

I am a researcher in reinforcement learning, a branch of artificial intelligence that studies how an algorithm (agent) should interact with a dynamic environment (task) to achieve a specific goal. My current research tackles the following questions:

Transfer Learning: How (and what) information learned by one agent on one task could help other agents trying to solve other related tasks? (i.e.)
Automated Planning: How can the optimal actions be found efficiently when the observations and interactions are high-dimensional and complex? (i.e.)
Offline RL: How can agents be evaluated/optimized better without costly interactions, by using data of prior interactions from another source? (i.e.)

Biography

I am currently working as a postdoc AI researcher in the RVL Lab led by Florian Shkurti at the University of Toronto, working on offline RL. In 2023, I was a researcher in the D3M Lab led by Scott Sanner, where I worked on automated planning and the development of the Python planning toolkit pyRDDLGym (which has now been included in scikit-decide!). In 2022, I interned at Google DeepMind, UK. I completed my PhD in Industrial Engineering at the University of Toronto advised by Scott Sanner and Chi-Guhn Lee in 2022, and my master’s degree in the same department under supervision of Michael Jong Kim in 2017. I received my Bachelor’s degree in Business Administration (BBA) from the Schulich School of Business.

I enjoy photography, outdoor sports, and play classical piano.

Alternate site here.

news

Jun 13, 2025	Our paper on trajectory stitching using diffusion models was accepted to RobotEvaluation@RSS 2025.
May 20, 2025	Our paper on bounded error policy optimization was accepted to CPAIOR2025 and symbolic DP for policy transfer was published in AAAI2025.
Mar 25, 2024	The results of the 2023 International Planning Competition were published in AI Magazine.
Feb 19, 2024	Version 2.0 of pyRDDLGym was published.
Feb 12, 2024	Our paper on JaxPlan and GurobiPlan was published in ICAPS.