Snabbfakta
-
- London
Ansök senast: 2025-02-22
ML Research Engineer
About the role: The job opening is part of a research project funded by the ARIA program: “Scaling Compute” by bringing the cost of AI hardware down by >1000x. The project is about Equilibrium Propagation (EP), an alternative training framework to backpropagation (BP) that is compatible with analog computing hardware (i.e. fast and energy-efficient hardware). Specifically, the project is aimed at demonstrating through simulations that EP can be a viable alternative to BP to solve modern ML tasks on analog computing platforms. In this position, you will help develop a software framework for EP in PyTorch. This framework, which will support both hardware and software simulations, will enable scaling of EP to large networks and datasets, enabling the core experiments of the research project. Responsibilities: Developing a software framework for the simulations of EP (in PyTorch), building upon the one available at this link Developing unit tests and establishing a working pipeline for us to safely contribute to the framework as we scale it Making the framework parallelizable on multiple GPUs (parallelization across mini-batches of data, parallelization over the computation of different equilibrium states of EP, etc.) Developing tools to store experimental results in an organized way, analyze and visualize the data/results, and schedule experiments in advance (to make optimal use of our GPUs) Conducting ML research related to the software framework, including benchmarking EP against equivalent-size networks trained with backpropagation Integrating new models and use cases in the framework (e.g. meta-learning and energy transformers ), as well as new algorithms from the literature on “ bilevel optimization ” Possibility to collaborate (both internally and externally), write research articles and present them in conferences Qualifications: MS or PhD in Computer Science, Machine Learning, or similar field or equivalent education and experience Experience building and distributing software libraries (including developing code with unit tests and collaborating on Github) Experience with deep learning frameworks such as PyTorch, Jax or Tensorflow Experience with implementing and training large models (e.g. ResNets, diffusion models, and transformers) on GPU clusters Experience in distributed computing Preferred Qualifications: Understanding of deep learning models such as ResNets, diffusion models, and transformers Familiarity with Bilevel Optimization Familiarity with Equilibrium Propagation (EP) Familiarity with Modern Hopfield Networks Familiarity with Meta-Learning Familiarity with hardware, data and environmental constraints associated with analog computing systems A top-tier publication record in Machine Learning conferences and journals