Snabbfakta

    • Évry-Courcouronnes

Ansök senast: 2024-08-27

PhD Position F/ M Fine grain energy consumption measurement of HPC task-based programs

Publicerad 2024-06-28

Contexte et atouts du poste

This thesis is placed in the context of NumPEx ( a key national project whose goal is to  co-design the software stack for the exascale era  and prepare applications accordingly.

This thesis will be  co-supervised by Inria Benagil (located in Evry) and Inria STORM (located in Bordeaux).  Beyond the supervision, collaborations within NumPEx with the different partners of the consortium are to be expected.

Mission confiée

The power consumption of supercomputers is and will be a major concern. As a matter of fact, Frontier, the fastest supercomputer in the world consumes around 20 MW. As a consequence, reducing the power consumption of HPC applications is mandatory.

The first step towards reducing the power consumption of programs is being able to monitor their energy consumption. Servers usually contain wattmeters able to measure the power consumption of the CPU, the memory, the GPU, etc. However, these wattmeters only provide coarse grain energy measurement, with a typical measurement period of dozens of milliseconds. During this period of time, the application may execute hundreds of tasks. As a result, analyzing the power consumption of an application at the microsecond scale is tedious.

As part of the PEPR NumPex, we investigate ways to reduce the energy consumption of parallel applications running on supercomputers.

Principales activités

The goal of this PhD is to investigate fine-grain energy measurement in StarPU. StarPU is a task-based runtime system that executes microsecond-scale tasks on CPUs and GPUs. Since StarPU executes many instances of a few types of tasks, it should be possible to build an energy consumption model of each type of task. The energy consumption model can then be provided to StarPU so that the task scheduling takes into account both the performance of tasks, and their energy consumption.

The proposed approach would be to measure the energy consumption of a server (its CPU, GPU, etc.) at coarse-grain (typically, one sample every 20 ms), and to log which tasks were executed during this period of time. By repeating this many times, it should be possible to solve a linear system that models the energy consumption of microsecond-scale tasks.

Avantages

  • Subsidized meals
  • Partial reimbursement of public transport costs
  • Leave: 7 weeks of annual leave + 10 extra days off due to RTT (statutory reduction in working hours) + possibility of exceptional leave (sick children, moving home, etc.)
  • Possibility of teleworking and flexible organization of working hours
  • Professional equipment available (videoconferencing, loan of computer equipment, etc.)
  • Social, cultural and sports events and activities
  • Access to vocational training
  • Social security coverage
  • Rémunération

    1st and 2nd year : 2100€ gross/month