Snabbfakta

    • Dundee

Ansök senast: 2025-03-04

PhD Student Vacancy: EastBio - Identification of target genes for control of economically important plant pathogens using large foundation models (LFMs)

Publicerad 2025-01-03

Objectives: (1) Develop computational models that capture the complex gene regulations of fungal species; (2) Use public condition-dependent gene essentially data to finetune models to identify genes crucial for pathogen control; (3) Validate the model predictions using unseen Syngenta gene essentially data.

Background: Large Foundation models (LFMs) are deep learning models trained on broad collections of text, images or other data and act as the basis behind many current artificial intelligence applications, such as ChatGPT. Recently, LFMs have also shown promising results in biology, drawing parallels between language (word sequences) and cells (gene and protein sequences). LFMs can also capture and make predictions on gene regulatory relationships in a context (e.g. cell type, tissue, development stage, etc). In addition, transfer learning allows LFMs to be repurposed for different tasks through finetuning with only minimum computational effort and training data required. Using LFMs to decipher gene regulations in human has shown great potential in perturbation studies, including gene deletions1-4. This project will leverage this powerful cutting edge technology by generating LFMs for fungal gene regulatory networks using public -omics data and through transfer learning, predicting essential genes. 

Project plan: Using S. cerevisiae, S. pombe and N. crassa. as testing cases, aggregate high-quality -omic data (bulk and single cell) from public databases. The diversity of the dataset (different cell types, , stresses, life cycle stages) will allow the capture of context specific gene regulations. Transformer based deep learning models (as employed in ChatGPT etc) will be utilised to both distil essential signals from each “-ome” and integrate the omics data of different modalities (e.g. transcriptomics and proteomics data); (1) Develop and train LFMs using masked self-supervised learning, where the expressions of a percentage of random selected genes (e.g., 10%) are masked and predicted using the expression of the rest of the genes. The cutting edge AI computing infrastructures at the James Hutton Institute (53 NVIDIA GPUs with 3.6T of memory, 500K cuda cores and 1.9 petaflops of processing power in total) will be leveraged to construct LFMs that integrates different -omics data types; (2) Apply transfer learning to finetune the LFMs to predict essential genes across a variety of fungal species; (3) Validate the results using unseen in-house essentiality data. 

Outcome & impact: The project is expected to (1) create scalable LFMs capable to capture the gene regulatory relationships of fungal species a context specific manner; (2) use transfer learning to predict essential genes for target identification; (3) equip Syngenta with LFM-based methodologies to address a wide range of biological research questions extending beyond gene essentiality.

 This project is a CASE project; (Collaborative with Industry) with Syngenta Limited: https://www.syngenta.co.uk/jealotts-hill. Supervisors at Syngenta are: Helena Saunders, and Oscar Charles, Oscar.charles@syngenta.

The EastBio partnership offers fully-funded competition based studentships. Funding covers Home (UK fees), a stipend at UKRI norm level (£19,327 for 2024/2025) and project costs. Application guidance can be found on the Eastbio website; How to Apply ¦ Biology. Information on UKRI-BBSRC can be found on the UKRI website UKRI – UK Research and Innovation

Liknande jobb

Publicerad: 2025-01-07
  • Manchester
Publicerad: 2025-01-07
  • Romford
Publicerad: 2025-01-09
  • Leicester