NHGRI logo
Fact Sheet

Artificial Intelligence, Machine Learning and Genomics

With increasing complexity in genomic data, researchers are turning to artificial intelligence and machine learning as ways to identify meaningful patterns for healthcare and research purposes.

The Big Picture

  • The genomics field continues to expand the use of computational methods such as artificial intelligence and machine learning to improve our understanding of hidden patterns in large and complex genomics data sets from basic and clinical research projects.
     
  • Machine learning analyses could benefit disease research and genomic tools like CRISPR.
     
  • NHGRI is identifying and shaping its unique role in the convergence of genomic and machine learning research.

What is artificial intelligence?

There are many definitions for artificial intelligence (AI). One such definition for AI is “AI is a science and a set of computational technologies that are inspired by — but typically operate quite differently from — the ways people use their nervous systems and bodies to sense, learn, reason, and take action.” AI can be created as software or tools that are able to mimic human intelligence in certain contexts or even exceed it in others.

To be able to build AI, scientists need large, well-explained datasets that first help them understand the techniques and processes used by humans to analyze and interpret complex scenarios. The field of AI is a dynamic one, and researchers are consistently developing new techniques and tools.

Machine Learning

What are machine learning and deep learning?

 

Machine learning (ML) and deep learning are fields of study frequently mentioned in the context of AI. Both kinds of learning are subfields of AI. Machine learning is a process by which machines can be given the capability to learn about a given dataset without being explicitly programmed on what to learn.

 

Machines can usually learn in either a supervised or unsupervised manner. Under supervised learning, scientists provide machines with separate training and test data sets. The training data has defined categories (e.g., people with coronary heart disease and those without) that the machine can use to infer hidden qualities of the data and distinguish the categories from each other. It is then able to use this knowledge to work on the test data and make informed predictions (e.g., which people in a population are likely to develop coronary heart disease).

 

In an unsupervised learning setting, machines can recognize patterns in large datasets and make predictions about the real world without requiring any additional help from humans.

 

When machines can learn in an unsupervised manner, they are considered to be learning “deeply.” Deep learning is a relatively modern technique used to implement machine learning. A deep learning algorithm takes a dataset and finds patterns and critical information by imitating how a human brain’s neurons interact with each other. The algorithms are artificial neural networks — a computing system that simulates the brain’s ability to weigh the importance of some data versus others, and handle bias.  

 

Deep Learning

 

Why is there a need for AI/ML in genomics?

 

As of 2021, 20 years have passed since the landmark completion of the draft human genome sequence. This milestone has led to the generation of an extraordinary amount of genomic data. Estimates predict that genomics research will generate between 2 and 40 exabytes of data within the next decade.

 

Data prefix chart

 

DNA sequencing and other biological techniques will continue to increase the number and complexity of such data sets. This is why genomics researchers need AI/ML-based computational tools that can handle, extract and interpret the valuable information hidden within this large trove of data.

Infographic illustrating the concept large numbers and prefixes (from left to right):

 

  • Thousand: Kilo
  • Million: Mega
  • Billion: Giga
  • Trillion: Tera
  • Quadrillion: Peta
  • Quintillion: Exa
  • Sextillion: Zetta
  • Septillion: Yotta

What are some ways in which AI/ML are being used in genomics?

Although the use of AI/ML tools in genomics is still at an early stage, researchers have already benefited from developing programs that assist in specific ways.

Some examples include:

These are just a few ways by which AI/ML methods are helping predict and identify hidden patterns in genomic data. Scientists are also using AI/ML to predict future variations in the genomes of the influenza and SARS-CoV-2 viruses to assist public health efforts.

Machine learning and facial recognition

What is NHGRI’s role in bringing AI, machine learning and genomics together?

NHGRI’s Genomic Data Science Working Group collaborates closely with NIH and other academic institutes to define critical areas in genomics for AI and machine learning. The group is also helping to define NHGRI’s unique role in enabling machine learning research to assist in both genomic sciences and genomic medicine. In April 2021, NHGRI hosted a virtual workshop on machine learning in genomics which put forth a vast array of promising advances at the intersection of artificial intelligence and genomics research.

NHGRI is also a key part of NIH’s new Common Fund program, Bridge to Artificial Intelligence (Bridge2AI). The goal of the Bridge2AI program is to act as a launchpad for the widespread adoption of AI in tackling complex biomedical and precision medicine challenges.

Apart from being part of Bridge2AI, NHGRI also independently funds research at the intersection of AI/ML and genomics, and is particularly focused on ensuring that the genomic data used in AI and deep learning programs appropriately reflect fairly and ethically on the diversity of the human species. The NHGRI also supports research on the ethical, legal and social implications of the use of AI/ML in genomics.

Last updated: January 12, 2022