Deep Learning for Medicine

How Biomedicine and Personalized Healthcare is Going to Be Disrupted by AI and Deep Learning

Imagine a world where people can be notified to have a blood and urine test, all of which has been determined necessary by an AI system which analyzes your health care records, , genome data, medical images, historical data monitoring your heart rate, blood pressure and more. The recommendation given by this system is based on the observational data of millions of patients including members of her family tree, on top of millions of cell biology datasets with quadrillions of training cases!

The blood test has detected an alteration in your transcriptome, or the sum total of all messenger RNA molecules expressed from the body’s genes. The urine test also detects a corresponding/matching alteration in the metabolome, or the sum of all metabolites (a substance necessary for the function of metabolism), which suggests she has a neuromuscular degenerative disorder.

You aren’t surprised, since at birth you could’ve edited the associated pathogenic variant in the uterus, but an AI system warned of a high probability of side effects. Finally, you are offered a genetic medicine which has been engineered according to your personal data, genome and transcriptome.

After a year of administering the medicine, as the device records information such as arm strength, all evidence of neuromuscular degeneration is now gone. The data from her case is fed back into the system to provide better personalized treatment for other people across the world.

Based off this story I told, you might think I’m a science-fiction writer. But this sceneario, outlined by “Deep Learning in Biomedicine” in Nature Biotechnology, is a possible interpretation of how medicine will change drastically in the years to come. But why and how is machine learning and deep learning likely going to have such as massive impact on the future of our healthcare system?

The Problem Today: Healthcare isn’t personalized

Right now, a major part of our healthcare system isn’t being properly utilized to create better treatment plans is our genome, transcriptome and metabolome. One of the largest reasons why genome data, and biomedical data in general isn’t being used patient by patient for diagnosis and treatment is the fact that it’s uninterpretable by the naked eye.

Geneticist Eric Lander, after the completion of the 13-year long Human Genome Project (HGP), once said, “Genome. Bought the book, Hard to read.” But he establishes a key point: Humans aren’t good at reading the genome, predicting target-drug reactions, or understanding methylation patterns in cFDNA. It’s clear that for biomedicine, we need to use computational approaches such as AI to help solve this problem. In this article, I’m going to be breaking down the applications of deep learning + ML into one huge problem: understanding genetic data. There are many other examples surrounding topics like medical imaging or drug discovery, and in future articles I’ll be going into current models for using deep learning and ML in these fields!

Applications to Genetic Data

One of the largest problems in genetics is figuring out the relation between the genotype and phenotype. The genotype is simply the set of genes in our DNA which code for a specific trait, whereas the phenotype is the physical expression or characteristics. The phenotype is influenced both by differences in the genotype as well as its surrounding environment. The model adopted by most researchers for solving this issue requires us to collect enough individuals to establish an association between a variation at a genetic location and the phenotype, statistically.

Once we identify specific genetic variants, we can pinpoint relevant biological pathways and suggest targets for drug discovery. The total results can be used to model disease risk caused by genetic variants, using a linear additive model called the polygenic risk score. This is a number based on variations in multiple fixed locations across chromosomes (genetic loci) and their associated weights. It is one of the best ways to predict the trait that can be made when taking variation and multiple genetic variants into account. This model can be used for extremely complex genetic traits or disorders such as height or schizophrenia. This is because in disorders like schizophrenia, common genetic variation (which can be statistically reinforced) is the cause for a large chunk of phenotype variance. Using statistics and the polygenic risk score, we can help find a way to explain the genotype-phenotype relation.

An example of deep learning being applied to locate genetic variants (which is required to use polygenic risk score) is DeepBind, which predicts transcription factor and RNA-binding protein binding from DNA and RNA sequences. Once we train this model to achieve a high accuracy in predicting binding affinity (likelihood of protein binding), we can edit certain nucleotides in the sequence (according to common genetic variants) and use the model to see if they positively or negatively impact binding. I implemented and trained this model using Tensorflow, so if you’d like to learn more about this, click this link!

However, in order for deep learning models to learn from DNA and other genomic data, we must process it in such a way that models can learn from it, using things such as vectors and matrices. Most models that learn from DNA sequence use something called one-hot encoding, which uses four channels for the four possible nucleotides A, C, G, and T.

However, there are a variety of ways to achieve a DNA one-hot encoding using TensorFlow operations, and I’ll be creating a video on this topic soon. If you’d like to understand some common algorithms for achieving this, check out this Medium article by Dr. Hannes Bretschneider.


In the upcoming years, medical practice is going to be disrupted by the exponential growth in genetic, molecular, biometric and chemical data being collected using sensors and other medical devices. As the cost of DNA sequences as become affordable, our genome looks to be one of the major ways of creating personalized medicine. However, understanding our genome and its processes will likely be hard to accomplish, without the help of computational methods such as AI. Biology is complex for humans to understand, and deep learning is one of the most promising ways of understanding the mass of data. Hopefully, deep learning will become widely accepted in the current healthcare system and clinical routine.

Key Takeaways

  1. Healthcare today isn’t very personalized; it revolves around the statistics of other patients.
  2. Using AI and the exponential growth of healthcare data, we have one of the largest opportunities to change this.
  3. Biology is complex and hard to understand. Deep learning represents one of the best ways to understand our genome, detect genetic variants and interpret the genotype-phenotype gap.

Next steps

If you enjoyed this article, be sure to follow these steps to keep in touch with my future projects and articles!

  1. Connect with me on Linkedin, to hear about my future developments and future projects. I’m currently looking into cfDNA and identifying unique biomarkers for early cancer diagnosis.
  2. My website is now up with all my content, as well as my Github.
  3. Be sure to subscribe to my monthly newsletter to see new projects, conferences I go to, and articles I put out!
  4. Feel free to email me at to talk about this project and more!

16-year old machine learning developer interested in hard-tech, biology, and philosophy.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store