Need a Sherlock Holmes to solve proteins 3D structure? Ask AlphaFold

>>> Proteins, proteins everywhere…

Proteins are the employees of the cell, working to maintain its survival, where their specific function is determined by their structural shape derived from instructions coming from the amino acid (AA) sequence encoded in our genes. For example, antibody proteins are Y-shaped, and this similitude to hooks allows them to hook to pathogens ( i.e. viruses, bacteria) and detect and tag them for extermination.

To understand how these employees go from AA sequence to their energy efficient 3D structure, the following video will be helpful. In summary, biochemists use 4 distinct aspects to describe a protein structure: A primary structure which consist of AA sequence, a secondary structure consisting of repeating local structures from this AA sequence held together by chemical bonds called hydrogen bonding forming α-helix, β-sheet and turns that are local, a tertiary structure consisting of the overall shape of a single polypeptide chain (long AA chain) by non-local chemical interactions, and a possibly quaternary structure if the protein is made by more than one polypeptide chain.

Elucidating the shape of the protein is an important scientific challenge because diseases like diabetes, Alzheimer’s, and cystic fibrosis arise by the misfolding of specific protein structures. The protein folding problem is to try and solve the right protein structure amidst many structural possibilities. A knowledge of protein structure will allow to combat deadly human diseases and use this knowledge within biotechnology to produce new proteins with functions such as plastic degradation.

Currently, the accurate experimental methods to determine the protein shape rely on laborious, lengthy, and costly processes (Figure 1). Therefore, biologists are turning to AI to help diminish these factors and speed up scientific discoveries with the potential of saving lives and bettering the environment.

protein_experimentals
Figure 1. Experimental Techniques to Determine Protein 3D Structure. (A) X-Ray Crystallography. Consists of shooting an x-ray beam through the protein crystal obtained through the use of specific chemical conditions, and uses the resulting diffraction pattern to analyse the location of electrons and decipher the protein model (image); (B) Cryo-Electron Microscopy (Cryo-EM). When biomolecules (i.e. proteins) do not want to crystallize, cryo-EM allows for the visualization of small-large biomolecule and its specific function although in a costly manner (image); (C) Nuclear Magnetic Resonance (NMR).  NMR allows to analyse the structure and conformational changes but is limited to small and soluble proteins (image).

 

“The success of our first foray into protein folding is indicative of how machine learning systems can integrate diverse sources of information to help scientists come up with creative solutions to complex problems at speed”

These were the words of Google’s AI DeepMind developers after project AlphaFold, which aims to use machine learning to predict 3D protein structure solely from amino acid sequence (from scratch), won the biennial global Community Wide Experiment on the Critical Assessment of Techniques for Protein Structure Prediction (CASP) competition on 2018. CASP is used as a gold standard for assessing new methods for protein structure prediction, and AlphaFold showed “unprecedented progress” by accurately predicting 25 out of the 43 proteins in the set (proteins which 3D structures had been obtained by conventional experimental means but not made public), compared to the second team only predicting 3 out of the 43 proteins.

Deep learning efforts attempting to do what AlphaFold  have focused on secondary structure predictions using recurrent neural networks  that does not predict the tertiary and/or quaternary structure needed to for the 3D protein shape due to the complexity of predicting the tertiary structure from scratch.

AlphaFold is composed of deep neural networks trained to 1) predict protein properties, namely the distances between AA and angles made by chemical bonds connecting these AA, and 2) predict the distances between every pair of protein residues and combine the probabilities into a score and use gradient descent, a mathematical method used widely in machine learning to make small incremental improvements, to estimate the most accurate prediction (Figure 2).

deepmind
Figure 2. DeepMind AlphaFold Methodology (source).

Even though there is much more work to do for a precise accurate AI use to try and solve the protein folding problem and speed up solutions to some of our world’s most grave problems, AlphaFold is undoubtedly a step in the right direction.

You with Alzheimer’s 6 years from now?

>>> tic, toc, time’s up… /n

Alzheimer’s is the most common type of dementia, a set of brain disorders that result in the loss of brain function. To throw some statistics highlighting the problem we face, 1 in 3 UK citizens will develop dementia during their lifetime where there is a 62% chance it will be Alzheimer’s, and it is the 6th leading cause of death in the USA.

The problem is that it is a multi-factorial disease as there are many factors influencing its development, i.e. reactive oxygen species, plaque aggregation, and protein malfunction. But these are just the tip of the iceberg, as at the heart of these activities leading to Alzheimer’s, there is a dysregulation (dyshomeostasis) of key biological transition metals such as Cu2+ and Zn 2+  that are vital to maintaining regular brain function and preventing dementia.  These factors contribute to the fact that there is no cure, and thus we are in competition against the clock to try and diagnose it as fast as possible to slow its progress.

ad_brains
Alzheimer’s (left) versus normal brain (right). Source.

Radiologist use Positron Emission Tomography (PET) scans to try and detect Alzheimer’s. PET allows the monitoring of molecular events as the disease evolves through the detection of positron emission using radioactive isotopes such as 18F. This isotope is attached to a version of glucose(18F-FDG), as glucose is the primary source of energy for brain cells, allowing their visualization. As brain cells become diseased, the amount of glucose decreases compared to normal brain cells. To aid in the war against time, Dr. Jae Ho Sohn combined machine learning with neuroimaging in the following article.

“One of the difficulties of Alzheimer’s disease is that by the time all the clinical symptoms manifest and we can make a definitive diagnosis, too many neurons have died, making it essentially irreversible. “

Jae Ho Sohn, MD,MS 

 

Debriefing the Article “A Deep Learning Model to Predict a Diagnosis  of Alzheimer’s Disease by Using 18F-FDG PET of the Brain” by Sohn et al.  

Objective. To develop a deep learning algorithm to forecast the diagnosis of Alzheimer’s disease (AD), mild cognitive impairment (MCI), or neither (non-AD/MCI)  of patients undergoing 18F-FDG PET brain imaging, and compare the results with that of conventional radiologic readers.

Reasoning. Due to the inefficacy of humans to detect slow, global imaging changes, and the awareness that deep learning may help address the complexity of imaging data as deep learning has been applied to help the detection of breast cancer using mammography, pulmonary nodule using CT, and hip osteoarthritis using radiography.

Methodology.  Sohn et al. trained the convolutional neural network of Inception V3 architecture using 90% (1921 imaging studies, 899 patients) of the total imaging studies from patients who had either AD, MCI, or neither enrolled in the Alzheimer’s Disease Neuroimaging Initiative (ADNI). This trained algorithm was then used for testing on the remaining 10% (188 imaging studies, 103 patients) of the ADNI images (labelled ADNI test set) , and on an independent set from 40 patients not in ADNI. To further asses the proficiency of this method, results from the trained algorithm were compared to radiological readers.

Results. The algorithm was able to predict with high ability those patients who were diagnosed with AD ( 92% in ADNI test set and 98% in the independent test set),  with MCI ( 63% in ADNI test set and 52% in the independent test set), and with non-AD/MCI (73% in ADNI test set and 84% in the independent test set). It outperformed three radiology readers in ROC space in forecasting the final AD diagnosis.

Limitations. The independent test data was small (n=40), not from a clinical trial, and also excluded data from patients with non-AD neurodegenerative cases and disorders like stroke that can affect memory function. The training of the algorithm was solely based on ADNI information and thus is limited by the ADNI patient population, which did not include patients with non-AD neurodegenerative diseases. The algorithm performed its predictions distinctly from human expert approaches, and the MCI and non-AD/MCI were unstable compared to AD diagnosis and their accuracy depends on the follow up time.

Conclusion. The trained deep learning algorithm using 18F-FDG PET images achieved 82% specificity with 100% sensitivity in predicting AD specifically, an average of 75.8 months(~6 years) before final diagnosis. It has the potential to diagnose Alzheimer’s 6 years in advance at the clinic, but further validation and analysis is needed per mentioned limitations.

 

Can AI discriminate against minorities?

>> Hello, World!

Much has been going on since I recently made my first post, i.e.,  the use of the new genome editing technique CRISPR-Cas9 by He Jiankui, scientist at the Southern University of Science and Technology of China, to alter the DNA of embryos of seven couples leading to the birth of two genetically modified female twins. This research has been called “monstrous”, “crazy”, and “a grave abuse of human rights” by the scientific community worldwide, and the universities involved have proclaimed no awareness of the performance of this research under their institutions.

This research, however, has a positive. It highlights the inefficient regulation of some novel technological innovations that need to urgently be addressed for the benefit of society and advancement of science and technology.

Currently, there is also the rise of Artificial Intelligence (AI) being implemented in many fields, specially healthcare. Similar to CRISPR-Cas9, the developers and users of this technology need to take a moment to step back from the technological upheaval and look at their innovation through ethical lenses, to see, address, and prepare for the potential negatives and ethical conflicts.

This post is the first on a two post series covering major ethical issues around AI use on healthcare that need to be taken into consideration.

ethics_cte

 

AI algorithms can discriminate against minorities

The food that enables AI algorithms to function, specially machine learning and deep learning,  is large data sets, which are taken as input, processed, and used to deliver conclusions based solely on these data. For example, a company could use AI to make recommendations on the best candidate to hire by feeding the algorithm data about successful candidates for it to make a conclusion.

Applying these algorithms to make a decision over a human matter is tricky, however, as the data needs to reflect our diversity. If it does not, this is one way the recommendation by the algorithm can be biased, others are human biases inherent in the data and an intentional embedding of bias into the algorithm by a prejudiced developer.

Already in non-medical fields AI has shown to reflect biases in training. For example, AI algorithms designed to help American judges make sentences by predicting an offender’s tendency to re-offend have shown an alarming amount of bias against African-Americans.

reoffend
Bernard Parker, left, was rated high risk; Dylan Fugett was rated low risk. (source)

With regards to healthcare delivery, it varies by race, and an algorithm designed to make a healthcare decision will be biased if there has been few (or no) genetic studies done in certain populations. An example of this is demonstrated in the attempts to use data from the Framing Heart Study to predict cardiovascular disease risks in non-white populations. These have led to biased results with overestimations and underestimations of risk.

Not everyone might benefit equally from AI in healthcare as AI might be inefficient where data is scarce. As a result, this might affect people with rare medical conditions or other underrepresented in clinical trials and research data, such as Black, Asian, and minority ethnic populations.

As the House of Lords Select Committee on AI cautions, datasets used to train AI systems usually do a poor job in representing the wider population, which can potentially make unjust decisions reflecting societal prejudice.

AI algorithms can be malicious   

In addition, there is the ethical issue that developers of AI may have negative or malicious  intentions when making the software. After all, if everybody had good intentions the world would certainly be a better place.

Take, for example, the recent high profile examples of Uber and Volkswagen. Uber’s machine learning software tool Greyball allowed the prediction of ride hailers that might be undercover law-enforcement officers, allowing the company to bypass local regulations. In the case of Volkswagen, they developed an AI algorithm in which their vehicles would reduce their nitrogen oxide emissions to pass emission tests.

hackjer

AI private companies working with healthcare institutions might create an algorithm that is better suited for the monetary interest of the institutions instead of the monetary and care interest of the patient. Particularly in the USA, where there is a continuous tension between improving health versus generating profit, since the makers of the algorithms are unlikely to be the ones delivering bedside care. In addition, AI could be used for cyber-attacks, robbery, and revealing information about a person’s health without their knowledge.

These potential negatives need to be acknowledged and addressed in the implementation of AI  in any field, specially healthcare. In the upcoming post, I will discuss the effects of AI on patients and healthcare professionals, breach of patient data privacy and AI reliability and safety.

 

AI in healthcare: for better or for worse?

>>> Hello, World!

In this century of technological advancement, there has been much hype over the recent emerging field of artificial intelligence (AI), defined as the intelligence applied by computational means instead of the natural world, i.e. humans.

AI has gained popularity following innovative applications in fields such as the automotive, finance, military, and healthcare industry.

However, as with any emerging technology, ethical and controversial issues arise. Questions over whether artificial intelligence will “take over the world” by, for example,  replacing industry sectors with robotics or the uncontrolled use of AI for military purpose are current hot topics of debate.

The Media, literature, and particularly the film industry, with movies such as “I, Robot” and “The Terminator”, have certainly expanded our imaginations as to the potential negatives in the field.

Adding fuel to the fire, recent comments from Tesla and SpaceX CEO Elon Musk stating that “A.I. is far more dangerous than nukes” and thus need to be proactively regulated ignite reasonable worries over the use of AI applications.

In healthcare and medical research, however, far from robots replacing human physicians in the foreseeable future, AI devices have been helping physicians and scientists save lives and develop new medical treatments.

AI is going to lead to the full understanding of human biology and give us the means to fully address human disease.

–Thomas Chittenden, VP of Statistical Sciences at WuXi NextCODE

A shift in the use of AI in medical research occurred in 12 June 2007 with Adam, a scientific robot developed by researchers in the UK universities of Aberystwyth and Cambridge, able to produce hypotheses about which genes provide information to develop key enzymes able to speed up (catalyse) reactions in the Brewer’s yeast Saccharomyces cerevisiae and experimentally test them robotically. Researchers then individually tested Adam’s hypotheses about the role of 19 genes and discovered that 9 were new and accurate while only 1 was incorrect.

Adam set the precedent for the team to develop a more advanced scientific robot called Eve, which helped identify triclosan, an ingredient found in toothpaste, as a potential anti-malarial drug against drug resistant malaria parasites which contribute to an estimate malaria mortality of 1.2 million annually.

Eve screened thousands of compounds against specific yeast strains that had their essential genes for growth replaced with equivalent ones either from malaria parasites or humans to find compounds that decreased or stopped the growth of strains dependent on malaria genes but not human genes ( to avoid human toxicity). As a result, triclosan was identified to halt the activity of the DHFR enzyme necessary for malaria survival even in pyrimethamine drug-resistant malaria strains.

Eve_robot
The scientific robot “Eve” (source)

Without Eve, it is likely that the research would have still been in progress at this stage and taken years to arrive at the published result, which is what usually happens in the drug discovery field.

To make a drug, on average it takes at least 10 years of arduous research and an estimate of US $2.6 billion with a high percentage of this money spent on drug therapies that fail. AI has the potential to lessen these time, money, and research inefficiency factors.

In the clinic, AI tools can use algorithms to assist physicians with the high volume of patient data, provide updated medical information, reduce therapeutic error, and use this information to provide clinical assistance and diagnosis with over 90% accuracy. The depicted diagram below provides some insight into the structural function of AI and examples of applications in medicine based on the detailed published information found in Jiang et al.

AI_paint
Insight into AI structure and examples of medical applications

Just as advantages, applying AI in healthcare rings the alarm for ethical issues and analytical concerns which will be discussed in future posts.

However, far being from robotic disaster, AI has proved valuable for the development of human medicine and health.

As Suchi Saria, a professor of computer science and director of the Machine Learning and Health Lab at John Hopkins University, explains in her TEDx talk, AI is already saving lives by detecting symptoms 12-24 hours before a doctor could.

AI in healthcare undoubtedly sets the precedent for a new future in medicine.