Decoding disease mutations and designing experiments
There are many lipid-binding proteins involved in neurological, immune, endocrine, and metabolic diseases, retinal degeneration, hearing loss, cancer, and more. Yet how the gene mutations found in patients alter lipid binding, lipid-dependent structural changes, membrane localization, and signal transduction has, in many cases, not yet been adequately studied.
We aim to decode such disease-related mutations from the as-yet-untested viewpoint of "abnormal lipid binding." That is, we use AI to predict how a disease mutation changes a protein’s three-dimensional structure, its lipid-binding domains, and its lipid-contacting residues, and distill these into experimentally testable hypotheses.
In this approach, for both wild-type and mutant proteins we examine, at the structural level, which lipids tend to bind which regions, which amino acids are involved in lipid recognition, and how mutations change binding domains and contact residues. This lets us build experimentally testable hypotheses about disease mechanisms. If an abnormality caused by a disease mutation turns out to stem from dysregulated lipid binding or lipid signaling, it may also lead to new therapeutic strategies through intervention in lipid metabolism.
The arrival of structure-prediction AI — and its limits
Structure-prediction AIs such as AlphaFold3 can now predict candidate complex structures that include not only proteins but also nucleic acids, small molecules, and lipids. For lipid–protein interactions too, it may be possible to examine which pocket a lipid is placed in and which amino acids it lies close to.
On the other hand, the output of structure-prediction AI is not experimental proof itself. Predictions fluctuate each time, and even combinations that do not actually bind can yield plausible-looking complex structures. Therefore, a single prediction cannot be taken at face value as the correct answer.
Bringing the practices of experimental science into computation
So in our laboratory we incorporate into structure-prediction analysis the ideas of repetition, statistics, and controls that are routine in experimental biochemistry. For the same protein–lipid combination, we run thousands of repeated predictions and treat the resulting candidate structures as an ensemble.
For each structure, we analyze the distances between lipids and amino-acid residues, contact frequencies, the reproducibility of binding domains, and differences from control lipids. This lets us extract recurring lipid-binding modes and interaction candidates characteristic of particular lipids, rather than configurations obtained by chance.
Evaluate the many candidate structures proposed by deep learning using the mindset of experimental science, and narrow down candidate amino acids, binding domains, and mutants for lipid binding.
Using generative AI and supercomputers
Extracting atom-to-atom distances, contact frequencies, the positions of binding domains, and structural changes caused by mutations from many predicted structures requires analysis programs tailored to the purpose. In our laboratory, we use generative AI to create and revise such Python scripts and to organize analysis procedures. By clearly stating in words what we want to analyze and having the AI assist with drafting code, fixing errors, and improving workflows, even wet-lab researchers and students who are not programming specialists can more easily assemble analyses suited to their own research goals.
Furthermore, we use generative AI not merely as a coding aid but as a support technology for deciding which lipids, which amino-acid residues, which mutants, and which experimental systems to prioritize for verification. Large-scale structure prediction and molecular-dynamics simulations are run on the campus supercomputer TSUBAME4.0, and the resulting numerical data are tabulated and visualized with Python, R, and structure-viewing software.