Talk:AlphaFold

This is the talk page for discussing improvements to the AlphaFold article.
This is not a forum for general discussion of the article's subject.

Put new text under old text. Click here to start a new topic.
New to Wikipedia? Welcome! Learn to edit; get help.

Article policies

Find sources: Google (books · news · scholar · free images · WP refs) · FENS · JSTOR · TWL

Archives: 1, 2: 12 months

A fact from AlphaFold appeared on Wikipedia's Main Page in the Did you know column on 8 February 2021 (check views). The text of the entry was as follows:

Did you know... that DeepMind's protein-folding program AlphaFold 2 has made significant progress towards solving a decades-old grand challenge of biology?

A record of the entry may be seen at Wikipedia:Recent additions/2021/February. The nomination discussion and review may be seen at Template:Did you know nominations/AlphaFold.

Wikipedia

This article is rated C-class on Wikipedia's content assessment scale.
It is of interest to multiple WikiProjects.

Molecular Biology: COMPBIO

	This article is within the scope of WikiProject Molecular Biology, a collaborative effort to improve the coverage of Molecular Biology on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.Molecular BiologyWikipedia:WikiProject Molecular BiologyTemplate:WikiProject Molecular BiologyMolecular Biology
???	This article has not yet received a rating on the importance scale.
	This article is supported by the Computational Biology task force (assessed as High-importance).

Computing Low‑importance

	This article is within the scope of WikiProject Computing, a collaborative effort to improve the coverage of computers, computing, and information technology on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.ComputingWikipedia:WikiProject ComputingTemplate:WikiProject ComputingComputing
Low	This article has been rated as Low-importance on the project's importance scale.

Computer science Low‑importance

This article is within the scope of WikiProject Computer science, a collaborative effort to improve the coverage of Computer science related articles on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.Computer scienceWikipedia:WikiProject Computer scienceTemplate:WikiProject Computer scienceComputer science

Low

This article has been rated as Low-importance on the project's importance scale.

Things you can help WikiProject Computer science with:

Here are some tasks awaiting attention:

Article requests :
- Requested articles/Applied arts and sciences/Computer science, computing, and Internet
Cleanup :
- Computer science articles needing attention
- Computer science articles needing expert attention
Copyedit :
- Computing
Expand :
- Computer science
Infobox :
- Computer science articles without infoboxes
Maintain :
- Timeline of computing 2020–present
Photo :
- Find pictures for the biographies of computer scientists (see List of computer scientists)
- Computing articles needing images
Stubs :
- Computer science stubs
Unreferenced :
- WikiProject Computer science/Unreferenced BLPs
Project-related :
- Tag all relevant articles in Category:Computer science and sub-categories with {{WikiProject Computer science}}

Google Low‑importance

This article is within the scope of WikiProject Google, a collaborative effort to improve the coverage of Google and related topics on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.GoogleWikipedia:WikiProject GoogleTemplate:WikiProject GoogleGoogle

Low This article has been rated as Low-importance on the project's importance scale.

WikiProject Google To-do:

Here are some tasks awaiting attention:

Article requests : Articles for most of the other products listed here and here.
Assess : All articles in the Category:Unknown-importance Google articles and Category:Unassessed Google articles using the project's assessment scale
Expand : Google Mapathon, Google Talkback
Maintain : This WikiProject
Merge : Google mobile services into List of Google products
Stubs : Category:Stub-Class Google articles and Category:Google stubs
Update : List of features in Android and Gmail interface#Product integration. Update logos of Google Marketing Platform products
Other :
- Add more stuff to this to do list if you like! (click here...)
- create:
- Help the Google article for a good article status
- Improve the Outline of Google
- Get more members using :
{{subst:Wikipedia:WikiProject Google/Invite Members}}
- Infobox Images with transparent areas needing a different background color

Yesterday Deepmind deprecated old weights (only for Multimer)

So AF 2.3.0 is trained on much more proteins from before and has some new hacks. https://github.com/deepmind/alphafold/commit/9b18d6a966b9b08b2095dd77d8414a68d3d31fc9

We have fine-tuned new AlphaFold-Multimer weights using identical model architecture but a new training cutoff of 2021-09-30. Previously released versions of AlphaFold and AlphaFold-Multimer were trained using PDB structures with a release date before 2018-04-30, a cutoff date chosen to coincide with the start of the 2018 CASP13 assessment. The new training cutoff represents ~30% more data to train AlphaFold and more importantly includes much more data on large protein complexes. The new training cutoff includes 4× the number of electron microscopy structures and in aggregate twice the number of large structures (more than 2,000 residues)[^1]. Due to the significant increase in the number of large structures, we are also able to increase the size of training crops (subsets of the structure used to train AlphaFold) from 384 to 640 residues. These new AlphaFold-Multimer models are expected to be substantially more accurate on large protein complexes even though we use the same model architecture and training methodology as our previously released AlphaFold-Multimer paper. 109.252.170.50 (talk) 22:17, 12 December 2022 (UTC)[reply]

CASP15 presentations, all (top 5) winners for not RNA are AF2-based

Ground truth https://predictioncenter.org/casp15/TARGETS_PDB/ (not all, no RNA)

https://predictioncenter.org/casp15/doc/presentations/

Video will be also later available. No (!) AF 2.3 with new weights (some are here: https://github.com/deepmind/alphafold/blob/9b18d6a966b9b08b2095dd77d8414a68d3d31fc9/docs/casp15_predictions.zip)

Openfold participated too. 109.252.170.50 (talk) 22:25, 12 December 2022 (UTC)[reply]

Alphafold fixes ancient DNA problem by reading proteins

By directly reading from Genyornis newtoni egg. https://www.pnas.org/doi/10.1073/pnas.2109326119 109.252.170.50 (talk) 22:29, 12 December 2022 (UTC)[reply]

"Responses" section

Almost all of the "Responses" section corresponds to a short period of time in late 2020 after AlphaFold 2 was unveiled but before technical details were given and the code made open-source. This is too focused, speculative, and presents relatively little interest nowadays. I propose to remove almost all of this section, and to update the "AlphaFold 2, 2020" subsection with recent content. Alenoach (talk) 06:42, 3 May 2024 (UTC)[reply]

Yes, I think this section might be removed, or at least shortened and rewritten. My very best wishes (talk) 21:27, 12 May 2024 (UTC)[reply]

Thanks for the response. By the way, are you sure that the section "Protein folding problem" needed to be removed? My impression is that it was a useful introduction for readers that don't know what protein folding is, and to explain the "historical" context and the methods that were used before AlphaFold. I don't think this section is really making the confusion that AlphaFold would be simulating the process of protein folding as suggested in the edit summary. Do you agree? Alenoach (talk) 22:13, 12 May 2024 (UTC)[reply]

OK. I self-reverted. 15:24, 13 May 2024 (UTC)

I shortened the section. Alenoach (talk) 22:59, 12 May 2024 (UTC)[reply]

Looks good. My very best wishes (talk) 15:24, 13 May 2024 (UTC)[reply]

I think it might still nevertheless be useful to mention that there were these concerns when AF2 was first released (including perhaps the Spiegel quote), but with the release of the code and with the experience of use, those criticisms have largely gone away. At the moment I find the latest version a bit unbalanced in respect of the initial reaction -- citing the most enthusiastic puff pieces, but not those with reservations. I think we actually would make the positivity about AF2 more credible by citing that there were some initial reservations, but those have not lasted. (Apart from any that have?) Jheald (talk) 21:03, 13 May 2024 (UTC)[reply]

Sure, you can add back some content on the reservations, as long as it's interesting and understandable for readers, and not too outdated. Alenoach (talk) 21:15, 13 May 2024 (UTC)[reply]

I agree this section sounds like an advertisement (it should not be), but Alenoach did good work by removing parts that are definitely outdated after the just released AF-3 and two latest versions of AF-Multimer (can be found in colabfold), which are significantly better than the first Multimer version (one with frequent overlaps of atoms used here).

This is complicated, and should be considered separately for monomeric proteins, protein complexes and complexes with ligands:

Monomeric structures. One central issue was nicely illustrated by Figure 1 in this article ("The good, the bad and the ugly"), i.e. 30% of sequence is predicted with low confidence score and should be discarded (a "dark matter") - fig. 2 in same paper; note these are monomeric structures. Even though the paper was published in 2021, this is still main issue of AF (and probably also of proteins themselves) - for monomers and complexes.
Complexes. That issue is worse for complexes, since they are typically determined with lower "protein-protein" scores ("ipTM" scores, see here). Actually, the ipTM scores are usually low or medium range, and the precision for large complexes is mediocre, meaning that different sets of residues from two subunits interact in the experimental (correct) and the modeled by AFM structures, as one can judge by calculating the well know DockQ score [1]. Some people are trying to generate as many divergent models as possible using different AFM versions (e.g. ptm, v2 and v3 from Colabfold) and select best of them using available experimental data, including experimental structures of partial complexes.
Complexes with non-protein ligands. There is currently a single article in Nature by authors of AF-3; there are no independent assessments. My very best wishes (talk) 22:04, 13 May 2024 (UTC)[reply]

P.S. Right now there are many new publications beyond CASPs that assess various AF versions for predicting mutations, multiple conformations, complexes, etc. But this would take a lot of time to include to the page. Overall, AF can produce a lot of interesting structures that look real, but one must verify each specific model very carefully through massive use of all available experimental data on mutations, structures, functions, complexes, etc. See Hallucination (artificial intelligence); AF may have same issue [2]. My very best wishes (talk) 19:16, 13 May 2024 (UTC)[reply]

How AF2 works ?

Another thing that would be good to see updated is the discussion on how AF2 works.

The present text in the article was written just after AF2 was first unveiled, when details were limited. In time AF2 became much better understood, and by the time the source code was released the system was rather more understood, so that the code was considered to be substantially as by then expected. IMO indicating how AF2 achieved what it did is really important, but our article doesn't really reflect the understanding that developed; in some regards what we currently give is at best unhelpful, at worst substantially misleading.

Alas, I don't have any time I can put into this at the moment, but IMO this is another section that could do with a substantial review / rework / rewrite (with sources). Thx, Jheald (talk) 19:11, 14 May 2024 (UTC)[reply]

I see, you are talking about section AlphaFold#AlphaFold_2,_2020. I do not think this is misleading, but you are very welcome to improve. I just checked AF-3. The server is easy to use and very fast (they have made defunct Colabfold Google labs server). There is no dramatic improvement compare to the latest AF2 version for large complexes, only small improvements in some cases. Cases like a homodimer of [3] are still a mess. The set of ligands is ridiculously insufficient, the set of PTMs is better, although also rather incomplete. Possibly a breakthrough for RNA and DNA complexes, but I did not check those. My very best wishes (talk) 15:34, 20 May 2024 (UTC)[reply]

Honestly, you seem to have much more expertise on this domain than most of us, so it's a bit hard to really make sense of your comments. I have the same impression as Jheald that this part needs to be updated, but I also don't feel knowledgeable enough. If you are motivated, feel free to modify the article directly, while keeping it relatively easy to understand for readers (mostly well-educated outsiders I guess). No obligation of course, and thanks for your work. Alenoach (talk) 18:28, 26 May 2024 (UTC)[reply]

To avoid WP:OR, we need to borrow/summarize a simplified explanation from sources, such as [4], [5] (few first paragraphs), [6]. But ultimately, this just a "black box"; a user will not have a slightest idea on why exactly such and such structure has been generated. The result does depend on the quality of input, such as the multiple sequence alignment (MSA) (because the correlations in MSAs play a role) , and the existence of similar structures in the PDB which affect the parameters obtained during training of the model. Perhaps I will add something later. My very best wishes (talk) 21:19, 27 May 2024 (UTC)[reply]

P.S. It does provide significant improvement for protein complexes in cases when some proteins in a complex (such as Tyrosin-protein kinase Lck) have important PTMs, such as lipidation. One must make such PTMs during their modeling with AF3. Overall, this is a fantastic modeling tool, but one that requires a significant biological expertise to interpret the results of calculations, verification of every model through comparison with experimental data, sampling (e.g. by using calculating with AF2), sometimes modeling of protein complexes by pieces (e.g. complexes of large transmembrane Tyr kinases), etc. My very best wishes (talk) 16:00, 22 May 2024 (UTC)[reply]