disclaimer: as i’m an organic chemist and this is inorganic chemistry, with all their techniques working in solid state, this is slightly out of my ballpark

some time ago i found this el reg story which claims that AI predicted new compounds, another AI predicted synthesis of these compounds, robotic chemist cooked them and according to another AI, some of that, most of the time, worked and provided new compounds, 43 of them, with 70%+ success rate, all in 17 days.

as you can expect

we might be running into rapidly compounding garbage in garbage out problemS.

paper was published in nature, so great success, at least according to google’s press release. google was very proud of it because model in question runs on deepmind. however, after some time, per el reg:

Secondary: Google has, to us, distanced itself a little from the Berkeley study, telling The Register that the materials produced by the A-Lab were proposed by the university’s researchers. The web giant’s reps said the Berkeley scientists “checked their predictions using a Google DeepMind tool,” ie: GNoMe.

For what it’s worth, at the time of the Nature paper going live, Google boasted in an announcement that the Berkeley-DeepMind study “shows how our AI predictions can be leveraged for autonomous material synthesis.” Two people at DeepMind, who are listed as co-authors of the Nature paper, are credited for using Google AI in the “filtering pipeline for novel-materials identification.”

this is because at some point somebody with actual domain knowledge looked into it all in detail and things started looking weird to them. things looked so weird to them that it all resulted in preprint which states that none of these compounds are actually new, only 3 of 58 syntheses were successful, of which only 1 has convincing receipts to back it up, most of the time whatever they made was dirty (up to 4 separate known compounds identified per sample), fit to experimental data was mediocre to nonexistent, which taken together means that the premise of the first paper doesn’t hold at all. this preprint is written clearly, without unnecessary jargon, and as authors state it right at the beginning, it was written in such a way to be accessible to “multi-disciplinary” (ie not exactly experts in field) audience

of these three compounds that were actually synthetised two were discovered between the time when google took snapshot of xrd database (2021) and now (2024). just one year of powering automated wisdom woodchipper per single inorganic compound. how efficient!

the problem 1:

el reg has a snippet that explains it well: ai model predicted a material with higher ordering, but what was formed irl and what is known in literature, sometimes from 70s, sometimes from 2003, has some degree of disorder in these sites. elreg explanation:

On the computational side, they couldn’t deal with something called ‘compositional disorder,’ which is a very important feature of inorganic materials. A crystal is an ordered arrangement of atoms. But even within that order there can be disorder. Imagine you have a set of children’s building blocks, all the same size and shape, and they are arranged in a perfectly ordered pattern on the floor. The blocks are like atoms in a crystal," Professor Palgrave told us.

“But now imagine that there are two colors of block, red and blue. We have an ordered pattern of colors, say alternating red, blue, red, blue etc. You might end up with a chess board type arrangement. But it is also possible for the colors to be mixed up randomly. In this case the blocks themselves are ordered, but the colors are disordered.”

why it happened? maybe because someone cut corners along the way, because simulating it would be much easier and somebody just had a genius idea that it’s the same thing anyway. i suspect that getting out of this problem involves throwing much larger, multi-elementary cell pieces into DFT because now we have to deal with some degree of disorder. probably there’s some nice trickery to deal with this problem, but it wasn’t used for whatever reason. this shouldn’t have happened if there was an actual crystallographer on team

the problem 2:

the data they have is powder xrd, which means it can’t be really interpreted directly and instead what is needed is a fit to known or predicted compounds. as it happens, additional ordering predicted by ai provides new testable prediction: sometimes there should have been additional peak in pxrd, but it doesn’t appear where it should. sometimes when disorder/order happens between metals that are sufficiently similar, difference in pxrd is also negligible and so authors of preprint state that some better proof, that is one using different technique or better quality data is needed to tell which is which. otherwise, if synthetic procedure is almost exactly the same as one from paper published 40 years ago, why should product be different?

(this is not my field, but in my field, generally, single technique is not enough to show that what you claim it is, is it. usually two different ones are required, like NMR and MS, or one to confirm identity and other to confirm purity. maybe it’s not the case here)

additional issue is that authors used another ai to interpret pxrd data, which for some mysterious reason always conformed to what they wanted to. for example, in case of phosphate series there was a possibility of overfitting model to what they wanted to get, but in all cases preprint shows clearly that all of these compounds are already known and provides better fits to experimental data.

i wanted to write an elaborate sneer about this shitshow, but i don’t have to. authors of preprint already did that, so i’ll just paste some snippets from it:

We discuss all 43 synthetic products and point out four common shortfalls in the analysis. These errors unfortunately lead to the conclusion that no new materials have been discovered in that work.

Many aspects of this work are impressive: the fact that robots can take over labor intensive steps, that AI can predict reasonable synthetic routes based on literature precedent, and that a full circle of materials synthesis and characterization without human intervention can be carried out. Unfortunately, we found that the central claim of the A-lab paper, namely that a large number of previously unknown materials were synthesized, does not hold. As we will explain below, we believe that at time of publication, none of the materials produced by A-lab were new: the large majority were misclassified, and a smaller number were correctly identified but already known.

Notably, all these materials are related to the famous “Naples Yellow” pigment, which derives from Pb2Sb2O7.27 Variants of Naples Yellow, including those with Sn(IV) substitution on the B site, were used by the ancient Egyptians, and have been lost and then rediscovered periodically throughout history, by different ancient civilisations, in the middle ages, at various points in the renaissance, and most recently by the A-lab.

Within the 36 samples classified as successes, we found that the analysis presented for 35 of them suffered from one or more of the error types described below.

  1. Very poor and obviously incorrect fits. This means models that are such poor fits to the data, often missing intense diffraction peaks, that they cannot be relied upon either for proof of the structure of the compounds, nor their purity. The poor fitting leads to the inability to identify impurity phases. Since the authors aim to have >50 wt% of their product, it is important to identify what other materials are present in order to assess if the 50% threshold has been met. (emphasis mine) Additionally, the presence of unreacted starting materials is symptomatic of an incomplete reaction and incorrect reaction conditions. This error type is present in 18/36 compounds.
  2. Using different structures for refinement than were claimed in the paper. In several cases the CIF supplied in the SI is not the same structure (or composition) as that claimed in the main paper. In several examples even the space group between the two differs. An example is Mg3NiO4 which we discuss below. This error is present in 8/36 compounds.
  3. No evidence for cation ordering. The most common error is prediction of compounds which are ordered versions of known disordered compounds. For example, as we will show in detail below, the existence of MgTi2NiO6 is claimed, which is the same as the known ilmenite structure of the same composition, but the predicted structure has ordered Mg and Ni cations, whereas the known structure has those cations disordered. However, no consideration is given by the authors to the possibility that they may have in fact made the known disordered compound instead of their intended compound. We show below that this is in fact the most likely situation. This error type is present in 24/36 compounds.
  4. Reporting existing compounds as new. In several cases the claimed new compounds are in fact already reported in the ICSD. This error type is present in 3/36compounds.

oh and also no actual experimental data was available, authors of preprint dug it out of charts in pdfs and still got better fits than whatever the third ai cooked

For the analysis, the original published experimental XRD patterns were obtained by digitalizing the data provided in the A-lab paper supplementary information using GetData Graph Digitalizer. […] This process is certainly not ideal and yields data of lower quality than the original. Nevertheless, we found it was possible to carry out Rietveld refinement on these datasets […] We do not claim our fits are definitive or cannot be improved upon, but we highlight in each case the features that make us believe the fits we propose are superior to those provided in the original paper.

The compound K2TiCr(PO4)3 was predicted to exist as a new cubic phase in the space group P 213. Fig. 10(d) shows our refinement of the provided PXRD pattern, using known cubic K2Ti2(PO4)3 (P 213; ICSD # 202888) and Cr2O3, a common impurity in high temperature synthesis of oxides containing chromium.34 The refinement provided in the A-lab paper had several unfitted peaks, which all correspond to the Cr2O3 impurity phase as marked by red arrows in Fig. 10©.

The example of K2TiCr(PO4)3 shows that there are serious issues with the supposed synthesis of the phosphates, in fact we could index and preliminarily match all 18 PXRD patterns to materials that are reported in the ICSD […] We consider it to be the responsibility of the authors of the A-lab paper4 to unambiguously prove the synthesis of the target materials in all cases and will refrain from providing alternative refinements of all 43 materials in this comment. We will however discuss each compound and possible alternatives briefly below.

In our view, three materials have been successfully synthesized as predicted. All of them, however, have been reported in the literature before. They are MnAgO2, Y3In2Ga3O12 and CaFe2(PO4)2O, which have been reported in the following references respectively.42–44 Of those CaFe2(PO4)2O seems to have been convincingly synthesized based on the provided PXRD data, whereas the other two’s PXRD patterns are fitted so poorly that it is difficult to state whether the materials indeed have been synthesized.

but gotta give it to them

In any case, the compounds in question were reported relatively recently, between 2021 and 2023. In fact, the authors of the Google DeepMind paper3 clarified that they took snapshots of the ICSD in 2021 and thus did not include materials discovered since in their training set. They rightfully view it as a success that materials they predicted based on a 2021 snapshot were since discovered.

Since we raised issues in the paper shortly after publication, the Ceder group has conceded that A-lab does not live up to human standards, but still claim that “the system offers a rapid way to prove that a substance can be made — before human chemists take over to improve the synthesis and study the material in more detail.”45 We hope that our comment made it clear that this statement is not justified - the A-lab paper does not provide proof that the new materials can be made.

now, tell me, how on god’s green earth this nature paper is still up and not retracted? this would be absolutely the case if authors weren’t partially automated, lots of papers were retracted for less. apparently you can commit any volume of scientific misconduct if you smear it in enough hype. in the meantime, developments were spun for nontechnical audience, stocks pumped and deals signed, and when it all turns out to be trash, people who funded it all and burned some square kilometers of amazon just to train and then run three-layered “ai chemist” that only sprouts garbage “distance themselves from findings”. whew i’ve never known it was that easy!

update: typos, wording

update 2: no matter what LIES computational/quantum/theoretical chemists tell you, chemistry is still an experimental science. i’ve also noticed there’s no actual experimental data, which is weird and this thing alone could be very well grounds for retraction. you’d expect a section in supplementary information of something like:

Compound 1a: In round bottom flask, 2a (XX mg, YY mmol, 0.5M), 3a (XX mg, YY mmol, W.W eq), catalyst 4j (XX mg, YY umol, W mol %) and toluene (XX ml) were placed. Homogenous reaction mixture was heated to 90C for 4h, washed with water, dried and subjected to column chromatography providing product 5a (XX mg, YY mmol, W% yield)

follows full set of analytical data necessary to confirm identity and purity of compound.

so, actual instructions needed to replicate their findings: synthesis, purification method if any, and all analytical data to check if they match. this supplementary info can easily run into 50-100 pages. i’ve only seen one xlsx file with one compound per line and “success” or “failure”, this hardly makes it work. also xrd data and simulations mean that there are pretty pictures to include, why wouldn’t you make it into a nice readable pdf (i have some suspictions as of why)

  • @blakestaceyA
    link
    English
    149 months ago

    “The AI has to work — we trained it on an extensive corpus of papers by Jan Hendrik Schön!”

    • @skillissuer@discuss.tchncs.deOP
      link
      fedilink
      English
      99 months ago

      my favourite case of retraction of a paper to date was the case of hexacyclinol. at this time new natural product, someone determined its structure, proposed and conducted its total synthesis, barely 39 steps, all alone, with slight help of 5 lab techs, for work that would usually yield multiple PhD degrees, at address that houses in real life yoga studio in Berlin. some steps were, to put it mildly, calling in question all known laws of chemistry, and no intermediate had HNMR reported because “solvent peak was added at incorrect shift” (residual solvent shift is a motherfucking physical property of a solvent, it’s physically impossible and makes no sense). ALL data was cooked. few years later someone revised the original structure that matched what was known of related fungi, conducted a very clever six step synthesis with 38% total yield, this time with all proof and analytical data that matched reality http://ccc.chem.pitt.edu/wipf/Current Literature/Adam_3.pdf

  • @skillissuer@discuss.tchncs.deOP
    link
    fedilink
    English
    99 months ago

    (the real answer is that reviews can take months and retractions can take years, but here the damage was already done within a week)

  • @zogwarg
    link
    English
    9
    edit-2
    9 months ago

    It underscores a bit of a universal delusion with Potemkin shitbots in general. People lauding the outputs, into languages (visual art, foreign language, programming, apparently inorganic chemistry, …) that they don’t speak, and since it passes the first glance test, they don’t even think to look twice.

    I think this is actually part of the reason why the prime reason (subconsciously or otherwise) they choose Japan for their main SORA video, the overall exotic nature decreases the uncanny valley factor.

    • @skillissuer@discuss.tchncs.deOP
      link
      fedilink
      English
      5
      edit-2
      9 months ago

      it’s worse than that, even tailor-made model can manufacture mountains of shite results

      is there an overrepresentation of weebs in openai circles? does this video look uncanny to east asians? are they trying to market to east asian customers? there are some other explanations

      • @zogwarg
        link
        English
        69 months ago

        Saying “prime reason” Is maybe overselling it a bit ^^.

        I think the fact it’s Japan let’s them a bit too uncritically act amazed though, for the lady in red one: the location is reminiscent of Shibuya but doesn’t exist and doesn’t make sense, the text on billboards and signs is scribbely gibberish, and the woman face changes over the course of the video.

        (Everyone should be unsettled by the godawful lack of perspective)

        General weebishness is probably a bigger factor, though I wish I could gatekeep them out of that descriptor.

  • @bitofhope
    link
    English
    89 months ago

    I distance myself from the findings presented in my texts last night where the article claimed I “need u pleasee come back”, postulated you are a “hrartless bicht” and concluded that further inquiry is required into the question of whether you would be willing to “send nudes”. I distinctly remember my drinking buddies who peer reviewed them unanimously recommending publication.

    • @skillissuer@discuss.tchncs.deOP
      link
      fedilink
      English
      69 months ago

      google just wants to be proud of their smol uwu plagiarism machine, like soccer mom putting diploma on fridge (it didn’t work this time)

  • @swlabr
    link
    English
    89 months ago

    Oh so we’re just gonna drown in AI GIGO science now too? I guess it’s not enough that we are igniting the atmosphere.

    • @skillissuer@discuss.tchncs.deOP
      link
      fedilink
      English
      6
      edit-2
      9 months ago

      has been happening for some time already albeit at smaller scale. there was a project during covid aiming for an open-sourced covid antiviral and it got mismanaged, perhaps intentionally as some key people were in ML, by relying heavily on ai-generated dreck. anything of value was actually made by humans, and were resources distributed differently, maybe they would have developed this thing half a year sooner or so. but this is material for another sneer

      • @swlabr
        link
        English
        79 months ago

        has been happening

        Yes absolutely, just now it’s A C C E L E R A T I N G