Scientists working in the field of organic chemistry are always looking for new molecules that are created and studied using magnetic resonance. The standards used to re-transcribed the collected data is however specific to each laboratory or publication, making it difficult to export the information electronically and thus to be used by the scientific community. An international team headed by chemists from the University of Geneva (UNIGE) has developed a new common electronic language around two main features: it translates the data of each molecule in exactly the same way and makes it simple to export it from one information system to another. This means that chemists everywhere can access the data easily, which is also directly reusable — resulting in significant time-savings for future research. This study, published in the journal Magnetic Resonance in Chemistry (Wiley), paves the way for creating an international, open-access database and specific tools, including artificial intelligence analysis.
Organic chemists create new molecules based on carbon atoms; these are so small, however, that it is impossible to see what they synthesise. Researchers use magnetic resonance to verify these compositions that are made “blind”: every atom that makes up the molecule emits a signal, whose frequency is translated in the form of a spectrum that the chemists can then decode. To determine the structure of a molecule, he must be able to “read” the magnetic resonance spectra.
Magnetic resonance gets up to speed
Chemists have a specific vocabulary for describing spectra and detailing the resonance of the atoms. But the way the raw data is translated into a written language varies depending on the individual laboratory, the software used and the particular publication. In short, there is no database available for assigned molecular structures or any uniformity in the way the spectra are processed and the data attributed to them. “That’s why it is very difficult to re-use data generated by other laboratories,” explains Damien Jeannerat, a researcher in the Department of Organic Chemistry in UNIGE’s Faculty of Science. “So, we came up with the idea of devising a single electronic language that can be used to switch from one system to another without losing any precision, and to build an international, open-access database.”
NMReDATA: the one and only language
The UNIGE chemists teamed up with specialists of the field and introduced a new electronic language that can serve as the standard for processing organic molecule data. “Our new format, called NMReDATA, operates according to a system of labels that are assigned to each item of data extracted from the spectra in a defined order — and which can be easily read by a computer,” says Marion Pupier, a chemical engineer in the Department of Organic Chemistry at UNIGE. The frequency of each atom will be described in a sequence showing the chemical shift, the number of atoms, the couplings, the interatomic correlations and finally the assignments. “Until now, everyone has used his own sequence to transmit the same information, making electronic transfer from one computer to another impossible and forcing the researchers to monitor and constantly reorganise the information. But there will be no need to do this with our system, thanks to the uniform nature of the language,” continues Damien Jeannerat.
Creating an international, open-access database
The idea of a common electronic language is closely linked to the desire to create open-access databases. “This would enable chemists to find the exact composition of the molecules they’re studying without having to re-do the work that has already been done in the past,” says Marion Pupier. The information will be visible and available anywhere and at any time, saving considerable time and money for organic chemistry research.
All that now remains is to disseminate the new format and to establish it as the norm for publishing articles in the major international journals. “We hope that all the software will be fully operational in around a year, and that NMReDATA will be used by everyone,” says Jeannerat by way of conclusion.