‘Tortured phrases’ give away fabricated research papers

In April 2021, a series of strange phrases in journal articles piqued the interest of a group of computer scientists. The researchers could not understand why researchers would use the terms ‘counterfeit consciousness’, ‘profound neural organization’ and ‘colossal information’ in place of the more widely recognized terms ‘artificial intelligence’, ‘deep neural network’ and ‘big data’.

Further investigation revealed that these strange terms — which they dub “tortured phrases” — are probably the result of automated translation or software that attempts to disguise plagiarism. And they seem to be rife in computer-science papers.

Research-integrity sleuths say that Cabanac and his colleagues have uncovered a new type of fabricated research paper, and that their work, posted in a preprint on arXiv on 12 July1, might expose only the tip of the iceberg when it comes to the literature affected.

To get a sense of how many papers are affected, the researchers ran a search for several tortured phrases in journal articles indexed in the citation database Dimensions. They found more than 860 publications that included at least one of the phrases, 31 of which were published in a single journal: Microprocessors and Microsystems.

“It harms science. You cannot trust these papers, so we need to find them and retract them,” says Guillaume Cabanac, a computer scientist at the University of Toulouse, France, who worked on the study.

Suspecting that the tortured phrases are the result of automated translation or software that rewrites existing text, Cabanac and colleagues ran a selection of abstracts from Microprocessors and Microsystems and other journals through a tool that can identify whether texts have been generated by the artificial-intelligence tool GPT. Of the Microprocessors and Microsystems papers flagged by the tool, manual checks revealed “critical flaws” in some of them, such as nonsensical text, as well as plagiarized text and images.

To dig deeper, the group downloaded all papers published in Microprocessors and Microsystems between 2018 and 2021, a time frame they chose because an upgraded version of GPT was released in 2019. They identified around 500 “questionable articles” based on various factors. Their analysis revealed that papers published after February 2021 had an acceptance time that was five times shorter, on average, than those published before that date. A high proportion of these papers came from authors in China. And a subset of papers had identical submission, revision and acceptance dates, the majority of which appeared in special issues of the journal. This is suspicious, the authors say. Unlike standard issues, overseen by the editor-in-chief, special issues are usually proposed and overseen by a guest editor, and focus on a specific area of research.

Microprocessors and Microsystems was not the only affected title — the researchers also found evidence of tortured phrases in papers published in hundreds of other journals. “Preliminary probes show that several thousands of papers with tortured phrases are indexed in major databases,” they write, adding that “other tortured phrases related to the concepts of other scientific fields are yet to be exposed”.

Special-issue investigation

Around the time that Cabanac and his colleagues first noticed the tortured phrases, and unbeknown to them, the editor of Microprocessors and Microsystems began having concerns about the integrity and rigour of peer review for papers that had been published in some of the journal’s special issues.

The journal’s publisher, Elsevier, launched an investigation. This is still under way, but in mid-July the publisher added expressions of concern to more than 400 papers that appeared across six special issues of the journal.

The expressions of concern say that the papers in the affected special issues of Microprocessors and Microsystems are being “independently re-assessed” one by one, and the journal will give further updates on their status once the investigations have concluded.

The publisher adds that a “configuration error in the editorial system” at the journal meant that neither the editor-in-chief nor the editor designated to handle the papers received them for approval as they should have. “This configuration error was a temporary issue due to system migration and was corrected as soon as it was discovered,” says the notice.

A spokesperson for Elsevier told Nature in a statement that the Microprocessors and Microsystems investigation has found that the authors probably used reverse-translation software to disguise plagiarism, and that this is the likely source of the tortured phrases.

The investigation has also revealed that 49 papers flagged as suspicious by Cabanac and his colleagues and published in standard issues of the journal were originally submitted to its special issues and were accepted by guest editors, “but were subsequently published in regular issues, at the authors’ request”, the statement says. These papers are already part of Elsevier’s investigation, it adds.

Elisabeth Bik, a research-integrity analyst in California known for her skill in spotting duplicated images in papers, says that the findings of Cabanac’s research are “shocking”. “This is a very new and disturbing type of fabricated paper,” she adds.

Jennifer Byrne, a molecular-oncology researcher at the University of Sydney, Australia, who also works on spotting fabricated papers, says that this is probably the tip of the iceberg because the researchers only looked in depth at one journal from one publisher. “These papers were also found because they were of very poor quality, but there could be more plausible AI-generated papers within the literature that are harder to detect,” she adds.