In 2017, 'fake news' was chosen as the new word of the year by the Language Council of Norway. But what are the linguistic features of fake news, and can fake news be uncovered on the basis of linguistic traits? Linguist Silje Susanne Alvestad has examined this in the project 'Fakespeak - the language of fake news'. She and her research colleagues have investigated the language of fake news in English, Russian and Norwegian.
The project draws, among other things, on research from the University of Birmingham, which examined the articles of former New York Times journalist Jayson Blair. He lost his job in 2003 after it was revealed that he wrote fake news.
"The researchers compared his true and false articles to see whether they found differences. An interesting finding was that he predominantly wrote in the present tense when he was lying, and in the past tense when he was writing genuine news," Alvestad said.
More informal style in untrue articles
They also found differences in the use of pronouns. Furthermore, the genuine articles had a higher average word length, and the fabricated texts had a more conversational and informal style. They found extensive use of so-called emphatic expressions in the fabricated texts, for example 'truly', 'really' and 'most'.
Alvestad and her colleagues have compared the Blair texts with similar corpora in which one and the same person has written both genuine and fake news. They see that the linguistic features of fake news vary according to the writer's motivation for deceiving readers.
"Blair says in his autobiography that his motivation was primarily money, and we found, for example, that his fabricated news contained few metaphors. When the motivation is ideological, on the other hand, more metaphors are used, often from domains such as sport and war," Alvestad said.
More categorical in fake news
Another key finding is that fake news can have a more categorical tone. They have examined stance-the ways in which the writer expresses attitudes, perceptions and thoughts.
"In fake news, the writer often gives the impression of being absolutely certain that what is being reported is true. This is called 'epistemic certainty'. There is an overrepresentation of expressions of such certainty, for example 'obviously', 'evidently', 'as a matter of fact', and so on. This tendency is stronger in Russian language texts than in English," Alvestad said.
"We asked ourselves whether there is a universal language of fake news. We concluded that there isn't. The linguistic features of fake news vary within individual languages and between languages. They depend on context and culture," Alvestad said.
A fact-checking tool developed

This makes it even more challenging to develop fact-checking tools for fake news based on linguistic features. Developing such a tool together with computer scientists from the research institute SINTEF was one of the project's goals. They have nevertheless managed to build a fact-checking tool, and it can be tested on SINTEF's website.
"From a linguistic perspective, we have been critical of the fact that the definition of fake news has, in practice, encompassed too many genres. That means one cannot quite know what the differences between fake and genuine news are due to. Good and balanced datasets are needed to develop robust fact-checking tools, together with a targeted and sophisticated linguistic approach," Alvestad argued.
Disinformation from AI
While the researchers have been working on the Fakespeak project, developments in artificial intelligence (AI) have accelerated and changed the landscape of fake news. This laid the groundwork for the project NxtGenFake, which is about identifying AI-generated disinformation. Alvestad and the other researchers use material from Fakespeak to find linguistic features of AI-generated disinformation.
"Purely fabricated news, which there was quite a lot of six or seven years ago, may not have great impact. A bigger problem is fake news that can be a mix of true and false," Alvestad said.
In NxtGenFake they move away from the term fake news and talk about disinformation.
"Some of the information is true, but the whole truth is not included. It is sharpened, often placed in the wrong context, and frequently overlaps with propaganda. This mix means it easily escapes the radar of online verification mechanisms and makes it particularly challenging."
Less variation in AI
The NxtGenFake project will run until 2029, but the researchers already have some findings. For example, there is less variation in the use of persuasive techniques in AI-generated propaganda compared with propaganda written by humans.
"Two types of techniques stand out in the AI-generated texts. One is what we call Appeal to Authority, which concerns reference to the source of the information. We notice that these references are generic, that is, they typically appear in the indefinite form. It might, for example, say 'according to researchers' or 'experts believe'. Large language models likely make such moves because they have no relationship to the world and do not know what is true and what is not. In this way, the claims become very difficult, if not impossible, to verify."
The other technique is that AI-generated news with propagandistic elements end differently than propaganda produced by humans. They often end with formulations the researchers call Appeal to Values. Here the argument is that something must be done to ensure, for example, increased growth, greater fairness or greater public trust.
Preferred AI-generated disinformation
How, then, do people respond to AI-generated disinformation compared with disinformation written by humans? The researchers conducted a test with Americans who were asked to rate AI-generated texts and texts written by humans on three parameters: credibility, emotional appeal and how informative they are. The people who were tested did not know the sources of the texts.
The AI-generated disinformation was rated both more credible and more informative than disinformation written by humans. The researchers also asked which of the excerpts the respondents would prefer to continue reading, and significantly more people said they would continue with the AI-generated texts.
"We were not surprised that the respondents preferred the AI-generated texts. But I was personally a little surprised that the AI-generated texts did not score highly on emotional appeal. Instead, they were perceived as both more informative and more credible than texts written by humans," Alvestad said.
This suggests that AI-generated disinformation may be harder to detect. Large language models can wrap misinformation and disinformation in genres we trust by default. Alvestad believes it is important that we are aware of this.
"I hope the project results can help raise awareness of the risks associated with large language models, especially at a time when such tools are increasingly being adopted."