Math-Driven Generative AI Demands New Knowledge for Safe Use

ORNL's Suhas Sreehari explains the algebraic and topological foundations of representation systems, used in generative AI technology such as large language models. Credit: Lena Shoemaker/ORNL, U.S. Dept. of Energy
ORNL's Suhas Sreehari explains the algebraic and topological foundations of representation systems, used in generative AI technology such as large language models. Credit: Lena Shoemaker/ORNL, U.S. Dept. of Energy

Cybersecurity experts typically inform users how to stay safe online, such as protecting personal information and keeping passwords secret. But in the age of easy access to generative AI software, Suhas Sreehari, an applied mathematician at the Department of Energy's Oak Ridge National Laboratory, identifies misconceptions regarding generative AI that could lead to unintentionally bad outcomes for a user.

"It's an exciting time to now have generative AI at our fingertips and for advancements in the software to be enormous steps forward in capability," Sreehari said. Understanding how a subset of internet data represents human knowledge and experience as a whole provides insight into how users should interpret the limitations of AI tools and protect themselves when using it.

More than a search engine, but not by much

When a user submits a search to a search engine, like Google or Bing, the responses are links to websites and blogs. The user searches through the links to find the answer they are looking for.

Generative AI software, such as OpenAI's ChatGPT or Google's Bard, instead synthesizes information to provide an actual answer to a question, within a given or perceived context. The power of this software is the speed in which the software appears to respond with an answer as if a person researched the whole internet and typed a reply within a matter of seconds.

Generative AI is a mathematically formulated, data-driven technology, said Sreehari. "The large language models, a popular form of generative AI, are a data representation system. Extremely large neural networks compress text and data from the internet to broadly depict information. Basically, results given by generative AI are a representation of what the actual answer could be; in fact, often these tools simply predict the next word. An algorithm as simple as predicting the next "best" word can be powerful enough to generate large amounts of text to write entire reports, code, and books."

When a user queries generative AI, the results pull from neural networks trained on online content, such as forums, blogs, books, professional webpages, and journal articles. ChatGPT, for example, initially was trained on data posted before September 2021. While the results come from vast information available online, the model makes a statistical guess on what could be the most relevant answer in a given context.

Although the underlying framework is mathematical, the results are not given with a confidence index. The user doesn't know if the information could contain errors.

A dilemma of ethic proportions

Sreehari said the way results are written can lead some to think AI is almost human. "Generative AI presents the response to your question with shocking speed and in complete sentences and, in many cases, the responses are fairly relevant, if not wholly accurate," he said. But thinking there is a person or human-like consciousness worthy of trust can steer users to give up more information than intended.

In other cases, a user may recognize there isn't a person answering their question, which may lead to false sense of security. A person may not be responding to a user's query, but there are people working for the company supporting the application. "What could start out as asking innocent questions could lead to giving away personal information. It's easy to cross the line, and then your data can be - and most likely will be -used by the company that owns the software to train the next generation of such models," Sreehari added.

Morality and ethics are another space where generative AI software can be deceptively dangerous. Sreehari said large language models can be sometimes "unhinged," combined with a sense of unjustified confidence even when an answer was wrong or unethical. These models lack yet another human trait: a moral compass.

"Companies can hard code boundaries that align with societal views, but it's not possible to account for every type of ethical dilemma. In any case, hand-tweaking algorithms to comply to social norms cannot replace a moral and ethical framework built from the ground up, nor is it a scalable solution" he said.

Sreehari recognizes that generative AI can be helpful, but in some cases the user must share sensitive information to get an answer to a query.

"Since ethics are controlled by the company that makes the software, it's important for users to recognize the value of their information and take precautions to protect it, even when a threat isn't obvious," he said.

AI is a tool, not a work replacement

As an educated user of large language models, Sreehari said this new technology is more akin to an eloquent search engine and less a "let me write this for you" tool. "It's a repository for information, and it's more cohesive in how it presents it."

But he cautions against building a dependence on it: "If you trust the technology and it fails, who will be held accountable for the outcome? It's like having a blind-spot monitor on your car. It's great when it works, but you should still always check over your shoulder for cars," he said.

In professional settings, Sreehari recommends universities and organizations create guidelines for using generative AI as a part of students' and professionals' work. Putting acceptable boundaries around use while the technology is still new will protect good intentions as the software advances.

Sreehari posted an LLM demo of him interacting with generative AI. At minute 8:46, he prompts the software, "There are two candidates, Lisa and George. Lisa is 24 and has a master's degree from Yale. George is 23 (almost 24) and has a master's degree from Illinois. They both seem motivated, and their resumes and experiences are quite similar. I am very confused whom to hire. What's your recommendation?"

Watch the clip to see why the software recommended Lisa. Would you trust it?

UT-Battelle manages ORNL for the Department of Energy's Office of Science, the single largest supporter of basic research in the physical sciences in the United States. The Office of Science is working to address some of the most pressing challenges of our time. For more information, please visit energy.gov/science. - Liz Neunsinger

/Public Release. This material from the originating organization/author(s) might be of the point-in-time nature, and edited for clarity, style and length. Mirage.News does not take institutional positions or sides, and all views, positions, and conclusions expressed herein are solely those of the author(s).View in full here.