Five Questions For: James Grimmelmann

James Grimmelmann, the Tessler Family Professor of Digital and Information Law at Cornell Tech and Cornell Law School, studies how software regulation shapes freedom, wealth, and power.

An August article co-authored with A. Feder Cooper, Ph.D. '24, "The Files Are in the Computer," was cited in a landmark German court ruling against OpenAI on Nov. 11.

The decision found that ChatGPT infringed songwriters' copyrights by memorizing and reproducing lyrics without a license - a conclusion heavily informed by Cooper and Grimmelmann's analysis of how memorization constitutes copying under copyright law.

In the Q&A below, Grimmelmann shares why memorization matters, what this ruling means for generative AI, and where the conversation goes next.

Why does memorization matter in copyright law?

When an AI model memorizes some of the data it was trained on, it can count as an infringing copy that violates copyright law.

Usually, generative AI models are supposed to learn patterns from their training data. It's the ability to find and make use of those patterns that makes them more than just fancy search engines, and that helps to justify letting AI companies train on copyrighted works. But when a model memorizes a specific work - like the way that one of Meta's Llama models has memorized 42 percent of the first Harry Potter novel - the case that this constitutes copyright infringement is much stronger.

Why is this court ruling significant?

The decision is one of the first ones to apply European copyright law to generative AI.

How did your work influence the court's decision?

The plaintiffs - songwriters and lyricists - showed that ChatGPT could reproduce substantial chunks of their lyrics. The court relied on our article, "The Files are in the Computer," to help explain why these outputs weren't a coincidence, but instead meant that the model had memorized the lyrics.

What does this decision signal for the future of generative AI, both in Europe and globally?

This ruling suggests that AI companies will have to be responsible for preventing their models from generating infringing outputs for their systems to be legal. The E.U. and the U.S. have moderately different copyright laws, but this seems like a rule that could also be adopted here.

What questions are you tackling next in this space?

I'm thinking about how much responsibility AI companies should have to prevent their users from putting their systems to harmful uses, like fraud and deepfakes.

Grace Stanley is the staff writer-editor for Cornell Tech.

/Public Release. This material from the originating organization/author(s) might be of the point-in-time nature, and edited for clarity, style and length. Mirage.News does not take institutional positions or sides, and all views, positions, and conclusions expressed herein are solely those of the author(s).View in full here.

You might also like