Duke Initiative Targets Big Data Bottlenecks for Secondary Use

Duke Engineering launched the Beyond the Horizon initiative to provide interdisciplinary teams with substantial investment to begin pursuing extremely high-risk, high-reward projects that have the potential for deep, transformative societal impact. Six proposals were selected for an initial round of funding that will play key roles in shaping Duke Engineering's future research and teaching profile. Each plays to the school's unique strengths and holds the promise of helping to define the future of their respective fields.

Consider, for a moment, all the data you generated and shared on the way to work or school today: the music you streamed, the traffic delay that registered on your GPS, the latte that you paid for through an app. Those data points might not seem valuable after your immediate needs have passed, but Duke Professor of Electrical and Computer Engineering Jian Pei says the opposite is true.

"The power of big data is in its secondary use," said Pei. The widespread adoption of smartphones and time-stamped GPS travel points, for example, has created valuable data about traffic patterns at intersections at specific times of the day-including where people are likely to slow down for traffic or to purchase a beverage.

In a traditional data market scenario, a third party might pay for this data and use it to generate targeted ads. That's of little benefit to the user. But according to Pei, current data markets are more complicated than a straightforward monetary purchase. "New opportunities actually compensate users with improved services, and that's what makes data markets interesting," said Pei.

"New opportunities actually compensate users with improved services, and that's what makes data markets interesting."

Duke professor of electrical and computer engineering Jian Pei

A large set of GPS data that has been generated by an entire community could help traffic engineers model more efficient roadways, for example, eliminating traffic snarls altogether-that's a huge benefit to the user, but one that is challenging to quantify.

Another challenge is that the growing AI economy depends on the extensive sharing and integration of data and AI models at the organizational level, and companies tend to keep their data to themselves. Pei wants to change that, with a project he is calling DAIMS2-data and AI model sharing, discovery, and integration in markets.

"Within institutions like universities, there are mechanisms for sharing. But when we cross boundaries into different kinds of organizations, it's hard-there are no incentives to share data, and privacy is usually cited as a concern," said Pei. "Data and AI model sharing, discovery and integration have become a painful bottleneck and a grand opportunity for a major breakthrough."

"Data and AI model sharing, discovery and integration have become a painful bottleneck and a grand opportunity for a major breakthrough."

Sharing models would not only represent the opportunity to learn from peers but would also prevent the duplication of work performed by multiple organizations.

From across Duke, Pei has assembled a team of experts with the goal of demonstrating that data markets can be shaped into fair, ethical, secure and efficient exchanges. His Duke Co-PIs and collaborators include ECE colleagues Yiran Chen, Michael Reiter and Cynthia Rudin; CEE faculty member Mark Borsuk; computer science faculty member Kamesh Munagala; and Fuqua School of Business faculty member Ali Makhdoumi.

These collaborators bring expertise in leveraging next-generation communication networks, building interpretable and privacy-preserving AI, and computational economics and information market mechanism design.

In the first year of the project the group intends to foster the data market research community at Duke, beginning with fundamental components and principles.

Because the idea of AI model markets is just emerging, Pei expects the project to be highly impactful for the discipline. But he emphasizes that society could benefit, too, in areas like healthcare and environmental management. "We have lots of data from North America that have contributed to good machine learning models to manage air pollution," said Pei. "There's not much data from Africa yet and it's not well shared. We have the opportunity to extend the power of AI to other areas, but sharing and appropriate compensation are key."

/Public Release. This material from the originating organization/author(s) might be of the point-in-time nature, and edited for clarity, style and length. Mirage.News does not take institutional positions or sides, and all views, positions, and conclusions expressed herein are solely those of the author(s).View in full here.