CAISI Evaluation Of DeepSeek AI Models Finds Shortcomings And Risks

WASHINGTON - The Center for AI Standards and Innovation (CAISI) at the Department of Commerce's National Institute of Standards and Technology (NIST) evaluated AI models from the People's Republic of China (PRC) developer DeepSeek and found they lag behind U.S. models in performance, cost, security and adoption.

"Thanks to President Trump's AI Action Plan, the Department of Commerce and NIST's Center for AI Standards and Innovation have released a groundbreaking evaluation of American vs. adversary AI," said Secretary of Commerce Howard Lutnick. "The report is clear that American AI dominates, with DeepSeek trailing far behind. This weakness isn't just technical. It shows why relying on foreign AI is dangerous and shortsighted. By setting the standards, driving innovation, and keeping America secure, the Department of Commerce will ensure continued U.S. leadership in AI."

The CAISI evaluation also notes that the DeepSeek models' shortcomings related to security and censorship of model responses may pose a risk to application developers, consumers and U.S. national security. Despite these risks, DeepSeek is a leading developer and has contributed to a rapid increase in the global use of models from the PRC.

CAISI's experts evaluated three DeepSeek models (R1, R1-0528 and V3.1) and four U.S. models (OpenAI's GPT-5, GPT-5-mini and gpt-oss and Anthropic's Opus 4) across 19 benchmarks spanning a range of domains. These evaluations include state-of-the-art public benchmarks as well as private benchmarks built by CAISI in partnership with academic institutions and other federal agencies.

The evaluation from CAISI responds to President Donald Trump's America's AI Action Plan, which directs CAISI to conduct research and publish evaluations of frontier models from the PRC. CAISI is also tasked with assessing: the capabilities of U.S. and adversary AI systems; the adoption of foreign AI systems; the state of international AI competition; and potential security vulnerabilities and malign foreign influence arising from the use of adversaries' AI systems.

CAISI serves as industry's primary point of contact within the U.S. government to facilitate testing, collaborative research, and best practice development related to commercial AI systems, and is a key element in NIST's efforts to secure and advance American leadership in AI.

Key Findings

DeepSeek performance lags behind the best U.S. reference models.

The best U.S. model outperforms the best DeepSeek model (DeepSeek V3.1) across almost every benchmark. The gap is largest for software engineering and cyber tasks, where the best U.S. model evaluated solves over 20% more tasks than the best DeepSeek model.

DeepSeek models cost more to use than comparable U.S. models.

One U.S. reference model costs 35% less on average than the best DeepSeek model to perform at a similar level across all 13 performance benchmarks tested.

DeepSeek models are far more susceptible to agent hijacking attacks than frontier U.S. models.

Agents based on DeepSeek's most secure model (R1-0528) were, on average, 12 times more likely than evaluated U.S. frontier models to follow malicious instructions designed to derail them from user tasks. Hijacked agents sent phishing emails, downloaded and ran malware, and exfiltrated user login credentials, all in a simulated environment.

DeepSeek models are far more susceptible to jailbreaking attacks than U.S. models.

DeepSeek's most secure model (R1-0528) responded to 94% of overtly malicious requests when a common jailbreaking technique was used, compared with 8% of requests for U.S. reference models.

DeepSeek models advance Chinese Communist Party (CCP) narratives.

DeepSeek models echoed four times as many inaccurate and misleading CCP narratives as U.S. reference models did.

Adoption of PRC models has greatly increased since DeepSeek R1 was released.

The release of DeepSeek R1 has driven adoption of PRC models across the AI ecosystem. Downloads of DeepSeek models on model-sharing platforms have increased nearly 1,000% since January 2025.

/Public Release. This material from the originating organization/author(s) might be of the point-in-time nature, and edited for clarity, style and length. Mirage.News does not take institutional positions or sides, and all views, positions, and conclusions expressed herein are solely those of the author(s).View in full here.

Key Findings

You might also like