Gaps in SEER Database May Skew Cancer Data

American College of Surgeons

Key Takeaways

  • The SEER database may be missing information on high-risk, underserved patients that can make overall outcomes appear better than reality.
  • Patients treated at centers that were not accredited by the American College of Surgeons Commission on Cancer (CoC) were more than two to three times more likely to have missing data in the SEER database than those who went to CoC-accredited facilities.
  • Study authors suggest using multiple data sources for research, such as both SEER and the National Cancer Database.

CHICAGO (May 14, 2026) — A significant number of cancer patients — particularly those with more advanced cancers who are more likely to receive care at community hospitals, safety net hospitals, and rural medical centers — may have incomplete case information in the Surveillance, Epidemiology, and End Results (SEER) database, according to a study published in the Journal of the American College of Surgeons. This finding, the researchers note, warrants careful consideration when interpreting studies that rely on SEER data.

These missing cases create "blind spots" in the data, said senior study author Schelomo Marmor, PhD, MPH, a professor in the department of surgery, division of surgical oncology and real-world data lead at the Center for Learning Health System Sciences at the University of Minnesota in Minneapolis (UMN).

Dr. Marmor and McKenzie White, MD, a complex general surgical oncology fellow at Moffitt Cancer Center in Tampa, Florida, and formerly at UMN, studied four types of cancer: breast, pancreas, colon, and non-small cell lung cancer (NSCLC).

"Patients with missing data had meaningfully lower rates of survival," Dr. Marmor said. "That was not a coincidence. These are not minor statistical loose ends. They are a high-risk, underserved population that effectively disappears from the scientific record every time a study excludes incomplete cases."

Study Findings

The study evaluated how population-based studies that used the SEER database handled the missing patient data. The study findings indicate that SEER studies that exclude patients with missing data may overestimate survival outcomes and exclude at-risk or underserved patient populations.

The study analyzed 328,030 patients with one of the four cancers and stage I–IV disease entered in SEER from 2018 to 2020. The vast majority of patients were treated at American College of Surgeons Commission on Cancer (CoC) accredited cancer centers — 82% with breast cancer, 83% with pancreatic cancer, 75% with colon cancer, and 80% with NSCLC. SEER captures all cancer cases from both CoC and non-COC accredited centers and 18 population-based registries across 22 geographic regions and covers about half of the United States population.

Patients who went to centers that were not CoC accredited were more than two to three times more likely to have missing data than those who went to CoC facilities: 23% vs. 9% for breast, 36% vs. 14% for pancreas, 30% vs. 13% for colon, and 42% vs. 13% for NSCLC.

"Think of it this way: if you set out to understand how cancer treatments perform across the entire country, but your data systematically leaves out the sickest patients, the oldest patients, and those from rural or underserved communities, then what you're left with is a portrait of cancer care that looks far more optimistic than reality," Dr. Marmor said. "The hardest cases weren't randomly absent; they were quietly excluded, and the database never flagged them as missing. The database results look much rosier than reality, and that's because the most difficult cases were quietly left out."

Patients with missing data had significantly lower three-year overall survival: 63% vs. 81% for breast; 5% vs. 12% for pancreas; 42% vs. 61% for colon and 17% vs. 27% for NSCLC. Overall, the proportion of missing data ranged from 12% for breast cancer to 19% for NSCLC.

This is the first study to show that patients with missing data in SEER are treated mostly at centers that are not CoC accredited, Dr. Marmor said.

"Our manuscript shows that patients with missing data are more likely to be older, from rural areas or socioeconomically disadvantaged backgrounds, the very patients who already face barriers to preventive care, who are more often diagnosed at aggressive stages, and who have less access to high-quality cancer treatment," Dr. Marmor said. "When their records are dropped from an analysis, we don't just lose data points; we lose the clinical and human reality of cancer in America."

Implications for Building AI Models

"Population-based registries like SEER are the foundation on which AI-driven cancer research is built. These registries capture longitudinal real-world data and are really the backbone that enable these types of AI advances, but if we move deeper into an era of AI and machine learning-driven oncology research, the issue of missing data becomes even more consequential because AI does not fix the missing data by default," he said. "AI learns from us how we handle it and if we're systematically excluding those patients, we're teaching AI our own blind spots."

The study suggested cancer researchers use multiple data sources; for example, both SEER and the National Cancer Database , a hospital-based registry that captures cases from CoC-accredited centers.

Strengths of the study are the large size of the dataset and its use of clinically validated methods to adjust for age and proportional hazard, Dr. Marmor said. A key limitation is that the study could not determine why specific data were missing. "Was it inadequate documentation? Understaffed registries? Changes in coding systems? Or disease severity itself?" he said. "That's an important question for future research, because understanding why data goes missing is the first step to ensuring we stop losing these patients from the scientific conversation."

Study co-authors are Saranya Prathibha, MD; Qianyun Luo; David Brauer, MD, MPHS, FACS; Jacob S. Ankeny, MD, FACS; Jane YC Hui, MD, MS, FACS, FRCSC; Paolo Goffredo, MD, FACS; Todd M. Tuttle, MD, MS, FACS; and Eric H. Jensen, MD, FACS.

This work was first presented at the Society of Surgical Oncology Annual Meeting in 2023, as an e-poster at the Minnesota Society for Clinical Oncology 2023 Spring Meeting, and as an oral presentation at the Minnesota Surgical Society — a Chapter of the ACS — 2023 Fall Meeting and then updated with an additional year of data in 2025.

The study is published as an article in press on the JACS website.

Citation: White MJ, Prathibha S, Luo Q, et al. Missing, but not Forgotten: Commission on Cancer Center Accreditation and the Impact of Missing Data in SEER Studies. Journal of the American College of Surgeons, 2026. DOI: 10.1097/XCS.0000000000001866

/Public Release. This material from the originating organization/author(s) might be of the point-in-time nature, and edited for clarity, style and length. Mirage.News does not take institutional positions or sides, and all views, positions, and conclusions expressed herein are solely those of the author(s).View in full here.