At the upcoming Flash Memory Summit, Los Alamos National Laboratory and SK hynix, an AI memory provider, will demonstrate a novel Object Computational Storage (OCS) system being used with real scientific data analysis that employs Paraview/Visualization Tool Kit (VTK), a broadly adopted scientific visualization software stack. The demonstration will show the stack pushing down queries to the OCS, proving how data reduction near nonvolatile memory express (NVME) storage improves efficiency, allows very large-scale data analysis from modest analysis resources - even desktops - and improves time to insight.
"Our large-scale indexing efforts, fueled by industry-standard ecosystems like the Apache columnar analytics, are showing great results," said Gary Grider, High Performance Computing division leader at Los Alamos.
Los Alamos has focused on moving large-scale simulation input-output to record- or column-based formats, instead of application-unique formats, while performing analytics with tools from the big data/analytics community. The Laboratory has shown speedups on analysis of simulation output by leveraging indexing to achieve data reduction on query multiple times with industry partners. Orders of magnitude of data movement can be saved by pushing such indexing capabilities closer to the storage devices.
SK hynix has enabled industry standard pushdown reduction function from an S3-based storage server to an OCS via NVME protocol utilizing Apache analytics ecosystem tools like DuckDB, Parquet, Substrait and Arrow. The demonstration uses Paraview/VTK, modified to leverage pushdown, to query extreme-scale data stored in Parquet files in the OCS - reducing data movement. Innovations like the OCS system can enable standard HPC scientific computing analysis tools to be more efficient through leveraging the analytics community ecosystem.
"Through the close collaboration between Los Alamos National Lab's high-performance computing expertise and SK hynix's advanced storage solutions, we have been able to pioneer an innovative object-based computational storage system," said Sungsoo Ryu, head of memory systems research at SK hynix. "This novel approach to data processing minimizes redundant data transfers between analytics applications and storage, and lightens the storage software stack. This accelerates the performance of data-intensive applications such as big data analytics, artificial intelligence and more. SK hynix is striving to develop an analytics ecosystem in collaboration with industry partners."