LOS ALAMOS, N.M., Jan. 28, 2020—Just over a year after Los Alamos National Laboratory launched the Efficient Mission Centric Computing Consortium (EMC3), 15 companies, universities and federal organizations are now working together to explore new ways to make extreme-scale computers more efficient.
“In the first year of EMC3 we have already seen efficiency improvements to HPC in a number of areas, including the world’s first NVMe-based hardware-accelerated compressed parallel filesystem, in-situ analysis enabled on network adapters for a real simulation code, identifying issues with file system metadata performance in the Linux Kernel, record-setting in situ simulation output indexing, demonstrating file-system metadata indexing, and more,” said Gary Grider, High Performance Computing division leader at Los Alamos National Laboratory. “We look forward to welcoming more members over the next year and collaboratively investigating ways to realize greater efficiency as well as a new year of productive collaborations with our treasured existing members.”
Efficiency does not simply mean spending money to attain more power, cooling, or flops; eventually, more efficient HPC solutions will be needed. The co-design of application and hardware has largely become about fitting applications to the latest hardware trends in industry. EMC3 seeks real co-design, Grider said, which has been dubbed “codesign2,” where current and emerging applications experts, computer scientists, system architects and the hardware, software, and infrastructure engineers synergistically work together in a balanced manner to reach higher degrees of performance efficiency, application efficiency and workload efficiency. In essence, EMC3 is furthering the move from architectures guiding the applications to applications guiding the architectures for the most demanding needs.
The consortium’s primary focus is on the most demanding multi-physics applications involving largely unstructured/sparse problems that require a balance of compute, memory size, memory bandwidth, memory latency, network, and I/O. The consortium has mission HPC users, developers and technologists.
The following companies are part of EMC3:
- British Petroleum
- Cray – prototyping large-scale file system metadata management
- DDN – exploration of massively parallel related failure management
- Eideticom – exploring high-bandwidth I/O computational storage offloads
- Exten Technologies – NVMe software solutions
- Marvell – enhancing core computing, memory, and chip interconnect capabilities
- Mellanox – exploring utilization of processing in the network fabric
- nCorium – exploring high-bandwidth memory-system computing offloads
- NetApp – data and metadata services and management
- Parallel Data Lab at Carnegie Mellon University
- Rockport Networks – switchless networking
- Texas A&M University
- University of Chicago
EMC3 will continue to support and encourage joint collaborations. Over the summer of 2019, DDN, Cray, and Mellanox collaborated on efficiency-related joint areas of interest with students at Los Alamos’ Ultra Scale Research Center. This year, EMC3 will influence future supercomputing hardware and system software in many mission centric areas.
“The focus will remain on nurturing architecture, component, workflow, infrastructure, and applications-algorithm areas that can improve the overall efficiency of our supercomputers,” Grider said. “Together, as a consortium, we can pursue greater efficiency for systems that feature multi-link scale, very unstructured and irregular memory access, and have many simultaneously running scientific packages that are mission critical for our EMC3 HPC site members and at Los Alamos to run at immense scale for many months.”