Autograph: Faster, More Accurate Compute Framework

Intelligent Computing

Nowadays, compute-intensive programs, like those for training artificial intelligence and machine learning models, are used extensively. Modern compilers use vectorization techniques to exploit parallel processing capabilities to improve the performance of such programs. A group of scientists from the University of Southern California, Cisco AI Research, and Intel Labs designed a data-driven, graph-based learning framework for automatic vectorization called autograph, which utilizes deep reinforcement learning to have an intelligent agent learn an optimal policy. Autograph greatly outperformed other approaches across different datasets. The work was published on June 2, 2025, in an article titled " A Graph-Based Learning Framework for Compiler Loop Auto-Vectorization " in Intelligent Computing , a Science Partner Journal.

On Polybench, AutoGraph achieved 2.49× higher accuracy than NeuroVectorizer and 1.16× more geometric speedup. On NPB, AutoGraph was 1.90× more accurate than NeuroVectorizer and 1.05× faster. On SPEC 2006, AutoGraph delivered 1.31× better accuracy than NeuroVectorizer and 1.18× faster. Then, to better illustrate the performance improvement, the authors conducted a case study on kernels from the NPB benchmark. For each of the kernels, the vectorization and interleaving factors predicted by autograph and those selected by the baseline O3 were collected and compared. AutoGraph often identified better configurations, leading to considerable speedups. On completely new datasets (GCC and MiBench), AutoGraph achieved 2.72× higher accuracy than NeuroVectorizer. Finally, the authors evaluated AutoGraph on different CPU platforms, further demonstrating the framework's effectiveness and superior capability in auto-vectorization across architectures.

Autograph uses graph neural networks and deep reinforcement learning. The authors used graph neural networks to automatically extract loops, construct dependency graphs, and learn structured representations that capture both the structural dependencies of the computation graph and the semantics of the code. As for the deep reinforcement learning part of the method, it is used to predict vectorization factors and inject the vectorization pragmas with the optimal vectorization factor and interleaving factor into the code to achieve better performance. As demonstrated in experiments, the new framework guides the compiler with smarter choices than either the default baseline O3 or the earlier machine learning method NeuroVectorizer.

In the future, the authors may focus on developing frameworks that can work with datasets containing diverse kernels and label distributions to increase robustness and generalizability. Frameworks with the ability to optimize not only loops but also straight-line sequential code—an area handled by superword-level parallelism vectorization—will also be explored in future studies.

/Public Release. This material from the originating organization/author(s) might be of the point-in-time nature, and edited for clarity, style and length. Mirage.News does not take institutional positions or sides, and all views, positions, and conclusions expressed herein are solely those of the author(s).View in full here.