Abstract: Numerous studies have proposed hardware architectures to accelerate sparse matrix multiplication, but these approaches often incur substantial area and power overhead, significantly ...
We took this version of HeCBench and are modifying it to build the CUDA and OMP codes to gather their roofline performance data. So far we have a large portion of the CUDA and OMP codes building ...
Sparse Autoencoders (SAEs) have recently gained attention as a means to improve the interpretability and steerability of Large Language Models (LLMs), both of which are essential for AI safety. In ...
Sparse matrix-matrix multiplication (SpMM) is a crucial kernel in various applications, including sparse deep neural networks [1]–[6], graph analytics [7], triangle counting [8], and linear algebra ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results