• CUDA

  • Referenced in 1325 articles [sw03258]
  • developers building GPU-accelerated applications. The CUDA Toolkit includes a compiler for NVIDIA GPUs, math ... libraries, and tools for debugging and optimizing the performance of your applications. You’ll also...
  • Mint

  • Referenced in 6 articles [sw12752]
  • enjoy the performance benefits of hand coded CUDA without becoming entangled in the details. Mint ... source-to-source translator that generates optimized CUDA C from traditional C source. The translator ... deliver performance competitive with painstakingly hand-optimized CUDA. We show that ... performance obtained from aggressively optimized CUDA on the 200 series NVIDIA GPUs. Our optimizations target...
  • clSpMV

  • Referenced in 10 articles [sw12638]
  • higher performance compared to the vendor optimized CUDA implementation of the proposed hybrid sparse format...
  • Copperhead

  • Referenced in 5 articles [sw30955]
  • times fewer lines of code than CUDA, and the compiler generates efficient code, yielding ... performance of hand-crafted, well optimized CUDA code...
  • BSGP

  • Referenced in 5 articles [sw08995]
  • similar or better performance than well-optimized CUDA programs, while the source code complexity...
  • SPHEROS

  • Referenced in 4 articles [sw29066]
  • CUDA featuring the Thrust library,and optimized CUDA kernels for both compute-bound and memory...
  • Gaalop

  • Referenced in 27 articles [sw00313]
  • algebra. We present Gaalop (Geometric algebra algorithms optimizer), our tool for high-performance computing based ... FPGA (field-programmable gate arrays) or the CUDA technology from NVIDIA. We describe the concepts ... future perspectives of Gaalop dealing with optimized software implementations, hardware implementations, and mixed solutions...
  • QUDA

  • Referenced in 11 articles [sw14040]
  • GPUs), leveraging NVIDIA’s CUDA platform. The current release includes optimized Dirac operators and solvers...
  • WIGEON

  • Referenced in 4 articles [sw24823]
  • series of techniques to optimize the CUDA implementation, especially in the memory access pattern...
  • QMCPACK

  • Referenced in 7 articles [sw12953]
  • utilizes a fully hybrid (OpenMP,CUDA)/MPI approach to optimize memory usage and to take...
  • PolyTop++

  • Referenced in 4 articles [sw17735]
  • structural topology optimization using polygonal meshes. It consists of a C++ and CUDA (a parallel ... code by Talischi et al. (Struct Multidiscip Optim 45(3):329–357 2012b). PolyTop ... programming language and the CUDA model to design algorithms with efficient memory management, capable...
  • Halide

  • Referenced in 7 articles [sw22108]
  • Halide: a language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines ... Compiler targets include x86/SSE, ARM v7/NEON, CUDA, and OpenCL...
  • cuFFT

  • Referenced in 24 articles [sw11258]
  • This document describes cuFFT, the NVIDIA® CUDA™ Fast Fourier Transform (FFT) product. It consists ... parallelism of the GPU in a highly optimized and tested FFT library. The cuFFT product...
  • KBLAS

  • Referenced in 4 articles [sw17481]
  • provides optimized kernels for a subset of Level 2 BLAS functionalities on CUDA-enabled GPUs ... overhead of memory accesses, a double-buffering optimization technique is employed to overlap data motion...
  • cuHE

  • Referenced in 1 article [sw14879]
  • Homomorphic Encryption Accelerator Library. We introduce a CUDA GPU library to accelerate evaluations with homomorphic ... polynomial rings enabled with a number of optimizations including algebraic techniques for efficient evaluation, memory ... thread scheduling and low level CUDA hand-tuned assembly optimizations to take full advantage ... compare the performance of the proposed CUDA library we implemented two applications: the Prince block...
  • ginSODA

  • Referenced in 1 article [sw37678]
  • hashing, ginSODA automatically builds highly optimized binaries for the CUDA architecture, preventing code re-compilation...
  • NMF-mGPU

  • Referenced in 1 article [sw26273]
  • Unified Device Architecture) framework for GPU Computing. CUDA represents a GPU device as a programmable ... mGPU has been explicitly optimized for the different existing CUDA architectures. Finally, NMF-mGPU also...
  • Kokkos

  • Referenced in 28 articles [sw20455]
  • neighbor lists. The layout is chosen to optimize performance on different platforms. Again this functionality ... These are OpenMP (for many-core CPUs), Cuda (for NVIDIA GPUs), and OpenMP (for Intel...
  • trng

  • Referenced in 10 articles [sw07529]
  • easy to use and has been speed optimized. Its implementation does not depend ... environment, e.g. Message Passing Standard, OpenMP or CUDA. All generators, that are implemented by TRNG...
  • CP2K

  • Referenced in 8 articles [sw15391]
  • level spectroscopy, energy minimization, and transition state optimization using NEB or dimer method. (Detailed overview ... combination of multi-threading, MPI, and CUDA. It is freely available under the GPL license...