• CUDA

  • Referenced in 1058 articles [sw03258]
  • environment for C and C++ developers building GPU-accelerated applications. The CUDA Toolkit includes...
  • OpenGL

  • Referenced in 124 articles [sw06740]
  • interact with a Graphics processing unit (GPU), to achieve hardware-accelerated rendering. OpenGL was developed...
  • CUBLAS

  • Referenced in 56 articles [sw06880]
  • computational resources of NVIDIA Graphics Processing Unit (GPU), but does not auto-parallelize across multiple ... required matrices and vectors in the GPU memory space, fill them with data, call ... then upload the results from the GPU memory space back to the host. The CUBLAS ... writing and retrieving data from the GPU...
  • Nektar++

  • Referenced in 43 articles [sw11964]
  • elliptic finite element method to the GPU and perform a case study for a particular ... This study provides comparison between CPU and GPU implementations of the method as well ... method is well-suited for GPU implementation, obtaining total speedups on the order...
  • MAGMA

  • Referenced in 44 articles [sw12741]
  • heterogeneous/hybrid architectures, starting with current ”Multicore+GPU” systems. The MAGMA research is based ... algorithms and frameworks for hybrid manycore and GPU systems that can enable applications to fully...
  • SPIRAL

  • Referenced in 41 articles [sw00903]
  • variety of platforms including SSE, multicore, Cell, GPU, distributed memory parallel processors, and FPGA...
  • Theano

  • Referenced in 38 articles [sw05894]
  • integration with numpy, transparent use of a GPU, efficient symbolic differentiation, speed and stability optimizations...
  • GPGPU

  • Referenced in 20 articles [sw09105]
  • computation on graphics hardware. The graphics processor (GPU) on today’s commodity video cards ... review the tools, perils, and strategies in GPU programming. We present analysis of GPU performance...
  • StarPU

  • Referenced in 31 articles [sw14216]
  • opteron processors. Other architectures, featuring GPU accelerators, are expected to appear in the near future...
  • MUMMER

  • Referenced in 31 articles [sw17256]
  • Biology paper. We have also developed a GPU accelerated version of MUMmer called MUMmerGPU...
  • GTEngine

  • Referenced in 30 articles [sw24041]
  • supports high-performance computing using general purpose GPU programming (GPGPU). SIMD code is also available...
  • cuFFT

  • Referenced in 21 articles [sw11258]
  • interface for computing FFTs on an NVIDIA GPU, which allows users to quickly leverage ... floating-point power and parallelism of the GPU in a highly optimized and tested...
  • GPUTeraSort

  • Referenced in 14 articles [sw12706]
  • algorithm uses the data parallelism within a GPU along with task parallelism by scheduling some ... intensive and compute-intensive threads on the GPU. Our new sorting architecture provides multiple memory ... fast and dedicated memory interface on the GPU along with the main memory interface ... communication bandwidth between the CPU and the GPU, and reduces the data communication between...
  • Keras

  • Referenced in 26 articles [sw15491]
  • output training). runs seamlessly on CPU and GPU. Read the documentation at Keras.io. Keras...
  • cuRAND

  • Referenced in 16 articles [sw11536]
  • Number Generation library (cuRAND) delivers high performance GPU-accelerated random number generation (RNG). The cuRAND ... within your CUDA functions/kernels running on the GPU. A variety of RNG algorithms and distribution...
  • CULA

  • Referenced in 9 articles [sw12745]
  • CULA: Hybrid GPU accelerated linear algebra routines. The modern graphics processing unit (GPU) found ... processing power of the GPU. Our work is on CULA, a GPU accelerated implementation ... like system solution and least squares. The GPU execution model featured by NVIDIA GPUs based ... linear algebra map extremely well to the GPU and others map poorly. CPUs...
  • VTune

  • Referenced in 18 articles [sw08852]
  • rich set of data to tune CPU & GPU compute performance, multi-core scalability, bandwidth...
  • PyTorch

  • Referenced in 18 articles [sw20939]
  • Dynamic neural networks in Python with strong GPU acceleration. PyTorch is a deep learning framework...
  • BADMM

  • Referenced in 16 articles [sw20288]
  • massive parallelism and can easily run on GPU. BADMM is several times faster than highly...
  • GPUVerify

  • Referenced in 8 articles [sw11260]
  • step semantics for analysis and verification of GPU kernels. We study semantics of GPU kernels ... novel lock-step execution semantics for GPU kernels represented by arbitrary reducible control flow graphs ... result induces a method that allows GPU kernels with arbitrary reducible control flow graphs ... open source and commercial GPU kernels. Among these kernels, 42 exhibit unstructured control flow which...