• CULA

  • Referenced in 11 articles [sw12745]
  • CULA: Hybrid GPU accelerated linear algebra routines. The modern graphics processing unit (GPU) found ... processing power of the GPU. Our work is on CULA, a GPU accelerated implementation ... like system solution and least squares. The GPU execution model featured by NVIDIA GPUs based ... linear algebra map extremely well to the GPU and others map poorly. CPUs...
  • VTune

  • Referenced in 19 articles [sw08852]
  • rich set of data to tune CPU & GPU compute performance, multi-core scalability, bandwidth...
  • GPUVerify

  • Referenced in 10 articles [sw11260]
  • step semantics for analysis and verification of GPU kernels. We study semantics of GPU kernels ... novel lock-step execution semantics for GPU kernels represented by arbitrary reducible control flow graphs ... result induces a method that allows GPU kernels with arbitrary reducible control flow graphs ... open source and commercial GPU kernels. Among these kernels, 42 exhibit unstructured control flow which...
  • AmgX

  • Referenced in 10 articles [sw13440]
  • AmgX: a library for GPU accelerated algebraic multigrid and preconditioned iterative methods. The solution ... AmgX library, which provides drop-in GPU acceleration of distributed algebraic multigrid (AMG) and preconditioned ... imes$ speedup on a single GPU against a competitive implementation on the CPU. As will...
  • GPU Quicksort

  • Referenced in 8 articles [sw12707]
  • GPU-quicksort, a practical quicksort algorithm for graphics processors. In this article, we describe ... GPU-quicksort, an efficient quicksort algorithm suitable for highly parallel multicore graphics processors. Quicksort ... general-purpose computations on graphical processors, GPU-quicksort performs better than the fastest-known sorting...
  • GAMER

  • Referenced in 10 articles [sw10937]
  • GAMER is a GPU-accelerated Adaptive MEsh Refinement Code for astrophysical applications. Currently the code ... Hydrodynamics with self-gravity; A variety of GPU-accelerated hydrodynamic and Poisson solvers; Hybrid OpenMP/MPI/GPU...
  • KinectFusion

  • Referenced in 10 articles [sw22783]
  • KinectFusion, as well as the novel GPU-based pipeline are described in full. Uses ... shown. Novel extensions to the core GPU pipeline demonstrate object segmentation and user interaction directly...
  • GPUTop

  • Referenced in 8 articles [sw23299]
  • Linear elasticity is solved entirely on the GPU by a matrix-free conjugate gradient method ... products entirely on the graphics card. The GPU code is found to be extremely efficient ... core shared memory CPU system. CPU and GPU implementations show different performance bottlenecks. The sources...
  • SOFA

  • Referenced in 13 articles [sw07014]
  • methods, OpenGL viewing, and many other features. GPU implementations are available for some force fields...
  • ITER-REF

  • Referenced in 13 articles [sw10290]
  • Programmable Gate Arrays (FPGA), Graphical Processing Units (GPU), and the STI Cell BE processor. Results...
  • TheLMA

  • Referenced in 9 articles [sw12960]
  • TheLMA project: Multi-GPU implementation of the lattice Boltzmann method. In this paper, we describe ... implementation of a multi-graphical processing unit (GPU) fluid flow solver based on the lattice...
  • CholQR

  • Referenced in 9 articles [sw13049]
  • with a different graphics processing unit (GPU) demonstrate that the overhead of using the double ... result, with a latest NVIDIA GPU, the mixed-precision CholQR was only $1.4 imes$ slower...
  • BinaryNet

  • Referenced in 9 articles [sw35872]
  • least, we wrote a binary matrix multiplication GPU kernel with which it is possible ... times faster than with an unoptimized GPU kernel, without suffering any loss in classification accuracy...
  • Sailfish

  • Referenced in 7 articles [sw16828]
  • Sailfish: a flexible multi-GPU implementation of the lattice Boltzmann method. We present Sailfish ... CUDA/OpenCL. We take a novel approach to GPU code implementation and use run-time code ... distributed environment, as well as the GPU implementation and optimization of many different LBM models ... performance benchmarks spanning the last three NVIDIA GPU generations (Tesla, Fermi, Kepler), which we hope...
  • GPflow

  • Referenced in 12 articles [sw21518]
  • software testing and is able to exploit GPU hardware...
  • NTRUEncrypt

  • Referenced in 7 articles [sw14148]
  • implemented for the first time on a GPU using the CUDA platform. As is shown ... encryptions/decryptions in parallel. Using a modern GTX280 GPU a throughput ... than a recent AES implementation on a GPU...
  • Zippy

  • Referenced in 6 articles [sw34497]
  • Framework for Computation and Visualization on a GPU Cluster. Due to its high performance/cost ratio ... GPU cluster is an attractive platform for large scale general-purpose computation and visualization applications ... high performance general- purpose computation on GPU clusters remains a complex problem. In this paper ... solution to this problem. It abstracts the GPU cluster programming with a two-level parallelism...
  • FE-gMG

  • Referenced in 8 articles [sw10365]
  • relying on unstructured grids. We augment our GPU- and multicore-oriented implementation technique based ... average of 5 on a single GPU over a multithreaded CPU code in our benchmarks...
  • EdgeCS

  • Referenced in 11 articles [sw14134]
  • standard PC. Finally, the algorithm is GPU friendly...
  • MatConvNet

  • Referenced in 11 articles [sw15651]
  • supports efficient computation on CPU and GPU, allowing to train complex models on large datasets...