-
CUDA
- Referenced in 1325 articles
[sw03258]
- developers building GPU-accelerated applications. The CUDA Toolkit includes a compiler for NVIDIA GPUs, math ... libraries, and tools for debugging and optimizing the performance of your applications. You’ll also...
-
Mint
- Referenced in 6 articles
[sw12752]
- enjoy the performance benefits of hand coded CUDA without becoming entangled in the details. Mint ... source-to-source translator that generates optimized CUDA C from traditional C source. The translator ... deliver performance competitive with painstakingly hand-optimized CUDA. We show that ... performance obtained from aggressively optimized CUDA on the 200 series NVIDIA GPUs. Our optimizations target...
-
clSpMV
- Referenced in 10 articles
[sw12638]
- higher performance compared to the vendor optimized CUDA implementation of the proposed hybrid sparse format...
-
Copperhead
- Referenced in 5 articles
[sw30955]
- times fewer lines of code than CUDA, and the compiler generates efficient code, yielding ... performance of hand-crafted, well optimized CUDA code...
-
BSGP
- Referenced in 5 articles
[sw08995]
- similar or better performance than well-optimized CUDA programs, while the source code complexity...
-
SPHEROS
- Referenced in 4 articles
[sw29066]
- CUDA featuring the Thrust library,and optimized CUDA kernels for both compute-bound and memory...
-
Gaalop
- Referenced in 27 articles
[sw00313]
- algebra. We present Gaalop (Geometric algebra algorithms optimizer), our tool for high-performance computing based ... FPGA (field-programmable gate arrays) or the CUDA technology from NVIDIA. We describe the concepts ... future perspectives of Gaalop dealing with optimized software implementations, hardware implementations, and mixed solutions...
-
QUDA
- Referenced in 11 articles
[sw14040]
- GPUs), leveraging NVIDIA’s CUDA platform. The current release includes optimized Dirac operators and solvers...
-
WIGEON
- Referenced in 4 articles
[sw24823]
- series of techniques to optimize the CUDA implementation, especially in the memory access pattern...
-
QMCPACK
- Referenced in 7 articles
[sw12953]
- utilizes a fully hybrid (OpenMP,CUDA)/MPI approach to optimize memory usage and to take...
-
PolyTop++
- Referenced in 4 articles
[sw17735]
- structural topology optimization using polygonal meshes. It consists of a C++ and CUDA (a parallel ... code by Talischi et al. (Struct Multidiscip Optim 45(3):329–357 2012b). PolyTop ... programming language and the CUDA model to design algorithms with efficient memory management, capable...
-
Halide
- Referenced in 7 articles
[sw22108]
- Halide: a language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines ... Compiler targets include x86/SSE, ARM v7/NEON, CUDA, and OpenCL...
-
cuFFT
- Referenced in 24 articles
[sw11258]
- This document describes cuFFT, the NVIDIA® CUDA™ Fast Fourier Transform (FFT) product. It consists ... parallelism of the GPU in a highly optimized and tested FFT library. The cuFFT product...
-
KBLAS
- Referenced in 4 articles
[sw17481]
- provides optimized kernels for a subset of Level 2 BLAS functionalities on CUDA-enabled GPUs ... overhead of memory accesses, a double-buffering optimization technique is employed to overlap data motion...
-
cuHE
- Referenced in 1 article
[sw14879]
- Homomorphic Encryption Accelerator Library. We introduce a CUDA GPU library to accelerate evaluations with homomorphic ... polynomial rings enabled with a number of optimizations including algebraic techniques for efficient evaluation, memory ... thread scheduling and low level CUDA hand-tuned assembly optimizations to take full advantage ... compare the performance of the proposed CUDA library we implemented two applications: the Prince block...
-
ginSODA
- Referenced in 1 article
[sw37678]
- hashing, ginSODA automatically builds highly optimized binaries for the CUDA architecture, preventing code re-compilation...
-
NMF-mGPU
- Referenced in 1 article
[sw26273]
- Unified Device Architecture) framework for GPU Computing. CUDA represents a GPU device as a programmable ... mGPU has been explicitly optimized for the different existing CUDA architectures. Finally, NMF-mGPU also...
-
Kokkos
- Referenced in 28 articles
[sw20455]
- neighbor lists. The layout is chosen to optimize performance on different platforms. Again this functionality ... These are OpenMP (for many-core CPUs), Cuda (for NVIDIA GPUs), and OpenMP (for Intel...
-
trng
- Referenced in 10 articles
[sw07529]
- easy to use and has been speed optimized. Its implementation does not depend ... environment, e.g. Message Passing Standard, OpenMP or CUDA. All generators, that are implemented by TRNG...
-
CP2K
- Referenced in 8 articles
[sw15391]
- level spectroscopy, energy minimization, and transition state optimization using NEB or dimer method. (Detailed overview ... combination of multi-threading, MPI, and CUDA. It is freely available under the GPL license...