#

gpu

Here are 3,981 public repositories matching this topic...

deepflowio / deepflow

✨ Zero-code distributed tracing and profiling, observability via eBPF 🚀

kubernetes gpu cuda wasm apm profiling distributed-tracing service-map opentelemetry llm

Updated Jun 1, 2024
Go

cjmcv / PocketAI

A Portable Toolkit for deploying Edge AI and HPC (opencl, vulkan, simd, task scheduling)

hpc gpu vulkan opencl cuda heterogeneous task-scheduling

Updated Jun 1, 2024
C

mosec

mosecorg / mosec

A high-performance ML model serving framework, offers dynamic batching and CPU/GPU pipelines to fully exploit your compute machine

python rust machine-learning deep-learning mxnet tensorflow gpu cv pytorch tts hacktoberfest model-serving nerual-network machine-learning-platform jax mlops llm llm-serving

Updated Jun 1, 2024
Python

DarwinKVM

royalgraphx / DarwinKVM

An advanced guide to run Mac OS / OS X / macOS on QEMU/KVM with libvirtd/Virt-Manager. Includes various write-ups for deep customization.

macos documentation osx amd gpu virtualization hypervisor intel machines kvm virtual hackintosh efi passthrough libvirtd opencore virtmanager

Updated Jun 1, 2024
Shell

rapidsai / cuvs

cuVS - a library for vector search and clustering on the GPU

machine-learning information-retrieval statistics clustering gpu distance cuda sparse nearest-neighbors similarity-search vector-similarity anns vector-search llm vector-store neighborhood-methods

Updated Jun 1, 2024
Cuda

rapidsai / cudf

cuDF - GPU DataFrame Library

python data-science cpp gpu arrow pydata cuda pandas data-analysis dask dataframe rapids cudf

Updated Jun 1, 2024
C++

intel / neural-speed

An innovative library for efficient LLM inference via low-bit quantization

Updated Jun 1, 2024
C++

pytorch / executorch

On-device AI across mobile, embedded and edge for PyTorch

machine-learning mobile embedded deep-learning neural-network gpu tensor

Updated Jun 1, 2024
C++

Open3D

isl-org / Open3D

Open3D: A Modern Library for 3D Data Processing

Updated Jun 1, 2024
C++

pytorch / serve

Serve, optimize and scale PyTorch models in production

docker kubernetes machine-learning cpu deep-learning metrics gpu optimization pytorch serving mlops

Updated Jun 1, 2024
Java

cupoch

neka-nat / cupoch

Robotics with GPU computing

python robotics gpu voxel cuda pathfinding point-cloud collision-detection ros registration gpgpu distance-transform odometry jetson pybind11 visual-odometry occupancy-grid-map triangle-mesh

Updated Jun 1, 2024
C++

pytorch / pytorch

Tensors and Dynamic neural networks in Python with strong GPU acceleration

python machine-learning deep-learning neural-network gpu numpy autograd tensor

Updated Jun 1, 2024
Python

cuteday / KiRaRay

KiRaRay is a simple interactive ray-tracing renderer using OptiX7/8.

cpp gpu cuda renderer raytracing optix

Updated Jun 1, 2024
C++

wgpu

gfx-rs / wgpu

A cross-platform, safe, pure-Rust graphics API.

rust opengl metal gpu vulkan d3d12 hacktoberfest webgpu

Updated Jun 1, 2024
Rust

microsoft / DeepSpeed

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

machine-learning compression deep-learning gpu inference pytorch zero data-parallelism model-parallelism mixture-of-experts pipeline-parallelism billion-parameters trillion-parameters

Updated Jun 1, 2024
Python

Dolkar / Tephra

A modern, high-performance C++17 graphics and compute library based on Vulkan

performance gpu graphics high-performance rendering vulkan gpgpu low-level graphics-library

Updated Jun 1, 2024
C++

CVCUDA / CV-CUDA

CV-CUDA™ is an open-source, GPU accelerated library for cloud-scale image processing and computer vision.

python machine-learning cloud computer-vision cpp gpu cuda image-processing nvidia bytedance cv-cuda

Updated May 31, 2024
C++

ROCm / Tensile

Stretching GPU performance for GEMMs and tensor contractions.

python machine-learning amd gpu assembly opencl dnn matrix-multiplication neural-networks gpu-acceleration blas hip gpu-computing tensors tensor-contraction gemm radeon auto-tuning

Updated May 31, 2024
Python

NVIDIA / TransformerEngine

A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper and Ada GPUs, to provide better performance with lower memory utilization in both training and inference.

python machine-learning deep-learning gpu cuda pytorch jax fp8

Updated Jun 1, 2024
Python

NVIDIA / MatX

An efficient C++17 GPU numerical computing library with Python-like syntax

hpc gpu cuda gpgpu gpu-computing

Updated May 31, 2024
C++

Improve this page

Add a description, image, and links to the gpu topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the gpu topic, visit your repo's landing page and select "manage topics."