I will compile and optimize mediapipe for your arm device with GPU acceleration

Richter

compile and optimize mediapipe for your arm device with GPU acceleration

Full Screen

About this gig

MediaPipe doesn't ship ARM64 wheels. I build them with GPU acceleration.

I compile from Bazel source, patched for ARM Mali GPU with EGL/GBM headless support. You get a pip-installable .whl with GPU delegate working no X11, no display server, no Docker GPU headaches.

What you get:

Custom .whl for your ARM board + Python + MediaPipe version

GPU delegate via EGL GBM (truly headless)

Install script + verification test

Benchmark report (CPU vs GPU, latency + throughput)

Verified platforms:

RK3576 (Mali-G52) primary dev board

RK3588 (Mali-G610)

Raspberry Pi 5 (VideoCore VII)

Any ARM64 Linux with Mali/VideoCore GPU + DDK

Benchmark: https://asciinema.org/a/Mv4LEGvaroBSs6oJ

Why this matters:

Stock: CPU-only, 100+ms/frame on ARM

My build: GPU-accelerated, 44ms/frame (2.3x faster)

Headless: Docker, CI/CD, server rack

No NPU SDK needed standard GPU drivers only

What I need:

Board model + OS (Ubuntu, Debian, Yocto)

Python version (3.10/3.11/3.12)

Modules: Pose, Face, Hand, Holistic, or all

Contact me before ordering if your setup is unusual I'll confirm compatibility.

Model expertise
- Custom model development
Industry
- Transportation & automotive
Programming language
- C
- C++
- Python
Language
- Chinese (Simplified)
- English
- German
Technical expertise
- Machine learning (Supervised, Unsupervised, Reinforcement)
- Deep learning (Neural networks, GANs)
- Computer Vision (Object detection, Image recognition)
- Algorithm development and optimization

Get to know Richter

Richter

4.8(4)

FromChina
Member sinceOct 2024
Last delivery1 year
Languages
English, Chinese, German

I build computer vision systems that ship — on NVIDIA CUDA servers and ARM edge. Not demos. Production. 6 projects deployed in 12 months: YOLO detection + tracking on CUDA and NPU (17x speedup), multi-camera RTSP pipelines with FFmpeg hardware decoding, MediaPipe GPU compiled from source for ARM Mali (2.3x faster, headless), PyTorch custom model training, and rPPG contactless vital signs from video. Stack: Python, C++, PyTorch, OpenCV, CUDA, ONNX, YOLO, Docker. GPUs: RTX 4060 Ti, Hailo-8L NPU, Mali-G52. 3600+ lines in a real school. 20K+ lines in a shipping edge AI product.

My Portfolio

FAQ

Q: Why can't I just pip install mediapipe on my ARM board?

A: Google only publishes x86_64 wheels. ARM64/aarch64 has no official wheel. You must compile from source using Bazel, which requires ~30GB build space and 1-2 hours. I've already solved the hard parts (EGL/GBM patching, Bazel config for ARM, GPU driver linking).

Q: What's the difference between CPU and GPU build?

A: CPU build uses xnnpack for inference — ~100ms per frame on RK3576. GPU build uses Mali GPU via EGL/OpenGLES — ~44ms per frame. Same accuracy, same model, 2.3x faster. GPU build also frees the CPU for other tasks (video decoding, API serving).

Q: Do you provide the source patches?

A: Premium package includes all Bazel BUILD files, CMake patches, and EGL/GBM modifications as a patch set you can reapply to future MediaPipe versions. Basic and Standard include only the compiled wheel.

Q: Will it work in Docker?

A: Yes. The GPU build links against /dev/dri/renderD128 (DRM render node), which Docker can expose via --device. I provide a tested Dockerfile in Standard and Premium packages.

Q: How long does the build take on my hardware?

A: Compilation happens on MY hardware (I have the toolchain ready). You receive the finished .whl file. Installation on your device takes ~30 seconds via pip install.

Need to get creative?

Looking for tech experts?

Ready to reach and convert consumers?

Looking for writers?

Get your business running smarter

I will compile and optimize mediapipe for your arm device with GPU acceleration

About this gig

Get to know Richter

My Portfolio

FAQ

Related tags