RunMat
GitHub
Back to Blog

How to Use GPU in MATLAB Without CUDA: Apple Silicon, NVIDIA, AMD & Intel

Published01/28/2026
Updated 04/14/2026
10 min read

MATLAB has exactly one path to GPU acceleration: the Parallel Computing Toolbox, a paid add-on. It comes with two requirements that often get conflated. The GPU has to be NVIDIA (the hardware), and the system has to have CUDA installed (the software), NVIDIA's proprietary compute runtime that the toolbox calls into. They aren't the same thing, but the toolbox needs both. If your machine has an Apple Silicon, AMD, or Intel GPU, neither requirement is met and gpuArray returns "No supported GPU device was found." This guide covers what actually works on each vendor, the gpuArray setup for NVIDIA users, and how to get GPU acceleration without CUDA.

TL;DR

  • Apple Silicon Mac (M1/M2/M3/M4): MATLAB's gpuArray returns "No supported GPU device was found" because CUDA doesn't run on Apple GPUs. RunMat uses Metal and runs the same MATLAB-syntax code on any M-series Mac.
  • AMD (Radeon, Instinct) or Intel (Arc, integrated): Same story. The Parallel Computing Toolbox only targets NVIDIA CUDA. RunMat uses Vulkan (Linux) and DirectX 12 (Windows) so these GPUs work without CUDA.
  • NVIDIA without the Parallel Computing Toolbox: You don't need the paid add-on to use your NVIDIA GPU. RunMat reaches NVIDIA through Vulkan, no CUDA toolkit required.
  • NVIDIA with the toolbox: The standard gpuArray pattern works and is well-documented by MathWorks. The short summary is further down.
  • FP32 is the safe first choice for any GPU path. Apple's Metal doesn't support FP64 at all.

What GPU do you have?

MATLAB's GPU support and the path you'll take depend almost entirely on what's inside your machine. Start here:

GPU vendorMATLAB (Parallel Computing Toolbox)Without CUDA (RunMat)
NVIDIA (GeForce, RTX, Tesla)Yes, with toolbox license ($)Yes, via Vulkan
Apple Silicon (M1/M2/M3/M4)NoYes, via Metal
AMD (Radeon, Instinct)NoYes, via Vulkan / DirectX 12
Intel (Arc, integrated)NoYes, via Vulkan / DirectX 12

The Parallel Computing Toolbox needs both NVIDIA hardware and CUDA. Without an NVIDIA card and a valid CUDA driver/toolkit, gpuArray and every GPU-enabled function are unavailable, regardless of how capable your GPU is.

Jump to the section that matches your hardware:


MATLAB GPU on Apple Silicon (M1, M2, M3, M4)

Every Mac sold since late 2020 has an Apple Silicon chip with a capable GPU that MATLAB can't touch. Apple's M-series GPUs deliver serious single-precision throughput:

ChipFP32 throughput (approx)
M1~2.6 TFLOPS
M2~3.6 TFLOPS
M3~4.1 TFLOPS
M3 Max (40-core GPU)~14 TFLOPS
M4~4.6 TFLOPS
M4 Max (40-core GPU)~18 TFLOPS

Numbers are public estimates; exact figures vary with GPU core count and sustained clocks. Even the base chips sit in the same league as a mid-range discrete GPU, idle in MATLAB's eyes.

Why MATLAB GPU doesn't work on Mac

MATLAB's GPU path is built on CUDA. NVIDIA stopped shipping Mac-compatible GPUs and drivers in 2016, and Apple moved the entire Mac line to its own ARM-based chips in 2020. There is no CUDA runtime for Apple Silicon and no indication either company plans to change that.

If you open MATLAB on an M-series Mac and try the GPU path, you'll get:

>> gpuDeviceCount
ans = 0

>> gpuArray(rand(1000))
Error using gpuArray
No supported GPU device was found on this computer.

MathWorks confirms this in their GPU support by release documentation: only NVIDIA GPUs with CUDA compute capability 3.5+ are supported.

What Mac users actually do

The workarounds are unappealing. MATLAB Online doesn't include GPU support in the standard tier, so you'd need a cloud instance with an NVIDIA GPU attached, which means paying for both the MATLAB license and the cloud compute. Remote desktop into a Windows or Linux box with an NVIDIA card works but adds latency and hardware cost. Most Mac-based MATLAB users just run on CPU and accept slower runtimes on large array workloads.

GPU acceleration on Mac with RunMat

RunMat uses Apple's Metal API directly, so M1/M2/M3/M4 GPUs are addressable with the same MATLAB-syntax code you'd write for CPU. The computation that fails with gpuArray on Mac runs with GPU acceleration in RunMat without any changes:

rng(0);
x = rand(10_000_000, 1, 'single');
y = sin(x) .* x + 0.5;
m = mean(y, 'all');
fprintf("m = %.6f\n", double(m));

What's missing is gpuArray, gather, CUDA, and any vendor check. RunMat's runtime inspects the computation shape and decides per-operation whether to run on CPU or GPU. Large, contiguous elementwise chains get fused into a single GPU kernel. Small or irregular work stays on CPU.

Where M-series GPUs shine

The workloads MATLAB users most often wish they could GPU-accelerate on Mac are exactly the ones Metal handles well: image and signal processing (big FFTs, 2D convolutions), large elementwise math pipelines (physical simulations, Monte Carlo), and dense linear algebra on single-precision data. The unified memory architecture on Apple Silicon also means the CPU-GPU "transfer" is effectively a memory fence, not a DMA copy, which softens one of the traditional GPU penalties covered further down.

One thing to know about precision

Metal doesn't implement FP64. If your code explicitly depends on double precision, parts of the pipeline will run on CPU in FP64 and only the FP32-safe portions will reach the GPU. For most scientific and engineering workloads FP32 is fine; for legacy numerics that require FP64 (certain accumulators, ill-conditioned solves), check whether reformulating to mixed precision is acceptable before expecting a GPU speedup.

Trying it

You can run the code block above in RunMat without installing anything at runmat.com/sandbox (WebGPU). For native Metal on your own Mac, install RunMat and run your .m file with runmat run your_script.m.


MATLAB GPU on AMD and Intel

AMD and Intel GPUs are in the same position as Apple Silicon. MATLAB's Parallel Computing Toolbox is NVIDIA-only, and neither AMD's ROCm stack nor Intel's oneAPI stack plugs in as a substitute. "ROCm for MATLAB" and "oneAPI for MATLAB" are frequent searches with no official answer: the toolbox simply doesn't have a backend for either.

On a system with only an AMD or Intel GPU, gpuDeviceCount returns 0 and gpuArray errors with "No supported GPU device was found," identical to the Apple Silicon case.

What each vendor brings

  • AMD Radeon RX (7900 XTX, 9070, etc.): strong FP32 throughput, great value per TFLOP, common on Linux workstations. Unused by MATLAB.
  • AMD Instinct (MI250, MI300): datacenter-grade FP32 and FP64. MATLAB sees none of it.
  • Intel Arc (A770, A750, B580): newer discrete GPUs aimed at compute alongside gaming. Same story.
  • Intel integrated (Iris Xe, UHD): not performance monsters, but capable of meaningful speedups on elementwise pipelines where array size is big enough. MATLAB ignores them completely.

GPU acceleration without CUDA

RunMat reaches these GPUs through the graphics APIs they already support: Vulkan on Linux (AMD and Intel) and DirectX 12 on Windows (AMD and Intel). No CUDA, no ROCm install, no oneAPI toolchain. The same MATLAB-syntax code runs unchanged:

rng(0);
x = rand(10_000_000, 1, 'single');
y = sin(x) .* x + 0.5;
m = mean(y, 'all');
fprintf("m = %.6f\n", double(m));

If you have multiple GPUs (e.g. an integrated Intel plus a discrete AMD on Linux), set RUNMAT_ACCEL_WGPU_POWER=high to prefer the discrete card or low to favour integrated. The default is auto, which lets wgpu pick based on the system's own hint.

The install link covers Windows and Linux builds. If you just want to confirm things work before installing, the browser sandbox uses WebGPU, which dispatches to your AMD or Intel GPU through the same underlying drivers.


If you have NVIDIA + the Parallel Computing Toolbox

The official gpuArray path works and is well-documented. You need an NVIDIA GPU with CUDA compute capability 3.5 or higher (most cards from 2012 onward, full list on NVIDIA's CUDA GPUs page), a compatible CUDA driver, and a Parallel Computing Toolbox license. Confirm MATLAB sees the GPU with gpuDeviceCount and gpuDevice. If either misbehaves, the usual culprits are a driver/MATLAB version mismatch, a missing toolbox license, or integrated graphics showing up instead of the discrete card.

The short form of the gpuArray workflow is upload-once, compute, gather-once:

rng(0);
x = gpuArray.rand(10_000_000, 1, 'single');
y = sin(x) .* x + 0.5;
m = mean(y, 'all');
fprintf("m = %.6f\n", gather(m));

For the full GPU-enabled function list, driver version matrix, and per-function examples, MathWorks maintains the authoritative references: Run Built-In Functions on a GPU and GPU Support by Release.


Two GPU traps that apply to any vendor

Whether you're using gpuArray on NVIDIA, Metal on Mac, or Vulkan/DX12 on AMD or Intel, the same two patterns cause most "my GPU is slower than my CPU" surprises.

The first is running on arrays that are too small. GPU kernel launches have fixed overhead. A million elements is usually enough to see a speedup on elementwise math; 10K usually isn't. Code that runs a loop over thousands of small arrays almost always loses to the CPU. The fix is to batch: replace the loop with one larger array and run the computation once.

% Overhead-heavy shape
acc = single(0);
for i = 1:2000
    x = rand(4096, 1, 'single');
    acc = acc + sum(sin(x) .* x + 0.5, 'all');
end
% Better shape for any GPU
X = rand(4096, 2000, 'single');
acc = sum(sin(X) .* X + 0.5, 'all');

The second is hidden transfers and syncs. Every gather, every fprintf or disp on a GPU value, every plot inside a hot loop forces a round-trip to the CPU. Keep inspection outside your timed region, and gather exactly once when you actually need the result on the host.

Precision choice, kernel fusion, and memory layout are refinements on those two ideas. For the MATLAB-specific version see MathWorks' Measure and Improve GPU Performance.


GPU acceleration without CUDA: how RunMat works

MATLAB's GPU path needs three things you may not have: CUDA, an NVIDIA GPU, and the Parallel Computing Toolbox. RunMat drops all three. It runs MATLAB-syntax .m files on Apple Silicon, AMD and Intel, and NVIDIA through a single runtime built on wgpu (the WebGPU standard), which dispatches to Metal on macOS, DirectX 12 on Windows, Vulkan on Linux, and WebGPU in the browser. No CUDA toolkit, no ROCm, no vendor-specific annotations in your code.

The runtime inspects each computation (array sizes, operation types, data dependencies) and decides per-operation whether to run on CPU or GPU. Large, contiguous elementwise chains get fused into a single GPU kernel so one launch handles what would otherwise be many. Small or irregular work stays on CPU. Your .m file doesn't change.

Correctness guarantees

Speed only matters if the numbers are right. Every GPU-accelerated builtin in RunMat is parity-tested against its CPU reference on every merge: the GPU kernel must reproduce the CPU result within a documented tolerance (1e-9 for f64, 1e-3 for f32) or CI fails the build. The parity tests are cargo test files in the public repository, so you can reproduce them without a MATLAB license. The full coverage table and per-builtin test links live on the Correctness & Trust page.

GPU-resident plotting

The "avoid transfers" principle extends to visualization. RunMat's plotting renders directly from GPU memory with zero copy between the computation and the chart, which matters most on pipelines that compute millions of points per frame. See the MATLAB plotting guide for runnable examples.

Where to run it

EnvironmentGPU pathBest for
Browser (runmat.com/sandbox)WebGPUTrying code with no install
CLI (runmat run script.m)Metal / DX12 / VulkanScripts, benchmarks, CI, max performance
Desktop app (coming soon)Metal / DX12 / VulkanFull IDE + full GPU headroom

For benchmarks against MATLAB, PyTorch, and NumPy, see Introducing RunMat Accelerate.


FAQ: common GPU + MATLAB questions

Can I use MATLAB GPU on a Mac?

No. MATLAB's gpuArray requires NVIDIA CUDA, which does not exist on Apple Silicon. If you try gpuArray on an M-series Mac, you'll get "No supported GPU device was found." RunMat uses Metal on macOS, so M1/M2/M3/M4 Macs get GPU acceleration with the same code. See MATLAB GPU on Apple Silicon.

Does MATLAB support M1/M2/M3/M4 GPU acceleration?

No. The Parallel Computing Toolbox is built on NVIDIA CUDA and does not support Apple's Metal API. M-series GPUs are powerful but MATLAB cannot use them. RunMat targets Metal directly. See MATLAB GPU on Apple Silicon.

Can MATLAB use an AMD GPU?

No. MATLAB requires NVIDIA CUDA. AMD GPUs (Radeon, Instinct) are not supported by the Parallel Computing Toolbox. RunMat uses Vulkan (Linux) and DirectX 12 (Windows) for AMD. See MATLAB GPU on AMD and Intel.

Can MATLAB use an Intel GPU?

No. Intel GPUs (integrated or Arc) are not supported by the Parallel Computing Toolbox. RunMat supports Intel GPUs through Vulkan and DirectX 12. See MATLAB GPU on AMD and Intel.

Do I need CUDA even if I have an NVIDIA GPU?

For MATLAB, yes. The Parallel Computing Toolbox requires CUDA. With RunMat, no: NVIDIA GPUs are accessed through Vulkan, so you get GPU acceleration without installing CUDA drivers or buying the toolbox.

Can I use GPU without the Parallel Computing Toolbox?

In MATLAB, no. The toolbox is required and is a paid add-on. RunMat includes GPU acceleration by default, free, even on NVIDIA hardware. See GPU acceleration without CUDA and free MATLAB alternatives.

Is there a free way to get GPU acceleration with MATLAB code?

MATLAB's GPU path requires the Parallel Computing Toolbox ($), CUDA, and an NVIDIA GPU. RunMat is free and accelerates on any vendor without any of those. See GPU acceleration without CUDA.

What GPU do I need for MATLAB?

NVIDIA only, with CUDA compute capability 3.5+. See NVIDIA + the Parallel Computing Toolbox and NVIDIA's CUDA GPUs page. If you want GPU acceleration on non-NVIDIA hardware or without the toolbox, RunMat works on any modern GPU.

Do I need to install CUDA or vendor drivers to use RunMat with an AMD or Intel GPU?

No. RunMat reaches AMD and Intel GPUs through Vulkan (Linux) and DirectX 12 (Windows), which are already present on most systems. There's no CUDA, no ROCm, and no oneAPI toolchain to install.

How do I try GPU acceleration on my Mac without installing anything?

Open runmat.com/sandbox in a WebGPU-capable browser (Safari 18+, Chrome 113+, Edge 113+, Firefox 139+) and paste your .m code. The sandbox dispatches to Metal on macOS automatically. For the native CLI on an M-series Mac, install RunMat and run runmat run your_script.m.

Why is my GPU slower than my CPU?

Usually the arrays are too small, you're doing many tiny steps, or you're transferring often (e.g. gather or printing in a loop). Batch into larger arrays and call gather only once at the end. See Two GPU traps that apply to any vendor.

Should I use single or double precision on GPU?

Use what your numerics need. Single (FP32) is faster, uses half the memory, and is the right default for most workloads. Double precision is still available in MATLAB on NVIDIA (performance then depends on the GPU's FP64 capability) and in RunMat on NVIDIA/AMD/Intel. Apple's Metal does not support FP64 at all, so on Mac FP32 is the only GPU option.

What's the simplest rule for GPU performance?

Make the work big and contiguous, and avoid transfers. Everything else is a refinement.


Enjoyed this post? Join the newsletter

Monthly updates on RunMat, Rust internals, and performance tips.

Try RunMat for free

Open the sandbox and start running MATLAB code in seconds. No account required.