RunMat
GitHub

All functions

CategoryMath: Reduction
Auto GPU

RunMat automatically offloads this function to the GPU when it estimates a speedup, without requiring explicit gpuArray inputs.

Learn more about Auto GPU →

std — Standard deviation of scalars, vectors, matrices, or N-D tensors with MATLAB-compatible options.

std(x) measures the spread of the elements in x. By default RunMat matches MATLAB’s sample definition (dividing by n-1) and works along the first non-singleton dimension.

Syntax

S = std(A)
S = std(A, w)
S = std(A, w, dim)
S = std(A, w, "all")
S = std(A, w, vecdim)
S = std(___, nanflag)
  • A is the numeric array to summarise. Logical inputs are promoted to double; complex inputs are not supported yet.
  • w is the normalisation flag: 0 (or []) divides by n - 1 for the sample / Bessel-corrected estimator (MATLAB's default), while 1 divides by n for the population estimator. MATLAB also documents w as a weight vector; weighted standard deviations are reported as not-yet-implemented in RunMat.
  • dim picks the dimension to reduce (default: the first non-singleton dimension). Use "all" to collapse every dimension into a scalar, or vecdim (a vector such as [1 3]) to reduce several axes in one call.
  • nanflag is an optional 'omitnan' or 'includenan' token (default 'includenan') controlling how NaN values are treated.
  • Returns S, the standard deviation, with the reduced dimension collapsed to length 1.

How std works

  • std(X) on an m × n matrix returns a 1 × n row vector with the sample standard deviation of each column.
  • std(X, 1) switches to population normalisation (n in the denominator). Use std(X, 0) or std(X, []) to keep the default sample behaviour.
  • std(X, flag, dim) lets you pick both the normalisation (flag = 0 sample, 1 population, or []) and the dimension to reduce. std(X, flag, 'all') collapses every dimension, while std(X, flag, vecdim) accepts a dimension vector such as [1 3] and reduces all listed axes in a single call. Multi-axis reductions execute on the host today when the active GPU provider cannot fuse them.
  • Strings like 'omitnan' and 'includenan' decide whether NaN values are skipped or propagated.
  • Optional out-type arguments ('double', 'default', 'native', or 'like', prototype) mirror MATLAB behaviour. 'native' rounds scalar integer results back to their original class; 'like' mirrors both the numeric class and device residency of prototype (complex prototypes yield complex outputs with zero imaginary parts).
  • Logical inputs are promoted to double precision before reduction so that results follow MATLAB’s numeric rules.
  • Empty slices return NaN with MATLAB-compatible shapes. Scalars return 0, regardless of the normalisation mode.
  • Dimensions greater than ndims(X) leave the input untouched.
  • Weighted standard deviations (flag as a vector) are not implemented yet; RunMat reports a descriptive error when they are requested. Complex tensors are not currently supported; convert them to real magnitudes manually before calling std.

How RunMat runs std on the GPU

When RunMat Accelerate is active, device-resident tensors remain on the GPU whenever the provider implements the relevant hooks. Providers that expose reduce_std_dim/reduce_std execute the reduction in-place on the device; the default WGPU backend currently supports two-dimensional inputs, single-axis reductions, and 'includenan' only. Whenever 'omitnan', multi-axis reductions, or unsupported shapes are requested, RunMat transparently gathers the data to the host, computes the result there, and then applies the requested output template ('native', 'like') before returning.

GPU memory and residency

Usually you do not need to call gpuArray manually. The fusion planner keeps tensors on the GPU across fused expressions and gathers them only when necessary. For explicit control or MATLAB compatibility, you can still call gpuArray/gather yourself.

Examples

Sample standard deviation of a vector

x = [1 2 3 4 5];
s = std(x);                 % uses flag = 0 (sample) by default

Expected output:

s = 1.5811

Population standard deviation of each column

A = [1 3 5; 2 4 6];
spop = std(A, 1);           % divide by n instead of n-1

Expected output:

spop = [0.5 0.5 0.5]

Collapsing every dimension at once

B = reshape(1:12, [3 4]);
overall = std(B, 0, 'all')

Expected output:

overall = 3.6056

Reducing across multiple dimensions

C = cat(3, [1 2; 3 4], [5 6; 7 8]);
sliceStd = std(C, [], [1 3]);   % keep columns, reduce rows & pages

Expected output:

sliceStd = [2.5820 2.5820]

Ignoring NaN values

D = [1 NaN 3; 2 4 NaN];
rowStd = std(D, 0, 2, 'omitnan')

Expected output:

rowStd = [1.4142; 1.4142]

Matching a prototype using 'like'

proto = gpuArray(single(42));
G = gpuArray(rand(1024, 512));
spread = std(G, 1, 'all', 'like', proto);
answer = gather(spread)

Preserving default behaviour with an empty normalisation flag

C = [1 2; 3 4];
rowStd = std(C, [], 2)

Expected output:

rowStd = [0.7071; 0.7071]

How RunMat validates std

std uses the shared reduction infrastructure in runmat-runtime, which implements a two-pass Welford-style algorithm on the CPU and a matching reduction kernel on the GPU. The reduction-parity test suite runs the same inputs through both paths and asserts agreement within tolerance on every merge.

See Correctness & Trust for the full methodology and coverage table.

FAQ

What values can I pass as the normalisation flag?

Use 0 (or []) for the sample definition, 1 for population. RunMat rejects non-scalar weight vectors and reports that weighted standard deviations are not implemented yet.

How can I collapse multiple dimensions?

Pass a vector of dimensions such as std(A, [], [1 3]). You can also use 'all' to collapse every dimension into a single scalar.

How do 'omitnan' and 'includenan' work?

'omitnan' skips NaN values; if every element in a slice is NaN the result is NaN. 'includenan' (the default) propagates a single NaN to the output slice.

What do 'native' and 'like' do?

'native' rounds scalar results back to the input’s integer class (multi-element outputs stay in double precision for now), while 'double'/'default' keep double precision. 'like', prototype mirrors both the numeric class and the device residency of prototype, including GPU tensors; complex prototypes produce complex outputs with zero imaginary parts.

What happens if I request a dimension greater than ndims(X)?

RunMat returns the input unchanged so that MATLAB-compatible code relying on that behaviour continues to work.

Are complex inputs supported?

Not yet. RunMat currently requires real inputs for std. Convert complex data to magnitude or separate real/imaginary parts before calling the builtin.

What does std compute by default?

— By default MATLAB's std returns the sample standard deviation, dividing by n - 1 (Bessel's correction). This differs from NumPy's np.std, which defaults to the population formula (dividing by n, i.e. ddof = 0). To match NumPy's default in MATLAB use std(A, 1); to match MATLAB's default in NumPy use np.std(A, ddof=1).

How do I compute population standard deviation (divide by n) in MATLAB?

— Pass 1 as the normalisation flag: std(A, 1) divides by n instead of n - 1. The same flag works with the other signatures: std(A, 1, dim), std(A, 1, 'all'), or std(A, 1, vecdim). Use std(A, 0) or std(A, []) to keep the default sample behaviour.

How do I handle NaN values when computing std?

— Add 'omitnan' as the final argument to skip missing values: std(A, 0, 1, 'omitnan') (or std(A, 'omitnan') if you accept the default normalisation and dimension). The default 'includenan' propagates a single NaN to the corresponding output slice. If an entire slice is NaN under 'omitnan', the result for that slice is NaN.

Reduction

all · any · cummax · cummin · cumprod · cumsum · cumtrapz · diff · gradient · max · mean · median · min · nnz · prod · sum · trapz · var

Elementwise

abs · angle · complex · conj · double · exp · expm1 · factorial · gamma · hypot · imag · ldivide · log · log10 · log1p · log2 · minus · nextpow2 · plus · pow2 · power · rdivide · real · sign · single · sqrt · times

Trigonometry

acos · acosh · asin · asinh · atan · atan2 · atanh · cos · cosd · cosh · deg2rad · rad2deg · sin · sind · sinh · tan · tand · tanh

Signal

blackman · conv · conv2 · deconv · filter · hamming · hann · sawtooth · sinc · square

Rounding

ceil · fix · floor · mod · rem · round

Factor

chol · eig · lu · qr · svd

Solve

cond · det · inv · linsolve · norm · pinv · rank · rcond

Fft

fft · fft2 · fftshift · ifft · ifft2 · ifftshift

Interpolation

interp1 · interp2 · pchip · ppval · spline

Ode

ode15s · ode23 · ode45

Open-source implementation

Unlike proprietary runtimes, every RunMat function is open-source. Read exactly how std works, line by line, in Rust.

About RunMat

RunMat is an open-source runtime that executes MATLAB-syntax code — faster, on any GPU, with no license required.

  • Simulations that took hours now take minutes. RunMat automatically optimizes your math for GPU execution on Apple, Nvidia, and AMD hardware. No code changes needed.
  • Start running code in seconds. Open the browser sandbox or download a single binary. No license server, no IT ticket, no setup.
  • A full development environment. GPU-accelerated 2D and 3D plotting, automatic versioning on every save, and a browser IDE you can share with a link.

Getting started · Benchmarks · Pricing

Try RunMat for free

Write code or describe what you want to compute. The sandbox is free, no account required.