regexp — Match text with regular expressions and return indices, matches, tokens, names, or splits in MATLAB-compatible forms.
regexp(text, pattern) searches character vectors, string arrays, and cell arrays of text and can return start/end indices, matched substrings, capture tokens, token extents, named tokens, or split text. Options such as 'once' and 'emptymatch' follow MATLAB-compatible behavior.
Syntax
out = regexp(subject, pattern)
out = regexp(subject, pattern, options...)
[start,end_idx,match,tokens,names,split] = regexp(subject, pattern, options...)Inputs
| Name | Type | Required | Default | Description |
|---|---|---|---|---|
subject | Any | Yes | — | Input text (char/string/string-array/cellstr). |
pattern | StringScalar | Yes | — | Regular-expression pattern. |
options | Any | Variadic | — | Output selectors and regexp options. |
Returns
| Name | Type | Description |
|---|---|---|
out | Any | Primary regexp output (depends on options and requested output count). |
start | Any | 1-based match start indices. |
end_idx | Any | 1-based inclusive match end indices. |
match | Any | Matched substrings. |
tokens | Any | Capture-group token outputs. |
names | Any | Named capture-group outputs. |
split | Any | Split output around regex matches. |
Returned values from regexp depend on how many outputs the caller requests.
Errors
| Identifier | When | Message |
|---|---|---|
RunMat:regexp:InvalidArgument | Input/options are malformed or unsupported. | regexp: invalid argument |
RunMat:regexp:PatternInvalid | Pattern cannot be compiled as a regular expression. | regexp: invalid regular expression pattern |
RunMat:regexp:Internal | Internal regexp output assembly fails. | regexp: internal operation failed |
How regexp works
- Single character vectors and string scalars return a numeric row vector of 1-based match start indices by default.
- String arrays and cell arrays always produce cell outputs that mirror the input shape, with each cell holding the result for the corresponding element.
'match'returns matched substrings,'tokens'returns nested cells of capture-group substrings,'tokenExtents'returnsn × 2double matrices with start/end indices for each token,'names'returns scalar struct values keyed by named tokens, and'split'yields the text segments between matches.'once'stops after the first match (per element), and every requested output honours that limit.'emptymatch','remove'(default) filters zero-length matches;'emptymatch','allow'keeps them so callers can observe optional patterns.'forceCellOutput'forces cell-array containers even for scalar inputs so downstream code can rely on uniform dimensions. MATLAB-compatible'warnings','on'/'off'flags are accepted but currently informational only.'matchcase'and'ignorecase'toggle case sensitivity, while'lineanchors'(^/$) and'dotall'/'dotExceptNewline'control how.interacts with newlines, mirroring MATLAB flags.
Does RunMat run regexp on the GPU?
regexp executes entirely on the CPU and is registered as an acceleration sink. If any argument resides on the GPU, the runtime gathers it before evaluation, computes all requested outputs on the host, and returns host-side containers. Providers do not implement custom hooks for this builtin, so no GPU kernels are required or invoked.
Examples
Find all 1-based match positions in a character vector
idx = regexp('abracadabra', 'a')Expected output:
idx =
1 4 6 8 11Return matched substrings using 'match'
matches = regexp('abc123xyz', '\d+', 'match')Expected output:
matches =
1×1 cell array
{'123'}Extract capture tokens
tokens = regexp('2024-03-14', '(\d{4})-(\d{2})-(\d{2})', 'tokens');
year = tokens{1}{1};
month = tokens{1}{2};
day = tokens{1}{3}Expected output:
year =
'2024'
month =
'03'
day =
'14'Split a string array around commas
parts = regexp(["a,b,c"; "1,2,3"], ',', 'split')Expected output:
parts =
2×1 cell array
{1×3 cell}
{1×3 cell}Return only the first match with 'once'
first_idx = regexp('abababa', 'ba', 'once')Expected output:
first_idx =
2Work with named tokens
matches = regexp('X=42; Y=7;', '(?<name>[A-Z])=(?<value>\d+)', 'names');
values = cellfun(@(s) str2double(s.value), matches)Expected output:
values =
42 7Keep zero-length matches with 'emptymatch','allow'
idx = regexp('aba', 'b*', 'emptymatch', 'allow')Expected output:
idx =
1 2 3 4Using regexp with coding agents
Open a RunMat example with live inputs, then ask the agent to explain how regexp changes the result.
Run a small regexp example, explain the result, then change one input and compare the output.
FAQ
What outputs does regexp return by default?⌄
With a single output argument, regexp returns a numeric row vector of 1-based match starts. When the call site asks for multiple outputs (e.g. [startIdx, endIdx, matchStr] = regexp(...)), RunMat returns match starts, match ends, and matched substrings in that order, just like MATLAB.
How can I request tokens or splits instead of indices?⌄
Specify the desired output types as string flags, for example regexp(str, pat, 'match'), regexp(str, pat, 'tokens'), or regexp(str, pat, 'split'). Multiple flags combine, so regexp(str, pat, 'match', 'tokens') returns both outputs.
Does regexp support case-insensitive matching?⌄
Yes. Use 'ignorecase' (or call regexpi) to enable case-insensitive matching, and 'matchcase' to revert to the default case-sensitive behaviour.
How are string arrays and cell arrays handled?⌄
For string arrays and cell arrays of char vectors, every output is a cell array whose shape matches the input. Each cell contains the result for the corresponding element, which mirrors MATLAB's container semantics.
How do zero-length matches behave?⌄
By default ('emptymatch','remove'), zero-length matches are filtered out so loops do not stall. Specify 'emptymatch','allow' to keep them, matching MATLAB's 'emptymatch' flag.
Can I force cell output even for character vectors?⌄
Yes. Pass 'forceCellOutput' to force the outputs into cell arrays, which is useful when writing code that handles both scalar and array inputs uniformly.
Does regexp run on the GPU?⌄
No. RunMat executes regexp on the CPU. If inputs reside on the GPU, it gathers them first and then re-uploads any numeric outputs when beneficial, preserving residency for downstream kernels.
What happens when I ask for more outputs than I requested via flags?⌄
RunMat follows MATLAB's rules: if you do not supply explicit output flags, the default multi-output order is start indices, end indices, and matched substrings. Extra requested outputs beyond what you specified become numeric zeros.
Related Strings functions
Open-source implementation
Unlike proprietary runtimes, every RunMat function is open-source. Read exactly how regexp is executed, line by line, in Rust.
- View the source for regexp in Rust on GitHub
- Learn how the RunMat runtime works
- Found a bug? Open an issue with a minimal reproduction.
About RunMat
RunMat is an open-source runtime that executes MATLAB-syntax code blazing on any GPU. It is licensed under the Apache 2.0 license.
- RunMat automatically optimizes your math for GPU execution on Apple, Nvidia, and AMD hardware. No code changes needed. Simulations that took hours now take minutes.
- Start running code in seconds. RunMat runs in the browser, on the desktop, or from the CLI. No license server, no IT ticket.