RunMat
GitHub

Filesystem Abstraction

runmat-filesystem gives the runtime one filesystem API across native CLI execution, sandboxed hosts, WASM hosts, and the remote project filesystem. Runtime code calls the same provider-neutral functions for load, save, fopen, dir, which, plotting exports, replay artifacts, datasets, and source loading.

The provider boundary is deliberately below MATLAB builtins. Builtins should not know whether a path resolves to local disk, a browser-provided filesystem, a sandbox root, or a remote object store behind RunMat Server.

Pages

Architecture

Loading diagram...
           ┌────────────────────┐
           │ RunMat Runtime     │
           │ (wasm/native)      │
           ├────────────────────┤
           │ Virtual FS (VFS)   │  <-- filesystem abstraction
           │  • open/read/write │
           │  • metadata        │
           │  • directory ops   │
           └────────┬───────────┘
                    │
        ┌───────────┴─────────────────────────────────────────────────────┐
        │                Backends                                         │
        │                                                                 │
   ┌────┴────┐   ┌────────────┐   ┌─────────────┐   ┌─────────────────┐   │
   │ Native  │   │ Browser    │   │ Desktop     │   │ Remote          │   │
   │ Std FS  │   │ Storage    │   │ Host Proxy  │   │ Gateway         │   │
   └─────────┘   └────────────┘   └─────────────┘   └─────────────────┘   │
   - std::fs      - IndexedDB       - Native shell    - Signed URL fetch  │
   - mmap         - OPFS            - Native FS       - Chunked streaming │
   - async disk   - In-memory       - Cache layer     - Cred mgmt         │
        │                │                 │                 │            │
        └────────────────┴─────────────────┴─────────────────┴────────────┘

The active provider is stored as a global Arc<dyn FsProvider> behind a lock. set_provider installs a provider for the process, while replace_provider and with_provider_override install scoped providers for tests or host-specific execution. On WASM, relative paths are resolved against a runtime-managed current directory; on native targets, current-directory behavior delegates to std::env.

Provider Contract

FsProvider covers file handles, whole-file operations, metadata, directory traversal, path normalization, mutation, and dataset chunk endpoints.

Operation groupMethods
File handlesopen, open_async, FileHandle::{Read, Write, Seek, flush_async, sync_all_async}
Whole-file I/Oread, write, remove_file
Metadatametadata, symlink_metadata, canonicalize
Directoriesread_dir, create_dir, create_dir_all, remove_dir, remove_dir_all, rename
Permissionsset_readonly
Batched readsread_many
Dataset objectsdata_manifest_descriptor, data_chunk_upload_targets, data_upload_chunk

OpenOptions mirrors the standard read/write/append/truncate/create/create-new flags. File wraps the provider-specific FileHandle and exposes normal Read, Write, and Seek, plus async flush and sync entrypoints.

The default read_many implementation is sequential and tolerant of misses: each path returns a ReadManyEntry with bytes or an error string. WASM hosts can override readMany to batch requests in JavaScript.

Providers

ProviderTargetRole
NativeFsProviderNativeDirect std::fs wrapper for local CLI and test execution.
SandboxFsProviderNativeResolves every incoming path under a fixed root and virtualizes returned paths back to sandbox paths.
PlaceholderFsProviderWASM defaultFails with Unsupported until a host installs a real provider.
JsFsProviderWASM host integrationInstalled by runmat-wasm; delegates file calls to JavaScript callbacks such as readFile, writeFile, metadata, readDir, and optional readMany.
RemoteFsProviderNative and WASM variantsTalks to RunMat Server filesystem endpoints and signed object-store URLs.

The sandbox provider is used when a host wants local std::fs behavior without giving MATLAB code access to arbitrary host paths. It normalizes .. and absolute paths under the configured root, creates parent directories for writes, and returns virtualized paths from directory listings and canonicalization.

Builtin Integration

Most filesystem behavior exposed to MATLAB code lives in runtime builtins, but those builtins use the provider API instead of std::fs.

Builtin familyFilesystem dependency
Text and binary file I/Ofopen, fread, fwrite, fprintf, fclose, feof, frewind, fileread, filewrite use OpenOptions and File.
MAT filesload and save open provider-backed files, so MAT I/O works against local, sandboxed, WASM, or remote storage.
Tabular I/Ocsvread, csvwrite, dlmread, dlmwrite, readmatrix, and writematrix use provider-backed handles and metadata checks.
REPL filesystem commandscd, pwd, dir, ls, mkdir, rmdir, delete, copyfile, movefile, addpath, rmpath, genpath, savepath, tempname use provider metadata, directory, rename, removal, and current-directory APIs.
Introspectionexist and which combine workspace lookup, builtin metadata, class lookup, and provider-backed path search.
Images and plottingimread, imshow, and print read or write through the provider; print writes to a temporary path and renames into place.
Datasets and replayRuntime dataset APIs and plotting replay use provider reads, writes, batched reads, manifests, and chunk upload targets.
Source loadingExecutionRequest with SourceInput::Path reads the script through read_to_string_async.

exist and which are worth calling out because they are not simple metadata checks. They first consider workspace variables and registered builtins, then use path-search helpers to check files, folders, class folders, classdef files, MEX/P-code/Simulink/library extensions, and canonical paths through the provider.

Remote Filesystem

RemoteFsProvider maps provider calls onto RunMat Server project endpoints. Metadata and directory operations go through JSON API routes, while large file payloads are moved through chunked server reads or signed object-store URLs.

Loading diagram...

Remote configuration controls transport behavior:

Config fieldDefaultMeaning
chunk_bytes16 MBUnit of ranged reads and chunked writes.
parallel_requests4Number of worker threads used for native remote chunk transfers.
direct_read_threshold_bytes64 MBFiles at or above this size use a signed download URL and ranged reads.
direct_write_threshold_bytes64 MBFiles at or above this size use upload-session or multipart upload flow.
shard_threshold_bytes4 GBFiles at or above this size are written as shard objects plus a manifest.
shard_size_bytes512 MBTarget size for each shard.
retry_max_attempts5Retries transient HTTP failures.

Transient 429, 500, 502, 503, and 504 responses are retried with capped exponential backoff.

Parallel Remote Reads

Native remote reads are designed to keep the network busy. After metadata determines file length, the provider splits the file into chunk_bytes tasks. A scoped worker pool with up to parallel_requests threads drains a shared queue. Each worker downloads a chunk either from /fs/read or from a signed URL range request. Chunks are stored by index and copied into the final buffer at their byte offsets after all workers finish.

For large reads, direct_read_threshold_bytes switches the path to a single signed download URL. The provider still splits that object into range requests and fetches those ranges in parallel. This avoids routing the payload through the API server while still saturating available client-to-object-store bandwidth.

Sharded files are represented by a small manifest whose hash is manifest:v1. The provider reads the manifest, downloads each shard, and concatenates shard payloads in manifest order. Each shard is itself read through the same raw-file path, so large shard downloads still use chunk/range parallelism.

Parallel Remote Writes

Small unsharded writes use /fs/write with offsets and final=true on the last chunk. That path is sequential because it writes through the gateway.

Larger writes use upload sessions. The provider computes a full-file SHA-256, asks the server to start an upload session, describes every chunk with offset, size, and chunk hash, receives upload targets, then runs parallel_requests worker threads that PUT chunk slices directly to those targets. After all workers finish, it completes the session with the expected size, hash, and chunk count.

If the newer upload-session endpoint is unavailable, the provider falls back to a legacy multipart flow: create multipart upload, presign each part, upload parts in parallel, collect ETags, sort parts by part number, and complete multipart upload.

Very large writes use sharding before the upload-session decision. The provider writes each shard under /.runmat/shards/<uuid>, then writes a manifest to the requested path. This keeps individual object writes bounded while preserving a single logical file at the provider API.

Dataset Chunk Contract

Dataset APIs use provider-neutral data-contract types instead of normal file paths:

TypePurpose
DataManifestRequestFetches dataset manifest metadata for a path and optional version.
DataManifestDescriptorReturns schema version, format, dataset ID, update timestamp, and transaction sequence.
DataChunkUploadRequestRequests upload targets for an array's chunk objects.
DataChunkUploadTargetDescribes the method, URL, headers, and key for uploading one chunk.

Native providers map these to local manifest and chunk files. Remote providers map them to /data/manifest, /data/chunks/upload-targets, and direct chunk upload URLs. Runtime dataset builtins can therefore use the same code path for local, sandboxed, and remote datasets.

Boundaries

The filesystem crate does not parse MATLAB file formats and does not implement MATLAB path semantics by itself. MAT parsing, CSV formatting, path search, class lookup, and REPL command behavior live in runtime builtins. The filesystem layer only provides consistent storage operations and enough provider-specific extension points for fast remote data movement.