RunMat Filesystem

This guide explains how the RunMat filesystem works, how to choose a backend, and how to use the remote filesystem backed by RunMat Server.

Overview

RunMat’s filesystem keeps scripts consistent across laptop, browser, desktop, and cloud without requiring runtime scripts to rewrite load or save. You can scale from a local sandbox to petabytes in the cloud without changing your code.

           ┌────────────────────┐
           │ RunMat Runtime     │
           │ (wasm/native)      │
           ├────────────────────┤
           │ Virtual FS (VFS)   │  <-- filesystem abstraction
           │  • open/read/write │
           │  • metadata        │
           │  • directory ops   │
           └────────┬───────────┘
                    │
        ┌───────────┴─────────────────────────────────────────────────────┐
        │                Backends                                         │
        │                                                                 │
   ┌────┴────┐   ┌────────────┐   ┌─────────────┐   ┌─────────────────┐   │
   │ Native  │   │ Browser    │   │ Desktop     │   │ Remote          │   │
   │ Std FS  │   │ Storage    │   │ Host Proxy  │   │ Gateway         │   │
   └─────────┘   └────────────┘   └─────────────┘   └─────────────────┘   │
   - std::fs      - IndexedDB       - Native shell    - Signed URL fetch  │
   - mmap         - OPFS            - Native FS       - Chunked streaming │
   - async disk   - In-memory       - Cache layer     - Cred mgmt         │
        │                │                 │                 │            │
        └────────────────┴─────────────────┴─────────────────┴────────────┘

Backends include:

Native — local development and low‑latency work with OS filesystem performance.
Browser — zero‑install demos and lightweight workflows where portability matters most.
Desktop — browser UX with native disk performance and enterprise policies.
Remote (RunMat Server) — multi‑TB/PB data, collaboration, and high‑throughput I/O at scale.

Backend details

Native

Best for local development, single‑node workloads, and low‑latency iteration
Uses the OS filesystem and page cache for peak local performance
Ideal for small to mid‑sized datasets and quick iteration loops

Browser

Best for zero‑install usage, onboarding, and lightweight workflows
Storage is sandboxed and portable across sessions
Great for demos and education; not optimized for very large datasets

Desktop

Best for teams that want a desktop UX with native filesystem access
Uses a privileged host process for high‑performance local reads/writes
Ideal when you need native disk access with a sandboxed UI

Remote (RunMat Server)

Use RunMat Server when data outgrows local disks or when collaboration and repeatability matter. It stays fast at terabyte‑to‑petabyte scale while preserving I/O semantics within code executing in the runtime.

You get:

High throughput for big reads and writes
Elastic scale without managing storage infrastructure
Copy‑on‑write updates so large datasets evolve without full rewrites
Versioned datasets when you need traceability, with smart defaults to control storage costs
Content-addressed blobs with ETags derived from hash + size for integrity

Using the filesystem from RunMat scripts

% Write a dataset
data = rand(1, 1000)
save("/data/example.mat", "data")

% Read it back
load("/data/example.mat")

If the runtime is configured with a remote filesystem provider, these calls read and write to the remote storage automatically.

For portable path assembly, use fullfile to join segments with the platform-specific separator:

rawPath = fullfile("data", "raw", "sample.dat");
fid = fopen(rawPath, "w"); fclose(fid);

Using the CLI with the remote filesystem

Authenticate and select a project

runmat login
runmat org list
runmat project list --org <org-id>
runmat project select <project-id>

You can pass a private server URL with --server. RunMat defaults to https://api.runmat.com if omitted.

Run scripts with the remote filesystem

runmat remote run /script.m

Basic filesystem operations

runmat fs ls /data
runmat fs read /data/example.mat --output example.mat
runmat fs write /data/example.mat ./example.mat
runmat fs mkdir /data/new --recursive
runmat fs rm /data/example.mat

Selecting a project

You can select a project by:

running runmat project select <project-id>
passing --project <project-id> to the command
setting the environment variable RUNMAT_PROJECT_ID to the project ID
providing a project ID when logging in with runmat login --project <project-id>

Versioning policy

Versioning lets you restore a previous file or dataset state without copying data yourself.

What users should expect:

Source/code and small files are versioned by default.
Large datasets are versioned when they are sharded or explicitly configured.
Restoring a version is instant because it switches the active version pointer.

Storage behavior:

Versioned files keep their previous blobs.
Non‑versioned updates keep only the latest blob.
Sharded datasets always version the manifest, so old datasets remain recoverable.

When history is pruned:

If a file is not configured for versioning, every new write replaces the previous data and the older history is removed automatically.
When versioning is enabled, RunMat keeps history based on a max‑versions policy. Defaults come from the server plan, and you can override per project.
- Retention policy is enforced by the background cleanup job and applies after version creation.
Versions referenced by snapshots are never pruned.

Using version history from the CLI

runmat fs history /data/example.mat
runmat fs restore <version-id>
runmat fs history-delete <version-id>

Snapshots (project history)

Snapshots give you a fast, durable “project checkpoint” without duplicating data. They are ideal for:

Marking a dataset or model before a risky migration
Creating a reproducible baseline before experiments
Capturing a stable project state for handoff or review

Snapshots are a single-parent chain (like a simple git history). Restoring a snapshot rewires file pointers back to the recorded versions with zero-copy behavior. Snapshots are only removed when explicitly deleted, and versions referenced by snapshots are never pruned.

Tags let you attach stable names (like baseline or release-2026-01) to any snapshot for quick retrieval.

runmat fs snapshot-create --message "baseline" --tag baseline
runmat fs snapshot-list
runmat fs snapshot-restore <snapshot-id>
runmat fs snapshot-tag-list
runmat fs snapshot-tag-set <snapshot-id> release-2026-01

Git sync

RunMat exposes a minimal git-compatible workflow backed by snapshots. It lets you clone a project into a git working tree, pull new snapshot history, and push linear commits back to the server.

Git sync is fast-forward only and currently supports a single branch (refs/heads/main).

runmat fs git-clone ./project-repo
cd project-repo
runmat fs git-pull
runmat fs git-push

Git export

Snapshots can be exported to a git fast-import stream, so you can materialize a git history for backup, sharing, or downstream tooling. Each snapshot becomes a commit, and tags map to git tags.

curl -L "$RUNMAT_SERVER_URL/v1/projects/$RUNMAT_PROJECT_ID/fs/snapshots/<snapshot-id>/git-export" \
  -H "authorization: Bearer $RUNMAT_API_KEY" \
  -o snapshot.fast-import

git init export
cd export
git fast-import < ../snapshot.fast-import
git log --oneline

Retention settings

runmat project retention get
runmat project retention set 50

Scaling to petabytes

RunMat Server lets you work with massive datasets without re‑architecting your code. It streams only what you need, parallelizes reads and writes, and supports shard‑based datasets so updates are incremental — not all‑or‑nothing. That means a 1 GB update inside a multi‑PB dataset is still fast and cost‑efficient.

Sharded datasets and manifests

Sharding splits very large files into smaller pieces (shards) so RunMat can stream them efficiently and update only what changes.

When sharding applies:

Large datasets above the shard threshold (default: 4 GB).
Workloads that need fast random access or partial updates.

Why it matters:

Reads are parallelized across shards for high throughput.
Updates rewrite only the touched shards, not the entire dataset.

Implementation details you may see in diagnostics:

The manifest is stored at the dataset path and tagged with hash=manifest:v1.
Shards are stored under /.runmat/shards/<uuid> and streamed in order.
The server computes content hashes; the only client-provided hash is the manifest:v1 marker.

Manifest schema:

{
  "version": 1,
  "total_size": 123456,
  "shard_size": 536870912,
  "shards": [
    { "path": "/.runmat/shards/<uuid>", "size": 536870912 }
  ]
}

Manifest workflows

runmat fs manifest-history /data/dataset
runmat fs manifest-restore <version-id>
runmat fs manifest-update /data/dataset --base-version <version-id> --manifest ./manifest.json

For high-throughput ingestion, /fs/manifest/urls returns presigned URLs for each shard so clients can download in parallel without routing through the server.

When should I use Remote?

You need to share datasets across teams or regions
Your data is too large for local disk
You want fast, parallel I/O without managing storage
You need versioned datasets and reproducible workflows

RunMat Server Configuration

Value	Description	Required	Default
`RUNMAT_SERVER_URL`	Base API URL	No	`https://api.runmat.com`
`RUNMAT_API_KEY`	API key or access token	Yes (unless using `runmat login`)	None
`RUNMAT_ORG_ID`	Default org	No	None
`RUNMAT_PROJECT_ID`	Default project	No	None
`RUNMAT_FS_SHARD_THRESHOLD_BYTES`	Size at which sharding begins	No	`4294967296` (4 GB)
`RUNMAT_FS_SHARD_SIZE_BYTES`	Shard size for large datasets	No	`536870912` (512 MB)
`RUNMAT_FS_VERSION_RETENTION_MAX_VERSIONS`	Default history limit per file (0 = unlimited)	No	`0`