RunMat
GitHub

RunMat Filesystem

This guide explains how the RunMat filesystem works, how to choose a backend, and how to use the remote filesystem backed by RunMat Server.

Overview

RunMat’s filesystem keeps scripts consistent across laptop, browser, desktop, and cloud without requiring runtime scripts to rewrite load or save. You can scale from a local sandbox to petabytes in the cloud without changing your code.

           ┌────────────────────┐
           │ RunMat Runtime     │
           │ (wasm/native)      │
           ├────────────────────┤
           │ Virtual FS (VFS)   │  <-- filesystem abstraction
           │  • open/read/write │
           │  • metadata        │
           │  • directory ops   │
           └────────┬───────────┘
        ┌───────────┴─────────────────────────────────────────────────────┐
        │                Backends                                         │
        │                                                                 │
   ┌────┴────┐   ┌────────────┐   ┌─────────────┐   ┌─────────────────┐   │
   │ Native  │   │ Browser    │   │ Desktop     │   │ Remote          │   │
   │ Std FS  │   │ Storage    │   │ Host Proxy  │   │ Gateway         │   │
   └─────────┘   └────────────┘   └─────────────┘   └─────────────────┘   │
   - std::fs      - IndexedDB       - Native shell    - Signed URL fetch  │
   - mmap         - OPFS            - Native FS       - Chunked streaming │
   - async disk   - In-memory       - Cache layer     - Cred mgmt         │
        │                │                 │                 │            │
        └────────────────┴─────────────────┴─────────────────┴────────────┘

Backends include:

  • Native — local development and low‑latency work with OS filesystem performance.
  • Browser — zero‑install demos and lightweight workflows where portability matters most.
  • Desktop — browser UX with native disk performance and enterprise policies.
  • Remote (RunMat Server) — multi‑TB/PB data, collaboration, and high‑throughput I/O at scale.

Backend details

Native

  • Best for local development, single‑node workloads, and low‑latency iteration
  • Uses the OS filesystem and page cache for peak local performance
  • Ideal for small to mid‑sized datasets and quick iteration loops

Browser

  • Best for zero‑install usage, onboarding, and lightweight workflows
  • Storage is sandboxed and portable across sessions
  • Great for demos and education; not optimized for very large datasets

Desktop

  • Best for teams that want a desktop UX with native filesystem access
  • Uses a privileged host process for high‑performance local reads/writes
  • Ideal when you need native disk access with a sandboxed UI

Remote (RunMat Server)

Use RunMat Server when data outgrows local disks or when collaboration and repeatability matter. It stays fast at terabyte‑to‑petabyte scale while preserving I/O semantics within code executing in the runtime.

You get:

  • High throughput for big reads and writes
  • Elastic scale without managing storage infrastructure
  • Copy‑on‑write updates so large datasets evolve without full rewrites
  • Versioned datasets when you need traceability, with smart defaults to control storage costs
  • Content-addressed blobs with ETags derived from hash + size for integrity

Using the filesystem from RunMat scripts

% Write a dataset
data = rand(1, 1000)
save("/data/example.mat", "data")

% Read it back
load("/data/example.mat")

If the runtime is configured with a remote filesystem provider, these calls read and write to the remote storage automatically.

For portable path assembly, use fullfile to join segments with the platform-specific separator:

rawPath = fullfile("data", "raw", "sample.dat");
fid = fopen(rawPath, "w"); fclose(fid);

Using the CLI with the remote filesystem

Authenticate and select a project

runmat login
runmat org list
runmat project list --org <org-id>
runmat project select <project-id>

You can pass a private server URL with --server. RunMat defaults to https://api.runmat.com if omitted.

Run scripts with the remote filesystem

runmat remote run /script.m

Basic filesystem operations

runmat fs ls /data
runmat fs read /data/example.mat --output example.mat
runmat fs write /data/example.mat ./example.mat
runmat fs mkdir /data/new --recursive
runmat fs rm /data/example.mat

Selecting a project

You can select a project by:

  • running runmat project select <project-id>
  • passing --project <project-id> to the command
  • setting the environment variable RUNMAT_PROJECT_ID to the project ID
  • providing a project ID when logging in with runmat login --project <project-id>

Versioning policy

Versioning lets you restore a previous file or dataset state without copying data yourself.

What users should expect:

  • Source/code and small files are versioned by default.
  • Large datasets are versioned when they are sharded or explicitly configured.
  • Restoring a version is instant because it switches the active version pointer.

Storage behavior:

  • Versioned files keep their previous blobs.
  • Non‑versioned updates keep only the latest blob.
  • Sharded datasets always version the manifest, so old datasets remain recoverable.

When history is pruned:

  • If a file is not configured for versioning, every new write replaces the previous data and the older history is removed automatically.
  • When versioning is enabled, RunMat keeps history based on a max‑versions policy. Defaults come from the server plan, and you can override per project.
    • Retention policy is enforced by the background cleanup job and applies after version creation.
  • Versions referenced by snapshots are never pruned.

Using version history from the CLI

runmat fs history /data/example.mat
runmat fs restore <version-id>
runmat fs history-delete <version-id>

Snapshots (project history)

Snapshots give you a fast, durable “project checkpoint” without duplicating data. They are ideal for:

  • Marking a dataset or model before a risky migration
  • Creating a reproducible baseline before experiments
  • Capturing a stable project state for handoff or review

Snapshots are a single-parent chain (like a simple git history). Restoring a snapshot rewires file pointers back to the recorded versions with zero-copy behavior. Snapshots are only removed when explicitly deleted, and versions referenced by snapshots are never pruned.

Tags let you attach stable names (like baseline or release-2026-01) to any snapshot for quick retrieval.

runmat fs snapshot-create --message "baseline" --tag baseline
runmat fs snapshot-list
runmat fs snapshot-restore <snapshot-id>
runmat fs snapshot-tag-list
runmat fs snapshot-tag-set <snapshot-id> release-2026-01

Git sync

RunMat exposes a minimal git-compatible workflow backed by snapshots. It lets you clone a project into a git working tree, pull new snapshot history, and push linear commits back to the server.

Git sync is fast-forward only and currently supports a single branch (refs/heads/main).

runmat fs git-clone ./project-repo
cd project-repo
runmat fs git-pull
runmat fs git-push

Git export

Snapshots can be exported to a git fast-import stream, so you can materialize a git history for backup, sharing, or downstream tooling. Each snapshot becomes a commit, and tags map to git tags.

curl -L "$RUNMAT_SERVER_URL/v1/projects/$RUNMAT_PROJECT_ID/fs/snapshots/<snapshot-id>/git-export" \
  -H "authorization: Bearer $RUNMAT_API_KEY" \
  -o snapshot.fast-import

git init export
cd export
git fast-import < ../snapshot.fast-import
git log --oneline

Retention settings

runmat project retention get
runmat project retention set 50

Scaling to petabytes

RunMat Server lets you work with massive datasets without re‑architecting your code. It streams only what you need, parallelizes reads and writes, and supports shard‑based datasets so updates are incremental — not all‑or‑nothing. That means a 1 GB update inside a multi‑PB dataset is still fast and cost‑efficient.

Sharded datasets and manifests

Sharding splits very large files into smaller pieces (shards) so RunMat can stream them efficiently and update only what changes.

When sharding applies:

  • Large datasets above the shard threshold (default: 4 GB).
  • Workloads that need fast random access or partial updates.

Why it matters:

  • Reads are parallelized across shards for high throughput.
  • Updates rewrite only the touched shards, not the entire dataset.

Implementation details you may see in diagnostics:

  • The manifest is stored at the dataset path and tagged with hash=manifest:v1.
  • Shards are stored under /.runmat/shards/<uuid> and streamed in order.
  • The server computes content hashes; the only client-provided hash is the manifest:v1 marker.

Manifest schema:

{
  "version": 1,
  "total_size": 123456,
  "shard_size": 536870912,
  "shards": [
    { "path": "/.runmat/shards/<uuid>", "size": 536870912 }
  ]
}

Manifest workflows

runmat fs manifest-history /data/dataset
runmat fs manifest-restore <version-id>
runmat fs manifest-update /data/dataset --base-version <version-id> --manifest ./manifest.json

For high-throughput ingestion, /fs/manifest/urls returns presigned URLs for each shard so clients can download in parallel without routing through the server.

When should I use Remote?

  • You need to share datasets across teams or regions
  • Your data is too large for local disk
  • You want fast, parallel I/O without managing storage
  • You need versioned datasets and reproducible workflows

RunMat Server Configuration

ValueDescriptionRequiredDefault
RUNMAT_SERVER_URLBase API URLNohttps://api.runmat.com
RUNMAT_API_KEYAPI key or access tokenYes (unless using runmat login)None
RUNMAT_ORG_IDDefault orgNoNone
RUNMAT_PROJECT_IDDefault projectNoNone
RUNMAT_FS_SHARD_THRESHOLD_BYTESSize at which sharding beginsNo4294967296 (4 GB)
RUNMAT_FS_SHARD_SIZE_BYTESShard size for large datasetsNo536870912 (512 MB)
RUNMAT_FS_VERSION_RETENTION_MAX_VERSIONSDefault history limit per file (0 = unlimited)No0