Weights Studio Guide

Weights Studio is the visual frontend for Weightslab experiments. It connects to your running Weightslab backend over gRPC-Web via Envoy, and gives you interactive control over samples, tags, discard/restore actions, training/audit mode, and training signal plots.

Architecture

Weights Studio architecture

Runtime path:

  1. Browser UI (Vite app)

  2. Envoy proxy (gRPC-Web bridge)

  3. Weightslab Python gRPC service

Project location

Weights Studio source lives in:

  • ../weights_studio

Key files:

  • Docker compose: ../weights_studio/docker/docker-compose.yml

  • Envoy config: ../weights_studio/envoy/envoy.yaml

  • Frontend entrypoint: ../weights_studio/src/main.ts

  • UI layout: ../weights_studio/index.html

Quick start (Docker)

  1. Start your Weightslab backend (gRPC on host, default port 50051).

  2. Load environment variables from ../weights_studio/docker/.env.

  3. Start studio stack from ../weights_studio/docker:

    • Envoy

    • Frontend (Vite)

  4. Open Weights Studio in your browser.

Docker services and ports

From ../weights_studio/docker/docker-compose.yml and ../weights_studio/envoy/envoy.yaml:

  • Frontend: VITE_PORT (default 5173)

  • Envoy gRPC-Web endpoint: ENVOY_PORT (default 8080)

  • Envoy admin: ENVOY_ADMIN_PORT (default 9901)

  • Backend target from Envoy: host.docker.internal:50051

Default values in ../weights_studio/docker/.env:

  • VITE_PORT=5173

  • WS_SERVER_HOST=localhost

  • WS_SERVER_PORT=8080

  • WS_SERVER_PROTOCOL=http

  • ENVOY_PORT=8080

  • ENVOY_ADMIN_PORT=9901

  • GRPC_BACKEND_PORT=50051

How the frontend endpoint is built

In src/main.ts, the UI builds the server URL from:

  • WS_SERVER_PROTOCOL

  • WS_SERVER_HOST

  • WS_SERVER_PORT

This URL is used as gRPC-Web base URL for ExperimentServiceClient.

Environment/configuration checklist

Backend:

  • Ensure Weightslab serves gRPC and listens on host 0.0.0.0:50051.

Envoy:

  • Ensure envoy.yaml cluster points to host backend: host.docker.internal:50051.

Frontend:

  • Ensure frontend points to Envoy (default http://localhost:8080).

Sanity checks:

  • Studio reachable at http://localhost:5173.

  • Envoy admin reachable at http://localhost:9901.

Server integration (AWS example)

This section describes a practical cloud deployment path for Weightslab + Weights Studio on AWS.

Two common deployment options

  1. EC2 (fastest to start)

    • Run Weightslab backend process on the VM (port 50051).

    • Run Envoy + frontend via Docker Compose.

    • Use one security group to expose only HTTPS (and optionally admin ports privately).

  2. ECS/Fargate (container-native)

    • Service A: frontend container.

    • Service B: Envoy container.

    • Service C: Weightslab backend container (or external training worker).

    • Route through ALB + target groups.

Minimum port plan (AWS)

  • Public ingress: - 443 (HTTPS to frontend/ALB)

  • Internal service ports: - 8080 (Envoy gRPC-Web listener) - 50051 (Weightslab backend gRPC) - 9901 (Envoy admin, keep private) - 5173 (frontend dev port; avoid exposing directly in production)

Security group guidance

  • Allow 443 from trusted client CIDRs (or internet if required).

  • Restrict 50051 and 8080 to VPC/internal security groups.

  • Restrict 9901 to private admin subnet/bastion only.

  • Do not expose backend gRPC directly to the public internet.

TLS and domains

  • Terminate TLS at ALB using ACM certificates.

  • Use a DNS record (Route 53) for the studio hostname.

  • Forward ALB target traffic to frontend service.

  • Keep Envoy/backend internal where possible.

Environment mapping (cloud)

For Weights Studio frontend + Envoy alignment, set environment variables consistently with your deployed endpoints:

# frontend resolves gRPC-web through Envoy
WS_SERVER_PROTOCOL=https
WS_SERVER_HOST=studio.your-domain.com
WS_SERVER_PORT=443

# envoy / backend internal wiring
ENVOY_PORT=8080
ENVOY_ADMIN_PORT=9901
GRPC_BACKEND_PORT=50051

If Envoy is internal-only and frontend is public, ensure frontend requests are routed to the Envoy endpoint through your internal load-balancing design.

AWS deployment checklist

  1. Provision VPC/subnets and security groups.

  2. Deploy backend gRPC service and verify :50051 internally.

  3. Deploy Envoy and verify routing to backend.

  4. Deploy frontend with correct WS_SERVER_* values.

  5. Attach ALB + ACM certificate and configure HTTPS listener.

  6. Validate UI actions (query, tagging, discard/restore, plots).

  7. Add monitoring/logging (CloudWatch metrics/logs).

Operational notes

  • Prefer long-running backend workers for stable interactive sessions.

  • Keep checkpoint/log storage durable (EBS/EFS/S3 strategy).

  • If using autoscaling, ensure session and backend availability are handled explicitly for active users.

  • For multi-environment setups, keep per-environment .env templates versioned and reviewed.

Concrete EC2 + Docker Compose + systemd recipe

Use this pattern for a simple single-VM production-like deployment.

  1. Provision EC2

  • Ubuntu 22.04 (or similar), attached security group.

  • Open only 443 publicly.

  • Keep 50051, 8080, 9901 private (VPC/admin only).

  1. Install runtime dependencies

  • Docker Engine + Docker Compose plugin

  • Python environment for your Weightslab backend process

  1. Configure environment

In weights_studio/docker/.env (or environment management equivalent):

VITE_PORT=5173
WS_SERVER_PROTOCOL=https
WS_SERVER_HOST=studio.your-domain.com
WS_SERVER_PORT=443

ENVOY_PORT=8080
ENVOY_ADMIN_PORT=9901
GRPC_BACKEND_PORT=50051
  1. Start backend service

Start Weightslab gRPC in your training/runtime process and ensure it binds to a reachable interface (for example 0.0.0.0:50051).

  1. Start studio containers

From weights_studio/docker:

docker compose up -d
  1. Add process supervision (systemd)

Use systemd to ensure services restart on reboot/failure.

Example unit for studio compose stack:

[Unit]
Description=Weights Studio (Docker Compose)
Requires=docker.service
After=docker.service

[Service]
Type=oneshot
WorkingDirectory=/opt/weights_studio/docker
ExecStart=/usr/bin/docker compose up -d
ExecStop=/usr/bin/docker compose down
RemainAfterExit=yes
TimeoutStartSec=0

[Install]
WantedBy=multi-user.target

Enable and start:

sudo systemctl daemon-reload
sudo systemctl enable weights-studio
sudo systemctl start weights-studio
  1. Put HTTPS in front (ALB + ACM)

  • Attach ACM certificate to ALB listener on 443.

  • Route DNS (Route 53) to ALB.

  • Forward traffic to frontend target.

  1. Validate end-to-end

  • Open studio URL.

  • Verify sample query, tag/discard actions, and plot refresh.

  • Verify backend logs and Envoy admin stats.

UI controls and actions

Top header controls

  • Dark mode toggle: switch light/dark theme.

  • Refresh button: manually refresh dynamic stats in visible grid.

  • Refresh config popover: - Data auto-refresh enable/disable + interval - Plot auto-refresh enable/disable + interval - Clear cache and reload page

  • Training button (Resume/Pause): toggles is_training via backend command.

  • Mode selector (dropdown next to training button): - train mode - audit mode (sets auditorMode)

Left panel

  • Training card: - Training state pill (running/paused/pending) - Connection status text - Live metrics and progress

  • Tags card: - Tag chips - New tag input - Painter toggle - Add/remove painter mode switch

  • Details card: - Grid settings (cell size + resolution + apply) - Segmentation overlays (Raw, GT, Pred, Diff, Split view) - Split colors - Metadata field toggles

Grid interactions

  • Drag selection rectangle (multi-select).

  • Ctrl multi-select support.

  • Right-click context menu actions: - Manage tags - Remove all tags - Discard selected samples - Restore selected samples

The UI pauses training before data-modifying actions to keep edits safe.

Bottom bar

  • Batch slider for navigation over samples.

  • Start/end batch index labels.

  • Total and active sample counters.

Image detail modal

  • Large image preview with previous/next navigation.

  • Zoom in/out/reset controls.

  • Metadata detail panel.

  • Volumetric support with Z-slice slider when applicable.

Signal plots

Per-signal cards include:

  • Reset zoom

  • CSV export

  • JSON export

  • Settings (curve color, smoothing, std band, markers)

Right-click menu on plots includes:

  • Reset X/Y/all zoom

  • Change curve color

  • Load weights at clicked step

  • Hide/unhide curve

  • Break by slices

  • Copy chart image

  • Save chart image

Weightslab CLI console (dev)

The Weightslab CLI console is a local developer REPL for inspecting and controlling a running experiment through the global ledger.

  • Transport: local TCP text commands with JSON responses.

  • Intended scope: development/debug only.

  • Security model: localhost binding by default, plain-text protocol.

How to start it

From your training script (recommended):

import weightslab as wl

wl.serve(serving_grpc=True, serving_cli=True)
wl.keep_serving()

Standalone server:

python -m weightslab.backend.cli serve --host localhost --port 60000

Connect a client manually:

python -m weightslab.backend.cli client --host localhost --port 60000

If no port is provided (or port is 0), the server picks a free port.

Console actions and commands

Discovery and help:

  • help / h / ?: show all command syntaxes and examples.

  • status: compact snapshot of models, loaders, optimizers, hyperparams.

  • dump: sanitized ledger dump (dataloaders, optimizers, hyperparams).

Training control:

  • pause (or p): pause training and set is_training=False.

  • resume (or r): resume training and set is_training=True.

Registry inspection:

  • list_models

  • list_optimizers

  • list_loaders

  • plot_model [model_name]: prints model architecture text tree.

Sample-level dataset operations:

  • list_uids [loader_name] [--discarded] [--limit N]

  • discard <uid> [uid2 ...] [--loader loader_name]

  • undiscard <uid> [uid2 ...] [--loader loader_name]

  • add_tag <uid> <tag> [--loader loader_name]

Hyperparameter operations:

  • hp: list hyperparameter sets.

  • hp <name>: show one set.

  • set_hp [hp_name] <key.path> <value>: update one key path.

Model architecture operation:

  • operate [model_name] <op_type:int> <layer_id:int> <nb|[list]>

Session control:

  • exit / quit: close client session.

  • clear / cls: client-side terminal clear (not sent to server).

Developer notes

  • Prefer CLI for quick diagnosis and manual interventions.

  • Keep CLI port private (localhost or private subnet only).

  • Use Weights Studio for richer visual workflows; use CLI for low-latency command-driven operations.

Weightslab workflow recommendation

Typical loop for productive usage:

  1. Start Weightslab backend and studio stack.

  2. Monitor training metrics and sample-level signals.

  3. Use grid metadata + modal details to inspect hard or noisy samples.

  4. Tag slices (e.g., outliers, hard cases).

  5. Discard low-value samples and continue training.

  6. Use train/audit mode toggles to inspect safely without weight updates.

  7. Use plot controls to inspect branch transitions and checkpoint behavior.

Troubleshooting

  • Studio loads but no data: check backend gRPC is running and Envoy target is reachable.

  • Connection/reset errors: verify ENVOY_PORT and backend port mapping.

  • Wrong endpoint: verify WS_SERVER_HOST/PORT/PROTOCOL in docker environment.

  • No plot updates: verify plot auto-refresh setting and backend logger data availability.