Weights Studio Guide ==================== Weights Studio is the visual frontend for Weightslab experiments. It connects to your running Weightslab backend over gRPC-Web via Envoy, and gives you interactive control over samples, tags, discard/restore actions, training/audit mode, and training signal plots. Architecture ------------ .. image:: _static/weights_studio_architecture.png :alt: Weights Studio architecture :width: 100% Runtime path: 1. Browser UI (Vite app) 2. Envoy proxy (gRPC-Web bridge) 3. Weightslab Python gRPC service Project location ---------------- Weights Studio source lives in: - ``../weights_studio`` Key files: - Docker compose: ``../weights_studio/docker/docker-compose.yml`` - Envoy config: ``../weights_studio/envoy/envoy.yaml`` - Frontend entrypoint: ``../weights_studio/src/main.ts`` - UI layout: ``../weights_studio/index.html`` Quick start (Docker) -------------------- 1. Start your Weightslab backend (gRPC on host, default port ``50051``). 2. Load environment variables from ``../weights_studio/docker/.env``. 3. Start studio stack from ``../weights_studio/docker``: - Envoy - Frontend (Vite) 4. Open Weights Studio in your browser. Docker services and ports ------------------------- From ``../weights_studio/docker/docker-compose.yml`` and ``../weights_studio/envoy/envoy.yaml``: - Frontend: ``VITE_PORT`` (default ``5173``) - Envoy gRPC-Web endpoint: ``ENVOY_PORT`` (default ``8080``) - Envoy admin: ``ENVOY_ADMIN_PORT`` (default ``9901``) - Backend target from Envoy: ``host.docker.internal:50051`` Default values in ``../weights_studio/docker/.env``: - ``VITE_PORT=5173`` - ``WS_SERVER_HOST=localhost`` - ``WS_SERVER_PORT=8080`` - ``WS_SERVER_PROTOCOL=http`` - ``ENVOY_PORT=8080`` - ``ENVOY_ADMIN_PORT=9901`` - ``GRPC_BACKEND_PORT=50051`` How the frontend endpoint is built ---------------------------------- In ``src/main.ts``, the UI builds the server URL from: - ``WS_SERVER_PROTOCOL`` - ``WS_SERVER_HOST`` - ``WS_SERVER_PORT`` This URL is used as gRPC-Web base URL for ``ExperimentServiceClient``. Environment/configuration checklist ----------------------------------- Backend: - Ensure Weightslab serves gRPC and listens on host ``0.0.0.0:50051``. Envoy: - Ensure ``envoy.yaml`` cluster points to host backend: ``host.docker.internal:50051``. Frontend: - Ensure frontend points to Envoy (default ``http://localhost:8080``). Sanity checks: - Studio reachable at ``http://localhost:5173``. - Envoy admin reachable at ``http://localhost:9901``. Server integration (AWS example) -------------------------------- This section describes a practical cloud deployment path for Weightslab + Weights Studio on AWS. Recommended production architecture ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - **UI**: Weights Studio frontend behind HTTPS (ALB). - **gRPC-Web bridge**: Envoy service reachable by frontend. - **Backend**: Weightslab training service (Python gRPC). - **Storage**: EBS/EFS/S3 for logs/checkpoints depending on your workflow. Two common deployment options ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 1. **EC2 (fastest to start)** - Run Weightslab backend process on the VM (port ``50051``). - Run Envoy + frontend via Docker Compose. - Use one security group to expose only HTTPS (and optionally admin ports privately). 2. **ECS/Fargate (container-native)** - Service A: frontend container. - Service B: Envoy container. - Service C: Weightslab backend container (or external training worker). - Route through ALB + target groups. Minimum port plan (AWS) ~~~~~~~~~~~~~~~~~~~~~~~ - Public ingress: - ``443`` (HTTPS to frontend/ALB) - Internal service ports: - ``8080`` (Envoy gRPC-Web listener) - ``50051`` (Weightslab backend gRPC) - ``9901`` (Envoy admin, keep private) - ``5173`` (frontend dev port; avoid exposing directly in production) Security group guidance ~~~~~~~~~~~~~~~~~~~~~~~ - Allow ``443`` from trusted client CIDRs (or internet if required). - Restrict ``50051`` and ``8080`` to VPC/internal security groups. - Restrict ``9901`` to private admin subnet/bastion only. - Do not expose backend gRPC directly to the public internet. TLS and domains ~~~~~~~~~~~~~~~ - Terminate TLS at ALB using ACM certificates. - Use a DNS record (Route 53) for the studio hostname. - Forward ALB target traffic to frontend service. - Keep Envoy/backend internal where possible. Environment mapping (cloud) ~~~~~~~~~~~~~~~~~~~~~~~~~~~ For Weights Studio frontend + Envoy alignment, set environment variables consistently with your deployed endpoints: .. code-block:: ini # frontend resolves gRPC-web through Envoy WS_SERVER_PROTOCOL=https WS_SERVER_HOST=studio.your-domain.com WS_SERVER_PORT=443 # envoy / backend internal wiring ENVOY_PORT=8080 ENVOY_ADMIN_PORT=9901 GRPC_BACKEND_PORT=50051 If Envoy is internal-only and frontend is public, ensure frontend requests are routed to the Envoy endpoint through your internal load-balancing design. AWS deployment checklist ~~~~~~~~~~~~~~~~~~~~~~~~ 1. Provision VPC/subnets and security groups. 2. Deploy backend gRPC service and verify ``:50051`` internally. 3. Deploy Envoy and verify routing to backend. 4. Deploy frontend with correct ``WS_SERVER_*`` values. 5. Attach ALB + ACM certificate and configure HTTPS listener. 6. Validate UI actions (query, tagging, discard/restore, plots). 7. Add monitoring/logging (CloudWatch metrics/logs). Operational notes ~~~~~~~~~~~~~~~~~ - Prefer long-running backend workers for stable interactive sessions. - Keep checkpoint/log storage durable (EBS/EFS/S3 strategy). - If using autoscaling, ensure session and backend availability are handled explicitly for active users. - For multi-environment setups, keep per-environment ``.env`` templates versioned and reviewed. Concrete EC2 + Docker Compose + systemd recipe ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Use this pattern for a simple single-VM production-like deployment. 1. Provision EC2 - Ubuntu 22.04 (or similar), attached security group. - Open only ``443`` publicly. - Keep ``50051``, ``8080``, ``9901`` private (VPC/admin only). 2. Install runtime dependencies - Docker Engine + Docker Compose plugin - Python environment for your Weightslab backend process 3. Configure environment In ``weights_studio/docker/.env`` (or environment management equivalent): .. code-block:: ini VITE_PORT=5173 WS_SERVER_PROTOCOL=https WS_SERVER_HOST=studio.your-domain.com WS_SERVER_PORT=443 ENVOY_PORT=8080 ENVOY_ADMIN_PORT=9901 GRPC_BACKEND_PORT=50051 4. Start backend service Start Weightslab gRPC in your training/runtime process and ensure it binds to a reachable interface (for example ``0.0.0.0:50051``). 5. Start studio containers From ``weights_studio/docker``: .. code-block:: bash docker compose up -d 6. Add process supervision (systemd) Use systemd to ensure services restart on reboot/failure. Example unit for studio compose stack: .. code-block:: ini [Unit] Description=Weights Studio (Docker Compose) Requires=docker.service After=docker.service [Service] Type=oneshot WorkingDirectory=/opt/weights_studio/docker ExecStart=/usr/bin/docker compose up -d ExecStop=/usr/bin/docker compose down RemainAfterExit=yes TimeoutStartSec=0 [Install] WantedBy=multi-user.target Enable and start: .. code-block:: bash sudo systemctl daemon-reload sudo systemctl enable weights-studio sudo systemctl start weights-studio 7. Put HTTPS in front (ALB + ACM) - Attach ACM certificate to ALB listener on ``443``. - Route DNS (Route 53) to ALB. - Forward traffic to frontend target. 8. Validate end-to-end - Open studio URL. - Verify sample query, tag/discard actions, and plot refresh. - Verify backend logs and Envoy admin stats. UI controls and actions ----------------------- Top header controls ~~~~~~~~~~~~~~~~~~~ - **Dark mode toggle**: switch light/dark theme. - **Refresh button**: manually refresh dynamic stats in visible grid. - **Refresh config popover**: - Data auto-refresh enable/disable + interval - Plot auto-refresh enable/disable + interval - Clear cache and reload page - **Training button** (Resume/Pause): toggles ``is_training`` via backend command. - **Mode selector** (dropdown next to training button): - ``train`` mode - ``audit`` mode (sets ``auditorMode``) Left panel ~~~~~~~~~~ - **Training card**: - Training state pill (running/paused/pending) - Connection status text - Live metrics and progress - **Tags card**: - Tag chips - New tag input - Painter toggle - Add/remove painter mode switch - **Details card**: - Grid settings (cell size + resolution + apply) - Segmentation overlays (Raw, GT, Pred, Diff, Split view) - Split colors - Metadata field toggles Grid interactions ~~~~~~~~~~~~~~~~~ - Drag selection rectangle (multi-select). - ``Ctrl`` multi-select support. - Right-click context menu actions: - Manage tags - Remove all tags - Discard selected samples - Restore selected samples The UI pauses training before data-modifying actions to keep edits safe. Bottom bar ~~~~~~~~~~ - Batch slider for navigation over samples. - Start/end batch index labels. - Total and active sample counters. Image detail modal ~~~~~~~~~~~~~~~~~~ - Large image preview with previous/next navigation. - Zoom in/out/reset controls. - Metadata detail panel. - Volumetric support with Z-slice slider when applicable. Signal plots ~~~~~~~~~~~~ Per-signal cards include: - Reset zoom - CSV export - JSON export - Settings (curve color, smoothing, std band, markers) Right-click menu on plots includes: - Reset X/Y/all zoom - Change curve color - Load weights at clicked step - Hide/unhide curve - Break by slices - Copy chart image - Save chart image Weightslab CLI console (dev) ---------------------------- The Weightslab CLI console is a local developer REPL for inspecting and controlling a running experiment through the global ledger. - Transport: local TCP text commands with JSON responses. - Intended scope: development/debug only. - Security model: localhost binding by default, plain-text protocol. How to start it ~~~~~~~~~~~~~~~ From your training script (recommended): .. code-block:: python import weightslab as wl wl.serve(serving_grpc=True, serving_cli=True) wl.keep_serving() Standalone server: .. code-block:: bash python -m weightslab.backend.cli serve --host localhost --port 60000 Connect a client manually: .. code-block:: bash python -m weightslab.backend.cli client --host localhost --port 60000 If no port is provided (or port is ``0``), the server picks a free port. Console actions and commands ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Discovery and help: - ``help`` / ``h`` / ``?``: show all command syntaxes and examples. - ``status``: compact snapshot of models, loaders, optimizers, hyperparams. - ``dump``: sanitized ledger dump (dataloaders, optimizers, hyperparams). Training control: - ``pause`` (or ``p``): pause training and set ``is_training=False``. - ``resume`` (or ``r``): resume training and set ``is_training=True``. Registry inspection: - ``list_models`` - ``list_optimizers`` - ``list_loaders`` - ``plot_model [model_name]``: prints model architecture text tree. Sample-level dataset operations: - ``list_uids [loader_name] [--discarded] [--limit N]`` - ``discard [uid2 ...] [--loader loader_name]`` - ``undiscard [uid2 ...] [--loader loader_name]`` - ``add_tag [--loader loader_name]`` Hyperparameter operations: - ``hp``: list hyperparameter sets. - ``hp ``: show one set. - ``set_hp [hp_name] ``: update one key path. Model architecture operation: - ``operate [model_name] `` Session control: - ``exit`` / ``quit``: close client session. - ``clear`` / ``cls``: client-side terminal clear (not sent to server). Developer notes ~~~~~~~~~~~~~~~ - Prefer CLI for quick diagnosis and manual interventions. - Keep CLI port private (localhost or private subnet only). - Use Weights Studio for richer visual workflows; use CLI for low-latency command-driven operations. Weightslab workflow recommendation ---------------------------------- Typical loop for productive usage: 1. Start Weightslab backend and studio stack. 2. Monitor training metrics and sample-level signals. 3. Use grid metadata + modal details to inspect hard or noisy samples. 4. Tag slices (e.g., outliers, hard cases). 5. Discard low-value samples and continue training. 6. Use train/audit mode toggles to inspect safely without weight updates. 7. Use plot controls to inspect branch transitions and checkpoint behavior. Troubleshooting --------------- - Studio loads but no data: check backend gRPC is running and Envoy target is reachable. - Connection/reset errors: verify ``ENVOY_PORT`` and backend port mapping. - Wrong endpoint: verify ``WS_SERVER_HOST/PORT/PROTOCOL`` in docker environment. - No plot updates: verify plot auto-refresh setting and backend logger data availability.