Configuration¶
WeightsLab and Weights Studio are configured entirely through environment variables.
Copy the .env file in the repository root and adjust the values for your setup.
All variables are optional ? the default shown in each table is used when unset.
# From the repository root
cp .env .env.local # or just edit .env directly
# Then export or load it before starting your training script
WeightsLab (Python backend)¶
Logging¶
Variable |
Default |
Description |
|---|---|---|
|
|
Log level for all WeightsLab Python components.
Accepted values: |
|
|
Write logs to a rotating file in addition to stdout.
Set to |
|
(training script dir) |
Root directory where training log snapshots are saved.
Defaults to a |
|
|
Output format for audit logs tracking all user interactions through gRPC.
Accepted values: |
CLI Server¶
Variable |
Default |
Description |
|---|---|---|
|
|
Host the CLI inspection server binds to. |
|
|
Port the CLI inspection server listens on. |
gRPC Server¶
Variable |
Default |
Description |
|---|---|---|
|
|
Host the gRPC server binds to. |
|
|
Port the gRPC server listens on. |
|
|
Maximum message size (bytes) for gRPC send and receive (default 256 MB). Increase when transferring large weight tensors or image batches. |
|
(thread-pool size) |
Maximum number of RPCs handled simultaneously. Leave unset to match the worker thread count. |
|
(gRPC default) |
Override the C-core gRPC log verbosity ( |
|
|
Enables TLS on the backend gRPC socket.
Set to |
|
|
Base directory used for default TLS file lookup when the per-file
|
|
|
Path to backend private key file (PEM). |
|
|
Path to backend server certificate file (PEM). |
|
|
Path to CA certificate used to validate mTLS client certificates. |
|
|
Requires client certificates (mTLS) when set. |
|
(unset) |
Optional shared token accepted from gRPC metadata headers
( |
|
(unset) |
Comma-separated token list for rotation support. |
Backend startup validates these security inputs at boot time and fails fast with an explicit error when required TLS files are missing or invalid.
When hyperparameters/config are registered, WeightsLab resolves gRPC TLS settings with config-first precedence:
TLS flags:
grpc_tls_enabledthenGRPC_TLS_ENABLED;grpc_tls_require_client_auththenGRPC_TLS_REQUIRE_CLIENT_AUTH.TLS paths (when TLS is enabled):
grpc_tls_*_file->GRPC_TLS_*_FILE->grpc_tls_cert_dir->GRPC_TLS_CERT_DIR-> default~/certs.
Watchdog¶
The watchdog is a background daemon thread that monitors both the main
weightslab_rlock and all in-flight gRPC RPCs. When a lock or RPC is held
longer than GRPC_WATCHDOG_STUCK_SECONDS it is flagged as stuck. For locks,
the holding thread receives a _WatchdogInterrupt (a BaseException
subclass) that unwinds the stack and releases the lock via finally /
with. For RPCs, the gRPC server is restarted after
GRPC_WATCHDOG_RESTART_THRESHOLD consecutive unhealthy polls.
Variable |
Default |
Description |
|---|---|---|
|
|
If set to Example (PowerShell):
|
|
|
Seconds a lock or in-flight RPC must be held before being flagged as
stuck. Also used as the lock-acquisition timeout inside gRPC handlers
( |
|
|
How often (seconds) the watchdog polls for stuck locks and RPCs. |
|
|
Number of consecutive unhealthy watchdog polls before requesting a gRPC server restart. |
|
|
Maximum number of in-flight RPC entries printed in watchdog log messages. |
|
|
If set to |
Data and Cache¶
Variable |
Default |
Description |
|---|---|---|
|
|
Persist sample predictions in HDF5 format alongside the JSON log.
Set to |
|
|
Maximum number of entries held in the in-memory preview image cache. |
|
|
Pre-load the image overview index on startup.
Set to |
|
|
Longest-edge pixel size used when generating preview thumbnails. |
|
|
Number of samples per internal processing chunk.
|
|
|
Bounded wait (milliseconds) before generating missing preview entries
on-demand when the preview cache is still warming up.
Goal: reduce duplicate image decode/resize work during startup.
Clamped to |
|
|
Synchronise hyperparameters with the UI on every training step.
Set to |
|
|
How per-instance (per-annotation) numeric columns are folded into a
single per-sample scalar when the |
|
|
Maximum number of points returned per curve in the break-by-slices
plot. In this view the backend aggregates the matching samples into a single
mean curve per experiment (mean of the metric across the tagged samples
at each step) rather than streaming one curve per sample — so a long run
(e.g. 10k tagged samples × 10k steps) sends one curve instead of millions of
points. If that mean curve still has more steps than this cap, it is
uniformly downsampled — keeping the first and last point and an evenly-spaced
subset in between (no values are interpolated/invented). Set to |
Evaluation Mode¶
Variable |
Default |
Description |
|---|---|---|
|
|
Dynamic timeout factor for user-triggered evaluation passes.
Timeout is computed as |
|
|
Minimum timeout floor (seconds) applied to evaluation runs. |
|
|
Optional absolute timeout override in seconds.
|
AI / LLM API Keys¶
These keys are required only when using the agentic data-query features.
Variable |
Default |
Description |
|---|---|---|
|
(empty) |
OpenRouter API key ? required for cloud agent setup in Weights Studio. |
Agent Configuration¶
These variables control how the data-query agent finds its YAML configuration. The agent supports two provider families:
ollamafor local inferenceopenrouterfor cloud-hosted models
Variable |
Default |
Description |
|---|---|---|
|
(empty) |
Optional directory override for |
Agent config lookup order¶
<AGENT_CONFIG_PATH>/agent_config.yaml(ifAGENT_CONFIG_PATHis set)Repository-level
agent_config.yamlPackage-level
agent_config.yamlCurrent working directory
agent_config.yaml
Example¶
export AGENT_CONFIG_PATH=/opt/weightslab/config
# WeightsLab will look for:
# /opt/weightslab/config/agent_config.yaml
Agent Provider Setup¶
The runtime agent is configured from agent_config.yaml plus optional
environment variables such as OPENROUTER_API_KEY.
Supported YAML keys¶
Key |
Example |
Description |
|---|---|---|
|
|
Active provider. Common values: |
|
|
Local Ollama model name. |
|
|
Ollama host. |
|
|
Ollama HTTP port used by WeightsLab. |
|
|
Default OpenRouter model. |
|
|
OpenRouter-compatible base URL. |
|
|
Request timeout in seconds for OpenRouter calls. |
|
(secret) |
Optional API key in YAML. Prefer environment variables or UI init when possible. |
|
|
If enabled, WeightsLab also tries the local Ollama provider as fallback. |
Local Ollama example¶
Use this mode when you want the agent available immediately at backend startup.
agent:
provider: ollama
ollama_model: llama3.2:3b
ollama_host: localhost
ollama_port: 11435
fallback_to_local: false
Operational steps:
Install Ollama.
Pull a model, for example
ollama pull llama3.2:3b.Start the Ollama server.
Start WeightsLab.
Open Weights Studio and query the agent directly.
Cloud OpenRouter example¶
Use this mode when you want hosted models and interactive setup from Weights Studio.
agent:
provider: openrouter
openrouter_model: meta-llama/llama-3.3-70b-instruct
fallback_to_local: false
Recommended secret handling:
export OPENROUTER_API_KEY=your_openrouter_key
Weights Studio commands¶
When using Weights Studio, the agent bar supports these runtime commands:
/initOpens the OpenRouter onboarding flow. Users can enter an API key manually or use the OAuth flow, then select a model./modelOpens the model browser and switches the active OpenRouter model without requiring a full reinitialization./resetClears the current runtime connection state and returns the agent to the uninitialized status.
Notes¶
The default OpenRouter model is
meta-llama/llama-3.3-70b-instruct.The model browser fetches the available models from OpenRouter using the configured API key.
Connection and model-change actions are recorded in the agent history as log-style entries.
/resetclears the current runtime agent state. If your startup config is local-only and you want that provider back immediately, restart the backend.
Testing¶
Variable |
Default |
Description |
|---|---|---|
|
|
Hard timeout (seconds) applied to each unit test via the
|
Weights Studio (frontend)¶
Backend Connection¶
Variable |
Default |
Description |
|---|---|---|
|
|
Hostname of the WeightsLab gRPC backend, used by the Envoy proxy. |
|
|
Port of the WeightsLab gRPC backend, used by the Envoy proxy. |
|
|
Hostname of the Envoy proxy the browser connects to. |
|
|
Port Envoy listens on for HTTPS / gRPC-Web traffic. |
|
|
Envoy admin interface port (metrics, health checks). Bound to loopback and not published by Docker Compose by default. |
Vite Dev Server¶
Variable |
Default |
Description |
|---|---|---|
|
|
Host the Vite development server binds to. |
|
|
Port the Vite development server listens on. |
Runtime (import.meta.env)¶
These variables are injected into the browser bundle at build / dev time.
Variable |
Default |
Description |
|---|---|---|
|
|
Hostname (usually the Envoy proxy) the browser uses to reach the backend. |
|
|
Port the browser uses to reach the backend. |
|
|
Protocol ( |
|
|
Enables sandbox / demo mode ? disables all write operations in the UI.
Set to |
|
(device-adaptive) |
Number of image batches to prefetch in the grid view. Leave unset to use the automatic 1?3 value derived from device capabilities. |
|
|
Maximum number of metadata histogram bars shown above the sample slider.
Values above |
|
(prefetch + 4) |
Maximum number of images held in the WebSocket image cache. |
|
|
Maximum memory (MB) for the grid-view image tile cache. |
|
|
Maximum memory (MB) for the full-resolution modal image cache. |