After a container update, browsers serve stale app.js and lang/*.json
from cache, causing old UI code and missing translations to appear.
- serve_index() now reads the git commit hash and injects ?v=COMMIT into
all static asset URLs (app.js, i18n.js, styles.css) in index.html
- window.STATIC_VERSION is injected into the page so i18n.js can append
the same version to lang/*.json fetch calls
- index.html itself is served with Cache-Control: no-cache so the browser
always revalidates it and picks up new asset URLs on next load
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add a confirmation modal when clicking Redeploy that lets the user choose:
- Keep Data: containers are recreated without wiping the instance directory.
NetBird database, peer configs, and encryption keys are preserved.
- Fresh Deploy: full undeploy (removes all data) then redeploy from scratch.
Backend changes:
- POST /customers/{id}/deploy accepts keep_data query param (default false)
- When keep_data=true, undeploy_customer is skipped entirely
- deploy_customer now reuses existing npm_proxy_id/stream_id when the
deployment record is still present (avoids duplicate NPM proxy entries)
- DNS record creation is skipped on keep_data redeploy (already exists)
Frontend changes:
- customerAction('deploy') opens the redeploy modal instead of calling API
- showRedeployModal(id) shows the two-option confirmation card dialog
- confirmRedeploy(keepData) calls the API with the correct parameter
- i18n keys added in en.json and de.json
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The Docker Hub REST API returns per-platform manifest digests, while
docker image inspect RepoDigests stores the manifest list digest.
These two values never match, causing update_available to always be
True even after a fresh pull.
Fix: use registry-1.docker.io/v2/{name}/manifests/{tag} with anonymous
auth and read the Docker-Content-Digest response header, which is the
exact same digest that docker pull stores in RepoDigests.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Dashboard: update badge (orange) injected lazily into customer Status cell
after table renders via GET /monitoring/customers/local-update-status
(local-only Docker inspect, no Hub call on every page load)
- Customer detail Deployment tab: "Update Images" button with spinner,
shows success/error inline without page reload
- Monitoring Update All: now synchronous + sequential (one customer at a
time), shows live spinner + per-customer results table on completion
- Settings > Docker Images: "Pull from Docker Hub" button with spinner
and inline status message
- /monitoring/customers/local-update-status: new lightweight endpoint
(no network, pure local Docker inspect)
- /monitoring/customers/update-all: removed BackgroundTasks, now awaits
each customer sequentially and returns detailed per-customer results
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- New image_service.py: Docker Hub digest check (no pull), local digest/ID
comparison, pull_all_images, per-customer container image status, and
update_customer_containers (docker compose up -d, data-safe)
- Monitoring endpoints: GET /images/check (hub vs local + per-customer
needs_update), POST /images/pull (background), POST /customers/update-all
- Deployment endpoint: POST /{id}/update-images (single-customer update)
- Monitoring page: "NetBird Container Updates" card with Check / Pull / Update
All buttons; image status table and per-customer update table with inline
update buttons
- i18n: added keys in en.json and de.json
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Backend: add admin-only guard + role validation to PUT /users/{id}
- Backend: prevent admins from changing their own role
- Frontend: role toggle button (person-check / person-dash) per user row
- Frontend: admin badge green, viewer badge secondary, ldap badge blue
- i18n: add makeAdmin / makeViewer translations (de + en)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
In stop_customer, start_customer and restart_customer the local variable
'customer' was referenced on the instance_dir line before it was assigned
(it was only queried after the docker compose call). This caused an
UnboundLocalError (HTTP 500) on every stop/start/restart action.
Fix: move the customer query to the top of each function alongside the
deployment and config queries.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Register SOURCE_DIR as git safe.directory before pulling so the
process (root inside container) can access repos owned by a host user
- Run 'git fetch --tags' after pull so git describe always finds the
latest tag for version.json — git pull does not reliably fetch all tags
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The project name was hardcoded as 'netbirdmsp-appliance' but Docker Compose
derives the project name from the install directory name ('netbird-msp').
This caused Phase A to build an image under the wrong project name and
Phase B to start the replacement container under a mismatched project,
leaving the old container running indefinitely.
Fix: read the 'com.docker.compose.project' label from the running container
at update time. Both Phase A (build) and Phase B (docker compose up) now
use the detected project name. Falls back to SOURCE_DIR basename if the
inspect fails.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Bake version info (commit, branch, date) into /app/version.json at build time
via Docker ARG GIT_COMMIT/GIT_BRANCH/GIT_COMMIT_DATE
- Mount source directory as /app-source for in-container git operations
- Add git config safe.directory for /app-source (ownership mismatch fix)
- Add SystemConfig fields: git_repo_url, git_branch, git_token_encrypted
- Add DB migrations for the three new columns
- Add git_token encryption in update_settings() handler
- New endpoints:
GET /api/settings/version — current version + latest from Gitea API
POST /api/settings/update — DB backup + git pull + docker compose rebuild
- New service: app/services/update_service.py
get_current_version() — reads /app/version.json
check_for_updates() — queries Gitea API for latest commit on branch
backup_database() — timestamped SQLite copy to /app/backups/
trigger_update() — git pull + fire-and-forget compose rebuild
- New script: update.sh — SSH-based manual update with health check
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Windows DNS (WinRM):
- New dns_service.py: create/delete A-records via PowerShell over WinRM (NTLM)
- Idempotent create (removes existing record first), graceful delete
- DNS failures are non-fatal — deployment continues, error logged
- test-dns endpoint: GET /api/settings/test-dns
- Integrated into deploy_customer() and undeploy_customer()
LDAP / Active Directory auth:
- New ldap_service.py: service-account bind + user search + user bind (ldap3)
- Optional AD group restriction via ldap_group_dn
- Login flow: LDAP first → local fallback (prevents admin lockout)
- LDAP users auto-created with auth_provider="ldap" and role="viewer"
- test-ldap endpoint: GET /api/settings/test-ldap
- reset-password/reset-mfa guards extended to block LDAP users
All credentials (dns_password, ldap_bind_password) encrypted with Fernet.
New DB columns added via backwards-compatible migrations.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- CORS: remove allow_origins=["*"]; restrict to ALLOWED_ORIGINS env var
(comma-separated list); default is no cross-origin access. Removed
allow_credentials=True and method/header wildcards.
- Security headers middleware: add X-Content-Type-Options, X-Frame-Options,
X-XSS-Protection, Referrer-Policy, Strict-Transport-Security to all
responses.
- users.py: guard POST /api/users so only users with role="admin" can
create new accounts (prevents privilege escalation by non-admin roles).
- auth.py: remove raw exception detail from Azure AD 500 response to
avoid leaking internal error messages / stack traces to clients.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Extract shared SlowAPI limiter to app/limiter.py to break circular
import between app.main and app.routers.auth
- Seed default SystemConfig row (id=1) on first DB init so settings
page works out of the box
- Make all docker_service.compose_* functions async (run_in_executor)
so long docker pulls/stops no longer block the async event loop
- Propagate async to netbird_service stop/start/restart and await
callers in deployments router
- Move customer delete to BackgroundTasks so the HTTP response returns
immediately and avoids frontend "Network error" on slow machines
- docker-compose: add :z SELinux labels, mount docker.sock directly,
add security_opt label:disable for socket access, extra_hosts for
host.docker.internal, enable DELETE/VOLUMES on socket proxy
- npm_service: auto-detect outbound host IP via UDP socket when
HOST_IP env var is not set
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Fix#1 - SECRET_KEY startup validation (config.py, .env):
- App refuses to start if SECRET_KEY is missing, shorter than 32 chars,
or matches a known insecure default value
- .env: replaced hardcoded test key with placeholder + generation hint
Fix#2 - Docker socket proxy (docker-compose.yml):
- Add tecnativa/docker-socket-proxy sidecar
- Only expose required Docker API endpoints (CONTAINERS, IMAGES,
NETWORKS, POST, EXEC); dangerous endpoints explicitly blocked
- Remove direct /var/run/docker.sock mount from main container
- Route Docker API via DOCKER_HOST=tcp://docker-socket-proxy:2375
Fix#3 - Azure AD group whitelist (auth.py, models.py, validators.py):
- New azure_allowed_group_id field in SystemConfig
- After token exchange, verify group membership via Graph API /me/memberOf
- Deny login with HTTP 403 if user is not in the required group
- New Azure AD users now get role 'viewer' instead of 'admin'
Fix#4 - Rate limiting on login (main.py, auth.py, requirements.txt):
- Add slowapi==0.1.9 dependency
- Initialize SlowAPI limiter in main.py with 429 exception handler
- Apply 10 requests/minute limit per IP on /login and /mfa/verify
Settings > NPM Integration now allows choosing between per-customer
Let's Encrypt certificates (default) or a shared wildcard certificate
already uploaded in NPM. Includes backend, frontend UI, and i18n support.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Global MFA toggle in Security settings, QR code setup on first login,
6-digit TOTP verification on subsequent logins. Azure AD users exempt.
Admins can reset user MFA. TOTP secrets encrypted at rest with Fernet.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
NPM's certificate creation endpoint rejects letsencrypt_agree and
letsencrypt_email in the meta field (schema validation error). The
LE email is configured globally in NPM settings. Empty meta works.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Three fixes:
1. When updating existing proxy host, preserve its certificate_id
and SSL settings instead of resetting to 0
2. Search NPM certificates by domain if proxy host has no cert
assigned (handles manually created certs)
3. Remove invalid 'nice_name' and 'dns_challenge' from LE cert
request payload (caused 400 error on newer NPM versions)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
When a proxy host already exists in NPM (domain "already in use"),
the code now finds the existing host, updates it, and requests SSL
instead of failing with an error. Also checks if the host already
has a valid certificate before requesting a new one from Let's Encrypt.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The deployment_logs table has a CHECK constraint allowing only
'success', 'error', 'info'. Using 'warning' caused an IntegrityError
that crashed the entire deployment.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The HTTP fallback (Step 9b) would rewrite all configs to HTTP when SSL
cert creation failed, but if the user then manually set up SSL in NPM
the dashboard would fail with "Unauthenticated" due to mixed content
(HTTPS page loading HTTP OAuth endpoints). Now keeps HTTPS configs and
logs a warning instead, so manual SSL setup works correctly.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
When redeploying a customer without undeploying first, the management
server would crash with FATAL because a new DataStoreEncryptionKey was
generated but the old database (encrypted with the old key) still
existed. Now:
- Reads existing key from management.json if present
- Reuses existing UDP port from deployment record
- Stops old containers before starting new ones
- Updates existing deployment record instead of creating duplicate
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Create NPM proxy host WITHOUT SSL initially (ssl_forced=False),
then request Let's Encrypt cert, then enable SSL only after cert
is assigned. Prevents broken proxy when cert fails.
- If SSL cert creation fails, automatically fall back to HTTP mode:
re-render management.json, dashboard.env, relay.env with http://
URLs and recreate containers so dashboard login works.
- Better error logging in _request_ssl with specific timeout hints.
- Use template variables for relay WebSocket protocol (rels/rel)
instead of hardcoded rels:// in management.json.j2 and relay.env.j2.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Socket detection inside Docker returns the container IP (172.18.0.x),
not the host IP. Now:
- install.sh detects host IP via hostname -I and stores in .env
- docker-compose.yml passes HOST_IP to the container
- npm_service.py reads HOST_IP from environment
- Increased SSL cert timeout to 120s (LE validation is slow)
- Added better logging for SSL cert creation/assignment
- README updated with HOST_IP in .env example
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- npm_service._get_forward_host() now detects the actual host IP via
UDP socket (works inside Docker containers) instead of using
172.17.0.1 Docker gateway which NPM can't reach
- install.sh uses hostname -I for NPM forward host
- Removed npm_api_url parameter from _get_forward_host()
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Forward proxy to host IP + dashboard_port instead of container name
- Remove redundant advanced_config (Caddy handles internal routing)
- Add provider: letsencrypt to SSL certificate request
- Add NPM UDP stream creation/deletion for STUN/TURN relay ports
- Add npm_stream_id to Deployment model with migration
- Fix API docs URL in README (/api/docs)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Multi-language support (EN/DE) with i18n engine and language files
- Configurable branding (name, subtitle, logo) in Settings
- Global default language and per-user language preference
- User management router with CRUD endpoints
- Customer status sync on start/stop/restart
- Health check fixes: derive status from container state, remove broken wget healthcheck
- Caddy reverse proxy and dashboard env templates for customer stacks
- Updated README with real hardware specs, prerequisites, and new features
- Removed .claude settings (JWT tokens) and build artifacts from tracking
- Updated .gitignore for .claude/ and Windows artifacts
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>