460 lines
15 KiB
Markdown
460 lines
15 KiB
Markdown
# NetBird MSP Appliance - Claude Code Specification
|
||
|
||
## Project Overview
|
||
Build a complete, production-ready multi-tenant NetBird management platform that runs entirely in Docker containers. This is an MSP (Managed Service Provider) tool to manage 100+ isolated NetBird instances from a single web interface.
|
||
|
||
## Technology Stack
|
||
- **Backend**: Python 3.11+ with FastAPI
|
||
- **Frontend**: HTML5 + Bootstrap 5 + Vanilla JavaScript (no frameworks)
|
||
- **Database**: SQLite
|
||
- **Containerization**: Docker + Docker Compose
|
||
- **Templating**: Jinja2 for Docker Compose generation
|
||
- **Integration**: Docker Python SDK, Nginx Proxy Manager API
|
||
|
||
## Project Structure
|
||
|
||
```
|
||
netbird-msp-appliance/
|
||
├── README.md # Main documentation
|
||
├── QUICKSTART.md # Quick start guide
|
||
├── ARCHITECTURE.md # Architecture documentation
|
||
├── LICENSE # MIT License
|
||
├── .gitignore # Git ignore file
|
||
├── .env.example # Environment variables template
|
||
├── install.sh # One-click installation script
|
||
├── docker-compose.yml # Main application container
|
||
├── Dockerfile # Application container definition
|
||
├── requirements.txt # Python dependencies
|
||
│
|
||
├── app/ # Python application
|
||
│ ├── __init__.py
|
||
│ ├── main.py # FastAPI entry point
|
||
│ ├── models.py # SQLAlchemy models
|
||
│ ├── database.py # Database setup
|
||
│ ├── dependencies.py # FastAPI dependencies
|
||
│ │
|
||
│ ├── routers/ # API endpoints
|
||
│ │ ├── __init__.py
|
||
│ │ ├── auth.py # Authentication endpoints
|
||
│ │ ├── customers.py # Customer CRUD
|
||
│ │ ├── deployments.py # Deployment management
|
||
│ │ ├── monitoring.py # Status & health checks
|
||
│ │ └── settings.py # System configuration
|
||
│ │
|
||
│ ├── services/ # Business logic
|
||
│ │ ├── __init__.py
|
||
│ │ ├── docker_service.py # Docker container management
|
||
│ │ ├── npm_service.py # NPM API integration
|
||
│ │ ├── netbird_service.py # NetBird deployment orchestration
|
||
│ │ └── port_manager.py # UDP port allocation
|
||
│ │
|
||
│ └── utils/ # Utilities
|
||
│ ├── __init__.py
|
||
│ ├── config.py # Configuration management
|
||
│ ├── security.py # Encryption, hashing
|
||
│ └── validators.py # Input validation
|
||
│
|
||
├── templates/ # Jinja2 templates
|
||
│ ├── docker-compose.yml.j2 # Per-customer Docker Compose
|
||
│ ├── management.json.j2 # NetBird management config
|
||
│ └── relay.env.j2 # Relay environment variables
|
||
│
|
||
├── static/ # Frontend files
|
||
│ ├── index.html # Main dashboard
|
||
│ ├── css/
|
||
│ │ └── styles.css # Custom styles
|
||
│ └── js/
|
||
│ └── app.js # Frontend JavaScript
|
||
│
|
||
├── tests/ # Unit & integration tests
|
||
│ ├── __init__.py
|
||
│ ├── test_customer_api.py
|
||
│ ├── test_deployment.py
|
||
│ └── test_docker_service.py
|
||
│
|
||
└── docs/ # Additional documentation
|
||
├── API.md # API documentation
|
||
├── DEPLOYMENT.md # Deployment guide
|
||
└── TROUBLESHOOTING.md # Common issues
|
||
```
|
||
|
||
## Key Features to Implement
|
||
|
||
### 1. Customer Management
|
||
- **Create Customer**: Web form → API → Deploy NetBird instance
|
||
- **List Customers**: Paginated table with search/filter
|
||
- **Customer Details**: Status, logs, setup URL, actions
|
||
- **Delete Customer**: Remove all containers, NPM entries, data
|
||
|
||
### 2. Automated Deployment
|
||
**Workflow when creating customer:**
|
||
1. Validate inputs (subdomain unique, email valid)
|
||
2. Allocate ports (Management internal, Relay UDP public)
|
||
3. Generate configs from Jinja2 templates
|
||
4. Create instance directory: `/opt/netbird-instances/kunde{id}/`
|
||
5. Write `docker-compose.yml`, `management.json`, `relay.env`
|
||
6. Start Docker containers via Docker SDK
|
||
7. Wait for health checks (max 60s)
|
||
8. Create NPM proxy hosts via API (with SSL)
|
||
9. Update database with deployment info
|
||
10. Return setup URL to user
|
||
|
||
### 3. Web-Based Configuration
|
||
**All settings in database, editable via UI:**
|
||
- Base Domain
|
||
- Admin Email
|
||
- NPM API URL & Token
|
||
- NetBird Docker Images
|
||
- Port Ranges
|
||
- Data Directories
|
||
|
||
No manual config file editing required!
|
||
|
||
### 4. Nginx Proxy Manager Integration
|
||
**Per customer, create proxy host:**
|
||
- Domain: `{subdomain}.{base_domain}`
|
||
- Forward to: `netbird-kunde{id}-dashboard:80`
|
||
- SSL: Automatic Let's Encrypt
|
||
- Advanced config: Route `/api/*` to management, `/signalexchange.*` to signal, `/relay` to relay
|
||
|
||
### 5. Port Management
|
||
**UDP Ports for STUN/Relay (publicly accessible):**
|
||
- Customer 1: 3478
|
||
- Customer 2: 3479
|
||
- ...
|
||
- Customer 100: 3577
|
||
|
||
**Algorithm:**
|
||
- Find next available port starting from 3478
|
||
- Check if port not in use (via `netstat` or database)
|
||
- Assign to customer
|
||
- Store in database
|
||
|
||
### 6. Monitoring & Health Checks
|
||
- Container status (running/stopped/failed)
|
||
- Health check endpoints (HTTP checks to management service)
|
||
- Resource usage (via Docker stats API)
|
||
- Relay connectivity test
|
||
|
||
## Database Schema
|
||
|
||
### Table: customers
|
||
```sql
|
||
CREATE TABLE customers (
|
||
id INTEGER PRIMARY KEY AUTOINCREMENT,
|
||
name TEXT NOT NULL,
|
||
company TEXT,
|
||
subdomain TEXT UNIQUE NOT NULL,
|
||
email TEXT NOT NULL,
|
||
max_devices INTEGER DEFAULT 20,
|
||
notes TEXT,
|
||
status TEXT DEFAULT 'active' CHECK(status IN ('active', 'inactive', 'deploying', 'error')),
|
||
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
|
||
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
|
||
);
|
||
```
|
||
|
||
### Table: deployments
|
||
```sql
|
||
CREATE TABLE deployments (
|
||
id INTEGER PRIMARY KEY AUTOINCREMENT,
|
||
customer_id INTEGER NOT NULL UNIQUE,
|
||
container_prefix TEXT NOT NULL,
|
||
relay_udp_port INTEGER UNIQUE NOT NULL,
|
||
npm_proxy_id INTEGER,
|
||
relay_secret TEXT NOT NULL,
|
||
setup_url TEXT,
|
||
deployment_status TEXT DEFAULT 'pending' CHECK(deployment_status IN ('pending', 'running', 'stopped', 'failed')),
|
||
deployed_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
|
||
last_health_check TIMESTAMP,
|
||
FOREIGN KEY (customer_id) REFERENCES customers(id) ON DELETE CASCADE
|
||
);
|
||
```
|
||
|
||
### Table: system_config
|
||
```sql
|
||
CREATE TABLE system_config (
|
||
id INTEGER PRIMARY KEY CHECK (id = 1),
|
||
base_domain TEXT NOT NULL,
|
||
admin_email TEXT NOT NULL,
|
||
npm_api_url TEXT NOT NULL,
|
||
npm_api_token_encrypted TEXT NOT NULL,
|
||
netbird_management_image TEXT DEFAULT 'netbirdio/management:latest',
|
||
netbird_signal_image TEXT DEFAULT 'netbirdio/signal:latest',
|
||
netbird_relay_image TEXT DEFAULT 'netbirdio/relay:latest',
|
||
netbird_dashboard_image TEXT DEFAULT 'netbirdio/dashboard:latest',
|
||
data_dir TEXT DEFAULT '/opt/netbird-instances',
|
||
docker_network TEXT DEFAULT 'npm-network',
|
||
relay_base_port INTEGER DEFAULT 3478,
|
||
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
|
||
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
|
||
);
|
||
```
|
||
|
||
### Table: deployment_logs
|
||
```sql
|
||
CREATE TABLE deployment_logs (
|
||
id INTEGER PRIMARY KEY AUTOINCREMENT,
|
||
customer_id INTEGER NOT NULL,
|
||
action TEXT NOT NULL,
|
||
status TEXT NOT NULL CHECK(status IN ('success', 'error', 'info')),
|
||
message TEXT,
|
||
details TEXT,
|
||
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
|
||
FOREIGN KEY (customer_id) REFERENCES customers(id) ON DELETE CASCADE
|
||
);
|
||
```
|
||
|
||
### Table: users (simple auth)
|
||
```sql
|
||
CREATE TABLE users (
|
||
id INTEGER PRIMARY KEY AUTOINCREMENT,
|
||
username TEXT UNIQUE NOT NULL,
|
||
password_hash TEXT NOT NULL,
|
||
email TEXT,
|
||
is_active BOOLEAN DEFAULT TRUE,
|
||
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
|
||
);
|
||
```
|
||
|
||
## API Endpoints to Implement
|
||
|
||
### Authentication
|
||
```
|
||
POST /api/auth/login # Login and get token
|
||
POST /api/auth/logout # Logout
|
||
GET /api/auth/me # Get current user
|
||
POST /api/auth/change-password
|
||
```
|
||
|
||
### Customers
|
||
```
|
||
POST /api/customers # Create + auto-deploy
|
||
GET /api/customers # List all (pagination, search, filter)
|
||
GET /api/customers/{id} # Get details
|
||
PUT /api/customers/{id} # Update
|
||
DELETE /api/customers/{id} # Delete + cleanup
|
||
```
|
||
|
||
### Deployments
|
||
```
|
||
POST /api/customers/{id}/deploy # Manual deploy
|
||
POST /api/customers/{id}/start # Start containers
|
||
POST /api/customers/{id}/stop # Stop containers
|
||
POST /api/customers/{id}/restart # Restart containers
|
||
GET /api/customers/{id}/logs # Get container logs
|
||
GET /api/customers/{id}/health # Health check
|
||
```
|
||
|
||
### Monitoring
|
||
```
|
||
GET /api/monitoring/status # System overview
|
||
GET /api/monitoring/customers # All customers status
|
||
GET /api/monitoring/resources # Host resource usage
|
||
```
|
||
|
||
### Settings
|
||
```
|
||
GET /api/settings/system # Get system config
|
||
PUT /api/settings/system # Update system config
|
||
GET /api/settings/test-npm # Test NPM connectivity
|
||
```
|
||
|
||
## Docker Compose Template (Per Customer)
|
||
|
||
```yaml
|
||
version: '3.8'
|
||
|
||
networks:
|
||
npm-network:
|
||
external: true
|
||
|
||
services:
|
||
netbird-management:
|
||
image: {{ netbird_management_image }}
|
||
container_name: netbird-kunde{{ customer_id }}-management
|
||
restart: unless-stopped
|
||
networks:
|
||
- npm-network
|
||
volumes:
|
||
- {{ instance_dir }}/data/management:/var/lib/netbird
|
||
- {{ instance_dir }}/management.json:/etc/netbird/management.json
|
||
command: ["--port", "80", "--log-file", "console", "--log-level", "info",
|
||
"--single-account-mode-domain={{ subdomain }}.{{ base_domain }}",
|
||
"--dns-domain={{ subdomain }}.{{ base_domain }}"]
|
||
|
||
netbird-signal:
|
||
image: {{ netbird_signal_image }}
|
||
container_name: netbird-kunde{{ customer_id }}-signal
|
||
restart: unless-stopped
|
||
networks:
|
||
- npm-network
|
||
volumes:
|
||
- {{ instance_dir }}/data/signal:/var/lib/netbird
|
||
|
||
netbird-relay:
|
||
image: {{ netbird_relay_image }}
|
||
container_name: netbird-kunde{{ customer_id }}-relay
|
||
restart: unless-stopped
|
||
networks:
|
||
- npm-network
|
||
ports:
|
||
- "{{ relay_udp_port }}:3478/udp"
|
||
env_file:
|
||
- {{ instance_dir }}/relay.env
|
||
environment:
|
||
- NB_ENABLE_STUN=true
|
||
- NB_STUN_PORTS=3478
|
||
- NB_LISTEN_ADDRESS=:80
|
||
- NB_EXPOSED_ADDRESS=rels://{{ subdomain }}.{{ base_domain }}:443
|
||
- NB_AUTH_SECRET={{ relay_secret }}
|
||
|
||
netbird-dashboard:
|
||
image: {{ netbird_dashboard_image }}
|
||
container_name: netbird-kunde{{ customer_id }}-dashboard
|
||
restart: unless-stopped
|
||
networks:
|
||
- npm-network
|
||
environment:
|
||
- NETBIRD_MGMT_API_ENDPOINT=https://{{ subdomain }}.{{ base_domain }}
|
||
- NETBIRD_MGMT_GRPC_API_ENDPOINT=https://{{ subdomain }}.{{ base_domain }}
|
||
```
|
||
|
||
## Frontend Requirements
|
||
|
||
### Main Dashboard (index.html)
|
||
**Layout:**
|
||
- Navbar: Logo, "New Customer" button, User menu (settings, logout)
|
||
- Stats Cards: Total customers, Active, Inactive, Errors
|
||
- Customer Table: Name, Subdomain, Status, Devices, Actions
|
||
- Pagination: 25 customers per page
|
||
- Search bar: Filter by name, subdomain, email
|
||
- Status filter dropdown: All, Active, Inactive, Error
|
||
|
||
**Customer Table Actions:**
|
||
- View Details (→ customer detail page)
|
||
- Start/Stop/Restart (inline buttons)
|
||
- Delete (with confirmation modal)
|
||
|
||
### Customer Detail Page
|
||
**Tabs:**
|
||
1. **Info**: All customer details, edit button
|
||
2. **Deployment**: Status, Setup URL (copy button), Container status
|
||
3. **Logs**: Real-time logs from all containers (auto-refresh)
|
||
4. **Health**: Health check results, relay connectivity test
|
||
|
||
### Settings Page
|
||
**Tabs:**
|
||
1. **System Configuration**: All system settings, save button
|
||
2. **NPM Integration**: API URL, Token, Test button
|
||
3. **Images**: NetBird Docker image tags
|
||
4. **Security**: Change admin password
|
||
|
||
### Modal Dialogs
|
||
- New/Edit Customer Form
|
||
- Delete Confirmation
|
||
- Deployment Progress (with spinner)
|
||
- Error Display
|
||
|
||
## Security Requirements
|
||
|
||
1. **Password Hashing**: Use bcrypt for admin password
|
||
2. **Secret Encryption**: Encrypt NPM token and relay secrets with Fernet
|
||
3. **Input Validation**: Pydantic models for all API inputs
|
||
4. **SQL Injection Prevention**: Use SQLAlchemy ORM (no raw queries)
|
||
5. **CSRF Protection**: Token-based authentication
|
||
6. **Rate Limiting**: Prevent brute force on login endpoint
|
||
|
||
## Error Handling
|
||
|
||
All operations should have comprehensive error handling:
|
||
|
||
```python
|
||
try:
|
||
# Deploy customer
|
||
result = deploy_customer(customer_id)
|
||
except DockerException as e:
|
||
# Rollback: Stop containers
|
||
# Log error
|
||
# Update status to 'failed'
|
||
# Return error to user
|
||
except NPMException as e:
|
||
# Rollback: Remove containers
|
||
# Log error
|
||
# Update status to 'failed'
|
||
except Exception as e:
|
||
# Generic rollback
|
||
# Log error
|
||
# Alert admin
|
||
```
|
||
|
||
## Testing Requirements
|
||
|
||
1. **Unit Tests**: All services (docker_service, npm_service, etc.)
|
||
2. **Integration Tests**: Full deployment workflow
|
||
3. **API Tests**: All endpoints with different scenarios
|
||
4. **Mock External Dependencies**: Docker API, NPM API
|
||
|
||
## Deployment Process
|
||
|
||
1. Clone repository
|
||
2. Run `./install.sh`
|
||
3. Access `http://server-ip:8000`
|
||
4. Complete setup wizard
|
||
5. Deploy first customer
|
||
|
||
## System Requirements Documentation
|
||
|
||
**Include in README.md:**
|
||
|
||
### For 100 Customers:
|
||
- **CPU**: 16 cores (minimum 8)
|
||
- **RAM**: 64 GB (minimum) - 128 GB (recommended)
|
||
- Formula: `(100 customers × 600 MB) + 8 GB overhead = 68 GB`
|
||
- **Disk**: 500 GB SSD (minimum) - 1 TB recommended
|
||
- **Network**: 1 Gbps dedicated connection
|
||
- **OS**: Ubuntu 22.04 LTS or 24.04 LTS
|
||
|
||
### Port Requirements:
|
||
- **TCP 8000**: Web UI
|
||
- **UDP 3478-3577**: Relay/STUN (100 ports for 100 customers)
|
||
|
||
## Success Criteria
|
||
|
||
✅ One-command installation via `install.sh`
|
||
✅ Web-based configuration (no manual file editing)
|
||
✅ Customer deployment < 2 minutes
|
||
✅ All settings in database
|
||
✅ Automatic NPM integration
|
||
✅ Comprehensive error handling
|
||
✅ Clean, professional UI
|
||
✅ Full API documentation (auto-generated)
|
||
✅ Health monitoring
|
||
✅ Easy to deploy on fresh Ubuntu VM
|
||
|
||
## Special Notes for Claude Code
|
||
|
||
- **Use type hints** throughout Python code
|
||
- **Document all functions** with docstrings
|
||
- **Follow PEP 8** style guidelines
|
||
- **Create modular code**: Each service should be independently testable
|
||
- **Use async/await** where appropriate (FastAPI endpoints)
|
||
- **Provide comprehensive comments** for complex logic
|
||
- **Include error messages** that help users troubleshoot
|
||
|
||
## File Priorities
|
||
|
||
Create in this order:
|
||
1. Basic structure (directories, requirements.txt, Dockerfile, docker-compose.yml)
|
||
2. Database models and setup (models.py, database.py)
|
||
3. Core services (docker_service.py, port_manager.py)
|
||
4. API routers (start with customers.py)
|
||
5. NPM integration (npm_service.py)
|
||
6. Templates (Jinja2 files)
|
||
7. Frontend (HTML, CSS, JS)
|
||
8. Installation script
|
||
9. Documentation
|
||
10. Tests
|
||
|
||
This specification provides everything needed to build a production-ready NetBird MSP Appliance!
|