297 lines
13 KiB
Markdown
297 lines
13 KiB
Markdown
# Managed Registry Plan
|
|
|
|
Keystone should be self-hosted first. A fresh install should include a working build and image pipeline without requiring the user to bring an external Docker registry, S3 bucket, or separate build server.
|
|
|
|
## Product Principles
|
|
|
|
- The Keystone control node is the default build node.
|
|
- Keystone provides a first-party managed Docker registry by default.
|
|
- The managed registry stores images on local disk first.
|
|
- The registry storage path must be configurable for mounted VPS volumes.
|
|
- External registries, S3-backed storage, and dedicated build nodes are optional advanced features.
|
|
- Multi-server deployments should work out of the box after Keystone is installed.
|
|
- Registry credentials must not be persisted in operation scripts, logs, or UI-visible output.
|
|
- Old build artifacts should be pruned automatically, retaining the latest 3 successful artifacts per environment by default.
|
|
- Build and deploy should be separate phases, even when started by one user action.
|
|
- Users should be able to connect an existing Ubuntu server as a Keystone node without using a cloud provider integration.
|
|
|
|
## Default Self-Hosted Shape
|
|
|
|
When Keystone is installed on a server, that server becomes the control node. The install process should prepare:
|
|
|
|
- Keystone application services.
|
|
- Docker and Docker Compose.
|
|
- A managed `registry:2` service.
|
|
- Local registry storage.
|
|
- Generated registry credentials.
|
|
- A default build capability on the control node.
|
|
|
|
This is separate from server provisioning. Keystone needs two scripts/flows:
|
|
|
|
- `install-keystone.sh` installs Keystone itself on the control node.
|
|
- The remote provisioning script prepares other servers so they can be managed by Keystone.
|
|
|
|
Remote provisioning should continue to install Docker, configure SSH access, prepare the `keystone` user, and link the server back to Keystone. It should not be responsible for installing the Keystone application itself.
|
|
|
|
Default settings:
|
|
|
|
```text
|
|
Build node: Keystone control node
|
|
Registry: registry:2 managed by Keystone
|
|
Registry storage driver: local
|
|
Registry storage path: /home/keystone/registry/data
|
|
Image retention: latest 3 successful artifacts per environment
|
|
Auth: generated htpasswd credentials managed by Keystone
|
|
```
|
|
|
|
The install flow should allow overriding the storage path, for example:
|
|
|
|
```text
|
|
/mnt/keystone-registry
|
|
```
|
|
|
|
This lets users place registry image data on a mounted VPS volume while keeping Keystone's default behavior simple.
|
|
|
|
## Default Image Flow
|
|
|
|
```text
|
|
Git repository
|
|
-> Keystone control node builds Docker image
|
|
-> Keystone pushes image to the managed registry
|
|
-> Target servers pull image from the managed registry
|
|
-> Target servers run containers
|
|
```
|
|
|
|
The build node and registry are separate concepts:
|
|
|
|
- Build node: where `git clone`, `docker build`, and `docker push` run.
|
|
- Registry: where built images are stored and later pulled from.
|
|
|
|
The control node is the default build node, but users should later be able to add a dedicated build node from Keystone settings.
|
|
|
|
The running Keystone server is the control node. This does not necessarily need to be represented as a normal deploy target server at first. A lightweight installation/control-node setting may be enough until Keystone needs HA control-plane support.
|
|
|
|
If Keystone later supports HA control planes, the control node concept should become more explicit so the app can distinguish between:
|
|
|
|
- The current web/queue/scheduler node.
|
|
- The active registry host.
|
|
- The default build node.
|
|
- Runtime nodes used for deployed applications.
|
|
|
|
## Registry Exposure
|
|
|
|
The managed registry should be exposed over HTTPS where possible, ideally behind the control node's web proxy, for example:
|
|
|
|
```text
|
|
registry.example.com
|
|
```
|
|
|
|
Avoid defaulting to a plain `host:5000` registry if possible. Plain HTTP registries require Docker daemon insecure-registry configuration on every build and target server, which adds onboarding friction.
|
|
|
|
Target servers must be able to reach the registry URL before they can deploy images built by Keystone.
|
|
|
|
## Authentication
|
|
|
|
Use `registry:2` htpasswd authentication for the first version.
|
|
|
|
Keystone should:
|
|
|
|
- Generate registry credentials.
|
|
- Write the registry htpasswd file during provisioning.
|
|
- Store credentials encrypted.
|
|
- Configure build and target servers for registry access.
|
|
- Use `docker login --password-stdin` when login is needed.
|
|
|
|
Do not inline registry passwords into persisted operation scripts. Operation steps are stored and may be visible in the UI or logs.
|
|
|
|
Preferred approaches:
|
|
|
|
- Configure Docker auth on each server through a separate secure action.
|
|
- Or write root-owned / user-owned credential files on the server and have deployment scripts read from those files.
|
|
|
|
Token auth can be considered later if Keystone needs per-repository or per-server scoped credentials. It should not be part of the first implementation.
|
|
|
|
## Build Planning
|
|
|
|
Build planning should assume a default managed registry exists after install.
|
|
|
|
For the default path:
|
|
|
|
- Build strategy: build on control node.
|
|
- Registry: managed local registry.
|
|
- Artifact reference: full managed registry image reference.
|
|
|
|
Multi-server deploys should no longer block because the user has not configured an external registry. They should only block if the managed registry is missing, unhealthy, or unreachable.
|
|
|
|
External registries should remain available as an advanced override.
|
|
|
|
Build strategy should not be exposed to users as low-level values such as `target_server`, `dedicated_builder`, or `external_registry`. The UI should expose intent instead:
|
|
|
|
- Default build node.
|
|
- Specific build node.
|
|
- External registry override.
|
|
|
|
Internally, build planning can still map those choices to implementation strategies.
|
|
|
|
## Build Execution
|
|
|
|
The default build execution should:
|
|
|
|
1. Select the configured build node, defaulting to the control node.
|
|
2. Clone the application repository.
|
|
3. Render the Keystone Dockerfile.
|
|
4. Log in to the managed registry.
|
|
5. Build the image.
|
|
6. Tag the image using the managed registry reference.
|
|
7. Push the image.
|
|
8. Resolve and store the registry manifest digest.
|
|
|
|
Example flow:
|
|
|
|
```bash
|
|
docker login registry.example.com --username keystone --password-stdin
|
|
docker build --file Dockerfile.keystone --tag registry.example.com/application:aaaaaaaaaaaa .
|
|
docker push registry.example.com/application:aaaaaaaaaaaa
|
|
docker manifest inspect registry.example.com/application:aaaaaaaaaaaa
|
|
```
|
|
|
|
The stored digest must be the registry manifest digest, not a local image ID. Digest-based pulls and registry manifest deletion depend on this being correct.
|
|
|
|
Build execution should create a build operation that can succeed or fail independently from deployment. A deployment can then depend on a successful build artifact.
|
|
|
|
## Deploy Execution
|
|
|
|
Target servers should pull immutable image references from the managed registry.
|
|
|
|
Deploy execution should:
|
|
|
|
1. Ensure the target server has registry auth configured.
|
|
2. Pull the exact image digest.
|
|
3. Render Compose with the full registry image reference.
|
|
4. Start or update containers.
|
|
|
|
Example pull reference:
|
|
|
|
```text
|
|
registry.example.com/application@sha256:...
|
|
```
|
|
|
|
Compose should use the full registry reference, not only `sha256:...`.
|
|
|
|
Deploy execution should be a separate operation phase from build execution. The deploy phase should consume a completed build artifact and should not be responsible for building the artifact itself.
|
|
|
|
Operations should have explicit execution targets. Inferring the SSH target only from the operation target model becomes fragile once Keystone has build nodes, registry maintenance, and runtime deployment steps.
|
|
|
|
Each operation or operation step should be able to declare where it runs:
|
|
|
|
- Control node.
|
|
- Build node.
|
|
- Runtime server.
|
|
- Specific server.
|
|
|
|
## Pruning And Retention
|
|
|
|
Default retention should keep the latest 3 successful build artifacts per environment.
|
|
|
|
Pruning should also retain:
|
|
|
|
- Any artifact currently referenced by a service's available image digest.
|
|
- Any artifact currently referenced by a service's current image digest.
|
|
- Any artifact needed for an active deployment operation.
|
|
|
|
Pruning should remove old registry manifests first, then run registry garbage collection to remove unreferenced blobs from local disk.
|
|
|
|
`registry:2` requires deletion to be enabled:
|
|
|
|
```text
|
|
REGISTRY_STORAGE_DELETE_ENABLED=true
|
|
```
|
|
|
|
Garbage collection is safest when the registry is not accepting writes. The first implementation should run cleanup during a controlled maintenance window, using a lock so pruning does not race with active builds or pushes.
|
|
|
|
Suggested cleanup flow:
|
|
|
|
1. Acquire a registry maintenance lock.
|
|
2. Find prunable artifacts by environment retention rules.
|
|
3. Delete old manifests through the registry API.
|
|
4. Stop the registry or put it in a safe maintenance state.
|
|
5. Run registry garbage collection.
|
|
6. Restart the registry.
|
|
7. Mark artifacts as pruned or delete their records.
|
|
8. Release the lock.
|
|
|
|
## Future Extensions
|
|
|
|
These should be optional settings, not onboarding requirements:
|
|
|
|
- Dedicated build nodes.
|
|
- S3-compatible registry storage.
|
|
- External registries such as GHCR, Gitea, Docker Hub, or generic registries.
|
|
- Separate push and pull credentials.
|
|
- Credential rotation.
|
|
- Per-server or per-repository scoped auth.
|
|
- Configurable retention per application or environment.
|
|
|
|
The first version should optimize for a self-hosted user installing Keystone on a VPS and being able to deploy with minimal additional setup.
|
|
|
|
## Existing Server Provisioning
|
|
|
|
Keystone should support connecting an existing Ubuntu server as a managed node. This is important for users running VPSs, Proxmox VMs, homelab hardware, or manually provisioned servers.
|
|
|
|
The flow should be:
|
|
|
|
1. User creates a server record in Keystone as an existing server.
|
|
2. Keystone shows a one-time provisioning command.
|
|
3. User runs the command on the server as root or a sudo-capable user.
|
|
4. The script installs Docker and required packages.
|
|
5. The script creates/configures the `keystone` user.
|
|
6. The script installs Keystone's management SSH key.
|
|
7. The script calls back to Keystone with a one-time token.
|
|
8. Keystone marks the server active.
|
|
|
|
This should sit alongside cloud-provider provisioning. Cloud providers can create the VM automatically, but the same remote preparation logic should be reused where possible.
|
|
|
|
Provisioning callbacks should not authenticate only by `server_id` or IP address. They should use a short-lived, single-use provisioning token tied to the server record.
|
|
|
|
Avoid passing sensitive values such as sudo passwords in URL query strings. Safer options include:
|
|
|
|
- Generate a short-lived provisioning token and pass only that in the URL.
|
|
- Store sensitive bootstrap data server-side and let the provisioning script exchange the one-time token for the data it needs.
|
|
- Prefer SSH key-based provider bootstrap where available instead of root password bootstrap.
|
|
- If a password must be used, pass it over SSH stdin or an encrypted job payload, not through a script URL.
|
|
|
|
The remote provisioning script can still be downloaded from Keystone, but the URL should not contain long-lived secrets or reusable credentials.
|
|
|
|
### Sudo Password Handling
|
|
|
|
Keep the current Forge-like user model for now:
|
|
|
|
- Provisioned servers have a `keystone` user.
|
|
- SSH login is key-only.
|
|
- The generated sudo password is for the human user to SSH in and run elevated commands manually.
|
|
- Keystone automation continues to use SSH key access and Docker/sudo-capable permissions as required.
|
|
|
|
This model is acceptable, but sudo password delivery should be hardened.
|
|
|
|
Laravel protections help with some leak paths:
|
|
|
|
- `ShouldBeEncrypted` protects queued job payloads.
|
|
- Encrypted casts protect stored secrets.
|
|
- Hidden model attributes avoid accidental serialization.
|
|
- PHP `#[\SensitiveParameter]` can prevent secret values appearing in stack traces.
|
|
|
|
These protections do not cover query strings, shell process arguments, rendered scripts left on disk, reverse-proxy logs, or third-party request logging.
|
|
|
|
Minimal hardening plan:
|
|
|
|
1. Keep generating a sudo password for the provisioned `keystone` user.
|
|
2. Keep flashing the sudo password to the user once after server creation.
|
|
3. Add `#[\SensitiveParameter]` to job constructor parameters such as `rootPassword` and `sudoPassword`.
|
|
4. Stop passing `sudo_password` in the provision script URL.
|
|
5. Use a short-lived, single-use provisioning token in the URL instead.
|
|
6. Store the sudo password encrypted server-side until the provisioning script is rendered or exchanged.
|
|
7. Ensure the remote provisioning script deletes itself at the end of provisioning.
|
|
8. Avoid writing the plaintext sudo password to logs or long-lived files.
|
|
|
|
The goal is to preserve the simple human-admin UX while removing avoidable secret exposure from URLs and leftover bootstrap artifacts.
|