keystone/docs/managed-registry.md

# Managed Registry Plan

Keystone should be self-hosted first. A fresh install should include a working build and image pipeline without requiring the user to bring an external Docker registry, S3 bucket, or separate build server.

## Product Principles

- The Keystone control node is the default build node.
- Keystone provides a first-party managed Docker registry by default.
- The managed registry stores images on local disk first.
- The registry storage path must be configurable for mounted VPS volumes.
- External registries, S3-backed storage, and dedicated build nodes are optional advanced features.
- Multi-server deployments should work out of the box after Keystone is installed.
- Registry credentials must not be persisted in operation scripts, logs, or UI-visible output.
- Old build artifacts should be pruned automatically, retaining the latest 3 successful artifacts per environment by default.
- Build and deploy should be separate phases, even when started by one user action.
- Users should be able to connect an existing Ubuntu server as a Keystone node without using a cloud provider integration.

## Default Self-Hosted Shape

When Keystone is installed on a server, that server becomes the control node. The install process should prepare:

- Keystone application services.
- Docker and Docker Compose.
- A managed `registry:2` service.
- Local registry storage.
- Generated registry credentials.
- A default build capability on the control node.

This is separate from server provisioning. Keystone needs two scripts/flows:

- `install-keystone.sh` installs Keystone itself on the control node.
- The remote provisioning script prepares other servers so they can be managed by Keystone.

Remote provisioning should continue to install Docker, configure SSH access, prepare the `keystone` user, and link the server back to Keystone. It should not be responsible for installing the Keystone application itself.

Default settings:

```text
Build node: Keystone control node
Registry: registry:2 managed by Keystone
Registry storage driver: local
Registry storage path: /home/keystone/registry/data
Image retention: latest 3 successful artifacts per environment
Auth: generated htpasswd credentials managed by Keystone
```

The install flow should allow overriding the storage path, for example:

```text
/mnt/keystone-registry
```

This lets users place registry image data on a mounted VPS volume while keeping Keystone's default behavior simple.

## Default Image Flow

```text
Git repository
  -> Keystone control node builds Docker image
  -> Keystone pushes image to the managed registry
  -> Target servers pull image from the managed registry
  -> Target servers run containers
```

The build node and registry are separate concepts:

- Build node: where `git clone`, `docker build`, and `docker push` run.
- Registry: where built images are stored and later pulled from.

The control node is the default build node, but users should later be able to add a dedicated build node from Keystone settings.

The running Keystone server is the control node. This does not necessarily need to be represented as a normal deploy target server at first. A lightweight installation/control-node setting may be enough until Keystone needs HA control-plane support.

If Keystone later supports HA control planes, the control node concept should become more explicit so the app can distinguish between:

- The current web/queue/scheduler node.
- The active registry host.
- The default build node.
- Runtime nodes used for deployed applications.

## Registry Exposure

The managed registry should be exposed over HTTPS where possible, ideally behind the control node's web proxy, for example:

```text
registry.example.com
```

Avoid defaulting to a plain `host:5000` registry if possible. Plain HTTP registries require Docker daemon insecure-registry configuration on every build and target server, which adds onboarding friction.

Target servers must be able to reach the registry URL before they can deploy images built by Keystone.

## Authentication

Use `registry:2` htpasswd authentication for the first version.

Keystone should:

- Generate registry credentials.
- Write the registry htpasswd file during provisioning.
- Store credentials encrypted.
- Configure build and target servers for registry access.
- Use `docker login --password-stdin` when login is needed.

Do not inline registry passwords into persisted operation scripts. Operation steps are stored and may be visible in the UI or logs.

Preferred approaches:

- Configure Docker auth on each server through a separate secure action.
- Or write root-owned / user-owned credential files on the server and have deployment scripts read from those files.

Token auth can be considered later if Keystone needs per-repository or per-server scoped credentials. It should not be part of the first implementation.

## Build Planning

Build planning should assume a default managed registry exists after install.

For the default path:

- Build strategy: build on control node.
- Registry: managed local registry.
- Artifact reference: full managed registry image reference.

Multi-server deploys should no longer block because the user has not configured an external registry. They should only block if the managed registry is missing, unhealthy, or unreachable.

External registries should remain available as an advanced override.

Build strategy should not be exposed to users as low-level values such as `target_server`, `dedicated_builder`, or `external_registry`. The UI should expose intent instead:

- Default build node.
- Specific build node.
- External registry override.

Internally, build planning can still map those choices to implementation strategies.

## Build Execution

The default build execution should:

1. Select the configured build node, defaulting to the control node.
2. Clone the application repository.
3. Render the Keystone Dockerfile.
4. Log in to the managed registry.
5. Build the image.
6. Tag the image using the managed registry reference.
7. Push the image.
8. Resolve and store the registry manifest digest.

Example flow:

```bash
docker login registry.example.com --username keystone --password-stdin
docker build --file Dockerfile.keystone --tag registry.example.com/application:aaaaaaaaaaaa .
docker push registry.example.com/application:aaaaaaaaaaaa
docker manifest inspect registry.example.com/application:aaaaaaaaaaaa
```

The stored digest must be the registry manifest digest, not a local image ID. Digest-based pulls and registry manifest deletion depend on this being correct.

Build execution should create a build operation that can succeed or fail independently from deployment. A deployment can then depend on a successful build artifact.

## Deploy Execution

Target servers should pull immutable image references from the managed registry.

Deploy execution should:

1. Ensure the target server has registry auth configured.
2. Pull the exact image digest.
3. Render Compose with the full registry image reference.
4. Start or update containers.

Example pull reference:

```text
registry.example.com/application@sha256:...
```

Compose should use the full registry reference, not only `sha256:...`.

Deploy execution should be a separate operation phase from build execution. The deploy phase should consume a completed build artifact and should not be responsible for building the artifact itself.

Operations should have explicit execution targets. Inferring the SSH target only from the operation target model becomes fragile once Keystone has build nodes, registry maintenance, and runtime deployment steps.

Each operation or operation step should be able to declare where it runs:

- Control node.
- Build node.
- Runtime server.
- Specific server.

## Pruning And Retention

Default retention should keep the latest 3 successful build artifacts per environment.

Pruning should also retain:

- Any artifact currently referenced by a service's available image digest.
- Any artifact currently referenced by a service's current image digest.
- Any artifact needed for an active deployment operation.

Pruning should remove old registry manifests first, then run registry garbage collection to remove unreferenced blobs from local disk.

`registry:2` requires deletion to be enabled:

```text
REGISTRY_STORAGE_DELETE_ENABLED=true
```

Garbage collection is safest when the registry is not accepting writes. The first implementation should run cleanup during a controlled maintenance window, using a lock so pruning does not race with active builds or pushes.

Suggested cleanup flow:

1. Acquire a registry maintenance lock.
2. Find prunable artifacts by environment retention rules.
3. Delete old manifests through the registry API.
4. Stop the registry or put it in a safe maintenance state.
5. Run registry garbage collection.
6. Restart the registry.
7. Mark artifacts as pruned or delete their records.
8. Release the lock.

## Future Extensions

These should be optional settings, not onboarding requirements:

- Dedicated build nodes.
- S3-compatible registry storage.
- External registries such as GHCR, Gitea, Docker Hub, or generic registries.
- Separate push and pull credentials.
- Credential rotation.
- Per-server or per-repository scoped auth.
- Configurable retention per application or environment.

The first version should optimize for a self-hosted user installing Keystone on a VPS and being able to deploy with minimal additional setup.

## Existing Server Provisioning

Keystone should support connecting an existing Ubuntu server as a managed node. This is important for users running VPSs, Proxmox VMs, homelab hardware, or manually provisioned servers.

The flow should be:

1. User creates a server record in Keystone as an existing server.
2. Keystone shows a one-time provisioning command.
3. User runs the command on the server as root or a sudo-capable user.
4. The script installs Docker and required packages.
5. The script creates/configures the `keystone` user.
6. The script installs Keystone's management SSH key.
7. The script calls back to Keystone with a one-time token.
8. Keystone marks the server active.

This should sit alongside cloud-provider provisioning. Cloud providers can create the VM automatically, but the same remote preparation logic should be reused where possible.

Provisioning callbacks should not authenticate only by `server_id` or IP address. They should use a short-lived, single-use provisioning token tied to the server record.

Avoid passing sensitive values such as sudo passwords in URL query strings. Safer options include:

- Generate a short-lived provisioning token and pass only that in the URL.
- Store sensitive bootstrap data server-side and let the provisioning script exchange the one-time token for the data it needs.
- Prefer SSH key-based provider bootstrap where available instead of root password bootstrap.
- If a password must be used, pass it over SSH stdin or an encrypted job payload, not through a script URL.

The remote provisioning script can still be downloaded from Keystone, but the URL should not contain long-lived secrets or reusable credentials.

### Sudo Password Handling

Keep the current Forge-like user model for now:

- Provisioned servers have a `keystone` user.
- SSH login is key-only.
- The generated sudo password is for the human user to SSH in and run elevated commands manually.
- Keystone automation continues to use SSH key access and Docker/sudo-capable permissions as required.

This model is acceptable, but sudo password delivery should be hardened.

Laravel protections help with some leak paths:

- `ShouldBeEncrypted` protects queued job payloads.
- Encrypted casts protect stored secrets.
- Hidden model attributes avoid accidental serialization.
- PHP `#[\SensitiveParameter]` can prevent secret values appearing in stack traces.

These protections do not cover query strings, shell process arguments, rendered scripts left on disk, reverse-proxy logs, or third-party request logging.

Minimal hardening plan:

1. Keep generating a sudo password for the provisioned `keystone` user.
2. Keep flashing the sudo password to the user once after server creation.
3. Add `#[\SensitiveParameter]` to job constructor parameters such as `rootPassword` and `sudoPassword`.
4. Stop passing `sudo_password` in the provision script URL.
5. Use a short-lived, single-use provisioning token in the URL instead.
6. Store the sudo password encrypted server-side until the provisioning script is rendered or exchanged.
7. Ensure the remote provisioning script deletes itself at the end of provisioning.
8. Avoid writing the plaintext sudo password to logs or long-lived files.

The goal is to preserve the simple human-admin UX while removing avoidable secret exposure from URLs and leftover bootstrap artifacts.