Implement Keystone environment deployments
This commit is contained in:
220
docs/implementation-review.md
Normal file
220
docs/implementation-review.md
Normal file
@@ -0,0 +1,220 @@
|
||||
# Keystone Implementation Review — Gaps vs `docs/implementation-spec.md`
|
||||
|
||||
The schema/migrations/models are about 98% correct. The orchestration, drivers, UI, and tests have substantial gaps. Below are concrete, file-anchored issues to fix.
|
||||
|
||||
## Critical orchestration bugs
|
||||
|
||||
### 1. Operation parent/child hierarchy is flat — replicas are siblings of their service_deploy, not children
|
||||
**Spec §3 example** nests `service_deploy → replica_deploy`. **`app/Jobs/Environments/DeployEnvironment.php:74-82`** creates both as siblings under the same `environment_deploy` parent. Replica_deploy siblings rely on `RunStep::dispatchNextSiblingOperation` (`app/Jobs/Services/RunStep.php:120-140`) ordering by `id` — fragile, and if `service_deploy` fails, replica operations are not cancelled.
|
||||
**Fix:** Nest replica operations under their service's `service_deploy` operation (parent_id = service_deploy.id), and cascade-cancel children when a parent fails.
|
||||
|
||||
### 2. Failed operations don't cancel siblings or children
|
||||
`RunStep::failed` (`app/Jobs/Services/RunStep.php:163-178`) only cancels the failed operation's remaining steps. Sibling/child operations under the same parent continue to dispatch via `dispatchNextSiblingOperation`. A failing service deploy will still trigger gateway cutover.
|
||||
**Fix:** On step failure, mark parent + all descendant operations as `CANCELLED`/`FAILED`. Re-check during sibling dispatch.
|
||||
|
||||
### 3. Gateway cutover uses a hardcoded container name that doesn't exist
|
||||
`DeployEnvironment.php:202` runs `docker exec keystone-caddy caddy reload ...`, but Caddy replicas are created with name `keystone-service-{service->id}-{N}` per `DeployEnvironment.php:155`. The `keystone-caddy` container is never created — cutover will always fail.
|
||||
**Fix:** Look up the Caddy service's replica container name; or set a stable container_name in Caddy compose.
|
||||
|
||||
### 4. Gateway cutover is monolithic; no add-upstream/reload/drain sub-sequence
|
||||
**Spec §15** requires: render new replica → health check → add new upstream → reload → drain old → stop old. `DeployEnvironment.php:201-206` does only `caddy reload && sleep 10 && stop draining`. There's no add-upstream step (Caddyfile is fully overwritten at `slice_configure` time), no real health check during cutover, and `sleep 10` is an arbitrary drain.
|
||||
**Fix:** Split into separate steps with explicit ordering, and tie drain to active connections / Caddy upstream health.
|
||||
|
||||
### 5. `dispatchChildOperations` dispatches only the first child's first step
|
||||
`DeployEnvironment.php:377-387` dispatches a single step. Continuation depends on `RunStep::dispatchNextSiblingOperation` chasing siblings by id. If a child operation has zero steps (e.g. `service_deploy` for a service whose driver returned no plan), the chain dies silently.
|
||||
**Fix:** Make a single orchestrator (e.g. `DispatchOperationChain`) that knows how to walk the tree. Don't rely on implicit id-ordering between independently-created sibling operations.
|
||||
|
||||
## Driver contract & implementations
|
||||
|
||||
### 6. `Driver` base contract is anemic
|
||||
**Spec §9** lists 13 required driver capabilities. `app/Drivers/Driver.php` only declares `__construct` and `getOperationPlan`. Image policy, ports, volumes, env schema, health checks, resource defaults, slice types, env exports, firewall, update behavior are scattered across drivers without contractual enforcement.
|
||||
**Fix:** Define an interface with explicit methods (`type()`, `versionTrack()`, `defaultPorts()`, `firewallRules()`, `updateBehavior()`, etc.) and assert per-driver via tests.
|
||||
|
||||
### 7. `Caddy2Driver::buildCaddyfile()` reads an undefined field — dead code
|
||||
`app/Drivers/Caddy/Caddy2Driver.php:46` references `$this->service->credentials['backend']`. Nothing ever sets this. The actual Caddyfile is rendered inline by `DeployEnvironment::configureCaddyRouteScript` (`app/Jobs/Environments/DeployEnvironment.php:321-335`).
|
||||
**Fix:** Delete `buildCaddyfile()`, or move Caddyfile generation into the driver and remove the duplicate in `DeployEnvironment`.
|
||||
|
||||
### 8. Postgres18Driver has no slice provisioning in its operation plan
|
||||
**Spec §12:** "Creating a Postgres database/user should run as a slice operation against an existing Postgres replica, not redeploy the Postgres container." `AttachManagedService::createSliceProvisionOperation` (`app/Actions/Environments/AttachManagedService.php:108-125`) hardcodes the SQL script outside the driver. Slice provisioning logic belongs in the driver so other Postgres versions can implement it.
|
||||
**Fix:** Add `provisionSliceScript(ServiceSlice $slice): string` to the slice contract; have Postgres driver own the SQL.
|
||||
|
||||
### 9. Postgres provision script assumes a `keystone` admin user that is never created
|
||||
`AttachManagedService.php:133` uses `($service->credentials ?? [])['user'] ?? 'keystone'`. The Postgres service is never seeded with admin credentials and the compose for Postgres never sets `POSTGRES_USER=keystone`. This will fail in production.
|
||||
**Fix:** Establish admin credentials on Postgres service creation (write to `service->credentials`); pass via `POSTGRES_USER`/`POSTGRES_PASSWORD` in compose env.
|
||||
|
||||
### 10. Stateful update steps are placeholder strings that won't actually run
|
||||
`app/Actions/Services/CreateStatefulServiceUpdateOperation.php:38-42`:
|
||||
- `'docker compose down'` — no `-f path` so it runs in the SSH user's home directory.
|
||||
- `'docker volume ls'` — listing isn't "preserving"; it's a no-op.
|
||||
- `'docker compose up -d'` — no `-f path`, no image digest, no env update.
|
||||
- `'docker compose ps'` — not a real health check.
|
||||
|
||||
Spec §11 specifies a real sequence: stop → preserve named volume → start new with updated digest → health check.
|
||||
**Fix:** Build steps from the driver against the service's actual compose path; verify the named volume exists before/after; replace healthcheck stub with `docker inspect --format '{{.State.Health.Status}}'` polling.
|
||||
|
||||
### 11. Stateful update doesn't write the updated digest into compose before restart
|
||||
The operation sets `available_image_digest` on the service (line 55) but the compose file on disk is not re-rendered. `docker compose up -d` will pull from whatever digest is currently in the compose, not the new one.
|
||||
**Fix:** Insert a "Render compose with new digest" step before the start step.
|
||||
|
||||
### 12. Valkey driver doesn't emit role-based env vars
|
||||
`app/Drivers/Valkey/Valkey8Driver.php:42-48` only emits `REDIS_HOST` and `REDIS_PORT`. **Spec §13** explicitly recommends `CACHE_STORE=redis`, `SESSION_DRIVER=redis`, `QUEUE_CONNECTION=redis` based on attachment role.
|
||||
**Fix:** Read the `EnvironmentAttachment.role` and add the appropriate Laravel env defaults (with the "Do not silently change queue behavior without confirmation" guard from §12).
|
||||
|
||||
### 13. Valkey has no logical-DB isolation
|
||||
`AttachManagedService.php:64-71` creates a `logical_database` slice but never assigns a Redis database index (`REDIS_DB`). All environments attached to the same Valkey service share DB 0.
|
||||
**Fix:** Assign `REDIS_DB` per slice; include in `environmentExportsForSlice`.
|
||||
|
||||
### 14. Caddy and Valkey slices have no `SLICE_PROVISION` operation
|
||||
`AttachManagedService::createSliceProvisionOperation` (line 110) early-returns unless `service->type === POSTGRES`. Caddy routes and Valkey logical DBs are created in the DB but never reconciled to the running service.
|
||||
**Fix:** Emit slice operations for all service types whose driver supports slices.
|
||||
|
||||
### 15. Postgres driver doesn't export DB_* at the service level
|
||||
`Postgres18Driver::environmentExports()` returns `[]`. That's fine, but only slice exports work, so if a service has no slice but a Laravel app references DB_HOST, nothing wires it.
|
||||
**Fix:** Either guarantee a slice always exists for Postgres attachments, or emit DB_HOST at the service level via the attachment.
|
||||
|
||||
## Deployment flow
|
||||
|
||||
### 16. Migration timing is hardcoded to pre_switch
|
||||
`DeployEnvironment::serviceDeployScripts` (`app/Jobs/Environments/DeployEnvironment.php:236-238`) always emits the migration step before "Deploy replicas". **Spec §18** lists `migration_timing: pre_switch | post_switch` on service config — never read.
|
||||
**Fix:** Check `$service->config['migration_timing']` and either emit the migration step before replicas or after the gateway cutover.
|
||||
|
||||
### 17. Migration mode `manual` is ignored
|
||||
`migrationScript` (line 292-301) only short-circuits when `migration_mode=disabled`. `manual` still auto-runs `php artisan migrate --force` during environment deploy. Spec §18 says manual mode should not run automatically.
|
||||
**Fix:** Treat `manual` the same as `disabled` for environment deploys; only the dedicated `environment-migrations.store` controller should run it.
|
||||
|
||||
### 18. Two parallel migration code paths
|
||||
`DeployEnvironment::migrationScript` (line 292), `LaravelRuntimeDriver::getOperationPlan`, and `EnvironmentMigrationController` all emit migration scripts independently. They will drift.
|
||||
**Fix:** Centralize in one place (driver method or dedicated action) and call from all three.
|
||||
|
||||
### 19. "Update gateway routes" step is a no-op
|
||||
`DeployEnvironment.php:248-250`:
|
||||
```
|
||||
'script' => 'test -f /home/keystone/gateway/Caddyfile',
|
||||
```
|
||||
This just checks file presence. The actual route update is in a separate `slice_configure` operation, so this step is dead code in the service-deploy chain.
|
||||
**Fix:** Either remove the step or have it actually trigger the route update for this service.
|
||||
|
||||
### 20. Pre-switch service steps from spec §17 step 6 are missing entirely
|
||||
No "pre-switch" hooks are emitted by `DeployEnvironment` or any driver.
|
||||
**Fix:** Add a `preSwitchSteps(): array` driver capability and call it before the migration/replica steps.
|
||||
|
||||
### 21. Multi-server replica placement is not implemented
|
||||
`DeployEnvironment::ensureServiceReplicas` (line 147-171) always assigns `server_id = $service->server_id`. There is no way to place replicas across multiple servers — yet the registry-required check at line 39-41 assumes multi-server deployments exist.
|
||||
**Fix:** Either ship single-server-only v1 and remove the multi-server gate, or wire `Service.process_roles`/placement policy to multiple servers in `ensureServiceReplicas`.
|
||||
|
||||
### 22. No explicit `docker pull <ref>@<digest>` on target servers for multi-server
|
||||
`replicaDeployScripts` (line 261-290) does `docker compose up -d` only; for multi-server, target servers need to pull from the registry by digest. There is no pull step.
|
||||
**Fix:** Add `docker pull` step using `registry_ref` + digest before the `up -d` step on each target server.
|
||||
|
||||
### 23. Build strategy `dedicated_builder` and `external_registry` are not enforced
|
||||
`BuildApplicationArtifact::execute` (`app/Actions/Environments/BuildApplicationArtifact.php:30`) picks any server via `buildServer()`. The push is only conditionally added when strategy === `EXTERNAL_REGISTRY` (line 89). There's no enforcement that `dedicated_builder` requires a builder service to exist, nor that `external_registry` skips local build entirely.
|
||||
**Fix:** Branch on strategy at the top of `execute()`; for `external_registry`, skip the build and resolve the digest from the registry (`docker manifest inspect`); for `dedicated_builder`, fail if no builder service is provisioned.
|
||||
|
||||
### 24. Scheduler placement is not enforced at runtime
|
||||
**Spec §8:** `single` mode runs `schedule:run` on exactly one replica; `every_replica` runs on all. `LaravelRuntimeDriver` sets `AUTORUN_LARAVEL_SCHEDULER=true` based on the service's `process_roles`, but nothing applies the env per-replica based on `scheduler_target_service_id`/`scheduler_mode`. All replicas of the target service end up with the env var.
|
||||
**Fix:** When generating replica config, only emit `AUTORUN_LARAVEL_SCHEDULER=true` on the elected replica when `scheduler_mode=single`. The existing `PlanEnvironmentDeployment::blockers` (`app/Actions/Environments/PlanEnvironmentDeployment.php:79-87`) is a pre-flight check, not an enforcement.
|
||||
|
||||
### 25. Compose file is generated via shell heredoc instead of a real upload
|
||||
`composeUploadScript` (line 303-319) inlines the compose body in a `cat <<'KEYSTONE_COMPOSE'` heredoc. Any quoting issue (binary, single-quote in env, large file) breaks. Spec §16 implies generated artifacts should be transferred via SSH/SCP.
|
||||
**Fix:** Use SCP (or `ssh ... 'cat > path'` with binary-safe encoding) instead of heredoc. Also drop separate generation of `.env` files — currently only the compose is uploaded; `.env` references in compose will 404 on disk.
|
||||
|
||||
### 26. `.env` files never written to disk
|
||||
**Spec §16** layout includes `/home/keystone/services/<service-id>/.env`. `composeUploadScript` writes only `compose.yml`. If the compose `env_file: .env` directive is rendered, the deploy will fail.
|
||||
**Fix:** Render and upload `.env` alongside `compose.yml`.
|
||||
|
||||
## Models, schema, and encryption
|
||||
|
||||
### 27. Service.credentials migration column is plaintext but model casts as `encrypted:array`
|
||||
Migration `database/migrations/2025_03_27_121050_create_services_table.php:35` declares `text('credentials')`, while `app/Models/Service.php:36` casts it as `encrypted:array`. The cast works (Laravel encrypts on write) — but a `text` column without explicit `nullable()`/encoding intent is confusing. Confirm cast actually encrypts (it does for `encrypted:*` casts) and document.
|
||||
**Fix:** Add `->nullable()` and a comment in the migration noting the field is encrypted at the model layer.
|
||||
|
||||
### 28. `EnvironmentVariable.value` cast missing array-vs-string handling
|
||||
Spec doesn't require it, but worth flagging: the cast is `encrypted` (scalar), which means complex values (JSON-encoded secrets) need explicit json_encode by callers. Currently `AttachManagedService.php:97-104` always passes scalar values, so this is OK.
|
||||
|
||||
## UI and onboarding
|
||||
|
||||
### 29. Onboarding (Spec §19) is not implemented
|
||||
No onboarding controller, no routes, no Inertia pages. The spec calls for a guided flow: organisation → provider → source → deploy key → registry → server → app/env → attachments. Currently users must navigate disjoint pages manually.
|
||||
**Fix:** Add an `OnboardingController` with a state machine on `Organisation` (or session-based progress) plus an Inertia wizard.
|
||||
|
||||
### 30. Service detail/edit pages are missing
|
||||
`resources/js/pages/services/` only contains `updates/Create.vue`. There's no Index/Show/Edit. Spec §20 Phase 6 calls for "services under an environment with sensible defaults".
|
||||
**Fix:** Add service Show/Edit pages, including replica health, slices, and one-click update.
|
||||
|
||||
### 31. No "managed attachment" guided flow
|
||||
`resources/js/pages/environment-attachments/Create.vue` exposes a raw service list. Spec §12 + §20 call for managed flows for Postgres / Valkey / Caddy with auto-defaulted slices.
|
||||
**Fix:** Build a guided picker per role (database / cache / queue / storage / gateway) that filters services to compatible types and previews the generated slice + env vars.
|
||||
|
||||
### 32. Deploy policies are visible / no defaults hiding
|
||||
Spec §20 Phase 6: "Hide deploy policies by default." Currently they're set/exposed via `Service` form requests (`StoreServiceRequest`). No UI hiding.
|
||||
**Fix:** Don't expose `deploy_policy` in service create/edit UI; rely on driver-provided defaults from spec §2.
|
||||
|
||||
### 33. `resources/js/pages/applications/Show.vue` has a stale stub comment
|
||||
Line 72: `<!-- Add instance button would go here -->` — references the removed `Instance` model.
|
||||
**Fix:** Remove the stale comment; add the actual "New environment" button.
|
||||
|
||||
### 34. `resources/js/pages/servers/Index.vue` contains a `@todo pagination` literal
|
||||
Ship-ready code shouldn't have TODOs visible to the user.
|
||||
**Fix:** Either implement pagination or remove the comment.
|
||||
|
||||
### 35. No UI surfaces variable source / overridable
|
||||
`EnvironmentVariableController::store` hardcodes `source=USER`. The UI in `environment-variables/Create.vue` provides no way to see managed vs. user vars, and the spec §13 requires the source/overridable badge.
|
||||
**Fix:** Read variables on the environment Show page grouped by source; for `managed_attachment` rows, show a "managed by Postgres slice X" badge and disable editing unless `overridable`.
|
||||
|
||||
### 36. No environment Show page; deploys/migrations etc. are triggered from `applications/Show.vue` per-environment row
|
||||
This works but spec §20 Phase 6 wants environments to be the primary surface. Currently there's no environment detail page where services, replicas, slices, attachments, env vars, and operations are visible together.
|
||||
**Fix:** Add `environments/Show.vue` and route `applications/Show.vue`'s environment row to it.
|
||||
|
||||
## Tests — coverage gaps
|
||||
|
||||
### 37. `EnvironmentDeploymentControllerTest` only asserts dispatch
|
||||
Short test, no state assertions after run.
|
||||
**Fix:** Replace `Bus::fake()` with running the job inline; assert that operations, steps, replica records, compose files, and env vars are all in expected state.
|
||||
|
||||
### 38. No test asserts deploy key cleanup after build
|
||||
`BuildApplicationArtifact.php:100-101` adds the cleanup trap. No test verifies the trap actually runs or that the key file is gone.
|
||||
**Fix:** In `BuildApplicationArtifactTest`, fake the remote runner and assert the build script contains `trap cleanup EXIT` AND that the operation_dir path resolves under `/home/keystone/operations/`.
|
||||
|
||||
### 39. No test asserts the named volume naming convention
|
||||
Spec §10 names volumes `keystone_service_<id>_postgres_data`. No test in `ComposeRendererTest` checks volume name format.
|
||||
**Fix:** Snapshot-test or regex-assert the volume name pattern in `ComposeRendererTest`.
|
||||
|
||||
### 40. No test for parent-child operation chain executing end-to-end
|
||||
`DeployEnvironmentJobTest` creates operations but never runs the resulting `RunStep` jobs through `Queue::handleAll()`-style flow.
|
||||
**Fix:** Add an integration test that fakes the SSH layer (return canned success) and lets the chain run, asserting each operation transitions to `COMPLETED` in the correct order.
|
||||
|
||||
### 41. No test for cancellation cascade on failure
|
||||
None of the test files exercise `RunStep::failed` with sibling/child cancellation expectations (because that behavior doesn't exist yet — see gap #2).
|
||||
**Fix:** Add a test that fails a step mid-chain and asserts all later operations are `CANCELLED`.
|
||||
|
||||
### 42. No test for stateful update flow against a Postgres service
|
||||
`StatefulServiceUpdateTest` likely asserts only operation/step rows. Need: rendered script asserts compose path is correct, named volume is preserved, new digest is written.
|
||||
**Fix:** Strengthen assertions to validate the full step contents.
|
||||
|
||||
### 43. No test for multi-server build/push/pull
|
||||
`BuildArtifactPlanningTest` checks `requiresRegistry=true` but no test asserts a `docker push` and per-target `docker pull` actually occur.
|
||||
**Fix:** Add a job-level test with two-server topology and assert each target's deploy script includes a `docker pull <ref>@<digest>` before `compose up`.
|
||||
|
||||
### 44. No test that `manual`/`disabled` migration modes are honored
|
||||
**Fix:** Parametrized test asserting `migrationScript` returns `'true'` for `disabled` and `manual` modes, and the real command for `auto`.
|
||||
|
||||
### 45. No test for scheduler enforcement per replica
|
||||
**Fix:** Test that for `scheduler_mode=single`, only one replica's rendered env has `AUTORUN_LARAVEL_SCHEDULER=true`; for `every_replica`, all do.
|
||||
|
||||
### 46. No test that managed attachment auto-creates slices for Valkey + Caddy
|
||||
`ManagedAttachmentTest` likely tests Postgres only.
|
||||
**Fix:** Extend dataset to cover Valkey logical_database and Caddy route slices, with their env exports.
|
||||
|
||||
---
|
||||
|
||||
## Suggested ordering
|
||||
|
||||
1. Fix the orchestration bugs (#1-#5) — without these the chain doesn't reliably reach completion.
|
||||
2. Fix the Caddy cutover (#3, #4, #7, #19) — without these no environment can serve traffic.
|
||||
3. Fix Postgres slice provision admin user (#9) and stateful update scripts (#10, #11).
|
||||
4. Implement migration timing/mode (#16, #17, #18).
|
||||
5. Implement scheduler enforcement (#24) and multi-server placement + pull (#21, #22, #23).
|
||||
6. Then UI: onboarding (#29), environment Show page (#36), managed attachment UI (#31), variable source display (#35).
|
||||
7. Strengthen tests last (#37-#46) once the orchestrator and drivers are stable.
|
||||
|
||||
Most of the schema and the high-level structure are correct — the gap is between the data model and the runtime behavior that's supposed to enforce/realize it.
|
||||
726
docs/implementation-spec.md
Normal file
726
docs/implementation-spec.md
Normal file
@@ -0,0 +1,726 @@
|
||||
# Keystone Implementation Spec
|
||||
|
||||
## 1. Product Scope
|
||||
|
||||
Keystone is a Laravel Forge-like deployment platform that runs applications and services with Docker. The v1 product is intentionally narrow:
|
||||
|
||||
- Laravel is the only first-class application framework.
|
||||
- Application containers use a Keystone-managed Dockerfile based on `serversideup/php` with FrankenPHP.
|
||||
- Services are explicitly coded drivers, not arbitrary Docker images.
|
||||
- v1 is agentless and executes operations over SSH.
|
||||
- Docker Compose is used as the generated runtime artifact.
|
||||
- Caddy 2 is the default and only gateway for v1.
|
||||
- The Keystone database is the source of truth. Server files are generated artifacts.
|
||||
|
||||
V1 should make the simple path robust before adding generic Docker support, distributed agents, HA databases, edge routing, or additional frameworks.
|
||||
|
||||
## 2. Core Domain Model
|
||||
|
||||
### Organisation
|
||||
|
||||
Owns users, providers, registries, applications, servers, services, and environments.
|
||||
|
||||
### Application
|
||||
|
||||
A source-code project. In v1, first-class applications are Laravel repositories.
|
||||
|
||||
Recommended fields:
|
||||
|
||||
- `organisation_id`
|
||||
- `name`
|
||||
- `repository_url`
|
||||
- `repository_type`
|
||||
- `default_branch`
|
||||
- `deploy_key_public`
|
||||
- `deploy_key_private` encrypted
|
||||
- `deploy_key_fingerprint`
|
||||
- `deploy_key_installed_at` nullable
|
||||
|
||||
### Environment
|
||||
|
||||
The primary application deployment unit. An application has environments such as production, staging, or dev.
|
||||
|
||||
Recommended fields:
|
||||
|
||||
- `application_id`
|
||||
- `name`
|
||||
- `branch`
|
||||
- `status`
|
||||
- `scheduler_enabled`
|
||||
- `scheduler_target_service_id` nullable
|
||||
- `scheduler_mode`: `single` or `every_replica`
|
||||
- `build_config` json
|
||||
|
||||
Default for Laravel environments:
|
||||
|
||||
- Scheduler enabled.
|
||||
- Scheduler target is the primary web service.
|
||||
- Scheduler mode is `single`.
|
||||
|
||||
### Service
|
||||
|
||||
Every deployable thing is represented as a `Service`.
|
||||
|
||||
Examples:
|
||||
|
||||
- Laravel web runtime
|
||||
- Laravel worker runtime
|
||||
- Laravel websocket runtime
|
||||
- Caddy gateway
|
||||
- Postgres
|
||||
- Valkey
|
||||
- Future standalone services
|
||||
|
||||
Recommended fields:
|
||||
|
||||
- `organisation_id`
|
||||
- `environment_id` nullable
|
||||
- `server_id` nullable for single-placement legacy convenience only; long term use replicas
|
||||
- `name`
|
||||
- `category`
|
||||
- `type`
|
||||
- `version_track`
|
||||
- `driver_name`
|
||||
- `status`
|
||||
- `desired_replicas`
|
||||
- `desired_revision`
|
||||
- `deploy_policy`
|
||||
- `process_roles` json
|
||||
- `current_image_digest` nullable
|
||||
- `available_image_digest` nullable
|
||||
- `update_status`
|
||||
- `default_cpu_limit` nullable
|
||||
- `default_memory_limit_mb` nullable
|
||||
- `config` json
|
||||
|
||||
Deploy policy defaults:
|
||||
|
||||
- Laravel web: `with_environment`
|
||||
- Laravel worker: `with_environment`
|
||||
- Laravel websocket: `with_environment`
|
||||
- Database/cache/storage: `dependency_only`
|
||||
- Gateway: `manual_or_on_route_change`
|
||||
- Standalone services: `manual`
|
||||
|
||||
The user should not need to configure these defaults during normal setup.
|
||||
|
||||
### ServiceReplica
|
||||
|
||||
A running instance of a service on a server. A service is logical; a replica is runtime placement.
|
||||
|
||||
Recommended fields:
|
||||
|
||||
- `service_id`
|
||||
- `server_id`
|
||||
- `operation_id` nullable
|
||||
- `container_name`
|
||||
- `container_id` nullable
|
||||
- `image_digest`
|
||||
- `internal_host`
|
||||
- `internal_port`
|
||||
- `public_port` nullable
|
||||
- `status`
|
||||
- `health_status`
|
||||
- `cpu_limit` nullable
|
||||
- `memory_limit_mb` nullable
|
||||
- `config` json
|
||||
|
||||
Replica resource limits override service defaults. Null means unrestricted except host capacity.
|
||||
|
||||
### ServiceSlice
|
||||
|
||||
A logical sub-resource inside a service. Slices belong to `Service`, not `ServiceReplica`.
|
||||
|
||||
Examples:
|
||||
|
||||
- Database and user inside Postgres
|
||||
- Logical database or namespace inside Valkey
|
||||
- Route inside Caddy
|
||||
- Future bucket, topic, vhost, etc.
|
||||
|
||||
Recommended fields:
|
||||
|
||||
- `service_id`
|
||||
- `environment_id` nullable
|
||||
- `name`
|
||||
- `type`
|
||||
- `status`
|
||||
- `config` json
|
||||
- `credentials` encrypted json nullable
|
||||
|
||||
Slices are not containers and should not be used for scaling. They are stable logical resources that survive service replica replacement.
|
||||
|
||||
### EnvironmentAttachment
|
||||
|
||||
Connects an environment to managed service slices.
|
||||
|
||||
Recommended fields:
|
||||
|
||||
- `environment_id`
|
||||
- `service_id`
|
||||
- `service_slice_id` nullable
|
||||
- `role`: `database`, `cache`, `queue`, `storage`, `gateway`, `custom`
|
||||
- `env_prefix` nullable
|
||||
- `is_primary`
|
||||
|
||||
Attachments should point to slices whenever a slice exists. For example, a Laravel environment attaches to a Postgres database/user slice, not merely to the Postgres service.
|
||||
|
||||
### EnvironmentVariable
|
||||
|
||||
Represents user-defined and Keystone-managed runtime environment values.
|
||||
|
||||
Recommended fields:
|
||||
|
||||
- `environment_id`
|
||||
- `key`
|
||||
- `value` encrypted
|
||||
- `source`: `user`, `managed_attachment`, `system`
|
||||
- `service_slice_id` nullable
|
||||
- `overridable` boolean
|
||||
|
||||
Managed values should be regenerated from attachments and slices.
|
||||
|
||||
## 3. Operations Model
|
||||
|
||||
Rename `Deployment` to `Operation`.
|
||||
|
||||
An operation is the generic audit and execution object for all state-changing work.
|
||||
|
||||
### Operation
|
||||
|
||||
Recommended fields:
|
||||
|
||||
- `id`
|
||||
- `parent_id` nullable
|
||||
- `hash`
|
||||
- `kind`
|
||||
- `target_type`
|
||||
- `target_id`
|
||||
- `status`
|
||||
- `started_at`
|
||||
- `finished_at`
|
||||
- timestamps
|
||||
|
||||
Operation kinds:
|
||||
|
||||
- `server_provision`
|
||||
- `service_deploy`
|
||||
- `replica_deploy`
|
||||
- `slice_provision`
|
||||
- `slice_configure`
|
||||
- `environment_deploy`
|
||||
- `gateway_cutover`
|
||||
- `config_change`
|
||||
- `credential_rotation`
|
||||
|
||||
### OperationStep
|
||||
|
||||
Rename `Step` to `OperationStep`.
|
||||
|
||||
Recommended fields:
|
||||
|
||||
- `operation_id`
|
||||
- `name`
|
||||
- `order`
|
||||
- `status`
|
||||
- `script`
|
||||
- `logs`
|
||||
- `error_logs`
|
||||
- `secrets` encrypted json nullable
|
||||
- `started_at`
|
||||
- `finished_at`
|
||||
- timestamps
|
||||
|
||||
### Parent-Child Operations
|
||||
|
||||
Environment deploys are parent operations that create child operations.
|
||||
|
||||
Example:
|
||||
|
||||
- `environment_deploy`
|
||||
- child `service_deploy` for web
|
||||
- child `replica_deploy` for each web replica
|
||||
- child `slice_configure` for Caddy route updates
|
||||
- child `gateway_cutover`
|
||||
|
||||
Standalone service deploys and slice operations can also run independently.
|
||||
|
||||
## 4. Server Provisioning
|
||||
|
||||
V1 remains agentless over SSH.
|
||||
|
||||
Provisioning flow:
|
||||
|
||||
1. Create server through provider API.
|
||||
2. Wait for root SSH to become available.
|
||||
3. Execute provisioning script over SSH.
|
||||
4. Create Keystone management user.
|
||||
5. Install Docker Engine, Docker Compose plugin, UFW, fail2ban, and required runtime packages.
|
||||
6. Install Keystone SSH public key.
|
||||
7. Disable password login.
|
||||
8. Enable UFW with SSH open.
|
||||
9. Callback or SSH verification marks server active.
|
||||
|
||||
Server permanent keys are for Keystone management only. Repository deploy keys must not be permanently installed on servers.
|
||||
|
||||
## 5. Source Providers And Repository Access
|
||||
|
||||
V1 source support:
|
||||
|
||||
- Self-hosted Gitea
|
||||
- GitHub
|
||||
- Generic Git over SSH
|
||||
|
||||
Repository access uses a Keystone-generated deploy key per application/repository.
|
||||
|
||||
V1 flow:
|
||||
|
||||
1. User enters repo SSH URL.
|
||||
2. Keystone generates an ed25519 deploy key.
|
||||
3. UI shows the public key.
|
||||
4. User adds it to Gitea/GitHub as read-only.
|
||||
5. Keystone verifies access with `git ls-remote`.
|
||||
|
||||
During build operations, Keystone injects the encrypted private key into a temporary operation directory and uses `GIT_SSH_COMMAND`. The key is removed after the build. Repo keys are never permanently stored on target servers or builder services.
|
||||
|
||||
## 6. Registry And Build Artifacts
|
||||
|
||||
An external registry is required for multi-server application deployments.
|
||||
|
||||
Single-server deployments may build and run a local image without a registry.
|
||||
|
||||
Multi-server deployments must:
|
||||
|
||||
1. Build once.
|
||||
2. Push the image to the configured external registry.
|
||||
3. Pull the exact same image digest on each target server.
|
||||
|
||||
Supported registry types:
|
||||
|
||||
- Generic Docker registry
|
||||
- Gitea registry
|
||||
- GHCR
|
||||
- Docker Hub
|
||||
|
||||
### Build Service
|
||||
|
||||
Building is a service capability, not a server type.
|
||||
|
||||
A dedicated builder is represented as a `Service` with category `builder`. If no builder service exists, Keystone may build on the target server for single-server deployments.
|
||||
|
||||
Build strategies:
|
||||
|
||||
- `target_server`: build on selected target server. Valid for single-server.
|
||||
- `dedicated_builder`: build on builder service, then push/export artifact.
|
||||
- `external_registry`: pull prebuilt image from registry.
|
||||
|
||||
For v1:
|
||||
|
||||
- Single-server default: build on target server.
|
||||
- Multi-server: require configured registry and build once.
|
||||
- Do not rebuild independently on each server.
|
||||
|
||||
### BuildArtifact
|
||||
|
||||
Recommended fields:
|
||||
|
||||
- `environment_id`
|
||||
- `commit_sha`
|
||||
- `image_tag`
|
||||
- `image_digest`
|
||||
- `registry_ref` nullable
|
||||
- `built_by_operation_id`
|
||||
- `built_by_service_id` nullable
|
||||
- `status`
|
||||
- `metadata` json
|
||||
|
||||
## 7. Managed Laravel Runtime
|
||||
|
||||
V1 uses Keystone-managed Dockerfile templates only. Custom Dockerfiles are deferred.
|
||||
|
||||
Laravel runtime defaults:
|
||||
|
||||
- Base: `serversideup/php` FrankenPHP image
|
||||
- PHP version configurable
|
||||
- Document root default: `public`
|
||||
- Health path default: `/up`, fallback `/`
|
||||
- Composer install with production defaults
|
||||
- JS build step configurable
|
||||
- Bun/Node strategy configurable
|
||||
|
||||
The same build artifact is used by web, worker, and websocket services. Runtime services differ by entrypoint/command.
|
||||
|
||||
Default topology:
|
||||
|
||||
- One web service.
|
||||
- No worker service by default.
|
||||
- Scheduler enabled on the web service by default.
|
||||
- Dedicated worker service is recommended when queues are used, but created only when the user opts in.
|
||||
|
||||
Worker options:
|
||||
|
||||
- Dedicated worker service, recommended.
|
||||
- Embedded worker in web service, allowed for low-throughput apps but not recommended for production.
|
||||
- No workers, default.
|
||||
|
||||
Keystone should warn against deployed environments using `QUEUE_CONNECTION=sync`, but it should not automatically create worker services.
|
||||
|
||||
## 8. Scheduler Model
|
||||
|
||||
Mirror Laravel Cloud's scheduler model.
|
||||
|
||||
Scheduler is not a standalone service by default. It is a role/capability attached to a selected web or worker service.
|
||||
|
||||
Defaults:
|
||||
|
||||
- `scheduler_enabled`: true for Laravel templates.
|
||||
- `scheduler_target_service_id`: primary web service.
|
||||
- `scheduler_mode`: `single`.
|
||||
|
||||
Runtime behavior:
|
||||
|
||||
- `single`: run `schedule:run` every minute on exactly one selected replica.
|
||||
- `every_replica`: run on each replica. This is advanced and explicit.
|
||||
|
||||
Keystone should enforce one scheduler runner per environment by default. Users may still use Laravel's `onOneServer()` for application-level safety.
|
||||
|
||||
## 9. Service Drivers
|
||||
|
||||
V1 services are explicitly coded drivers only. No arbitrary Docker image service in the v1 happy path.
|
||||
|
||||
Driver contract should define:
|
||||
|
||||
- service type and version track
|
||||
- default image policy
|
||||
- ports
|
||||
- volumes
|
||||
- environment schema
|
||||
- health checks
|
||||
- resource defaults
|
||||
- supported slice types
|
||||
- Compose rendering
|
||||
- operation steps
|
||||
- env var exports
|
||||
- firewall requirements
|
||||
- update behavior
|
||||
|
||||
V1 driver list:
|
||||
|
||||
- Caddy 2 gateway
|
||||
- Laravel managed runtime using `serversideup/php` FrankenPHP
|
||||
- Postgres 18
|
||||
- Valkey 8
|
||||
|
||||
Use latest minor versions for new service deploy/update operations by resolving image tags to digests. Store the resolved digest on the operation/service/replica for reproducible rollbacks.
|
||||
|
||||
Do not silently update managed service images. Show updates in the UI and require an explicit service update/redeploy operation.
|
||||
|
||||
## 10. Persistent Storage
|
||||
|
||||
Use named Docker volumes for persistent service-local data.
|
||||
|
||||
Examples:
|
||||
|
||||
- Postgres: `keystone_service_<id>_postgres_data`
|
||||
- Valkey: named volume when persistence is enabled
|
||||
- Caddy: named volumes for `/data` and `/config`
|
||||
|
||||
Avoid distributed storage in v1. Moving a stateful service to another server requires an explicit migration operation.
|
||||
|
||||
## 11. Stateful Service Updates
|
||||
|
||||
V1 accepts downtime for single-node stateful updates.
|
||||
|
||||
Postgres/Valkey update flow:
|
||||
|
||||
1. User explicitly triggers update/redeploy.
|
||||
2. Keystone warns about downtime and data risk.
|
||||
3. Optional backup checkbox appears only if backup capability exists.
|
||||
4. Stop container.
|
||||
5. Preserve named volume.
|
||||
6. Start new container with updated image digest.
|
||||
7. Health check.
|
||||
8. Mark operation complete.
|
||||
|
||||
Rolling stateful updates and HA clusters are v2.
|
||||
|
||||
## 12. Slices And Attachments
|
||||
|
||||
Attaching a managed service to an environment should create sensible default slices automatically.
|
||||
|
||||
Postgres attachment:
|
||||
|
||||
- Create database/user slice by default.
|
||||
- Generate credentials.
|
||||
- Wire `DB_*` environment variables.
|
||||
|
||||
Valkey attachment:
|
||||
|
||||
- Create/select logical slice if supported.
|
||||
- Wire `REDIS_*`.
|
||||
- Recommend `CACHE_STORE=redis`, `SESSION_DRIVER=redis`, or `QUEUE_CONNECTION=redis` depending on role.
|
||||
- Do not silently change queue behavior without confirmation.
|
||||
|
||||
Caddy/domain attachment:
|
||||
|
||||
- Create route slice.
|
||||
- Wire gateway route to environment web service.
|
||||
|
||||
Advanced users can select existing slices or create slices manually from service detail pages.
|
||||
|
||||
Slice operations should be independent from service container deployments. Creating a Postgres database/user should run as a slice operation against an existing Postgres replica, not redeploy the Postgres container.
|
||||
|
||||
## 13. Environment Variables
|
||||
|
||||
Keystone manages env vars from attachments and slices.
|
||||
|
||||
Postgres slice should export:
|
||||
|
||||
- `DB_CONNECTION=pgsql`
|
||||
- `DB_HOST`
|
||||
- `DB_PORT=5432`
|
||||
- `DB_DATABASE`
|
||||
- `DB_USERNAME`
|
||||
- `DB_PASSWORD`
|
||||
|
||||
Valkey slice/service should export:
|
||||
|
||||
- `REDIS_HOST`
|
||||
- `REDIS_PORT=6379`
|
||||
- optional `CACHE_STORE=redis`
|
||||
- optional `SESSION_DRIVER=redis`
|
||||
- optional `QUEUE_CONNECTION=redis`
|
||||
|
||||
User-defined variables remain editable. Managed variables should show their source and whether they are overridable.
|
||||
|
||||
## 14. Networking And Internal Aliases
|
||||
|
||||
Support both same-server Docker networking and cross-server private networking.
|
||||
|
||||
Routing preference:
|
||||
|
||||
1. Same server: Docker network aliases/container DNS.
|
||||
2. Same provider private network: private IP and internal port.
|
||||
3. Public fallback only if explicitly allowed.
|
||||
|
||||
V1 should not build distributed DNS. Use deterministic internal hostnames and generated env vars. Where Keystone controls Docker networks, use network aliases. For cross-server communication, inject private IP/port endpoints.
|
||||
|
||||
Future agent/DNS systems should be possible, but are out of scope for v1.
|
||||
|
||||
Recommended endpoint model:
|
||||
|
||||
- `service_id`
|
||||
- `service_replica_id` nullable
|
||||
- `scope`: `docker_network`, `private_network`, `public`
|
||||
- `hostname`
|
||||
- `ip_address` nullable
|
||||
- `port`
|
||||
- `priority`
|
||||
- `health_status`
|
||||
|
||||
## 15. Gateway And Cutover
|
||||
|
||||
There must be exactly one gateway service per server for v1.
|
||||
|
||||
Caddy owns public ports `80` and `443`. Application runtime containers should bind only to internal Docker networks or assigned internal ports.
|
||||
|
||||
Zero-downtime deployment happens at the gateway layer:
|
||||
|
||||
1. Render/start new service replica with unique container/project name.
|
||||
2. Health check new replica.
|
||||
3. Update Caddy upstreams to include the new healthy replica.
|
||||
4. Reload Caddy.
|
||||
5. Drain/remove old replica from Caddy upstreams.
|
||||
6. Stop old container after the drain window.
|
||||
|
||||
For same-server upstreams, Caddy can use Docker network names. For cross-server upstreams, Caddy uses private IP and assigned internal port.
|
||||
|
||||
Web services may span multiple servers in v1. Keystone provides load balancing through Caddy upstreams but does not optimize global latency or regional placement.
|
||||
|
||||
Future v2 doctor page can flag:
|
||||
|
||||
- cross-region upstreams
|
||||
- public-network fallbacks
|
||||
- missing workers for async queues
|
||||
- scheduler every-replica risks
|
||||
- inefficient database/cache placement
|
||||
|
||||
## 16. Docker Compose Runtime
|
||||
|
||||
Use generated Docker Compose files, not raw `docker run`, for v1 runtime management.
|
||||
|
||||
Suggested server layout:
|
||||
|
||||
- `/home/keystone/services/<service-id>/compose.yml`
|
||||
- `/home/keystone/services/<service-id>/.env`
|
||||
- `/home/keystone/gateway/Caddyfile`
|
||||
- `/home/keystone/operations/<operation-hash>/`
|
||||
|
||||
Compose files are generated artifacts. The Keystone database is canonical.
|
||||
|
||||
Compose should be used for:
|
||||
|
||||
- container definitions
|
||||
- env files
|
||||
- named volumes
|
||||
- networks
|
||||
- health checks
|
||||
- restart policies
|
||||
- resource limits
|
||||
- labels
|
||||
|
||||
Resource controls:
|
||||
|
||||
- Use plain Docker runtime constraints such as `cpus`, `mem_limit`, and `memswap_limit`.
|
||||
- Avoid relying on Swarm-only `deploy.resources` semantics for v1.
|
||||
|
||||
Example:
|
||||
|
||||
```yaml
|
||||
services:
|
||||
web:
|
||||
image: registry.example.com/app:abc123
|
||||
cpus: "1.0"
|
||||
mem_limit: 1024m
|
||||
memswap_limit: 1024m
|
||||
```
|
||||
|
||||
## 17. Environment Deployment Flow
|
||||
|
||||
Environment deployment creates a parent `environment_deploy` operation.
|
||||
|
||||
High-level flow:
|
||||
|
||||
1. Resolve target commit.
|
||||
2. Create or reuse build artifact.
|
||||
3. Compute desired service changes.
|
||||
4. Include only services with `deploy_policy=with_environment` and changed revision/config.
|
||||
5. Check dependency-only services and attached slices.
|
||||
6. Run pre-switch service steps.
|
||||
7. Run application migrations according to service migration policy.
|
||||
8. Deploy new web/worker/websocket replicas.
|
||||
9. Health check new replicas.
|
||||
10. Update gateway routes.
|
||||
11. Reload Caddy.
|
||||
12. Drain and stop old replicas.
|
||||
13. Mark operation complete.
|
||||
|
||||
Database/cache services attached to the environment are checked but not redeployed unless the user explicitly deploys or updates them.
|
||||
|
||||
## 18. Migrations
|
||||
|
||||
Database migrations are owned by the application runtime service deployment.
|
||||
|
||||
Recommended fields on service config:
|
||||
|
||||
- `migration_mode`: `auto`, `manual`, `disabled`
|
||||
- `migration_timing`: `pre_switch`, `post_switch`
|
||||
- `migration_command`: default `php artisan migrate --force`
|
||||
|
||||
Default for Laravel web services:
|
||||
|
||||
- `migration_mode=auto`
|
||||
- `migration_timing=pre_switch`
|
||||
- command `php artisan migrate --force`
|
||||
|
||||
Manual mode should allow the user to run migration operation explicitly.
|
||||
|
||||
## 19. Onboarding
|
||||
|
||||
Onboarding should guide users through:
|
||||
|
||||
1. Organisation creation.
|
||||
2. Server provider setup, Hetzner first.
|
||||
3. Source provider/repository setup, including Gitea/GitHub/generic Git.
|
||||
4. Deploy key installation and verification.
|
||||
5. Registry setup. Optional for single-server, required for multi-server.
|
||||
6. Server creation/provisioning.
|
||||
7. Application/environment creation.
|
||||
8. Optional service attachments: Postgres, Valkey, domain/gateway.
|
||||
|
||||
If an environment spans more than one server and no registry exists, deployment should be blocked with a registry setup prompt.
|
||||
|
||||
## 20. Current Code Migration Plan
|
||||
|
||||
The current code already has useful pieces:
|
||||
|
||||
- Provider abstraction
|
||||
- Hetzner server creation
|
||||
- Server provisioning jobs
|
||||
- Service drivers
|
||||
- Polymorphic deployments
|
||||
- Step execution over SSH
|
||||
|
||||
Refactor in phases.
|
||||
|
||||
### Phase 1: Schema Alignment
|
||||
|
||||
- Add `environments` table.
|
||||
- Rename `deployments` to `operations`.
|
||||
- Rename `steps` to `operation_steps`.
|
||||
- Add `operations.parent_id`.
|
||||
- Add `operations.kind`.
|
||||
- Add `service_replicas`.
|
||||
- Add `service_slices`.
|
||||
- Add `environment_attachments`.
|
||||
- Add `environment_variables`.
|
||||
- Add registry/source/build artifact tables.
|
||||
|
||||
### Phase 2: Model Cleanup
|
||||
|
||||
- Replace `Application::instances()` as the primary deployment path with `Application::environments()`.
|
||||
- Keep or migrate `Instance` into `ServiceReplica` depending on implementation cost.
|
||||
- Replace `Service::slices` references with real `ServiceSlice` relationship.
|
||||
- Replace `Deployment` references with `Operation`.
|
||||
- Replace deployment step jobs with operation step jobs.
|
||||
|
||||
### Phase 3: Driver Contract
|
||||
|
||||
- Define formal driver interfaces for service deployment, replica rendering, slices, health checks, and env exports.
|
||||
- Implement Caddy 2 driver.
|
||||
- Implement Postgres 18 driver with database/user slice provisioning.
|
||||
- Implement Valkey 8 driver.
|
||||
- Implement Laravel runtime driver/template.
|
||||
|
||||
### Phase 4: Compose Renderer
|
||||
|
||||
- Render Compose files from DB state.
|
||||
- Upload generated files over SSH.
|
||||
- Run `docker compose` operations.
|
||||
- Capture container IDs and health state into `ServiceReplica`.
|
||||
|
||||
### Phase 5: Environment Deploy
|
||||
|
||||
- Build application artifact.
|
||||
- Deploy web replicas.
|
||||
- Run migrations.
|
||||
- Health check.
|
||||
- Cut over Caddy.
|
||||
- Stop old replicas.
|
||||
|
||||
### Phase 6: UI Simplification
|
||||
|
||||
- Present environments as the primary application surface.
|
||||
- Present services under an environment with sensible defaults.
|
||||
- Hide deploy policies by default.
|
||||
- Provide one-click add worker.
|
||||
- Provide managed attachment flows for Postgres/Valkey/Caddy.
|
||||
|
||||
## 21. Explicit V2 Deferrals
|
||||
|
||||
Out of scope for v1:
|
||||
|
||||
- Server agent.
|
||||
- Distributed internal DNS.
|
||||
- Edge routing or anycast.
|
||||
- Automatic regional topology optimization.
|
||||
- Custom Dockerfiles.
|
||||
- Arbitrary Docker image services.
|
||||
- Non-Laravel first-class app frameworks.
|
||||
- Managed Docker registry.
|
||||
- HA Postgres/Valkey.
|
||||
- Rolling stateful updates.
|
||||
- Distributed storage.
|
||||
- Full backup orchestration.
|
||||
- Automatic deploy key installation via Gitea/GitHub API.
|
||||
|
||||
Reference in New Issue
Block a user