* feat(edge-registry): ADR-102 — surface Cognitum cog catalog via /api/v1/edge/registry Adds a new sensing-server endpoint that fetches and caches the canonical Cognitum app registry at https://storage.googleapis.com/cognitum-apps/app-registry.json (105 cogs across 11 categories as of v2.1.0). RuView previously had no live awareness of the catalog — the README's capability table was hand- curated and went stale as Cognitum shipped new cogs (the registry was last updated 6 days ago). ADR: * docs/adr/ADR-102-edge-module-registry.md — full design, response shape, configuration flags, failure modes, and a 12-row security review covering SSRF, response inflation, ?refresh abuse, stale-serve semantics, TLS, cache poisoning, JSON-panic resistance, etc. Code: * v2/.../edge_registry.rs — EdgeRegistry struct + UreqFetcher + MockFetcher trait + 7 unit tests. RwLock<Option<CachedEntry>> with stale-on-error fallback. MAX_PAYLOAD_BYTES=8 MiB, 10s wire timeout. * v2/.../main.rs — constructs Option<Arc<EdgeRegistry>> at startup, registers GET /api/v1/edge/registry handler, wires Extension layer. Handler runs the blocking ureq fetch via tokio::task::spawn_blocking so the async runtime stays free. * v2/.../cli.rs / main.rs Args — three new flags (per user request to "allow the registry to be disabled or changed"): --edge-registry-url <URL> (env RUVIEW_EDGE_REGISTRY_URL) --edge-registry-ttl-secs <N> (env RUVIEW_EDGE_REGISTRY_TTL_SECS) --no-edge-registry (env RUVIEW_NO_EDGE_REGISTRY) When --no-edge-registry is set or the URL is empty, the endpoint returns 404. Cargo.toml: adds ureq (rustls), sha2, thiserror as direct deps. README: * New collapsed "🧩 Edge Module Catalog" section with the full 105-cog table generated from the registry, grouped by category with practical one-line descriptions (e.g. "Spots irregular heartbeats and abnormal heart rhythms", "Detects walking problems and scores fall risk"). Links to https://seed.cognitum.one/store and the local appliance /cogs page. Sits between the HF model section and How It Works. Tests (7/7 pass): first_call_hits_upstream_and_caches ttl_expiry_triggers_refetch force_refresh_bypasses_fresh_cache stale_serve_on_upstream_failure_after_cached_success no_cache_no_upstream_returns_error upstream_invalid_json_is_treated_as_error upstream_sha256_is_deterministic Security highlights (full review in ADR-102 §"Security review"): - The registry is metadata-only; per-cog binary signatures (ADR-100) remain the trust root for installs. A compromised registry can mislead a human reader but cannot ship malicious binaries. - 8 MiB cap + 10s timeout + Option<Arc<...>> via Extension layer means the endpoint can't be used to exhaust memory or pin tokio threads. - Stale-on-error responses carry an explicit `stale: true` field so upstream outages are visible to consumers rather than silently masked. - Endpoint sits behind the existing RUVIEW_API_TOKEN bearer gate when set, otherwise unauthenticated (registry contents are public anyway). * chore: refresh Cargo.lock for ureq/sha2/thiserror deps added by ADR-102
12 KiB
ADR-102: Edge Module Registry Integration
- Status: Accepted
- Date: 2026-05-19
- Deciders: ruv
Context
The Cognitum app ecosystem publishes a canonical app store catalog at:
https://storage.googleapis.com/cognitum-apps/app-registry.json
As of v2.1.0 (2026-05-13) the registry advertises 105 cogs across 11 categories (health, security, building, retail, industrial, research, ai, swarm, signal, network, developer). Each entry carries id, name, category, version, description, size_kb, difficulty, sha256, binary_size, and a config[] schema describing the runtime parameters the appliance offers when installing the cog.
RuView today has no live awareness of this catalog. The README.md capability table is hand-curated; the UI surfaces only the capabilities the dashboard's HTML knows about; nothing in wifi-densepose-sensing-server references the registry. Result: when Cognitum ships a new cog (the registry was last updated 6 days ago — a fast cadence), RuView stays unaware until someone manually edits the README. Customers running the RuView dashboard against a real appliance see a 10-capability bag in the UI while the appliance is actually capable of installing 105 cogs.
Today's cog-pose-estimation@0.0.1 release (PRs #642 / #643, ADR-100, ADR-101) is the first cog this repo ships to that registry. We need the discovery side to match.
Decision
wifi-densepose-sensing-server will fetch app-registry.json on demand, cache it in process memory with a TTL, and serve it back through a new endpoint:
GET /api/v1/edge/registry
GET /api/v1/edge/registry?refresh=1 (force-bypass cache, log if abused)
The registry is passively surfaced, not modified. RuView is a presentation layer for the canonical Cognitum catalog; it never re-signs entries or re-hosts binaries.
Module
v2/crates/wifi-densepose-sensing-server/src/edge_registry.rs — small, ~150 lines.
pub struct EdgeRegistry {
cached: RwLock<Option<CachedEntry>>,
ttl: Duration,
upstream_url: String,
}
struct CachedEntry {
payload: serde_json::Value,
fetched_at: Instant,
upstream_sha256: String,
}
Cache semantics:
- TTL 3600 s (1 hour) by default — registry updates land on a roughly-weekly cadence and a stale-by-an-hour catalog is fine.
?refresh=1bypasses the cache but writes a debug log so accidental abuse is visible.- On upstream fetch failure when the cache is non-empty, serve the stale cached copy with a
stale: truemarker in the response and a 200 status (preserve UI), not a 5xx. - On upstream fetch failure when the cache is empty, return 503 with the upstream error in the body.
Response shape
{
"fetched_at": 1779200000, // server-side fetch timestamp
"ttl_seconds": 3600,
"stale": false, // true when serving past TTL because upstream is down
"upstream_url": "https://storage.googleapis.com/cognitum-apps/app-registry.json",
"upstream_sha256": "<sha256-of-payload-bytes>",
"registry": { /* full canonical JSON as returned upstream */ }
}
The registry field is the upstream JSON inlined verbatim so consumers don't need to make a second hop. upstream_sha256 lets a paranoid consumer compare against a pinned hash.
Trust / verification
- Bucket is public-read with object versioning enabled (per ADR-100 §"GCS misconfiguration risks").
- The cog-level
binary_sha256+binary_signature(ADR-100) are the trust roots for installs. The registry itself is not signed today. - We deliberately do not add a signature requirement to the registry JSON in this ADR — that would block the integration on a parallel infrastructure project. A future ADR can layer signature checks on top once the publisher pipeline emits them.
UI surfacing
New page ui/edge-modules.html renders the registry into category sections with cog cards. Each card links out to the Cognitum V0 appliance's /cogs page (http://cognitum-v0:9000/cogs#<id>) for the install action — RuView itself never installs.
The existing dashboard's "Capabilities" section continues to show RuView-native sensing capabilities (presence, breathing, pose, etc. — the things RuView itself runs); the new edge-modules page shows the broader Cognitum cog catalog. The two are distinct surfaces and shouldn't be merged.
Failure modes
| Scenario | Behaviour |
|---|---|
| Upstream returns 200 with valid JSON | Cache it, return it. |
| Upstream returns 200 with invalid JSON | Treat as failure; serve stale if available else 503. Log the upstream sha + the parse error. |
| Upstream returns 4xx / 5xx | Same as JSON-invalid: serve stale if available else 503. |
| TLS / DNS / timeout error | Same. |
| Upstream is permanently moved | Operator updates the upstream_url config (CLI flag added). No code change required to migrate registries. |
Configuration
--edge-registry-url <URL>— override the default (default:https://storage.googleapis.com/cognitum-apps/app-registry.json)--edge-registry-ttl-secs <N>— override the cache TTL (default: 3600)--no-edge-registry— disable the endpoint entirely (returns 404). For air-gapped deployments.
Consequences
Positive
- One source of truth for the cog catalog across RuView + Cognitum dashboards.
- Zero ongoing maintenance: when Cognitum publishes registry v2.2.0, RuView sees it within an hour without a release.
- The endpoint is also useful for non-UI consumers (CI checks, fleet automation, third-party integrations).
- Lets us deprecate the hand-curated README capability table in favour of generated content (separate PR).
Negative
- Adds an outbound HTTP dependency to the sensing-server. Air-gapped deployments must use
--no-edge-registry. - Stale-but-served behaviour can mask upstream outages from operators. Mitigation: include
stale: true+fetched_atin the response so the UI can render a "registry possibly out of date" badge.
Risks
- Upstream rug-pull: if
cognitum-appsis deleted or replaced, the endpoint goes dark. The--edge-registry-urlflag lets operators repoint without a code change. Long-term, RuView could mirror the registry into its own GCS bucket if the relationship requires it. - Cache poisoning: the upstream is public-read; an attacker who breaches Cognitum's GCS write could push a bad registry. The cog-level signatures (ADR-100) limit the blast radius — bad registry entries can't install bad binaries, only show wrong metadata. Acceptable until registry-level signing lands.
Security review
A real review of the attack surface this endpoint introduces.
Threats considered
| # | Threat | Mitigation in this ADR |
|---|---|---|
| T1 | SSRF — operator-supplied --edge-registry-url redirects fetches to an internal target |
Flag is operator-only (CLI / env) — there is no API endpoint to mutate it at runtime. Operators are already trusted (they control the binary). |
| T2 | Outbound dependency reveals deployment — a passive observer of the egress sees the appliance phoning home to GCS | Documented in the docstring + the runtime startup log. Operators wanting offline deployments use --no-edge-registry. |
| T3 | Malicious upstream registry — Cognitum's GCS bucket is breached and a poisoned app-registry.json is served |
Two layers absorb this: (a) the registry's role is discovery only — installs verify the per-cog binary_sha256 + binary_signature (ADR-100); a wrong description string can mislead a human, but a wrong binary still has to pass Ed25519 against COGNITUM_OWNER_SIGNING_KEY. (b) The endpoint exposes upstream_sha256 so a paranoid operator can pin the expected registry hash externally and alert on drift. |
| T4 | Response inflation — upstream returns a multi-GB payload to exhaust memory | MAX_PAYLOAD_BYTES = 8 MiB cap (current registry is ~50–200 KB). Exceeding cap returns an error without buffering past the cap. |
| T5 | Slow upstream blocking server threads — Slowloris-style stall on the fetch | 10-second wire timeout via ureq::AgentBuilder. Per-handler fetch runs inside tokio::task::spawn_blocking so a stalled fetch never blocks the async runtime. |
| T6 | Denial via ?refresh=1 abuse — unauthenticated callers force-bypass the cache repeatedly |
Cache lives in process; ?refresh=1 triggers a single upstream fetch behind a synchronous code path. A flood of refresh requests is rate-limited by the upstream's own throttling (GCS) and locally serialised by Rust's RwLock. Refresh requests are logged at debug so abuse is visible. Follow-up: add per-IP rate-limit middleware if seen abused (separate PR; tracked in #574-style follow-up). |
| T7 | JSON deserialisation panics — malformed registry triggers a Rust panic | Payload is parsed as serde_json::Value (opaque untyped tree) — never coerced into a strongly-typed struct that could panic. Failure is propagated as FetcherError::Network which the handler maps to 503. |
| T8 | Stale-on-error masks outages from operators | Response carries stale: true + fetched_at (unix timestamp). UI rendering MUST surface this badge — encoded as an explicit field, not an implicit silence. |
| T9 | TLS downgrade / MITM on the fetch | ureq is built with the tls feature (rustls) by default. No --insecure flag exists. If the upstream uses LetsEncrypt the cert chain is system-trusted; certificate pinning is out of scope (would block the bucket from rotating certs). |
| T10 | Unauthenticated access exposes ‘what cogs exist’ | The registry is canonical-public information (already public-read on GCS via anonymous HTTP GET). Surfacing it on a local LAN HTTP API does not increase its disclosure. The endpoint stays under the project's existing RUVIEW_API_TOKEN Bearer auth — when set, the registry is gated like other /api/v1/* routes. |
| T11 | Configuration injection via env var — RUVIEW_EDGE_REGISTRY_URL set to a malicious URL by an attacker who controls the process environment |
If an attacker controls the env, they own the process; this is not a new threat surface. Documented in the CLI help. |
| T12 | Cache mutation across threads / poisoning | The cache is RwLock<Option<CachedEntry>>. Writes go through cached.write() once per fetch. Snapshot reads clone() the CachedEntry (cheap — Value is reference-counted internally for large strings) so concurrent readers don't share mutable state. Tests cover the multi-call path; no unsafe is used. |
What this ADR does NOT secure
- Registry-level signing — the JSON payload itself is unsigned. If/when Cognitum's publisher pipeline emits a registry sig (e.g. detached
.json.sig), a follow-up ADR will require it. Today the per-cog binary signature (ADR-100) is the actual trust root for installs; the registry is metadata. - Per-client rate-limiting on
?refresh=1— relies on the upstream's own throttling. If we see abuse we'll add a token-bucket middleware; not needed for v0.0.1.
Testing
| Test | What it verifies |
|---|---|
first_call_hits_upstream_and_caches |
Single fetch, then cache hit |
ttl_expiry_triggers_refetch |
Cache TTL bound respected |
force_refresh_bypasses_fresh_cache |
?refresh=1 semantics |
stale_serve_on_upstream_failure_after_cached_success |
T8 explicit (stale: true returned) |
no_cache_no_upstream_returns_error |
T3/T5 — error propagated cleanly when nothing to fall back on |
upstream_invalid_json_is_treated_as_error |
T7 — malformed payload doesn't panic |
upstream_sha256_is_deterministic |
T3 — hash field is reliable for external pinning |
All 7 tests in src/edge_registry.rs::tests pass.
Migration
- Land this ADR + the implementing PR.
- UI: ship
ui/edge-modules.htmland link fromindex.html. - After two clean releases of the endpoint, remove the hand-curated "Capabilities" table from
README.mdand replace with a small "see the appliance for the full catalog" pointer. - Future ADR: registry signing once Cognitum's publisher pipeline emits a sig.
See also
- ADR-100: Cognitum Cog Packaging Specification (binary trust model).
- ADR-101: Pose Estimation Cog (the first repo-shipped cog visible in the registry).
- v0-appliance ADR-220: Cog management surface (where this registry is the input to install actions).
docs/benchmarks/pose-estimation-cog.md: the per-cog benchmark format this ADR's response shape complements.