* feat(worldmodel): ADR-147 — OccWorld integration, wifi-densepose-worldmodel v0.3.0 (#854) - New crate `wifi-densepose-worldmodel` v0.3.0: async Unix-socket bridge to OccWorld Python inference server; `OccWorldBridge`, `OccupancyGrid3D`, `TrajectoryPrior`, `worldgraph_to_occupancy` encoder (14/14 tests pass) - `scripts/occworld_server.py`: long-lived Python inference server for OccWorld TransVQVAE (72.4M params); applies API-bug patches; dummy mode for CI testing; graceful SIGTERM shutdown - `pose_tracker.rs`: `trajectory_prior` soft-blend injection (80/20 Kalman/prior) on torso keypoint; `set_trajectory_prior()` public method - CI: added `Run ADR-147 worldmodel tests` step - ADR-147: accepted — OccWorld primary (209 ms, 3.37 GB VRAM, RTX 5080); Cosmos deferred to ADR-148 (32.54 GB VRAM exceeds hardware) - Benchmark proof: 208.7 ms P50, 3.37 GB peak VRAM, 12.1 GB headroom Co-Authored-By: claude-flow <ruv@ruv.net> * chore: update ruvector.db state Co-Authored-By: claude-flow <ruv@ruv.net> * chore: ruvector.db sync Co-Authored-By: claude-flow <ruv@ruv.net> * fix(cli): add missing min_frames field to CalibrateArgs test helper E0063 in calibrate.rs:448 — CalibrateArgs gained min_frames in ADR-135 but the default_args() test helper was not updated. min_frames=0 means 'use tier default', matching the existing runtime behaviour. Co-Authored-By: claude-flow <ruv@ruv.net>
12 KiB
ADR-147: Occupancy World Model Integration (OccWorld / RoboOccWorld)
| Field | Value |
|---|---|
| Status | Accepted |
| Date | 2026-05-29 |
| Deciders | ruv |
| Relates to | ADR-136, ADR-139, ADR-140, ADR-141, ADR-143, ADR-145, ADR-146 |
Previously titled "NVIDIA Cosmos WFM Integration". Decision revised after hardware analysis confirmed RTX 5080 (16 GB VRAM) cannot run Cosmos-Transfer2.5-2B (requires 32.54 GB). OccWorld runs in 1.65 GB VRAM at 375 ms/inference — validated locally.
1. Context
RuView's WorldGraph (ADR-139) produces a current-state environmental digital twin; the RF encoder (ADR-146) predicts present-frame pose/presence/count at ~20 Hz. There is no future-state prediction — no trajectory priors beyond the Kalman tracker's 5–10 frame horizon, and no physics-aware validation of SemanticState updates.
Two world-model families were evaluated:
1.1 NVIDIA Cosmos (deferred)
Cosmos-Transfer2.5-2B requires 32.54 GB VRAM. ruvultra has an RTX 5080 with 15.5 GB VRAM. Cannot run locally. Deferred to ADR-148 for when H100/A100 access is available or for offline training data generation only.
1.2 OccWorld / RoboOccWorld (this ADR)
| Model | Domain | Input | VRAM (inf) | Status |
|---|---|---|---|---|
| OccWorld (wzzheng/OccWorld, ECCV 2024) | Outdoor AV (nuScenes) | 3D semantic voxel seq | 1.65 GB validated | Code available, Apache-2.0 |
| RoboOccWorld (arXiv 2505.05512) | Indoor robotics | 3D voxel seq, camera poses | ~2–4 GB estimated | Code not yet released (~Q3 2025) |
Both operate natively in 3D occupancy space — the same representation RuView produces from WiFi CSI. No video rendering intermediate is needed (unlike Cosmos).
OccWorld architecture: VQVAE tokenizer (72.4M params) encodes 3D semantic occupancy
to discrete latent tokens → PlanUAutoRegTransformer predicts future tokens → VQVAE
decoder reconstructs future 3D occupancy. Input: (B, F, H, W, D) voxel grid with
integer class labels. Output: predicted occupancy for the next F−1 timesteps.
RoboOccWorld (once released): identical paradigm but trained on indoor scenes (60×60×36 voxels at 0.08 m/voxel, 4.8×4.8×2.88 m space, 12 indoor semantic classes) — near-perfect match for RuView's room-scale CSI occupancy.
2. Decision
Phase A (now): Use OccWorld as the integration scaffold. Run inference from a Python subprocess. Adapt its dataset loader to accept RuView's custom occupancy format. Remap semantic classes from nuScenes outdoor (18 classes) to RuView indoor (wall, floor, person, furniture, free).
Phase B (Q3–Q4 2025): Swap in RoboOccWorld when its code releases. The Rust
OccupancyWorldModel interface (§3) is designed for clean backend swap.
Cosmos: Deferred. Revisit as an offline training data generator if H100 becomes available (ADR-148).
3. Validated Installation (ruvultra, 2026-05-29)
3.1 Environment
| Component | Version | Notes |
|---|---|---|
| GPU | RTX 5080, 15.5 GB VRAM | sm_120 (Blackwell) |
| PyTorch | 2.10.0+cu128 | ml-env, Python 3.12 |
| CUDA toolkit | 12.8 | /usr/local/cuda-12.8 |
| mmcv | 2.0.1 (Python-only, no CUDA ops) | Built from source with pkg_resources patch |
| mmdet | 3.0.0 | pip install |
| mmdet3d | 1.1.1 | Built from source with --no-deps |
| mmengine | 0.10.7 | pip install via mmcv |
| OccWorld | commit HEAD | ~/projects/OccWorld |
3.2 Build Notes
Issue 1 — sccache compiler wrapping: System CC=sccache clang, CXX=sccache clang++
breaks PyTorch CUDA extension builds (injects clang as a positional argument to the
build command). Fix: unset CC CXX before all pip install.
Issue 2 — pkg_resources in mmcv setup.py: setuptools ≥72 removed the legacy
pkg_resources top-level import. Fix: patch line 5 of setup.py to use
importlib.metadata and packaging.version.
Issue 3 — CUDA version mismatch: host nvcc is CUDA 13.0; PyTorch was built with
12.8. Fix: CUDA_HOME=/usr/local/cuda-12.8 for all builds.
Issue 4 — mmcv 2.0.1 CUDA ops incompatible with PyTorch 2.10 ATen headers:
c10::Type::TypePtr dereference operator changed. Fix: build MMCV_WITH_OPS=0
(Python-only build, mmcv-lite). OccWorld's inference path does not use mmcv CUDA ops.
Issue 5 — OccWorld API bug: TransVQVAE.forward_inference calls
self.transformer(..., hidden=hidden) but PlanUAutoRegTransformer.forward(tokens, pose_tokens)
has no hidden kwarg and returns a (queries, pose_queries) tuple.
Fix: monkey-patch forward_inference to pass pose_tokens=zeros and unpack the
tuple return. Applied in the Python subprocess at startup.
3.3 Validation Results
Input: torch.Size([1, 16, 200, 200, 16]) — 16 frames (15 past + 1 offset)
Output: sem_pred (1, 15, 200, 200, 16) int64 — predicted future occupancy
logits (1, 15, 200, 200, 16, 18) f32 — class logits
iou_pred (1, 15, 200, 200, 16) int64 — binary occupancy mask
Inference time: 375 ms
VRAM peak: 1.65 GB
Parameters: 72.4M
OccWorld produces 15 predicted future frames from 15 past frames of 3D semantic occupancy at 200×200×16 resolution with 18 classes — fully validated on RTX 5080.
4. Integration Architecture
4.1 Data Flow
ESP32-S3 CSI (20 Hz)
│
▼
[ruvsense signal pipeline] ── ADR-136 frame contracts
│
▼
[RfEncoder / MultiTaskOutput] ── ADR-146 pose + presence + count
│ (sub-Hz WorldGraph update rate)
▼
[WorldGraph] ── PersonTrack, ObjectAnchor, SemanticState ── ADR-139/140
│
│ On semantic event (motion, activity change, fall-risk query)
▼
[BFLD Privacy Gate] ── ADR-141: "occworld_inference" action
│ PRIVATE/HOME → bridge NOT called
│ MONITORING/AWAY → local inference permitted
▼
[wifi-densepose-worldmodel] ── Rust thin client (Unix socket)
│
▼
[OccWorld Inference Server] ── Python subprocess (~/projects/OccWorld)
│ WorldGraph PersonTrack history → (B, F, H, W, D) occupancy tensor
│ OccWorld forward_inference → sem_pred (15 future frames)
│ Decode future voxels → TrajectoryPrior per PersonTrack
│
▼
[Trajectory priors injected into ruvsense/pose_tracker.rs Kalman filter]
[WorldGraph::upsert_node(Event { predicted_movement, ... })]
SemanticProvenance { model_version, calibration_id, privacy_decision }
4.2 Rust Interface (wifi-densepose-worldmodel crate — to be created)
Interface designed to be backend-agnostic (OccWorld today, RoboOccWorld when released):
pub struct OccupancyWorldModelRequest {
pub past_frames: Vec<OccupancyGrid3D>, // N frames of history
pub voxel_resolution: f32, // metres/voxel
pub scene_bounds: AabbEnu, // room extent in ENU
pub prediction_steps: u32, // how many future steps
}
pub struct OccupancyWorldModelResponse {
pub future_frames: Vec<OccupancyGrid3D>, // predicted future occupancy
pub confidence: f32,
pub model_id: String, // checkpoint hash for provenance
}
pub struct OccWorldBridge {
socket_path: PathBuf,
client: reqwest::Client,
}
impl OccWorldBridge {
pub async fn predict(
&self,
request: OccupancyWorldModelRequest,
) -> Result<OccupancyWorldModelResponse, WorldModelError>;
}
4.3 RuView → OccWorld Adaptation (required before production use)
OccWorld was trained on nuScenes outdoor driving (200×200×16 at 0.4 m/voxel, 80×80×6.4 m, 18 outdoor classes). RuView uses indoor room-scale occupancy (~10×10×3 m at finer resolution). Required adaptations:
- New dataset loader: replace
nuScenesSceneDatasetLidarTraversewith aRuViewOccDatasetthat reads WorldGraph history snapshots and returns the(B, F, H, W, D)tensor in OccWorld's expected format. - Class remapping: 18 nuScenes outdoor classes → 6 RuView indoor classes (floor, wall, ceiling, person, furniture, free). Remap during tensor construction.
- Ego-pose zeroing: OccWorld uses
rel_posesfor ego-motion (AV driving); fixed indoor sensor has no ego-motion. Pass zero poses inforward_inference_with_plan. - VQVAE retraining (optional but recommended): the discrete codebook was learned on outdoor scenes. Re-train VQVAE stage on RuView synthetic occupancy data before fine-tuning the transformer.
- Resolution rescaling: if indoor occupancy uses finer voxels (e.g. 0.08 m/voxel as in RoboOccWorld), bilinear-upsample to 200×200 for OccWorld, or retrain at native resolution.
4.4 Privacy Compliance (ADR-141)
The OccWorld bridge is a new occworld_inference action in the BFLD privacy control plane:
| Action | PRIVATE | HOME | MONITORING | AWAY |
|---|---|---|---|---|
occworld_inference (local) |
✗ | ✗ | ✓ | ✓ |
All SemanticState nodes derived from predictions carry SemanticProvenance:
privacy_decision: PrivacyDecisionRef { mode, action: "occworld_inference", timestamp }
model_version: <OccWorld checkpoint hash>
calibration_id: <active baseline from ADR-135>
5. Consequences
5.1 Positive
- Validated locally: 375 ms inference, 1.65 GB VRAM — fits comfortably on RTX 5080
- 15-frame prediction horizon (~7.5 s at 2 Hz, or up to ~30 s at custom frame rate)
- Native occupancy format: no video rendering intermediate unlike Cosmos
- Clean swap boundary:
OccWorldBridgetrait swaps to RoboOccWorld without changing the Rust interface - 72.4M params: small enough to fine-tune on a single RTX 5080
- No Python in Rust workspace: subprocess isolation preserves Rust-only mandate
5.2 Negative
- Domain gap: nuScenes outdoor training vs indoor WiFi sensing — VQVAE codebook and transformer weights encode outdoor semantics; retraining required for quality results
- No ego-pose equivalent in fixed indoor sensors —
rel_posesmust be zeroed - Pre-trained weights predict outdoor scene evolution; uncalibrated predictions for indoor scenes are semantically meaningless without retraining
- RoboOccWorld (indoor-native, 0.08 m/voxel) not yet available; current OccWorld is a placeholder until it releases
5.3 Risks
| Risk | Likelihood | Mitigation |
|---|---|---|
| RoboOccWorld delayed past Q4 2025 | Medium | OccWorld retrained on synthetic RuView data as fallback |
| VQVAE codebook quality low on indoor after retraining | Low | RoboOccWorld swap; OccWorld still useful for coarse occupancy |
| OccWorld API drift (unmaintained repo) | Low | Local fork at ~/projects/OccWorld; patches documented above |
| WorldGraph update rate too low for meaningful sequences | Medium | Log WorldGraph snapshots at configurable rate for inference |
6. Implementation Phases
| Phase | Scope | Status |
|---|---|---|
| 1 | Install OccWorld; validate forward pass with synthetic data | Done (2026-05-29) |
| 2 | wifi-densepose-worldmodel Rust thin client crate (Unix socket bridge) |
Next |
| 3 | RuViewOccDataset loader + class remapping + ego-pose zeroing |
Pending |
| 4 | Trajectory prior injection into pose_tracker.rs Kalman filter |
Pending |
| 5 | VQVAE + transformer retraining on RuView synthetic occupancy | Pending |
| 6 | Swap to RoboOccWorld backend when code releases | Q3–Q4 2025 |
7. Cosmos Path (Deferred — ADR-148)
NVIDIA Cosmos-Transfer2.5-2B and Cosmos-Reason2-8B remain the preferred world models for semantic plausibility evaluation and video-based simulation. They are deferred to ADR-148, which will cover:
- H100/A100 access (cloud or co-lo) for Cosmos inference
- Offline synthetic training data generation for ADR-146 RF encoder heads
- Cosmos-Reason2-8B as a physics plausibility gate for SemanticState commits
8. References
- OccWorld (ECCV 2024): https://github.com/wzzheng/OccWorld, arXiv 2311.16038
- RoboOccWorld (May 2025): arXiv 2505.05512
- PyTorch 2.7 Blackwell support: https://pytorch.org/blog/pytorch-2-7/
- NVIDIA Cosmos (deferred): https://www.nvidia.com/en-us/ai/cosmos/, arXiv 2511.00062
- Cosmos-Transfer1: arXiv 2503.14492