mirror of https://github.com/ruvnet/RuView.git synced 2026-06-02 00:58:56 +02:00

Files

T

History

ruv 0fbdd15955 docs: results+proof links, capabilities-proof rebuttal, fix stale claims

- README: replace retracted "100% presence" claim with honest 82.3%
  held-out temporal-triplet; correct stale "pose model not in this
  release" (now live at ruvnet/wifi-densepose-mmfi-pose, 82.69%
  torso-PCK@20 SOTA); add a Results & proof table (HF models,
  AetherArena, benchmark study, deterministic verify.py proof, witness).
- user-guide: same 100%->82.3% correction in two places; add Results &
  proof pointers and the SOTA pose model + AetherArena links.
- docs/proof-of-capabilities.md (new): evidence-first rebuttal to the
  "fake / misleading" claims. Concedes what was fair (over-stated early
  metrics, AI-doc tone), refutes the category errors (simulate-mode
  mistaken for fraud; missing weights mistaken for missing pipeline),
  and gives copy-paste "prove it yourself" steps (verify.py VERDICT:
  PASS + published SHA-256, cargo test, HF model pull, ESP32 CSI).
  Emphasizes built-in-public history (git, 96 ADRs, CHANGELOG, issues
  incl. #803/#872 bug->fix arcs) as the anti-facade evidence.
- aether-arena/VERIFY.md: cross-link the whole-platform proof doc.

Verified: python archive/v1/data/proof/verify.py -> VERDICT: PASS
(hash ca58956c...9199 matches published expected_features.sha256).

Co-Authored-By: claude-flow <ruv@ruv.net>

2026-05-31 10:29:28 -04:00

calibration

feat(calibration): cog adapter producer — completes the cog --adapter feature

2026-05-31 05:10:07 -04:00

fixtures

feat(aether-arena): benchmark-first scorer + witness chain + repeatability (M2/M5/M7)

2026-05-30 16:59:11 -04:00

ledger

publish: best MM-Fi benchmark set (in-domain 83.59, x-subject 64.0, x-env 17.5 CORAL)

2026-05-30 22:22:53 -04:00

schema

feat(aether-arena): ADR-149 spatial-intelligence benchmark — scorer + CI harness gate (M1-M4)

2026-05-30 16:47:22 -04:00

space

publish: best MM-Fi benchmark set (in-domain 83.59, x-subject 64.0, x-env 17.5 CORAL)

2026-05-30 22:22:53 -04:00

README.md

feat(aether-arena): ADR-149 spatial-intelligence benchmark — scorer + CI harness gate (M1-M4)

2026-05-30 16:47:22 -04:00

STATUS.md

docs(aether-arena): v0 infrastructure complete — Space live, harness gate passing (M8)

2026-05-30 17:15:08 -04:00

VERIFY.md

docs: results+proof links, capabilities-proof rebuttal, fix stale claims

2026-05-31 10:29:28 -04:00

README.md

AetherArena ("AA") — The Official Spatial-Intelligence Benchmark

Public leaderboard. Private evaluation split. Open scorer. Signed results.

AetherArena is a standalone, project-agnostic benchmark for camera-free spatial intelligence — pose, presence, occupancy, tracking, and vitals from RF/WiFi (and, over time, mmWave / UWB / radar / lidar / multimodal). It is not a single-vendor leaderboard: any team, framework, or sensing modality can enter, and every entrant — including the RuView baseline that donated the seed scorer — is scored by the identical, open, pinned harness.

Specified in ADR-149 (Accepted).

Canonical home: ruvnet/aether-arena + a Hugging Face Space (deploy pending — see STATUS).

Why

WiFi/RF spatial sensing has no shared yardstick — papers self-report against inconsistent splits and metrics, with no accounting for latency, reproducibility, or privacy leakage. AA fixes the measurement, not just the models: a single deterministic scorer, a private held-out split nobody can train on, and a signed result ledger that can't be silently edited.

What gets measured (v0)

Category	Metric	Status
Pose	PCK@0.2 (all / torso), OKS	Ranked
Presence	accuracy, FP/FN	Ranked
Edge latency	p50 / p95 / p99 ms	Ranked
Determinism	proof-hash pass/fail	Ranked (gate)
Tracking (MOTA)	—	activates when multi-person clips land
Vitals (BPM err)	—	activates when paired vitals ground truth lands
Privacy leakage	membership-inference ∈ [0,1]	gated — not ranked until the attacker ships
Cross-room	degradation ratio	coming soon

The headline rank is the category metric; an optional arena_score = quality × latency_factor × privacy_factor × determinism_gate is exposed alongside (never instead) so accuracy can't win at any cost. See ADR-149 §2.5.

How scoring works

The scorer is RuView's already-published wifi-densepose-train acceptance harness (ruview_metrics + ADR-145 ablation), run in a pinned sandbox. You submit a model, not predictions — predictions on data you hold prove nothing. Your model is scored against a private MM-Fi held-out split (CC BY-NC 4.0; Wi-Pose excluded for redistribution reasons), and one signed, append-only row is written to the results ledger with a determinism proof hash.

Submission lifecycle: submitted → validated → quarantined → smoke_scored → full_scored → published (or rejected with a reason). The model only ever runs inside a no-network, read-only-FS sandbox.

Submit (when the Space is live)

Write a manifest: schema/aa-submission.toml.
Push your model artifact (.safetensors / .rvf / LoRA adapter) + manifest to the Space.
Watch it move through the lifecycle; your signed row appears on the board.

Verify it's fair (you don't have to trust us)

See VERIFY.md — run the open scorer locally on the public smoke split, reproduce the determinism hash, and confirm RuView's own entries were scored by the identical path. That five-step check is the launch gate (ADR-149 §7).

Neutrality

AA is a neutral commons. The scorer is open and versioned; any metric change is a public harness_version bump that re-scores all entries. RuView donated the seed harness and enters as one baseline — it gets no special treatment (ADR-149 §2.8).

README.md Unescape Escape

AetherArena ("AA") — The Official Spatial-Intelligence Benchmark

Why

What gets measured (v0)

How scoring works

Submit (when the Space is live)

Verify it's fair (you don't have to trust us)

Neutrality

README.md