wifi-ruview

mirror of https://github.com/ruvnet/RuView.git synced 2026-06-02 00:58:56 +02:00

Author	SHA1	Message	Date
rUv	f850d46e9a	Merge pull request #874 from ruvnet/feat/adr-149-aether-arena feat(aether-arena): ADR-149 Spatial-Intelligence Benchmark — scorer + CI harness gate v1528	2026-05-31 11:32:26 -04:00
ruv	4896d05cca	fix(proof): regenerate ADR-134 CIR witness hash after CIR fixes Rust Workspace Tests failed the CIR determinism guard: expected 120bd7b1… (from the original ADR-134, #837) vs actual 304d5469…. The later CIR fixes on this branch (windowed dominant-tap ratio, λ tuning, causal-delay-window rms — ADR-134 P2) intentionally changed the CirEstimator output but never regenerated the witness hash. The new output is bit-deterministic and cross-platform stable: the Rust cir_proof_runner produces 304d5469… on both Linux CI and local Windows. Regenerated via the sanctioned `--generate-hash` path; verify-cir-proof.sh now prints "VERDICT: PASS (CIR hash matches)". Co-Authored-By: claude-flow <ruv@ruv.net>	2026-05-31 11:11:38 -04:00
ruv	e84aef223c	ci(ruview-swarm): install clippy on the pinned 1.89 toolchain The clippy job failed with "cargo-clippy is not installed for the toolchain '1.89'". v2/rust-toolchain.toml pins channel "1.89" (profile "minimal", no clippy); dtolnay@stable installed clippy on the floating "stable" toolchain, but the override makes cargo use the separate "1.89" toolchain in working-directory v2. Pin the toolchain input to "1.89" so clippy lands on the toolchain cargo actually runs. (The real clippy lint it then catches — manual_is_multiple_of — was fixed in 29e698a05.) Co-Authored-By: claude-flow <ruv@ruv.net>	2026-05-31 10:51:04 -04:00
ruv	810ee656de	fix(bfld): gate PrivacyAttestationProof::compute behind std CI `cargo test --no-default-features (baseline regression)` failed with `error: associated function compute is never used` under -D warnings. compute() is only reachable via PrivacyModeRegistry (#[cfg(feature = "std")]); without std there is no caller. Gate the impl to match its only callers. Verified clean under --no-default-features, default, and --features mqtt with RUSTFLAGS=-D warnings. Co-Authored-By: claude-flow <ruv@ruv.net>	2026-05-31 10:45:38 -04:00
ruv	29e698a05c	fix(ruview-swarm): clippy manual_is_multiple_of in lawnmower planner CI `clippy (-D warnings, --no-deps)` failed on patterns.rs:131 — `row % 2 == 0` is flagged by clippy::manual_is_multiple_of. Use `row.is_multiple_of(2)` (identical even-row check). Both CI clippy variants (--no-default-features and --features full,train) now pass. Co-Authored-By: claude-flow <ruv@ruv.net>	2026-05-31 10:41:05 -04:00
ruv	138449a378	Merge remote-tracking branch 'origin/main' into feat/adr-149-aether-arena # Conflicts: # CHANGELOG.md	2026-05-31 10:36:12 -04:00
ruv	6778c708ff	chore(gitignore): exclude MM-Fi dataset archives (assets/MM-Fi/*.zip) The MM-Fi benchmark environment archives (E01-E04.zip) are large data files fetched separately for evaluation — they must never be committed. Also keeps the existing aether-arena/staging/ private-staging exclusion. Co-Authored-By: claude-flow <ruv@ruv.net>	2026-05-31 10:33:13 -04:00
ruv	0fbdd15955	docs: results+proof links, capabilities-proof rebuttal, fix stale claims - README: replace retracted "100% presence" claim with honest 82.3% held-out temporal-triplet; correct stale "pose model not in this release" (now live at ruvnet/wifi-densepose-mmfi-pose, 82.69% torso-PCK@20 SOTA); add a Results & proof table (HF models, AetherArena, benchmark study, deterministic verify.py proof, witness). - user-guide: same 100%->82.3% correction in two places; add Results & proof pointers and the SOTA pose model + AetherArena links. - docs/proof-of-capabilities.md (new): evidence-first rebuttal to the "fake / misleading" claims. Concedes what was fair (over-stated early metrics, AI-doc tone), refutes the category errors (simulate-mode mistaken for fraud; missing weights mistaken for missing pipeline), and gives copy-paste "prove it yourself" steps (verify.py VERDICT: PASS + published SHA-256, cargo test, HF model pull, ESP32 CSI). Emphasizes built-in-public history (git, 96 ADRs, CHANGELOG, issues incl. #803/#872 bug->fix arcs) as the anti-facade evidence. - aether-arena/VERIFY.md: cross-link the whole-platform proof doc. Verified: python archive/v1/data/proof/verify.py -> VERDICT: PASS (hash ca58956c...9199 matches published expected_features.sha256). Co-Authored-By: claude-flow <ruv@ruv.net>	2026-05-31 10:29:28 -04:00
ruv	4007db5d13	fix(sensing-server): fix CSI per-node count clamp — #803 (part 2) The pure-CSI per-node path clamped its own occupancy estimate before the aggregator could read it. estimate_persons_from_correlation (DynamicMinCut) returns 0-3, but it was mapped to a score via `corr_persons / 3.0`, putting 2 people at 0.667 — just under the 0.70 up-threshold of score_to_person_count — so the per-node count never climbed past 1, leaving node_max stuck at 1 for CSI-only nodes even when the min-cut cleanly separated two people. Replace the lossy /3.0 mapping with a threshold-aligned corr_persons_to_score (1->0.40, 2->0.74, 3->0.96) whose steady state round-trips back to the same count through the EMA + hysteresis bands, while still gating transient noise. A convergence test replays the exact CSI-loop EMA and asserts min-cut=2 now reports 2 / 3 reports 3 / 1 reports 1, plus a regression test documenting that the old /3.0 mapping pinned two people to 1. Full suite: 586 passed, 0 failed. Co-Authored-By: claude-flow <ruv@ruv.net>	2026-05-31 10:09:58 -04:00
ruv	a933fc7732	fix(sensing-server): surface count-aware per-node estimates — #803 Person count was pinned to 1 because the aggregate was derived from `smoothed_person_score`, an EMA-smoothed activity score (amplitude variance / motion / spectral energy) that saturates near a single occupant and cannot discriminate count. The count-aware per-node estimates the ESP32 paths already compute (firmware n_persons, mincut corr_persons) were stored in NodeState::prev_person_count then discarded by the aggregator — the same dead-wiring class as #872. Add `aggregate_person_count(activity_count, node_states)` = max(activity, node_max) and use it at both ESP32 aggregation sites (edge-vitals + CSI loop, Some + fallback arms). It can only raise the count when a node positively reports more occupants, so the lone-occupant case is provably never inflated (regression-guarded). 5 new unit tests + full suite: 582 passed, 0 failed. Co-Authored-By: claude-flow <ruv@ruv.net>	2026-05-31 10:00:56 -04:00
ruv	415eaea849	docs(changelog): #872 MQTT publisher wiring fix Co-Authored-By: claude-flow <ruv@ruv.net>	2026-05-31 09:40:11 -04:00
ruv	a3f80b0cda	fix(sensing-server): wire MQTT publisher into the binary — closes #872 #872 reported '--mqtt: unexpected argument' on the Docker image; prior attempts chased a Docker rebuild, but the real cause was disconnected code: the --mqtt* flags lived only in cli::Args (dead code — referenced nowhere), while the binary parses a separate main::Args with no mqtt fields, and main.rs never declared/started the mqtt:: publisher. So MQTT was fully unwired: flags didn't parse, and the publisher never ran. Fix: - Extract the mqtt + privacy flags into a shared (#[derive(clap::Args)]); retarget mqtt::config::{from_args,build_tls} to it. - #[command(flatten)] MqttArgs into the binary's main::Args (using the lib crate's type so it matches from_args), so --mqtt* now parse. - Spawn the publisher on --mqtt: build MqttConfig, validate, and bridge the existing JSON sensing broadcast into the typed VitalsSnapshot stream the publisher consumes (defensive serde_json::Value mapping — absent fields default, never wrong values). #[cfg(feature=mqtt)]-gated; without the feature --mqtt WARNs and no-ops (documented contract). Fix the mqtt_publisher example for the new signature. Verified end-to-end against local mosquitto: publisher connects and emits 20 HA auto-discovery entities + live state (presence ON, person_count, …). Tests: 577 pass default / 580 pass --features mqtt / 0 fail; both configs build. Co-Authored-By: claude-flow <ruv@ruv.net>	2026-05-31 09:39:21 -04:00
ruv	edbe57378a	fix(signal/cir): un-ignore end-to-end CIR pipeline test — ADR-134 P2 fully resolved The cir_pipeline end-to-end test was gated on the same dominant_tap_ratio floor; the windowed-ratio fix resolves it. All 6 ADR-134 P2 CIR tests (cir_synthetic 5 + cir_pipeline 1) now pass. signal+cir: 472 pass / 0 fail. Co-Authored-By: claude-flow <ruv@ruv.net>	2026-05-31 06:27:50 -04:00
ruv	821f441af0	fix(signal/cir): causal-delay-window rms spread — resolves last ADR-134 P2 cir test Found the principled fix for the rms-delay-spread inflation (superseding my prior 'needs ISTA work' note): the spurious ~15-20% tap at ~bin 150 is an ALIAS of the near-zero dominant tap — the ISTA delay grid is circular (Φ is DFT-like), so bins >= G/2 are non-causal negative delays. Computing the delay spread over only the causal half [0, G/2) drops rms from 389ns to 65ns (true value), cleanly and robustly (no fragile magnitude threshold). Un-ignores should_produce_positive_rms_delay_spread. ADR-134 P2 cir_synthetic now FULLY resolved: all 5 previously-ignored tests pass via two physics-justified fixes (windowed dominant-ratio for super- resolution leakage + causal-window rms for circular-grid aliasing). signal+cir: 471 pass / 0 fail / 0 ignored in cir_synthetic. Co-Authored-By: claude-flow <ruv@ruv.net>	2026-05-31 06:26:48 -04:00
ruv	bce5765d89	docs(signal/cir): precise diagnosis of remaining ADR-134 P2 rms-spread failure Diagnosed the one still-ignored CIR test: ISTA emits a spurious ~15-20%-of- dominant tap at an implausible far delay (~bin 150 / ~3us) that inflates rms_delay_spread to ~390ns (vs ~53ns true). It sits too close to the real weakest tap (~30% of dominant) for a safe magnitude cutoff, so the proper fix is ISTA recovery-quality work (grid de-aliasing / far-tap suppression), not a band-aid threshold. Sharpened the #[ignore] note accordingly. signal+cir: 470 pass / 0 fail. Co-Authored-By: claude-flow <ruv@ruv.net>	2026-05-31 06:24:30 -04:00
ruv	d55c4d4b65	fix(signal/cir): resolve ADR-134 P2 dominant-tap-ratio — un-ignore 4 CIR tests The CIR estimator's dominant_tap_ratio measured a single grid bin, but on the 3x super-resolved ISTA grid a single physical tap leaks across ~3 adjacent bins — so the ratio under-counted the dominant tap and sat far below the per-tier floors (HT20 0.158<0.30, HT40 0.133<0.35, HE20 0.102<0.40), forcing the 3-tap recovery + 40MHz-ToF tests to be #[ignore]d. Fix (data-backed via a lambda sweep): (1) compute dominant_tap_ratio over a +/-1-bin window around the peak — the physical tap's true footprint; (2) tune L1 lambda for sparse multipath (HT20 .05->.08, HT40 .03->.08, HE20 .03->.18). Result: ratios 0.367/0.406/0.474, comfortably above floors with all 3 taps preserved. Un-ignores should_recover_3tap_channel_{ht20,ht40,he20} and should_return_tof_at_40mhz. signal crate: 470 pass / 0 fail; change isolated to CIR (no external consumers). The rms-delay-spread test stays ignored with a re-scoped note (far-tap robustness is separate remaining work). Co-Authored-By: claude-flow <ruv@ruv.net>	2026-05-31 06:20:41 -04:00
ruv	403841b19e	docs(changelog): reflect cog producer, cross-language test, Windows fixes Update the Unreleased entry: calibration service is now complete across both model paths (transformer .npz + cog safetensors via cog_calibrate.py) with cross-language Python->Rust integration test; add the Windows cross-platform build fixes (worldmodel cfg(unix), bfld CRLF) — 2682 workspace tests green/0 fail on Windows. Co-Authored-By: claude-flow <ruv@ruv.net>	2026-05-31 05:38:38 -04:00
ruv	0fede72ec4	test(cog-pose): cross-language adapter integration (Python producer -> Rust engine) Closes the last verification gap in the calibration feature: previously the Python producer and Rust consumer were proven compatible only by format matching. Now a real ~11KB adapter fitted by cog_calibrate.py on the in-repo pose_v1.safetensors is committed as a fixture, and a Rust test loads it via the engine and asserts is_calibrated() + that it changes inference output. The full Python->Rust calibration contract is verified with a real artifact. 7/7 cog-pose tests pass. Co-Authored-By: claude-flow <ruv@ruv.net>	2026-05-31 05:22:54 -04:00
ruv	e94f4d8f73	feat(calibration): cog adapter producer — completes the cog --adapter feature I'd shipped the Rust cog-pose --adapter consumer (+test) but there was no producer for cog-format adapters, leaving it a half-feature. cog_calibrate.py fits a rank-r LoRA on the cog conv+MLP head (pose_v1.safetensors, 56x20) from a labeled in-room capture and writes a safetensors with fc1.a/fc1.b/fc2.a/fc2.b (scale baked into b) — exactly what the Rust engine loads. Verified against the in-repo pose_v1.safetensors: correct keys/shapes, reduces fit error, active adapter, ~2.6KB. Adds test_cog_calibration.py (passes) + README documenting the two non-interchangeable producers (transformer .npz vs cog safetensors). Co-Authored-By: claude-flow <ruv@ruv.net>	2026-05-31 05:10:07 -04:00
ruv	946acf2d10	docs(cog-pose): correct misleading adapter cross-reference The --adapter docs claimed the adapter is produced by aether-arena/calibration/calibrate.py, but that reference tool targets the MM-Fi transformer model and emits .npz with proj/head LoRA keys, while this cog runs a conv+MLP model expecting safetensors with fc1.a/fc1.b/ fc2.a/fc2.b. Same LoRA mechanism, different model -> adapters are model-specific and NOT interchangeable. Clarify the expected key layout and that the Python tool is a mechanism reference, not a drop-in producer. 6/6 tests pass. Co-Authored-By: claude-flow <ruv@ruv.net>	2026-05-31 05:04:35 -04:00
ruv	76cc57294d	test(calibration): self-contained end-to-end regression test The committed calibration service (model.py/calibrate.py/infer.py) had no automated test — only ad-hoc verification. Adds a CPU-only, no-real-checkpoint test that exercises the CLI end-to-end on synthetic data: build base -> calibrate.py fits adapter -> infer.py runs base+adapter, asserting adapter size (<200KB), keypoint shape [N,17,2], finiteness, [0,1] range, and that the adapter actually changes the output. Passes on Windows CPU (torch 2.11). Co-Authored-By: claude-flow <ruv@ruv.net>	2026-05-31 05:02:24 -04:00
ruv	1b48b6f5c8	fix(bfld): make README quickstart test robust to CRLF line endings readme_quickstart_uses_canonical_public_api checked a multi-line needle 'pipeline\n .process' against the include_str! README. On a CRLF checkout (Windows / core.autocrlf) the content is 'pipeline\r\n .process', so the LF needle never matched and the test failed deterministically (only surfaced once the worldmodel fix let cargo test --workspace run on Windows; the test is #[cfg(feature=std)]-gated, enabled via workspace feature unification). Normalize CRLF->LF before the check. Full workspace now green 3/3 runs on Windows. Co-Authored-By: claude-flow <ruv@ruv.net>	2026-05-31 04:27:25 -04:00
ruv	c9539433b8	fix(worldmodel): compile on non-unix targets (Windows workspace build) bridge.rs imported tokio::net::UnixStream unconditionally, so the whole workspace failed to build on Windows (E0432) — blocking cargo test --workspace and the pre-merge gate there. The OccWorld Unix-socket bridge is a Linux-appliance feature (Python inference server on the GPU host), so gate it #[cfg(unix)] and add a #[cfg(not(unix))] send_recv that fails fast with a clear 'unsupported on this target' Protocol error. Workspace now builds on Windows; worldmodel 12 tests pass. Co-Authored-By: claude-flow <ruv@ruv.net>	2026-05-31 03:55:32 -04:00
ruv	1d9c0b3d4c	docs(study): sharpest finding — the encoder barely matters for CSI pose Random frozen encoder + trained head matches a fully-trained encoder to within 2-4pts (cross-subject <2pts). WiFi-CSI sensing is largely a random-features + target-readout problem: barely a learned representation to transfer, which unifies the zero-shot collapse, no-transfer results, foundation-encoder failure, and why per-room calibration works. Practical: invest in readout + calibration, not encoder pretraining. Co-Authored-By: claude-flow <ruv@ruv.net>	2026-05-31 03:43:14 -04:00
ruv	c95dd308fd	docs(study): cross-dataset confirmed on harder NTU-Fi-HumanID task Re-ran transfer on 14-class person-ID (harder than 6-activity HAR): same null-transfer result (MM-Fi pretrain 91.7% = random 92.8%). Unified root cause: CSI in-domain classification lives in the target-trained readout (random projection already separable); learned reps don't transfer across subjects/rooms/datasets. WiFi-CSI is distribution-locked. Addresses the 'HAR too easy' caveat. Co-Authored-By: claude-flow <ruv@ruv.net>	2026-05-31 03:37:19 -04:00
ruv	af68bd68d8	docs(study): cross-dataset transfer tested (MM-Fi -> NTU-Fi, honest negative) Tested the cross-dataset frontier: MM-Fi-trained CSI representation does NOT transfer beneficially to NTU-Fi HAR (frozen probe 91.5% = random features 93%; full fine-tune 75% < probe). CSI reps are distribution-locked, same root cause as within-MM-Fi cross-subject/-env collapse. Caveat: NTU-Fi 6 coarse activities are an easy target (random->93%). Updates the study's cross-dataset limitation from 'untested' to this measured result. Co-Authored-By: claude-flow <ruv@ruv.net>	2026-05-31 03:27:38 -04:00
ruv	695b5fb700	docs: complete MM-Fi WiFi-sensing study (pose + action, the honest picture) Consolidates the full campaign into one committed, citable artifact (the detailed log was in a gitignored staging report): pose SOTA 83.6% + 20KB int4 edge model; action recognition 88% (a WiFi task MM-Fi never benchmarked); the generalization story (zero-shot collapse, few-shot calibration rescue, task-general across pose+action); all honest negatives (CORAL/DANN/instance-norm/SupCon/distillation/subject-scaling); the 11KB calibration-adapter deployment recipe; honest limitations (cross-dataset untested, ARM latency pending). Co-Authored-By: claude-flow <ruv@ruv.net>	2026-05-31 03:06:54 -04:00
ruv	dac40e5df2	docs(adr-150): calibration thesis is task-general (action recognition) Verified on a 2nd MM-Fi task: 27-class action recognition (which MM-Fi never benchmarked for WiFi; only published baseline WiDistill 34%). In-domain 88% (leaky); cross-subject zero-shot collapses to ~10%; few-shot calibration rescues 10->76% (1000 samples). Same mechanism as pose -> few-shot in-room calibration is the universal WiFi-sensing generalization answer, not a pose quirk. Co-Authored-By: claude-flow <ruv@ruv.net>	2026-05-31 03:01:50 -04:00
ruv	17ff2433bc	docs(changelog): WiFi-CSI efficiency frontier + per-room calibration service Document the beyond-SOTA efficiency frontier (75K params beats SOTA, int4 edge model 20KB@74%), few-shot calibration resolving generalization (cross-env 10->73%), and the calibration service (Python ref + Rust cog-pose --adapter integration). Co-Authored-By: claude-flow <ruv@ruv.net>	2026-05-31 02:38:07 -04:00
ruv	83299b4d04	feat(cog-pose): --adapter CLI flag for per-room calibration Completes the end-to-end product path: cog-pose-estimation run --config <cfg> --adapter <room.safetensors> loads the shared base + a per-room LoRA adapter for calibrated inference. Adds InferenceEngine::with_adapter() (default weights + adapter) and logs when a calibration adapter is active. 6/6 tests pass. Co-Authored-By: claude-flow <ruv@ruv.net>	2026-05-31 02:28:16 -04:00
ruv	3760db6c9a	feat(cog-pose): per-room LoRA calibration adapter in the Rust inference path Ports the calibration mechanism (ADR-150 §3.5-3.6, reference impl in aether-arena/calibration/) into the real product pose engine. The Candle InferenceEngine now loads an optional per-room adapter safetensors and applies low-rank deltas (y + (x.A).B) on the fc1/fc2 head at inference. Architecture-agnostic LoRA; base behaviour unchanged when no adapter. New API: with_weights_and_adapter(), is_calibrated(). Tested: adapter detection + output-change integration test (6/6 pass). Co-Authored-By: claude-flow <ruv@ruv.net>	2026-05-31 02:26:48 -04:00
ruv	4db727649a	feat(calibration): RuView per-room calibration service (reference impl) Operationalizes the campaign's central finding (ADR-150 §3.3-3.6): a frozen shared base + a ~11KB per-room LoRA adapter from ~100-200 labeled samples recovers SOTA-level pose in any new room/person. Verified end-to-end: source-only base zero-shot 3.09% on unseen room -> 74.29% after 200-sample calibration. Files: model.py (PoseNet+LoRA), calibrate.py, infer.py, README with measured calibration budget. Co-Authored-By: claude-flow <ruv@ruv.net>	2026-05-31 02:22:10 -04:00
ruv	5533ffe43e	docs(adr-150): cross-env few-shot — no unsolved deployment case Decisive capstone: cross-environment (unseen room+people) zero-shot 10.6%, but 5 calibration samples/person -> 60%, 200 -> 73%. The hard frontier is calibration-soluble, MORE dramatically than cross-subject (+62.5 vs +12 at K=200). The unsolved-frontier framing was a zero-shot artifact. Reframes generalization: ship few-shot calibration, not zero-shot invariance. Recommend accepting ADR-150 re-scoped around the calibration mechanism. Co-Authored-By: claude-flow <ruv@ruv.net>	2026-05-31 02:09:03 -04:00
ruv	ef4344f0f9	docs(adr-150): LoRA calibration data requirement — completes calibration spec 11KB adapter needs ~100-200 labeled samples/room for ~72% (knee ~50->70%); below ~20 it hurts. Evidence-complete calibration-service spec: base + ~100-200 samples -> 11KB LoRA -> ~72% cross-subject. Encoder goal now precisely posed: cut the sample requirement / lift the per-budget ceiling. Co-Authored-By: claude-flow <ruv@ruv.net>	2026-05-31 02:04:37 -04:00
ruv	ed1294a176	docs(adr-150): deployable adapter calibration — 11KB LoRA = calibration service Compared per-room calibration methods at K=200: LoRA rank-8 recovers 63.6->72.5% (SOTA-level) with just 11K params (~11KB), 0.5% the model size. Validates the ship-base-once + tiny-per-room-adapter mechanism for the RuView calibration service. Accuracy/size knob documented. Co-Authored-By: claude-flow <ruv@ruv.net>	2026-05-31 01:54:23 -04:00
ruv	898aaef053	docs(adr-150): few-shot adaptation resolves the cross-subject frontier Decisive result: 50 labeled frames/subject of in-room calibration -> 72.2% (reaches SOTA), 200 -> 76.1%, 1000 -> 78.3%. Few-shot target adaptation dominates source volume (+24 subjects bought +6pt; 200 target frames bought +12.4pt). Re-scopes the deployment story: ship a ~30s on-site calibration, not a mass corpus. Foundation encoder's role shifts to making that calibration cheaper. Supersedes the earlier data-bound pessimism. Co-Authored-By: claude-flow <ruv@ruv.net>	2026-05-31 01:47:00 -04:00
ruv	70bf9e41fe	docs(adr-150): subject-scaling study — capture diversity, not volume Measured cross-subject PCK vs N training subjects: 4->8 = +21pts, but 24->32 = +0.45pt. Saturates ~64%, ~19pt below in-domain. Correction to 'more data': subject-count returns vanish past ~16-20; the residual is device/room/protocol shift. Re-scope phase-1 capture around DIVERSITY (rooms/devices/protocols) + few-shot target adaptation, not headcount. Co-Authored-By: claude-flow <ruv@ruv.net>	2026-05-31 01:43:49 -04:00
ruv	96ccfa58fb	bench: ship int4 edge artifact + CPU latency Published deployable int4-QAT micro (verified 74.08%, ~20KB) at ruvnet/wifi-densepose-mmfi-pose/edge. Runs 0.135ms single-thread x86 CPU (no GPU) - real-time pose without an accelerator. ARM on-device validation pending fleet availability. Co-Authored-By: claude-flow <ruv@ruv.net>	2026-05-31 01:30:29 -04:00
ruv	92d433523d	bench: deployed quantized accuracy + QAT for micro edge model int8 PTQ lossless (74.70%, 73.5KB); int4 naive PTQ drops below SOTA (70.21%) but QAT recovers to 74.46% (36.7KB) - still beats MultiFormer. A SOTA-beating WiFi-pose model genuinely runs in ~37KB int4 (QAT) / 73KB int8. Distillation negative noted. Co-Authored-By: claude-flow <ruv@ruv.net>	2026-05-31 01:23:30 -04:00
ruv	d64323c2d6	bench: add quantized footprint — SOTA-beating WiFi pose in 37KB int4 micro (74.87%, beats MultiFormer 72.25%) = 36.7KB int4 / 73.5KB int8; nano (~72%) = 19.5KB int4. Distillation tested, no gain (direct training wins). A SOTA-beating pose model fits on the sensing node itself. Co-Authored-By: claude-flow <ruv@ruv.net>	2026-05-31 01:16:16 -04:00
ruv	9c64d90054	bench: WiFi-CSI pose efficiency frontier — 75K-param model beats SOTA Swept model size on MM-Fi random_split: every config from micro (75,237 params, 0.22ms, 74.30%) up beats MultiFormer (72.25%); nano (40K, 0.13ms) within 0.5pt. Pareto-dominant (smaller AND more accurate than prior SOTA). Orthogonal to the data-bound accuracy frontier (ADR-150). Co-Authored-By: claude-flow <ruv@ruv.net>	2026-05-31 01:10:33 -04:00
ruv	5d1fb48eb5	docs(adr-150): empirical cross-subject findings — pose-contrastive pretrain refuted Measured all near-term levers on the official MM-Fi cross-subject split: - mixup+TTA+ensemble = best at 64.92% (+0.9 over doc 64.04) - pose-contrastive foundation pretrain: estimated +5..+12, MEASURED -2.3 (SupCon loss pinned at ln(B) across K/BS/seeds -> same-pose CSI is not contrastively alignable across subjects) - instance-norm+SpecAugment -4.6; CORAL/DANN ~0 Conclusion: the 18-pt in-domain<->cross-subject gap is fundamental subject shift, not algorithmic. Promotes multi-subject data collection to the primary lever; recommends re-scoping ADR-150 phase 1 around capture. Co-Authored-By: claude-flow <ruv@ruv.net>	2026-05-31 00:33:43 -04:00
ruv	b4cb1384de	docs(readme): honest re-benchmark of ESP32 presence model (retract single-class 100%) v1 '100% presence accuracy' was on a single-class overnight recording (6062/6063 'present'). Replaced with v2 encoder's honest label-free held-out temporal-triplet accuracy (66.4% raw -> 82.3% trained). Models published to HF; tracking ruvnet/RuView#882. Co-Authored-By: claude-flow <ruv@ruv.net>	2026-05-30 23:52:11 -04:00
ruv	66e917ea86	bench: HOMECORE vs Home Assistant — measured perf + capability matrix Head-to-head on the wire-compatible HA API surface: - Cold start 0.55s vs 9.7s (18x), idle RSS 10.1MB vs 359MB (35x), binary 4.7MB vs 610MB image (130x), throughput 1599 vs 716 rps. - Honest caveats: latency endpoints differ (auth /api/states vs unauth /manifest.json); HA wins integration breadth + UI maturity. - Repro harnesses in aether-arena/staging/. Co-Authored-By: claude-flow <ruv@ruv.net>	2026-05-30 23:41:15 -04:00
ruv	7738370b18	docs(readme): link SOTA MM-Fi pose model (82.69% torso-PCK@20) on HF Published ruvnet/wifi-densepose-mmfi-pose — beats MultiFormer (72.25%) and CSI2Pose (68.41%) on matched MM-Fi random_split torso-PCK@20. Tracking: ruvnet/RuView#880 Co-Authored-By: claude-flow <ruv@ruv.net>	2026-05-30 23:32:12 -04:00
ruv	7bad51aca6	publish: best MM-Fi benchmark set (in-domain 83.59, x-subject 64.0, x-env 17.5 CORAL) Append best witness rows to ledger (seq 2-4) + update HF Space leaderboard banner. In-domain 83.59% torso-PCK@20 (graph+ensemble+TTA) supersedes the 81.63 single-model entry, +11.34 over MultiFormer 72.25. Cross-subject 64.04% (official split). Cross-environment 17.51% (CORAL domain alignment, the cross-room DG win). Gist + issue #876 updated with frontier map. Co-Authored-By: claude-flow <ruv@ruv.net>	2026-05-30 22:22:53 -04:00
ruv	eb3509e9ab	reframe(aether-arena): vendor-neutral industry benchmark, RuView is one entrant	2026-05-30 19:59:10 -04:00
ruv	046b2564b8	feat(aether-arena): publish RuView MM-Fi SOTA result + ADR-150 RF Foundation Encoder - Ledger witness row (seq 1, Gold): RuView CSI-Transformer 81.63% torso-PCK@20 on MM-Fi random_split, exceeding MultiFormer 72.25% (CSI2Pose 68.41%) — protocol- and metric-matched, self-corrected from inflated 91.86% bbox. Hash-chained, verifiable. - HF Space updated with the controlled SOTA claim + caveat (cross-subject is the frontier). - Proof/replay/witness gist: gist.github.com/ruvnet/af2fbc1c7674dddf09c15509b3c7f785 - Tracking issue #876 (result + Generalization Track roadmap). - ADR-150: RuView RF Foundation Encoder — pose-preserving, subject/room/device-invariant SSL embedding (masked CSI + pose-contrast-across-subjects + coherence head); the principled attack on the cross-subject frontier. DANN failed; this is the corrected design. Co-Authored-By: claude-flow <ruv@ruv.net>	2026-05-30 19:55:58 -04:00
rUv	8d64434d21	feat(swarm): ADR-149 evaluation harness — GDOP, IQM+bootstrap CI, noise sweep (#875 ) Stage-1 kinematic evaluator per ADR-149 (peer-reviewed). Pure Rust, no new deps. evals/: - gdop.rs: 2D Geometric Dilution of Precision ((HᵀH)⁻¹ trace-sqrt); None for <2 observers or collinear/singular geometry - stats.rs: IQM (Agarwal 2021) + 95% stratified-bootstrap CI (deterministic LCG) + probability_of_improvement - metrics.rs: EpisodeMetrics + AggregateMetrics::from_strata (IQM±CI, seed-stratified) - runner.rs: seeded kinematic rollout (FlightPattern-driven), seed×episode matrix, 3σ×3κ default noise sweep (Gaussian amplitude × von Mises phase) - report.rs + eval_swarm bin: generates evals/RESULTS.md leaderboard RESULTS.md surfaces the real coverage-vs-localization-precision trade-off via GDOP: partitioned wins coverage (100%) but single-drone sightings (GDOP 0 → 7.0m); pheromone gets multistatic fusion (GDOP 1.6 → 4.1m). Wi2SAR 5m paper-baseline row included. Stage-2 (Gazebo/PX4 SITL false-alarm + collision on median seeds) is documented follow-on. Tests: 116 default / 133 full+train (+13 eval tests), 0 failed. Clippy clean (-D warnings). v1506	2026-05-30 17:38:49 -04:00
ruv	4f7ab8e4f0	docs(aether-arena): v0 infrastructure complete — Space live, harness gate passing (M8)	2026-05-30 17:15:08 -04:00

1 2 3 4 5 ...

852 Commits