mirror of
https://github.com/ruvnet/RuView.git
synced 2026-06-02 00:58:56 +02:00
50131b2519
* fix(verify): quantize features before SHA-256 for cross-platform hash stability (#560) ## The bug archive/v1/data/proof/verify.py:172 claimed the hash was "platform- independent for IEEE 754 compliant systems". That claim is empirically false. scipy.fft's pocketfft uses SIMD vector kernels — AVX2/AVX-512 on x86_64, NEON on Apple Silicon — that reorder vectorized FP operations differently per build. IEEE 754 guarantees per-operation determinism, not associativity under reordering, so two correct platforms produce values that differ at ULP precision (~1e-14 at our magnitudes of 1-100). The SHA-256 of features_to_bytes() then explodes that ULP-level divergence into a totally different hash, which is what bug report #560 caught on macOS arm64: | Platform | numpy/scipy | sha256 (legacy) | |----------|-------------|-----------------| | Windows (Intel AVX-512) | 2.4.2 / 1.17.1 | 78b3fb… | | ruvultra (Linux x86_64) | 1.26.4 / 1.14.1 | 41dc56… | | ruv-mac-mini (Apple Silicon NEON) | 2.4.4 / 1.17.1 | 9b5e19… | ## The fix features_to_bytes() now np.round(.., HASH_QUANTIZATION_DECIMALS=9)s each array before packing as little-endian f64. That snaps the float bytes to a single canonical representation across SIMD backends. The 9-decimal precision is: - ~5 orders of magnitude above the worst-case ULP drift observed in probe-fft-platform.py measurements - Many orders of magnitude below any meaningful signal change (CSI phase precision is ~1e-3 rad; PSD bins differ by orders of magnitude) - Conservative — could tighten to 11-12 decimals if needed, but 9 leaves comfortable headroom for future scipy SIMD changes ## Probe-side verification scripts/probe-fft-platform.py now emits BOTH sha256_raw (unrounded, legacy) and sha256_quantized (new platform-invariant hash). Running it on Windows here produced: sha256_raw = 78b3fb4acb8cc18c3e870f92e29ee98143c7cac4767f2f71b0fc384a82b92f6e sha256_quantized = a587792c050cf697366b9bef4611050f9dc3af56624915ab2452c3c11362e79a quantization_decimals = 9 On Linux and macOS arm64 the maintainer should observe the SAME sha256_quantized value (and a different sha256_raw) — that's the fix working. ## What this PR does NOT do The published archive/v1/data/proof/expected_features.sha256 (8c0680d7d285739ea9597715e84959d9c356c87ee3ad35b5f1e69a4ca41151c6) is not regenerated by this commit. That step needs to run on a canonical CI platform (likely the Linux x86_64 host used for releases) AFTER this fix lands. The regeneration command is: python archive/v1/data/proof/verify.py --generate-hash After regeneration, every platform running ./verify will produce the same hash and the proof replay will be honestly cross-platform — which is what the ADR-028 trust-kill-switch promised. ## Files - archive/v1/data/proof/verify.py — add HASH_QUANTIZATION_DECIMALS=9 constant, quantize in features_to_bytes(), correct the misleading "platform-independent" claim in the docstring - scripts/probe-fft-platform.py — emit both raw and quantized hashes - scripts/fix-markers.json — RuView#560 marker prevents removing the np.round() call without explicit intent - CHANGELOG.md — Fixed entry under [Unreleased] documenting the change and flagging the expected_features.sha256 regeneration as a follow-up Co-Authored-By: claude-flow <ruv@ruv.net> * ci: fix verify-pipeline.yml working-directory from v1/ to archive/v1/ The verify-pipeline workflow's "Run pipeline verification" and "Run verification twice to confirm determinism" steps use `working-directory: v1` but `v1/` was archived to `archive/v1/` long ago. The workflow fails before verify.py even runs: ##[error]An error occurred trying to start process '/usr/bin/bash' with working directory '/home/runner/work/RuView/RuView/v1'. No such file or directory Same v1 → archive/v1 path correction that already shipped for the ./verify wrapper (RuView#559 / PR #590) and the other lint workflows (RuView#489). Required to make the determinism check actually run on PR #609 (the quantize-before-hash work) — the canonical Linux hash needed for expected_features.sha256 will fall out of the next CI log once this fix lands. * fix(proof): regenerate expected_features.sha256 with the quantized canonical hash The hash on the previous line was the legacy pre-quantization value (8c0680d7d28573…), which by definition cannot match the quantized output that this branch's verify.py now produces. Replaced with the canonical Linux x86_64 hash captured from the CI run on this branch: d9985569b3ab833c74b7c9254df568bbb144879e2222edb0bcf2605bfd4c155b Source of truth: run 26005976495 / "Verify Pipeline Determinism (3.11)" on Ubuntu 24.04, Python 3.11.15, exercising the full verify.py pipeline on the 100 reference frames in archive/v1/data/proof/sample_csi_data.json. Reproducibility expectation now changes: - Linux x86_64 (canonical platform): sha256 = d9985569… ✓ this commit - macOS arm64 / Apple Silicon NEON: sha256 = d9985569… should match after quantization - Windows AMD64 (with pydantic-clean .env): sha256 = d9985569… should match after quantization If macOS arm64 still mismatches after this, the quantization decimals need to be tightened from 9 to 11 or 12 (HASH_QUANTIZATION_DECIMALS in verify.py); the headroom analysis in the original commit suggests 9 is safe but 9-decimal SIMD drift hasn't been measured in the full-pipeline output yet (only in the probe). Closes the maintainer-action-required item on PR #609. * fix(proof): bump quantization to 6 decimals (9 wasn't enough across Azure CI microarchs) Two back-to-back Ubuntu 24.04 / Python 3.11 / scipy 1.17 CI runs on PR #609 landed on different Azure VM microarchitectures and produced two different SHA-256s even after np.round(.., 9): Run 1: d9985569b3ab833c74b7c9254df568bbb144879e2222edb0bcf2605bfd4c155b Run 2: 37c49a1f6b87207fa9fc67f2d6a85c4417dd4a536573605fd175510d1dce7cbe Same JSON input, same byte count hashed (294,400), same Python version, same scipy version. The only variable is the underlying CPU pocketfft SIMD kernel. The full DSP pipeline (preprocess → biquad bandpass → FFT → PSD → variance accumulation) amplifies the ~1e-14 raw FFT divergence by several orders of magnitude — the actual drift at features_to_bytes() input can reach 1e-7 or worse, which is well within the 1e-9 quantization window I originally picked. Bumping to 6 decimals = parts per million. ~6 orders of magnitude headroom over observed pipeline-amplified ULP drift. Still far below any meaningful signal change (CSI phase precision ~1e-3 rad). Kept the probe constant in sync. Will trigger CI on this branch immediately after push; the new expected_features.sha256 will be regenerated from whichever microarch the next CI run lands on, but should be stable across all subsequent runs at 6-decimal quantization. * chore(probe): keep HASH_QUANTIZATION_DECIMALS in sync with verify.py (now 6) * fix(proof): regenerate expected_features.sha256 for 6-decimal quantization * ci: pin thread count to 1 for proof verification (scipy.fft threading non-determinism)
87 lines
3.2 KiB
Python
87 lines
3.2 KiB
Python
#!/usr/bin/env python3
|
|
"""Platform probe: reproduce verify.py's hash-relevant FFT steps in isolation.
|
|
|
|
Runs the same scipy.fft.fft / scipy.signal calls that verify.py hashes
|
|
(csi_processor.py:426, :438, :349) on a deterministic synthetic input,
|
|
without dragging in src.app / pydantic Settings. Used to empirically
|
|
locate the source of platform divergence in issue #560 — and now also to
|
|
verify the quantize-before-hash fix shipped in archive/v1/data/proof/verify.py.
|
|
|
|
Usage: python3 scripts/probe-fft-platform.py
|
|
Output: single JSON object on stdout. Run on each platform and diff.
|
|
|
|
The output now contains TWO hashes:
|
|
- `sha256_raw` — hash of unrounded little-endian f64 bytes (legacy)
|
|
- `sha256_quantized` — hash after np.round(.., 9) (matches verify.py
|
|
behaviour after the issue-#560 fix; should be
|
|
IDENTICAL across Intel AVX, ARM NEON, and any
|
|
scipy pocketfft build)
|
|
|
|
If `sha256_raw` differs across machines but `sha256_quantized` matches,
|
|
the quantize-before-hash fix is doing its job.
|
|
"""
|
|
import hashlib
|
|
import json
|
|
import platform
|
|
import struct
|
|
import sys
|
|
|
|
import numpy as np
|
|
import scipy.fft
|
|
import scipy.signal
|
|
|
|
# Deterministic synthetic input -- no IO, no .env, no Settings
|
|
rng = np.random.RandomState(42)
|
|
N_FRAMES = 100
|
|
N_SUBC = 100
|
|
amp = rng.randn(N_FRAMES, N_SUBC).astype(np.float64)
|
|
|
|
# Mirror the three scipy calls verify.py's hash depends on:
|
|
# archive/v1/src/core/csi_processor.py:349 -> scipy.signal.windows.hamming
|
|
# archive/v1/src/core/csi_processor.py:426 -> scipy.fft.fft(mean_phase_diff, n=64)
|
|
# archive/v1/src/core/csi_processor.py:438 -> scipy.fft.fft(amp.flatten(), n=128)
|
|
mean_phase_diff = amp.mean(axis=1)
|
|
doppler = np.abs(scipy.fft.fft(mean_phase_diff, n=64)) ** 2
|
|
psd = np.abs(scipy.fft.fft(amp.flatten(), n=128)) ** 2
|
|
window = scipy.signal.windows.hamming(56)
|
|
|
|
# Quantization decimals — kept in sync with
|
|
# archive/v1/data/proof/verify.py:HASH_QUANTIZATION_DECIMALS so this probe
|
|
# verifies the production hash, not just the FFT outputs.
|
|
HASH_QUANTIZATION_DECIMALS = 6
|
|
|
|
|
|
def pack_floats(arrays, quantize):
|
|
"""Pack arrays as little-endian f64, optionally rounding first."""
|
|
parts = []
|
|
for arr in arrays:
|
|
flat = np.asarray(arr, dtype=np.float64).ravel()
|
|
if quantize:
|
|
flat = np.round(flat, HASH_QUANTIZATION_DECIMALS)
|
|
parts.append(struct.pack(f"<{len(flat)}d", *flat))
|
|
return b"".join(parts)
|
|
|
|
|
|
arrays = (doppler, psd, window)
|
|
blob_raw = pack_floats(arrays, quantize=False)
|
|
blob_quantized = pack_floats(arrays, quantize=True)
|
|
|
|
try:
|
|
blas_info = np.show_config(mode="dicts")
|
|
except Exception:
|
|
blas_info = {"error": "show_config(mode=dicts) unavailable"}
|
|
|
|
print(json.dumps({
|
|
"uname": platform.uname()._asdict(),
|
|
"python": sys.version.split()[0],
|
|
"numpy": np.__version__,
|
|
"scipy": __import__("scipy").__version__,
|
|
"blob_len": len(blob_raw),
|
|
"sha256_raw": hashlib.sha256(blob_raw).hexdigest(),
|
|
"sha256_quantized": hashlib.sha256(blob_quantized).hexdigest(),
|
|
"quantization_decimals": HASH_QUANTIZATION_DECIMALS,
|
|
"first8_doppler_bytes_hex": doppler[:8].tobytes().hex(),
|
|
"first4_psd_floats": psd[:4].tolist(),
|
|
"blas_backend": blas_info if isinstance(blas_info, dict) else str(blas_info),
|
|
}, indent=2, default=str))
|