mirror of
https://github.com/ruvnet/RuView.git
synced 2026-06-02 00:58:56 +02:00
cfda8dbd14
GitHub's /traffic/clones and /traffic/views endpoints only retain the
last 14 days server-side. Without periodic scraping, that data falls
off the cliff and is gone forever. This commit:
* Adds a scheduled GitHub Action (.github/workflows/clone-tracking.yml)
that runs on the 1st and 15th of every month (~14-day cadence) and
appends a snapshot to data/clone-data.rvf via the GitHub API.
* Seeds the file with today's first snapshot so the historical record
starts immediately rather than waiting for the next cron fire.
File format: ruvector JSONL RVF (schema "ruvector.rvf.jsonl/v1"). Each
line is one segment:
{type: "metadata", ...} — file header, written once on
first run
{type: "clone_snapshot", fetched_at,
window_count, window_uniques,
per_day: [{timestamp, count, uniques}, ...]}
— appended every run
{type: "view_snapshot", fetched_at,
window_count, window_uniques,
per_day: [{timestamp, count, uniques}, ...]}
— appended every run
Per-day entries are keyed by `timestamp`, so a downstream reader can
de-duplicate across overlapping snapshot windows (cron drift, manual
re-runs, etc.).
Today's seed:
clones (14d): 27,887 total / 6,611 uniques
views (14d): 162,314 total / 75,464 uniques
The workflow's commit message includes cumulative observed totals
("16 days observed → 30K clones, 28 days observed → 180K views"
style) so the git log itself doubles as a traffic timeline.
This is the long-term storage layer for the "downloads" badge work —
once we have a few months of snapshots, a small script can roll the
per-day entries into a real defensible number.