* fix: various cleanups and improvements for the reproducibility system
* fix: save only essential settings
* fix: improve model commit handling
* feat: make including system information optional
* fix: improve formatting of reproducibility README
* fix: fix remaining issues
* fix: prevent UnboundLocalError when analyzer is not initialized
Move cleanup of analyzer and residuals inside the conditional block
where they are actually defined to avoid crashing when
--print-residual-geometry or --plot-residuals are not used.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
* fix: address AI review feedback on residual cleanup
---------
Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
* feat: implement reproducibility features with safetensors
* feat: prompt user before creating reproducibility folder
* fix: use prompt_confirm wrapper
* style comment
* style comment
* fix: ignore None values in Settings dump for TOML compatibility
* fix: imports
* feat: auto-generate seed if none provided for full reproducibility
* style: fix ruff formatting issues
* style: ruff
* style: fix ty check errors with ty:ignore
* Update src/heretic/main.py
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
* Update src/heretic/utils.py
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
* add period at end.
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
* Improve: Add README, checkpoint.jsonl, to Reproduce
* fix: use centralize device info, remove random states file
* feat: Add CUDA driver version
* ruff
* ruff...
* ty fix
* LGTM: Rich native strip, use nvidia-smi
* ruff fix
* ruff
* revert kaggle hack)
* normalize names for deduplication of packages/versions
* docstring
* rufff
* cleanup, add suffix for torch CUDA version, distinguish ROCm
* add PyTorch index URL detection
* revert index URL to be simple
* flip priority of index..
* add Important note
* add exact suffix for WHL in instruction
* add warning for heterogeneous GPU env
* extend driver version info (more accelerators)
* fix: style
* sync
* no abbreviation
* use multi-line string
* fix: prompt_confirm
* feat: CPU info
* strip 'slow' warning from environment.txt
* feat: Add virtual env info to environment.txt
* ruffff
* feat: AMD (Radeon) GPU driver version
* Refactor: system.py
* feat: LGTM capturing specifc installation origin of heretic
* feat: Include chosen trial into reproduce/README
* style: run ruff format on utils.py
* feat: reproduce.json
* fix: seperate values in different keys
* restore comment
* style, clean, seperate commit key
* no abbreviation, cleanup
* remove labels, store only dependencies
* missed import, ruff
* sort import
* feat: More CPU Info
* only store direct dependencies of heretic
* complete comment
* refactor: use cpuinfo package instead
* ruff import sort
* distinguish cores & threads
* move function amd-driver
* rename
* moving heretic package info,
* rufff
* Move: cleanup memory cache
* fix: model.py import
* no unknowns
* generalize all accelerator info stuff
* ruff f
* move package info
* type change
* feat: no reproducibility suite for local saving/model used
* import fix
* fix: type check
* style change
* style ruff
* feat: no env.txt, SHA256SUMS file, cleanup
* feat: ADD tip to readme
* remove trial index, two-keys only
* fix: No time-zone
* feat: No suite for local datasets allowed
* simplify
* featt: capture both direct and transitive dependencies
* style: sort readme of reproducibility suite
* feat: Store commit hash for datasets too
* add total refusal prompts for evaluation display
* remove try/except from cpu
* extend SHA256 support
* remove .txt
* only have safetensors for SHA256
* style comment
* use HF api to get commit hash
* fix: requirements containing irrelevant dependencies
* only store heretic-llm if from PyPI..
* add SELECTED tag to the trial that was pushed
* AttributeError fix
* simplify trial preservation
* add direction_index in trial info
* remove unwanted CPU info
* style: rename
---------
Co-authored-by: Vinayyyy7 <vinayumrethe99@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
* fix: prevent div-by-zero in evaluator when base_refusals is 0
When a model refuses all prompts from the start, base_refusals is 0.
Return refusals directly in that case so ablations that introduce new
refusals are still penalized correctly.
* fix: cast refusals to float for type consistency" before hitting commit changes
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
---------
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
* fix: display all abliterable components across layers
The current code only displays abliterable components from layer 0, which is misleading for hybrid architectures like Qwen3.5 that use different attention types across layers (e.g., `linear_attn.out_proj` in some layers, `self_attn.o_proj` in others).
This fix iterates through all layers to collect and display the complete set of abliterable components with accurate module counts.
Before (Qwen3.5-27B):
* attn.out_proj: 1 modules per layer
* mlp.down_proj: 1 modules per layer
After (Qwen3.5-27B):
* attn.out_proj: 48 modules total
* attn.o_proj: 16 modules total
* mlp.down_proj: 64 modules total
* Fix formatting
---------
Co-authored-by: Lawfer12 <ac728@ymail.com>
* feat: add Qwen3.5 MoE hybrid layer support
Qwen3.5 MoE uses GatedDeltaNet (linear attention) on some layers instead
of standard self-attention, causing abliteration to fail because
self_attn.o_proj doesn't exist on those layers.
Changes:
- Wrap self_attn.o_proj in suppress(Exception) and add linear_attn.out_proj
as alternative attention out-projection for GatedDeltaNet layers
- Scan all layers in get_abliterable_components() instead of only layer 0,
since hybrid models have different components on different layers
- Derive LoRA target_modules from actual named_modules() instead of
splitting component keys, which fails when module names differ across
layers (e.g. "o_proj" vs "out_proj")
Tested with Qwen3.5-397B-A17B (7/100 refusals, KL 0.2676).
Relates to #43
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* Apply suggestion from @gemini-code-assist[bot]
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
---------
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: Philipp Emanuel Weidmann <pew@worldwidemann.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
memory_allocated() and memory_reserved() without a device argument only
report GPU 0. Sum across all devices for correct multi-GPU totals and
add total VRAM reporting.