Commit Graph

131 Commits

Author SHA1 Message Date
dependabot[bot] 5f6e1e4d52 build(deps): bump requests from 2.32.5 to 2.33.0 (#272)
Bumps [requests](https://github.com/psf/requests) from 2.32.5 to 2.33.0.
- [Release notes](https://github.com/psf/requests/releases)
- [Changelog](https://github.com/psf/requests/blob/main/HISTORY.md)
- [Commits](https://github.com/psf/requests/compare/v2.32.5...v2.33.0)

---
updated-dependencies:
- dependency-name: requests
  dependency-version: 2.33.0
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-04-04 08:25:25 +05:30
dependabot[bot] 7ebd92dfa7 build(deps): bump pygments from 2.19.2 to 2.20.0 (#271)
Bumps [pygments](https://github.com/pygments/pygments) from 2.19.2 to 2.20.0.
- [Release notes](https://github.com/pygments/pygments/releases)
- [Changelog](https://github.com/pygments/pygments/blob/master/CHANGES)
- [Commits](https://github.com/pygments/pygments/compare/2.19.2...2.20.0)

---
updated-dependencies:
- dependency-name: pygments
  dependency-version: 2.20.0
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-04-04 08:24:56 +05:30
dependabot[bot] 655d66ef24 build(deps): bump nltk from 3.9.3 to 3.9.4 (#270)
Bumps [nltk](https://github.com/nltk/nltk) from 3.9.3 to 3.9.4.
- [Changelog](https://github.com/nltk/nltk/blob/develop/ChangeLog)
- [Commits](https://github.com/nltk/nltk/compare/3.9.3...3.9.4)

---
updated-dependencies:
- dependency-name: nltk
  dependency-version: 3.9.4
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-04-04 08:24:29 +05:30
dependabot[bot] 0f99c882ec build(deps): bump filelock from 3.20.0 to 3.20.3 (#269)
Bumps [filelock](https://github.com/tox-dev/py-filelock) from 3.20.0 to 3.20.3.
- [Release notes](https://github.com/tox-dev/py-filelock/releases)
- [Changelog](https://github.com/tox-dev/filelock/blob/main/docs/changelog.rst)
- [Commits](https://github.com/tox-dev/py-filelock/compare/3.20.0...3.20.3)

---
updated-dependencies:
- dependency-name: filelock
  dependency-version: 3.20.3
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-04-04 08:23:59 +05:30
dependabot[bot] 92f851b693 build(deps): bump pillow from 12.0.0 to 12.1.1 (#268)
Bumps [pillow](https://github.com/python-pillow/Pillow) from 12.0.0 to 12.1.1.
- [Release notes](https://github.com/python-pillow/Pillow/releases)
- [Changelog](https://github.com/python-pillow/Pillow/blob/main/CHANGES.rst)
- [Commits](https://github.com/python-pillow/Pillow/compare/12.0.0...12.1.1)

---
updated-dependencies:
- dependency-name: pillow
  dependency-version: 12.1.1
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-04-04 08:23:32 +05:30
dependabot[bot] 81e0c84ec6 build(deps): bump aiohttp from 3.13.2 to 3.13.4 (#267)
---
updated-dependencies:
- dependency-name: aiohttp
  dependency-version: 3.13.4
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-04-04 08:10:51 +05:30
Philipp Emanuel Weidmann 887d43a8d9 fix: set batch size on HFLM object 2026-04-01 14:27:43 +05:30
Philipp Emanuel Weidmann 96c7a7d98a fix: replace tqdm progress bars with Rich progress bars 2026-03-28 18:30:15 +05:30
Philipp Emanuel Weidmann 1126332281 feat: add integrated benchmarking system 2026-03-24 18:25:12 +05:30
Philipp Emanuel Weidmann 19cdf7e244 fix: address ty complaint 2026-03-15 09:58:00 +05:30
Philipp Emanuel Weidmann 94775d4148 chore: update dependencies 2026-03-15 09:31:32 +05:30
cpagac 515a7b9eb5 fix: prevent div-by-zero in evaluator when base_refusals is 0 (#225)
* fix: prevent div-by-zero in evaluator when base_refusals is 0

When a model refuses all prompts from the start, base_refusals is 0.
Return refusals directly in that case so ablations that introduce new
refusals are still penalized correctly.

* fix: cast refusals to float for type consistency" before hitting commit changes

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

---------

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-03-13 11:21:23 +05:30
erm14254 e26da5e0e6 fix: display all abliterable components across layers (#215)
* fix: display all abliterable components across layers

The current code only displays abliterable components from layer 0, which is misleading for hybrid architectures like Qwen3.5 that use different attention types across layers (e.g., `linear_attn.out_proj` in some layers, `self_attn.o_proj` in others).

This fix iterates through all layers to collect and display the complete set of abliterable components with accurate module counts.

Before (Qwen3.5-27B):
* attn.out_proj: 1 modules per layer
* mlp.down_proj: 1 modules per layer

After (Qwen3.5-27B):
* attn.out_proj: 48 modules total
* attn.o_proj: 16 modules total
* mlp.down_proj: 64 modules total

* Fix formatting

---------

Co-authored-by: Lawfer12 <ac728@ymail.com>
2026-03-11 14:10:37 +05:30
Philipp Emanuel Weidmann ec0367226d style: fix formatting and naming 2026-03-06 13:18:08 +05:30
Matthias Stegner 5e3c04c802 feat: add Qwen3.5 MoE hybrid layer support (#187)
* feat: add Qwen3.5 MoE hybrid layer support

Qwen3.5 MoE uses GatedDeltaNet (linear attention) on some layers instead
of standard self-attention, causing abliteration to fail because
self_attn.o_proj doesn't exist on those layers.

Changes:
- Wrap self_attn.o_proj in suppress(Exception) and add linear_attn.out_proj
  as alternative attention out-projection for GatedDeltaNet layers
- Scan all layers in get_abliterable_components() instead of only layer 0,
  since hybrid models have different components on different layers
- Derive LoRA target_modules from actual named_modules() instead of
  splitting component keys, which fails when module names differ across
  layers (e.g. "o_proj" vs "out_proj")

Tested with Qwen3.5-397B-A17B (7/100 refusals, KL 0.2676).

Relates to #43

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Apply suggestion from @gemini-code-assist[bot]

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: Philipp Emanuel Weidmann <pew@worldwidemann.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-03-06 13:03:57 +05:30
Spiky Moth 303ba9d978 fix: recheck prefix after inserting predefined (#194) 2026-02-27 08:07:33 +05:30
Philipp Emanuel Weidmann cb4ef3fdfc docs: add Trendshift badge to README 2026-02-20 13:00:19 +05:30
cpagac 4c80c4beb9 fix: report VRAM usage across all GPUs instead of only the default device (#169)
memory_allocated() and memory_reserved() without a device argument only
report GPU 0. Sum across all devices for correct multi-GPU totals and
add total VRAM reporting.
2026-02-17 12:53:41 +05:30
Spiky Moth 3a115e280c fix: produce card for local models with existing readme (#157) 2026-02-15 19:10:10 +05:30
Philipp Emanuel Weidmann 27097bfe8e build: bump version to 1.2.0 v1.2.0 2026-02-14 18:11:42 +05:30
Philipp Emanuel Weidmann 025ab3a881 fix: disable LoRA export for now
Workaround for #152
2026-02-14 16:56:12 +05:30
Philipp Emanuel Weidmann 1179013999 docs: update README 2026-02-14 16:32:08 +05:30
Philipp Emanuel Weidmann fe7bc1bae3 docs: update README 2026-02-14 10:47:28 +05:30
Philipp Emanuel Weidmann e70a1a85e8 fix: don't load checkpoint when evaluating a second model
Fixes #144
2026-02-14 10:02:17 +05:30
Philipp Emanuel Weidmann e7f8be98b7 fix: only export tokenizer when exporting full model
Fixes #143
2026-02-14 09:18:22 +05:30
Philipp Emanuel Weidmann 6017bcd347 fix: use compatible release specifiers for non-dev dependencies
Fixes #145

Credit to MuX on Discord for recognizing that this is an issue with Transformers 5
2026-02-13 12:27:57 +05:30
Philipp Emanuel Weidmann dd0b3a2f69 docs: update README 2026-02-11 11:09:17 +05:30
Philipp Emanuel Weidmann b873598b77 docs: improve settings documentation 2026-02-11 10:19:05 +05:30
Philipp Emanuel Weidmann 10ceb3098e chore: update copyright notice 2026-02-11 09:46:36 +05:30
Salman Chishti 745b582414 ci: upgrade GitHub Actions to latest versions (#137)
Signed-off-by: Salman Muin Kayser Chishti <13schishti@gmail.com>
2026-02-08 16:49:04 +05:30
Salman Chishti d0e9462fb8 ci: upgrade GitHub Actions for Node 24 compatibility (#136)
Signed-off-by: Salman Muin Kayser Chishti <13schishti@gmail.com>
2026-02-08 16:48:12 +05:30
Philipp Emanuel Weidmann f68a887a7b fix: improve code quality, improve UX, fix small bugs 2026-02-08 13:32:00 +05:30
Philipp Emanuel Weidmann 2690655a83 feat: print memory usage during run 2026-02-02 21:18:01 +05:30
Spiky Moth 3525b1ac22 Implement Magnitude-Preserving Orthogonal Ablation (#52)
* feat: add support for winsorizing the residuals

Adds setting winsorization_quantile, expressed as the quantile to clamp to.
- If set to a value below 1, the residuals obtained from evaluating the first token of the good and bad prompts are winsorized - that is, values outside the given quantile are clamped. Note that winsorization_quantile = 0.95 corresponds to a 90% winsorization.

* feat: implement magnitude-preserving orthogonal ablation

Adds boolean setting orthogonalize_direction:
- When enabled, only the component of the refusal directions that is orthogonal to the harmless direction is subtracted during abliteration.

Adds enum-valued setting row_normalization:
- 'none': No normalization.
- 'pre': Row-normalize the weight matrix before computing the LoRA adapter.
- 'full': Like 'pre', but re-normalizes to preserve original row magnitudes.

* prefer 'good' and 'bad' over 'harmless' and 'harmful'

* clarify how winsorization is applied

* store and reuse full peft_config

* remove unneeded cast

* make LoRA rank configurable for full normalization

* explain why the singular values are split across the components
2026-02-02 17:05:19 +05:30
anrp 42f5a9b553 fix: Use file instead of symlink lock (for windows) (#116) 2026-01-25 19:34:01 +05:30
anrp 451db0b76e fix: specify study name (#119)
If we don't, optuna will generate a UUID for a name, which will never be found when loading as it is a "different" study. https://optuna.readthedocs.io/en/stable/reference/generated/optuna.study.create_study.html#optuna.study.create_study
2026-01-25 18:48:23 +05:30
anrp ebc22c299e feat: Allow study progress to be saved & resumed (#106)
* feat: Store active study in log/study.jsonl and allow resuming

* Simplify resume logic with load_if_exists=True

* Significantly improve flexibility of study save/load

* Put constructor arguments at the highest precedence

* Review comments

---------

Co-authored-by: Spiky Moth <spikymoth@pm.me>
2026-01-23 19:49:37 +05:30
anrp d5c834c51d fix: Allow abliterating VL models (#108)
Per https://huggingface.co/docs/transformers/en/model_doc/auto#auto-classes,
it indicates that "There is one class of AutoModel for each task." Use
the presence of "vision_config" in the config.json to determine which.
2026-01-23 19:34:31 +05:30
anrp c86f49035e feat: Refactor save machinery and always allow user to save LoRA (#110) 2026-01-20 18:53:47 +05:30
anrp 85a6ec5ecb fix: Include kernels (allows MXFP4 to be loaded in MXFP4 instead of upcasting) (#107)
Co-authored-by: Andrew Patrikalakis <anrp@tri.global>
2026-01-16 17:30:24 +05:30
Philipp Emanuel Weidmann 632b1da622 feat: add config file for slop reduction 2026-01-11 18:51:26 +05:30
Philipp Emanuel Weidmann 1cfd09d7f3 ci: add style guide for Gemini 2026-01-09 14:58:56 +05:30
Philipp Emanuel Weidmann 09be09e12e fix: restore classification of empty responses as refusals
Fixes #93
2026-01-02 16:50:02 +05:30
Philipp Emanuel Weidmann 039f6222d2 feat: allow overriding the system prompt per dataset 2025-12-31 14:26:44 +05:30
Philipp Emanuel Weidmann c4b2ea0c42 feat: allow injecting prefixes and suffixes into prompts 2025-12-31 12:00:44 +05:30
Philipp Emanuel Weidmann 02a5237a02 feat: add option to print prompt/response pairs 2025-12-27 14:48:29 +05:30
Philipp Emanuel Weidmann cf8cf6f349 fix: address remaining ty complaint 2025-12-22 11:12:45 +05:30
Philipp Emanuel Weidmann 2141e110fb ci: treat ty warnings as errors 2025-12-22 10:57:36 +05:30
Philipp Emanuel Weidmann 39101137ef ci: add type checking 2025-12-22 10:48:42 +05:30
Philipp Emanuel Weidmann 064bed9a9f fix: resolve issues raised by ty
A single issue has been deliberately left unfixed to verify that the CI check works
2025-12-22 10:24:55 +05:30