162 Commits

Author SHA1 Message Date
Philipp Emanuel Weidmann 27097bfe8e build: bump version to 1.2.0 v1.2.0 2026-02-14 18:11:42 +05:30
Philipp Emanuel Weidmann 025ab3a881 fix: disable LoRA export for now
Workaround for #152
2026-02-14 16:56:12 +05:30
Philipp Emanuel Weidmann 1179013999 docs: update README 2026-02-14 16:32:08 +05:30
Philipp Emanuel Weidmann fe7bc1bae3 docs: update README 2026-02-14 10:47:28 +05:30
Philipp Emanuel Weidmann e70a1a85e8 fix: don't load checkpoint when evaluating a second model
Fixes #144
2026-02-14 10:02:17 +05:30
Philipp Emanuel Weidmann e7f8be98b7 fix: only export tokenizer when exporting full model
Fixes #143
2026-02-14 09:18:22 +05:30
Philipp Emanuel Weidmann 6017bcd347 fix: use compatible release specifiers for non-dev dependencies
Fixes #145

Credit to MuX on Discord for recognizing that this is an issue with Transformers 5
2026-02-13 12:27:57 +05:30
Philipp Emanuel Weidmann dd0b3a2f69 docs: update README 2026-02-11 11:09:17 +05:30
Philipp Emanuel Weidmann b873598b77 docs: improve settings documentation 2026-02-11 10:19:05 +05:30
Philipp Emanuel Weidmann 10ceb3098e chore: update copyright notice 2026-02-11 09:46:36 +05:30
Salman Chishti 745b582414 ci: upgrade GitHub Actions to latest versions (#137)
Signed-off-by: Salman Muin Kayser Chishti <13schishti@gmail.com>
2026-02-08 16:49:04 +05:30
Salman Chishti d0e9462fb8 ci: upgrade GitHub Actions for Node 24 compatibility (#136)
Signed-off-by: Salman Muin Kayser Chishti <13schishti@gmail.com>
2026-02-08 16:48:12 +05:30
Philipp Emanuel Weidmann f68a887a7b fix: improve code quality, improve UX, fix small bugs 2026-02-08 13:32:00 +05:30
Philipp Emanuel Weidmann 2690655a83 feat: print memory usage during run 2026-02-02 21:18:01 +05:30
Spiky Moth 3525b1ac22 Implement Magnitude-Preserving Orthogonal Ablation (#52)
* feat: add support for winsorizing the residuals

Adds setting winsorization_quantile, expressed as the quantile to clamp to.
- If set to a value below 1, the residuals obtained from evaluating the first token of the good and bad prompts are winsorized - that is, values outside the given quantile are clamped. Note that winsorization_quantile = 0.95 corresponds to a 90% winsorization.

* feat: implement magnitude-preserving orthogonal ablation

Adds boolean setting orthogonalize_direction:
- When enabled, only the component of the refusal directions that is orthogonal to the harmless direction is subtracted during abliteration.

Adds enum-valued setting row_normalization:
- 'none': No normalization.
- 'pre': Row-normalize the weight matrix before computing the LoRA adapter.
- 'full': Like 'pre', but re-normalizes to preserve original row magnitudes.

* prefer 'good' and 'bad' over 'harmless' and 'harmful'

* clarify how winsorization is applied

* store and reuse full peft_config

* remove unneeded cast

* make LoRA rank configurable for full normalization

* explain why the singular values are split across the components
2026-02-02 17:05:19 +05:30
anrp 42f5a9b553 fix: Use file instead of symlink lock (for windows) (#116) 2026-01-25 19:34:01 +05:30
anrp 451db0b76e fix: specify study name (#119)
If we don't, optuna will generate a UUID for a name, which will never be found when loading as it is a "different" study. https://optuna.readthedocs.io/en/stable/reference/generated/optuna.study.create_study.html#optuna.study.create_study
2026-01-25 18:48:23 +05:30
anrp ebc22c299e feat: Allow study progress to be saved & resumed (#106)
* feat: Store active study in log/study.jsonl and allow resuming

* Simplify resume logic with load_if_exists=True

* Significantly improve flexibility of study save/load

* Put constructor arguments at the highest precedence

* Review comments

---------

Co-authored-by: Spiky Moth <spikymoth@pm.me>
2026-01-23 19:49:37 +05:30
anrp d5c834c51d fix: Allow abliterating VL models (#108)
Per https://huggingface.co/docs/transformers/en/model_doc/auto#auto-classes,
it indicates that "There is one class of AutoModel for each task." Use
the presence of "vision_config" in the config.json to determine which.
2026-01-23 19:34:31 +05:30
anrp c86f49035e feat: Refactor save machinery and always allow user to save LoRA (#110) 2026-01-20 18:53:47 +05:30
anrp 85a6ec5ecb fix: Include kernels (allows MXFP4 to be loaded in MXFP4 instead of upcasting) (#107)
Co-authored-by: Andrew Patrikalakis <anrp@tri.global>
2026-01-16 17:30:24 +05:30
Philipp Emanuel Weidmann 632b1da622 feat: add config file for slop reduction 2026-01-11 18:51:26 +05:30
Philipp Emanuel Weidmann 1cfd09d7f3 ci: add style guide for Gemini 2026-01-09 14:58:56 +05:30
Philipp Emanuel Weidmann 09be09e12e fix: restore classification of empty responses as refusals
Fixes #93
2026-01-02 16:50:02 +05:30
Philipp Emanuel Weidmann 039f6222d2 feat: allow overriding the system prompt per dataset 2025-12-31 14:26:44 +05:30
Philipp Emanuel Weidmann c4b2ea0c42 feat: allow injecting prefixes and suffixes into prompts 2025-12-31 12:00:44 +05:30
Philipp Emanuel Weidmann 02a5237a02 feat: add option to print prompt/response pairs 2025-12-27 14:48:29 +05:30
Philipp Emanuel Weidmann cf8cf6f349 fix: address remaining ty complaint 2025-12-22 11:12:45 +05:30
Philipp Emanuel Weidmann 2141e110fb ci: treat ty warnings as errors 2025-12-22 10:57:36 +05:30
Philipp Emanuel Weidmann 39101137ef ci: add type checking 2025-12-22 10:48:42 +05:30
Philipp Emanuel Weidmann 064bed9a9f fix: resolve issues raised by ty
A single issue has been deliberately left unfixed to verify that the CI check works
2025-12-22 10:24:55 +05:30
_Vinayyyy_ 8d44b65670 feat: add continuous optimization option(latest changes updated) (#76)
* fix: a little merge bug

* refactor: simplify optimization loop based on feedback

* fix: address review comments

* fix: remove redundant check for study.best_trials

* fix: restore comments

---------

Co-authored-by: Vinay Umrethe <vinayumrethe99@gmail.com>
2025-12-20 18:57:57 +05:30
Philipp Emanuel Weidmann 5ddef6fd2f feat: add more CoT templates
Suggested by u/Chromix_ on Reddit
2025-12-20 17:12:46 +05:30
michaelh 92d0c0d551 feat: enumerate all available GPUs on startup (#86)
* feat: enumerate all available GPUs on startup

* feat: extend device enumeration to all accelerator types
2025-12-16 17:42:15 +05:30
michaelh 243f821d93 feat: Add 4-bit loading + LoRA support for low VRAM optimization (#60)
* Add files via upload

* perf: optimize abliteration matrix op (#46)

* perf: optimize abliteration matrix op

* refactor: comments and var names correspond with arditi

* refactor: fix comments and improve var notation

* fix: accidental line change and improve comments

---------

Co-authored-by: mad-cat-lon <113548315+mad-cat-lon@users.noreply.github.com>

* Fix line endings to LF

* Add hybrid approach for GPT-OSS compatibility

- Check for LoRA adapters before attempting LoRA abliteration
- Fall back to direct weight modification for nn.Parameter (GPT-OSS)
- Ensures compatibility across all model architectures

* Fix projector bug, update print statement, revert README

* Revert README changes to match upstream

* Fix import sorting for ruff

* Fix reload_model for evaluate_model, add type hints and validation

* Apply ruff formatting

* Replace load_in_4bit with quantization enum

* Fix precision loss: use FP32 refusal direction directly

* Move r assignment into non-LoRA path

* Fix linting: apply ruff formatting

* Add auto-merge for LoRA adapters on save/upload

* Fix linting: apply ruff formatting

* Implement CPU-based merge for 4-bit models with OOM fallback

* Remove use_lora flag (LoRA always on), add user prompt for 4-bit export

* Fix: PEFT target_modules expects module names without path prefix

* Fix linting: apply ruff formatting

* Add LoRA fallback and fix quantization_config handling

- Add try/except around LoRA initialization with fallback to direct weight modification
- Only pass quantization_config when not None (fixes gpt-oss loading)
- Use simple forward pass instead of generate() for model test (avoids chat template issues)
- Reset non-LoRA models by reloading in reload_model()
- Check self.use_lora before accessing LoRA adapters in abliterate()

* Add 8-bit quantization support via bitsandbytes

- Add BNB_8BIT option to QuantizationMethod enum
- Add --load-in-8bit CLI support (auto via pydantic-settings)
- Update documentation in config.py and config.default.toml
- Useful for mid-range VRAM (12-16 GB) as balance between memory and numeric stability

* Improve LoRA merge warning and fix linting

* Apply final ruff formatting

* Fix CI: apply ruff import sorting

* Use tiny model for CI efficiency

* Fix import sorting in test_lora.py

* Fix formatting in test_lora.py

* feat: Show merge warning for all models (requires high RAM)

* style: Apply ruff fixes

* Fix undefined Style import in main.py

* Fix(model): Support MoE/3D tensors and enforce dtype safety in abliterate

* Fix(ci): Format model.py with ruff

* Fix(main): Remove invalid style argument from prompt_select and unused import

* Fix logic errors, memory leak, and redundant merges in main.py

* Fix linting and formatting issues (isort, ruff)

* chore: Simplify .gitattributes as requested

* refactor: Remove defensive try-except around LoRA initialization

* chore: Update uv.lock with peft and bitsandbytes

* chore: Regenerate uv.lock to include missing peft dependency

* style: Fix import sorting (isort) for CI compliance

* style: Simplify .gitattributes to single line as requested

* Address PR #60 feedback: Remove caching, fix LoRA reload, global LoRA usage, style fixes

* Address PR review comments: clarify code, fix quantization, rename method

- Add explanatory comments for warning suppression and gc behavior
- Remove redundant gc.collect() calls (empty_cache handles it)
- Fix output message order (ask merge strategy before 'Uploading...')
- Add comment explaining 8-bit quantization doesn't need compute_dtype
- Remove extra newline after dtype comment
- Add future-proofing note for hybrid layer support (#43)
- Remove leftover comment in get_merged_model
- Delete test_lora.py (debug script, not a real test)
- Add comment explaining needs_reload flag purpose
- Extract quantization config into _get_quantization_config() helper
- Rename reload_model() to reset_model_for_trial() for clarity
- Fix reload_model to respect quantization config (fixes evaluate_model bug)
- Remove unused gc import

* Restore gc.collect() before empty_cache() for large models

* refactor: Remove LoRA fallback remnants, simplify code

- Remove use_lora flag (always true since LoRA is always applied)
- Remove isinstance(PeftModel) check in get_merged_model() (always true)
- Simplify reset_model_for_trial() by removing defensive try/except
- Remove redundant gc.collect() calls (empty_cache handles GC)
- Remove unused gc import from main.py

* Address p-e-w review feedback: rename reset_model, remove loaded_model_name, fix type hints, remove GPT-OSS MoE, update assertion

* Restore skip logic for non-LoRA modules and fix 4-bit base_layer.weight access

* Remove defensive lora_A check per review - get_layer_modules already filters

* Fix try_add: nest component init inside Module check, add assert for unexpected types

* Add note about module.weight assumption for type checking

* Change 'Reloading model' to 'Resetting model' in logging

---------

Co-authored-by: accemlcc <accemlcc@users.noreply.github.com>
Co-authored-by: mad-cat-lon <113548315+mad-cat-lon@users.noreply.github.com>
Co-authored-by: Hager <Michael.Hager@bruker.com>
2025-12-14 20:19:09 +05:30
Spiky Moth 9d1734855d feat: avoid excessive low divergence iteration (#73)
* feat: adjust scoring to avoid useless iteration

Adjusts the scoring function to avoid targeting meaninglessly low KL divergences.
Below a threshold value, the KL divergence score switches to the refusal count.
Adds config option kl_divergence_target (defaulting to 0.01).

* fix: Clean up parameter selection in objective

Create variables for num_layers and last_layer_index
* Improves readability and makes choices explicit

* feat: Print the parameters of the selected model
2025-12-14 14:26:48 +05:30
George 740aab61ba feat: add max_memory parameter to limit memory usage (#83)
* add max_memory parameter to limit memory usage

* Added to reload_model also

* forgot to add self

* Process max_memory once in __init__ and store it as an instance variable, then reuse it in both locations
2025-12-11 20:57:40 +05:30
Philipp Emanuel Weidmann d9f2b0407a build: bump version to 1.1.0 v1.1.0 2025-12-10 16:54:03 +05:30
Philipp Emanuel Weidmann ca783db6c9 docs: update README 2025-12-10 16:30:35 +05:30
Philipp Emanuel Weidmann 6acccac994 feat: add progress bars for plotting operations 2025-12-10 13:07:34 +05:30
Philipp Emanuel Weidmann ac154a55a0 fix: suppress CoT output for thinking models
Ref #75
2025-12-09 11:54:08 +05:30
Philipp Emanuel Weidmann 15781a8a0c fix: skip common response prefix for thinking models
Ref #75
2025-12-09 08:25:10 +05:30
Philipp Emanuel Weidmann 24c3aeb442 feat: turn boolean settings into CLI flags 2025-12-07 11:37:07 +05:30
Philipp Emanuel Weidmann ffbde3ac2a fix: follow up after recent PRs 2025-12-07 10:26:16 +05:30
Philipp Emanuel Weidmann 932d737edf feat: add silhouette coefficient to residual geometry output 2025-12-07 08:48:38 +05:30
Philipp Emanuel Weidmann 1f5e977f4f Revert "perf: optimize abliteration matrix op (#46)" (#74)
This reverts commit 60bd531fde.
2025-12-07 06:30:37 +05:30
Philipp Emanuel Weidmann da27ba8054 fix: always left-pad inputs, and avoid optimizing for empty responses
Fixes #70

Co-authored-by: arnomatic <acc@eml.cc>
2025-12-06 06:31:09 +05:30
Philipp Emanuel Weidmann baf5b0b0d1 feat: add geometric median to residual geometry output 2025-12-05 20:15:50 +05:30
Philipp Emanuel Weidmann eeb28b28c1 feat: add option to plot residual vectors 2025-12-04 14:22:29 +05:30
red40maxxer d836fb2da9 ci: add PR title lint (#66)
* ci: add PR title lint

* style: ending newline

---------

Co-authored-by: mad-cat-lon <113548315+mad-cat-lon@users.noreply.github.com>
2025-12-03 09:25:48 +05:30