* perf: optimize abliteration matrix op
* refactor: comments and var names correspond with arditi
* refactor: fix comments and improve var notation
* fix: accidental line change and improve comments
---------
Co-authored-by: mad-cat-lon <113548315+mad-cat-lon@users.noreply.github.com>
* Add `trust_remote_code` configuration option and apply it when loading models and tokenizers
* Default `trust_remote_code` to `None` and set it to `True` if previously `None` so the user wouldn't be asked multiple times
* Consistently access `trust_remote_code` from `self.settings` instead of the global `settings` object.
* Introduce `trusted_models` dictionary to manage and confirm `trust_remote_code` settings during model loading
* Assign `trust_remote_code` to `evaluate_model` in `trusted_models` instead of `model`
When loading models with MXFP4 quantization (e.g., openai/gpt-oss-20b),
the transformers library uses Triton tensors to wrap the quantized weights.
These Triton tensors have a .data attribute containing the underlying
PyTorch tensor, but torch.is_tensor() returns False for them.
This caused a KeyError: 'mlp.down_proj' when trying to load such models,
as the try_add() function would fail the assertion check before adding
the down projection matrices.
The fix extracts the underlying PyTorch tensor via the .data attribute
when encountering Triton tensors, allowing heretic to work with MXFP4
quantized models while maintaining full compatibility with standard models.
Tested with openai/gpt-oss-20b on PyTorch 2.9.1+cu130, transformers 4.57.1,
triton 3.5.1, and kernels 0.11.0.
Handle optional fullname and email fields in user profile gracefully
using .get() method with fallback values to prevent KeyError when
uploading models to HuggingFace.
This fixes an issue where users without a public email or fullname
set in their HuggingFace profile would encounter an error during
the upload process.
Co-authored-by: ricyoung <riyoung@gmail.com>
* Ensure projector is on the same device as the matrix for multi-GPU support
* Optimize memory management for loaded model weights
* Refactor memory management by removing unnecessary gc.collect() calls
* Optimize memory usage (#1)
* Improve memory management by explicitly deleting model layers and optimizing projector usage
* Optimize memory management by explicitly deleting the model and forcing garbage collection
* Add back deleted `empty_cache` call
* Fix broken file
* Remove unnecessary deletions
* Remove unnecessary empty_cache() calls
* Remove unused import of gc
* Duplicate `gc.collect` call in `empty_cache()`
* Move additional `gc.collect` call in front of `torch.x.empty_cache`