docs: update README

This commit is contained in:
Philipp Emanuel Weidmann
2026-05-22 14:51:24 +05:30
parent 551db26bb7
commit 4e3a3a78a3
+3 -2
View File
@@ -116,8 +116,9 @@ a configuration file.
At the start of a program run, Heretic benchmarks the system to determine At the start of a program run, Heretic benchmarks the system to determine
the optimal batch size to make the most of the available hardware. the optimal batch size to make the most of the available hardware.
On an RTX 3090, with the default configuration, decensoring Llama-3.1-8B-Instruct On an RTX 3090, with the default configuration, decensoring
takes about 45 minutes. Note that Heretic supports model quantization with [Qwen3-4B-Instruct-2507](https://huggingface.co/Qwen/Qwen3-4B-Instruct-2507)
takes about 20-30 minutes. Note that Heretic supports model quantization with
bitsandbytes, which can drastically reduce the amount of VRAM required to process bitsandbytes, which can drastically reduce the amount of VRAM required to process
models. Set the `quantization` option to `bnb_4bit` to enable quantization. models. Set the `quantization` option to `bnb_4bit` to enable quantization.