Philipp Emanuel Weidmann
|
7caf9fcdc5
|
Separate training and evaluation prompts
|
2025-10-09 12:51:31 +05:30 |
|
Philipp Emanuel Weidmann
|
2ff8dcba6b
|
Add model card when uploading to Hugging Face
|
2025-09-30 08:43:21 +05:30 |
|
Philipp Emanuel Weidmann
|
5b01ad4344
|
Add save and upload functionality
|
2025-09-27 11:15:41 +05:30 |
|
Philipp Emanuel Weidmann
|
7573a2eebd
|
Support passing model name without "--model" argument prefix
|
2025-09-25 15:02:22 +05:30 |
|
Philipp Emanuel Weidmann
|
fd0fa52552
|
Add chat functionality
|
2025-09-24 18:09:23 +05:30 |
|
Philipp Emanuel Weidmann
|
f00d35dc46
|
Improve early abort score calculation
|
2025-09-23 19:02:00 +05:30 |
|
Philipp Emanuel Weidmann
|
3f242369e0
|
Add educated guesses for parameter values to get the optimizer started
|
2025-09-23 16:00:20 +05:30 |
|
Philipp Emanuel Weidmann
|
c447805fc2
|
Improve default dtype configuration
|
2025-09-23 13:31:41 +05:30 |
|
Philipp Emanuel Weidmann
|
b6c715ab6f
|
Abort trial early if KL divergence is too high
|
2025-09-23 13:20:31 +05:30 |
|
Philipp Emanuel Weidmann
|
9485edc221
|
Support Qwen3 MoE
|
2025-09-22 15:22:48 +05:30 |
|
Philipp Emanuel Weidmann
|
1b37160490
|
Fix model loading issues
|
2025-09-21 16:04:41 +05:30 |
|
Philipp Emanuel Weidmann
|
af19fbd254
|
Initial commit
|
2025-09-21 11:10:30 +05:30 |
|