run_causal.py — entry point for probe-faithfulness ablation and logit-level steering (Tables 2 and 4 in the paper) run_behavioral_steering.py — entry point for behavioral generate-under-steering test ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results