preprints

papers in submission by categories.

The Revealed Preferences of Pre-authorized Licenses and Their Ethical Implications for Generative Models
Suriyakumar, V.M., P. Menell, D. Hadfield-Menell, A. Wilson. GenLaw Workshop at ICML 2024. (by request)

Security & Safety

When Style Breaks Safety: Defending Language Models Against Superficial Style Alignment
In Submission
Xiao, Y., S. Tonekaboni, W. Gerych, V.M. Suriyakumar, M. Ghassemi. 2025

TOGA: Trigger Optimization for Clean Data Ordering Backdoor Attack
In Submission
Jin, A., W. Gerych, A. Gourabathina, V.M. Suriyakumar, M.Ghassemi. 2025 (by request)

UCD: Unlearning in LLMs with Contrastive Decoding
In Submission
Suriyakumar, V.M., A. Sekhari*, A. Wilson*. 2025. * Denotes equal supervision

Layered Unlearning for Adversarial Relearning
In Submission
Qian, T., V.M. Suriyakumar, A. Wilson, D. Hadfield-Menell. 2025

Unstable Unlearning: The Hidden Risk of Concept Resurgence in Diffusion Models
ICLR 2025 Workshop on Navigating and Addressing Data Problems for Foundation Models
Suriyakumar, V.M., R. Alur, A. Sekhari, M. Raghavan, A. Wilson. 2024.

Fairness

Iterative Nullifcation Transforms for Debiasing Vision-Language Models at Test-Time.
In Submission
Q. Perian, V.M. Suriyakumar, M.Ghassemi. 2025.