preprints
papers in submission by categories.
Privacy, Memorization, & Copyright
The Revealed Preferences of Pre-authorized Licenses and Their Ethical Implications for Generative Models
Suriyakumar, V.M., P. Menell, D. Hadfield-Menell, A. Wilson. GenLaw Workshop at ICML 2024. (by request)
Security & Safety
When Style Breaks Safety: Defending Language Models Against Superficial Style Alignment
In Submission
Xiao, Y., S. Tonekaboni, W. Gerych, V.M. Suriyakumar, M. Ghassemi. 2025
TOGA: Trigger Optimization for Clean Data Ordering Backdoor Attack
In Submission
Jin, A., W. Gerych, A. Gourabathina, V.M. Suriyakumar, M.Ghassemi. 2025 (by request)
UCD: Unlearning in LLMs with Contrastive Decoding
In Submission
Suriyakumar, V.M., A. Sekhari*, A. Wilson*. 2025. * Denotes equal supervision
Layered Unlearning for Adversarial Relearning
In Submission
Qian, T., V.M. Suriyakumar, A. Wilson, D. Hadfield-Menell. 2025
Unstable Unlearning: The Hidden Risk of Concept Resurgence in Diffusion Models
ICLR 2025 Workshop on Navigating and Addressing Data Problems for Foundation Models
Suriyakumar, V.M., R. Alur, A. Sekhari, M. Raghavan, A. Wilson. 2024.
Fairness
Iterative Nullifcation Transforms for Debiasing
Vision-Language Models at Test-Time.
In Submission
Q. Perian, V.M. Suriyakumar, M.Ghassemi. 2025.