יולי 21, 1994 - יולי 21, 2027

  • Date:04חמישייולי 2024

    Vision and AI

    More information
    שעה
    12:15 - 13:15
    כותרת
    Recovering the Pre-Fine-Tuning Weights of Generative Models
    מיקום
    בניין יעקב זיסקינד
    Room 1
    מרצה
    Eliahu Horwitz
    HUJI
    מארגן
    המחלקה למדעי המחשב ומתמטיקה שימושית
    Seminar
    צרו קשר
    תקצירShow full text abstract about The dominant paradigm in generative modeling consists of two...»
    The dominant paradigm in generative modeling consists of two steps: i) pre-training on a large-scale but unsafe dataset, ii) aligning the pre-trained model with human values via fine-tuning. This practice is considered safe, as no current method can recover the unsafe, pre-fine-tuning model weights. In this paper, we demonstrate that this assumption is often false. Concretely, we present Spectral DeTuning, a method that can recover the weights of the pre-fine-tuning model using a few low-rank (LoRA) fine-tuned models. In contrast to previous attacks that attempt to recover pre-fine-tuning capabilities, our method aims to recover the exact pre-fine-tuning weights. Our approach exploits this new vulnerability against large-scale models such as a personalized Stable Diffusion and an aligned Mistral.

    Bio:

    Eliahu Horwitz is a PhD candidate in Computer Science at the Hebrew University of Jerusalem, working under the supervision of Prof. Yedid Hoshen. His research area is computer vision, with a focus on representation learning and generative models. Currently, his work revolves around reversing the training trajectories of neural networks.

    A recipient of the KLA Scholarship for Outstanding Graduate Students and a CIDR (Center for Interdisciplinary Data Science Research) fellow, Eliahu’s academic achievements are complemented by his practical experience. Before transitioning to research, he honed his skills as a self-taught software developer, working with diverse technologies across the tech stack at both startups and large-scale companies. His latest research can be found on his website: pages.cs.huji.ac.il/eliahu-horwitz.
    הרצאה