On Tuesday, Microsoft Research Asia unveiled VASA-1, an AI model that can create a synchronized animated video of a person talking or singing from a single photo and an existing audio track. In the future, it could power virtual avatars that render locally and don’t require video feeds—or allow anyone with similar tools to take a photo of a person found online and make them appear to say whatever they want.

  • P03 Locke@lemmy.dbzer0.com
    link
    fedilink
    English
    arrow-up
    1
    ·
    8 months ago

    No. No, they can’t. This shit still takes lots and lots of training data.

    It’s just like any job. You can’t just fully fake something in one day. At best, you might get 60% of the way there, maybe 80% after adding on some generic experience. But, you’re not going to fully mimic anything without lots of training and experience.