Deepfaking myself was scarily easy

Alessandro Marrella - Sep 5 - - Dev Community

Moved by my curiosity on everything AI related, I decided to give it a try by creating a LoRa adapter on the FLUX.1 model by Black Forest Labs to generate pictures of myself that never happened.

Deepfake of myself eating pineapple pizza

FLUX.1 is a model by Black Forest Labs and as of today represents the state of the art on image generation. It currently comes into 3 versions: Schnell (open weights, Apache 2.0 licensed), Dev (open weights, Non Commercial) and Pro (Commercial, closed source).

In this experiment, I decided to try training FLUX.1 Dev, using the very convenient ostris/flux-dev-lora-trainer hosted on Replicate. The code for the trainer can be found on Github.

I didn't play around with the parameters much (training costed me around $2.44), the only things I did were:

  1. Uploading a Zip file with 12 photos of myself (without any label, I relied on the auto-labeling that the trainer provides that leverages LLava-1.5, a image captioning model)
  2. Setting the autocaption_prefix to "A photo of TOK, ". TOK being a trigger word that should help the model to identify myself. I'm not sure if this helped or not (I decide to limit my budget to do a simple POC), but I did it anyway 🤷‍♂️.

I then kicked off the job, which took about 34 minutes to complete. Once the model was trained, by clicking "Run trained model" in the UI, I was brought to a form where I could configure a prompt like "A photo of TOK, eating pineapple pizza" and generate an image that hopefully would resemble me. You can also provide and image and a "mask" to guide the model on what to generate (see "Lord Commander of the Night's Watch example below).

And that's it really, generating the images took about a minute, some of them missed the target, but the adapter learned my face reasonably well with no tweaking, and produced absurd results like the ones in the examples gallery.

The next step I'm going to try is videos, though I am really creeped out already by the photos one.

In a world where everything can be realistically faked (again, my example took no effort, I'm sure that can be tweaked be even more realistic) how will we be able to distinguish fiction from reality?

Note: if you ever see a picture of me eating pineapple pizza, that's a fake for sure.

. . . . . . . .