• Scrubbles@poptalk.scrubbles.tech
    link
    fedilink
    English
    arrow-up
    13
    ·
    10 hours ago

    Ah another billion, that will do it!

    Seriously, sora is impressive for what it does, 10 second text to speech models. You cannot make movies with text to speech. You will never get a consistent face, expression, anything. By definition they are unpredictable.

    So you train some Loras to try to get a consistent face, but then the model only can make that face. So you can’t have 2 people in the video.

    So you get into advanced sectioning of frames and you generate then you replace person A with this face and person B with that face.

    And then we haven’t even gotten into backgrounds, consistent sets, or anything.

    And they want to make a whole damn movie with it.

    At this stage, it’s literally easier to learn real film editing than it is to prompt engineer 1 minute of continuous footage.

    1 billion ain’t enough mouse.

    • mindbleach@sh.itjust.works
      link
      fedilink
      arrow-up
      1
      ·
      3 hours ago

      Text to anything is the least version. You can perform a line yourself, out loud, with your human mouth, and then the model will make that sound like the voice you want.

      Diffusion turns whatever you have into whatever you describe. It’s powerful enough to work on noise. If you want even that to be consistent, just use the same noise, with a fixed seed. Otherwise - put in literally any more effort than asking nicely.

      So you can’t have 2 people in the video.

      … have y’all genuinely avoided seeing anything generated since 2022? Even the text-to-whatever models can handle “left guy” versus “right guy.”

      At this stage, it’s literally easier to learn real film editing than it is to prompt engineer 1 minute of continuous footage.

      A clip I saw ages ago showed a guy walking through his backyard holding a cardboard tube like a rifle, and then the result of presumably typing ‘monkey soldier walking through rainforest’ was Wish.com Planet Of The Apes. But: same camera motions, same body language. This tech is CGI for dummies. You don’t have to engineer shit, when you can nail down the animatic with amateurs. Or dolls.

      You could apply this shot-by-shot to Who Killed Captain Alex, and it would look better than any Steven Seagal movie.

      • Scrubbles@poptalk.scrubbles.tech
        link
        fedilink
        English
        arrow-up
        1
        ·
        2 hours ago

        2 people where at least one looks like a specific person who was not trained in the original model. As soon as you add something like a Lora into the mix it will try to make everyone like that person.

    • Not_mikey@lemmy.dbzer0.com
      link
      fedilink
      arrow-up
      1
      ·
      7 hours ago

      The deal isn’t about making movies, at least for now. It’s about allowing users on Disney + to make 10 second clips of them swinging a light saber at Darth vader or whatever