Skip to content
Triangles

Making my own GenAI film taught me so much about our future

”Listen to audio version”
11:12

A couple of weeks ago, we ran a competition for all our team – yup, everyone – to produce an AI-generated video. We wanted the whole company to understand how this technology works. As I have told a thousand wannabe filmmakers over the years, the best way to learn is to get on and make films.

Many of the entries to the competition were excellent. Some were OK, but everyone learned something. We have shared some of them on our company's LinkedIn page, and I am sure we will continue to do so. I thought I would use this week to discuss the process I went through and, more importantly, what it taught me about where our industry is headed.

Disclaimer No.1

Before I begin, I want to acknowledge that these platforms were trained on human artists’ work, which, in many, if not most, cases, they had no permission to use in that manner. Many of those artists are now losing work to the very platforms that stole their work. That is wrong, I wish it was different, and frankly it sucks. But that is the world we live in. We have to use these tools to stay relevant in the fast-moving markets we operate in.

Disclaimer No.2

I have not made a film for over a decade, so while the will may be strong, the skills have been stronger. We also only had two weeks to do it, and I was travelling for the entire time, which made finding the time to do it difficult. The physical output is not my best work, but the learning was invaluable.

Ok, get on with it…



Creating ‘Cas’

As the star of my show, I decided to create a mascot for Casual. An animal that would not be too geographically specific, and which would have characteristics that would be representative of our company’s character and values.

After some back-and-forth briefing ChatGPT, I decided on a cool arctic fox, named Cascade, or ‘Cas’ for short. She is an approachable master of adaptation, with keen perception, always sensing opportunities below the surface. We workshopped some imagery and ended up with this chap.

Article content

Cascade AKA Cas

I liked her clean colour scheme, which matches our branding. The point is that I knew more or less what I wanted, but I used ChatGPT as a partner, almost as if I were working with another creative and an illustrator.

Function: Character design and image generation | Platform: ChatGPT | Time: 3 hours

Accuracy: ⭐️⭐️⭐️ | Control: ⭐️⭐️⭐️ | Ease of use: ⭐️⭐️⭐️⭐️ | Quality: ⭐️⭐️⭐️⭐️



Writing the brief

Having written blogs for some time now, I know not to use these platforms for draft work. It’s quick, but I can tell it’s AI immediately, it isn’t my voice, and it lacks flourish. I can never make the outputs feel right after, no matter how much editing I do.

So, I wrote the film from my own imagination. You could argue that this is cheating in an AI Film comp, but a) I wasn’t really trying to win, and b) I genuinely didn’t think it was the best way to make a good AI film. Generic TikTok stuff, sure, but not what I wanted to create.

Scooby Cas

I wanted to produce an old-school animation to match the style of Cas. I was thinking of Scooby Doo and the other cartoons I watched as a kid. Cas would need to solve a terrible problem in an office – the team have been let down and the product launch is tomorrow!

She would enter, hanging from a helicopter high above the city. From there, she would take control of the situation, travelling from location to location, creating the film that would lead to a successful product launch in front of thousands of amazed people.

Platform: Microsoft Word | Time: 2ish hrs of drafting + a few days of background thinking



Video Generation with Veo2

From there, I chopped the script into shots and fed them into ChatGPT, asking it to make prompts for the videos. I took these prompts and gave them to Veo2, which seemed OK for animation.  There are other platforms, but I decided to focus on this one.

The ChatGPT prompts didn’t work that well, so I found myself painstakingly describing the contents of every clip. This was hit or miss at best, and very frustrating. Generating consistency took a long time. That is partially due to poor workflow, partially due to the time limitations, and partially due to the way the AI works.

Crazy office render
Some of the renders were crazy. This wasn't even the worst!

I could use cut-and-paste descriptions of elements that appeared frequently. The Ingredients to Video function allowed me to use images from other clips as part of the prompt, which really helped with consistency.

Render Bender

In total, I created around 180 renders, and they are still a long way from perfect. I would probably need another 60 at least to get it to somewhere a bit more respectable, but even then, I don’t think I would be fully happy with it. And that is before you consider lip syncing.

Rather than using the built-in Google video compiler, I downloaded the footage to edit in Adobe Premiere. It’s a platform I'm familiar with (well, I was over a decade ago!), and I was going to be on a long flight, so I wanted to work offline.

Function: Video Generation | Platform: VEO2 | Time: At least 10 hours over multiple days.

Accuracy: ⭐️⭐️ | Control: ⭐️⭐️ | Ease of use: ⭐️⭐️⭐️ | Quality: ⭐️⭐️⭐️



Music composition

For the music composition, I briefed the AI music creation app Suno. The prompt included the character of Cas and the story: solving problems and travelling the world. I wanted an exciting, indie-style theme tune to a cartoon animation.

It gave me two examples, and I chose the one that I thought worked best. It was decent but not perfect. While I love music, I have never been much of a composer, and what the platform allowed me to do in such a short time was impressive. Certainly not as good as working with a human composer but amazing given the time and my talent constraints.

By now, I was starting to run a bit short on time, so I would have liked to have honed the output more, but I simply couldn’t. The same was true with the instrumentals. I wrote a brief for what was happening in the scene, and it gave me a few options. I added these to my timeline.

Function: Music generation | Platform: Suno | Time: 0.5 hours

Accuracy: ⭐️⭐️⭐️⭐️ | Control: ⭐️⭐️⭐️ | Ease of use: ⭐️⭐️ | Quality: ⭐️⭐️⭐️



Dialogue Generation

I used Elevenlabs with the script I had written earlier. I listened to the various voices and selected the ones that I thought worked best. They were not perfect, but see above – time!

I fed in the snippets of VO, got an output, and then, if I didn’t like the intonation, I could rerender it. I couldn’t direct the VO, which was annoying, but what are you going to do?

I added these to the timeline in the relevant parts. The lip sync is all over the place. I know there are platforms that can improve this, but that wasn’t really what this exercise was all about. I did my best with the clips I had, but it’s really not perfect.

Function: Dialogue, editing | Platform: Elevenlabs, Adobe Premiere | Time: 1 hour



Onlining/SFX/Finishing

I added sound effects from Casual’s own audio library and then re-generated some of the clips with Veo3 to enhance the flow of the film. I could have taken many times longer to improve the film at this stage, but by now, I was about to lose cell service, so I really had to hit render and upload it.

Function: SFX, finishing | Platform: Adobe Premiere, Casual’s SFX Library | Time: 1.5 hrs



So here it is:

 

 'Putting the Cas in Casual'




Takeaways

Platforms: ChatGPT, Microsoft Word, Google Veo2, Suno, Adobe Premiere, Elevenlabs, Casual’s SFX Library | Time: 16 hours

GenAI is a member of the team

The fact that “AI is a tool” has been written a billion times. Sure. But having done this exercise, I would say it is more than that. AI is a member of the team. Not the team, a member of the team. In some ways, a great member of the team; in others, a flawed member of the team. Just like all of us.

Raising the Creative Bar

The potential for someone with the time, imagination, and filmmaking ability to produce stunning work is profound. It made me realise that working with these platforms is going to significantly raise the bar on our best work. We are going to do work for corporate clients that have hitherto been the preserve of the highest commercial budgets.

It will allow us to unleash our imaginations as filmmakers. If you can imagine a shot, then you can do it. No more line producers shaking their heads and rolling their eyes. But you do need to have that vision, and know what you want – the current set of AI platforms is not going to give you that.

A Hand to Hold through the Creative Process

Much of our job involves divining the film that our clients are actually after. They don’t necessarily know themselves. This ability stems from years of practice, involving thousands of conversations, and a clear process to achieve the desired output. It’s hard to see how an AI platform (in its current form) supplants that.

Striving for Professional Precision

That led me to one of my most significant findings. Throughout the process, I wanted to be able to pass what I had done to an animator, a sound engineer, or a VO artist, to just do it properly. The tools are just not controllable enough to get precisely what you want in a timely manner. Consistency is essential to our corporate clients, and that was something I really struggled with.

Constraint is the Path to Creative Excellence

I also realised the limitations of being able to generate the content so relatively easily. They say that every final film evolves from three separate films: the one you write, the one you shoot, and the one you edit. There is also arguably the fourth film that is created during the client feedback process.

These films are the result of limitations and new opportunities that occur as you go through each step. The cave you wanted to shoot in is closed, so you make it in the abandoned house instead. The sequence shot by the beach didn’t really work, but it does amazingly if you change the scene order in the edit. And so on.

My point is that using GenAI, you don’t have to wrestle with the constraints of the process, and I think the film is the poorer for it. Those flights of creative fancy are what really make the output sing, and we’re losing that because we go straight from imagination to final version.

Summary

On the one hand, it is incredible that someone with no animation and limited design experience can pull something like this together in a couple of days of intense work time. My knowledge and capability with the tools will only grow, and the tools themselves will continue to improve, becoming more controllable and usable. I wasn’t necessarily using the latest platforms either.

But, can you make a film, every step, with AI? Yeah, just about, but will it be any good? I don’t think so.

 

 
 

Related posts

Frame (13)

Competition for mind space is fierce.

To stand out and stand apart, brands must work doubly hard.

let’s talk