Generative AI in all of its weird and wonderful forms is set to bring about a huge revolution in filmmaking. Whether ultimately this is for better or worse, whilst this technology is continuously evolving, now is the time to be embracing the advances and playing with the various forms. This is exactly what artist and filmmaker Matteo Di Gioia has done in ‘Italian Boi’, a gaudy and flamboyant mix of live action, CGI and AI animation brought to life as an incredibly watchable music video.
Matteo, founder of The Jack Stupid creative studio, takes his audience on a queer and delightful narrative journey exploring the often unfavourable, nuanced tropes associated with a typical Italian man. His music video is heavily guided by the featured lyrics and is visually enriched by a vast and impressive array of AI generated images interspersed into the live action and CGI, creating a somewhat chaotic but wholly enjoyable watch. As an artist and freethinker, Matteo further invites his audience to explore the images and messages hidden within his fast-paced musical delight as multidimensional art i. Amateur AI aficionados such as myself and anyone with an interest in incorporating AI into their projects are in for a treat.
The lure of the Italian anti-culture is deep in this video, yet the goal was to keep the tone light and bright.
I love your stereotypical representation of ‘The Italian Boi’. How was he born?
We wanted to tell a love story about a cultural clash. We have someone who falls in love with an Italian Boi who is attractive and charismatic, but they soon realize he’s a bit of an ignorant, chauvinist, lazy mama’s boy. Furious, they break him into fragments which fly off and land across the nation. We decided to have three visual worlds: a human, a 3D-modelled Italian Boi and an AI-generated stream of Italian Zeitgeists.
Prior to shooting, we designed and developed the 3D model and animated those scenes. Parallelly, a geek squad of our people prompted DALL•E to obtain thousands of pictures that could be sequenced to create a visual stream of consciousness that best described the Italian Boi’s upbringing and environment, featuring the good and the bad! Italian politicians, historical facts, trashy personalities of today. These computer generated sequences were pre-finished ahead of shooting so that once the filming was over I could use all the ingredients and immediately move on to the editing.
Your music video is an audacious and surreal dream incorporating so many different elements. How did you go about planning what to include and the rough narrative structure?
I started imagining the scenes verse after verse and it started coming together quite naturally. Given that we decided to have three visual worlds this seemed like a natural point of division.
The starting point was always the lyrics which tell the story of these two characters. Full disclosure, I’m the author of this song, music and lyrics. For the music video, I started imagining the scenes verse after verse and it started coming together quite naturally. Given that we decided to have three visual worlds this seemed like a natural point of division. The sections that contain vocals would be live action so that the actor can lip sync, the instrumental parts would mainly be animated and the short DALL.E sequences would be tossed in the mix when relevant whilst the longer DALL.E sequences would play the main role during the guitar solo and the final riffs. In our animatics, each world had a key color so that everybody could immediately spot which technique would be used in each scene. I like to start working with post-its on a table and then moving to digital boards to shuffle things and eventually making a rough animatic as the music plays.
The tropes focussed on the typical ‘Italian Boi’ are brilliant. How did you go about planning what to include?
I wrote the character and set his environment so that we could imagine the Italian Boi’s train of thought which took the shape of a wishlist: “Berlusconi running naked in a field”, “solarium and underwear with big logos”, “Pinocchio hitting a bong underwater”. We then divided the wish list into colors and started mounting and dismounting the puzzle over the song structure until the balance felt right. During the production, many new ideas came to life and a few old ideas changed radically, but I hope all of them kept conceptual coherence, organizing chaos is super fun.
I am both fascinated and terrified by the rise in AI technology at the moment and love the inclusions in your video. What prompts were you giving DALL•E to produce your images and how did you then manipulate them to what you wanted?
After some hours of interaction, the AI behavior becomes less mysterious and manipulating it feels easier.
We started picking phrases from our wishlist and testing them out, prompting the AI with various deliveries of the same concept, varying and shuffling words to adjust the result. This approach did the trick for the brief sequences such as “Luca Di Maio energy vampire flying in the pink sky”. Longer sequences were trickier. I don’t remember the exact wording for the sequence but as an example “The Pope rides a pink bull in Rome” after a few dozen frames progressively blended into “Pagan orgy fire party painting”. We then morphed “Mandolin shredding in flames” into “Cannon battle in a soccer field” into “Gladiators slay in the Coliseum” into “Pantani’s skeleton wins the tour de Mars”. I picked the images that suited my purpose and organized them into stop-motion-like sequences to get a feel of where the story was going and to have a clearer idea of what to ask the AI. After some hours of interaction, the AI behavior becomes less mysterious and manipulating it feels easier. Typing down the name of a color or one artistic movement may suddenly turn the search in the right direction. I’d suggest being patient, setting a line, and tweaking it progressively, a word at a time, search after search.
How many images did your geek squad eventually attain from DALL•E and how did you choose and collate those together?
We normally do a lot of stop motion so we developed an eye for sequencing stills and obtaining an animated scene.
At some point, seven people were prompting the server at once and each of them had tens of windows calculating at the same time! This went on for over a week. Thousands of pics were downloaded and hundreds of them were hand-picked only for us to realise we needed many more, and more specific ones. We normally do a lot of stop motion so we developed an eye for sequencing stills and obtaining an animated scene. The process is the usual: set a strong start and end, then set the main keyframes, and finally choose the in-between frames to fill the sequence. Watch and adjust. If some jumps are too quirky, rearrangements and horizontal flips come in very handy. It is a very intuitive process based on trial and error.
With so many different elements, how did you work on the edit to put it all to the music and garner that manic yet story-led flow?
Having the animatic done before shooting was crucial! All was time-based, all the keystone scenes were set and some ‘question mark’ blank spaces were left to be filled by the DALL.E sequences, which we knew would be erratic by nature. I guess the manic flow comes from playing music and translating musical concepts on screen. A few people working on this video are musicians, and if music is sound organised in time, a video is images organised in time!
What do you hope for your film?
I’d like this film to make the audience reflect on cultural and national stereotypes, which are often partly, and humorously, true. I also want the process to be fun. As the lore on the topic is deep, I believe irony is the key. I personally love those works that can be quickly enjoyed with a light heart but they’re also concealing more under their surface. I always try to achieve some pleasurable layering. I hope you will enjoy the vision!