Beyond the Screen: Unveiling the Influence of AI on the Media Industry
Authored by Kesava Reddy, Chief Revenue Officer, E2E Networks Ltd.
It was in 2018 when Lexus, the luxury vehicle division of Toyota Motor Corp, explored the intuitive collaboration between human and AI through the release of their ad ‘Driven by Intuition.’ It was scripted entirely by artificial intelligence (AI) and shot by Kevin Macdonald, the award-winning director of ‘The Last King of Scotland’ and ‘Whitney’.
This was one of the first projects to test the boundaries of how humans and machines can work together in harmony in the field of media production. Today, Generative AI is transforming the way images, memes, videos, voiceovers, music, gaming visuals, and more are conceptualized and produced.
McKinsey’s report released in February 2024 found that across the 63 use cases analyzed, Generative AI (or Gen AI) has the potential to generate $2.6 trillion to $4.4 trillion in value across industries. Of this, it will unleash between $80 billion to $130 billion in impact in the media and entertainment sector alone.
There are two ways in which it will influence media in the long run: first, through transforming existing media production workflows; and second, through unleashing entirely new and highly innovative applications that are yet to emerge.
Challenges in Existing Media Workflows
There are several challenges that currently plague the media industry, which Generative AI can help solve.
The first is the perennial challenge of stock footage and ‘B-roll’. B-roll, essentially, stands for the supplemental footage that any video content uses in addition to the primary footage, known as the ‘A-roll’. Historically, B-roll had to be shot separately, or acquired through stock footage providers. However, shooting B-roll separately drives up production cost, whereas stock footage libraries often provide a limited selection that may not fully meet creators' diverse needs.
A second, and bigger, problem often arises from salvaging substandard videos that had flaws after the shoot. These issues crop from unstable camera work, poor lighting, improper framing, or creative flaws like narrative inconsistency. Fixing them demands considerable time and resources and editors often find it extremely challenging to work with them. Hence reshooting is often required, thereby bloating up the cost.
Also, consider the cost of production itself. Producing high-quality content is costly due to the need for expensive equipment, skilled labor, and logistical expenses. These costs stem from purchasing or renting cameras and editing software, paying professionals, and the expenses around logistics of a shoot, such as transportation, location, crew and others. This makes it challenging to stick to ‘budget’, with shoots often exceeding their budget expectations or planning.
With constant demand for fresh, engaging content, there is intense pressure on content creators to produce high-quality content that captures and retains viewer interest. Generative AI has the capability to solve this, and more.
How Generative AI Can Transform Media
Generative AI technologies have introduced innovative ways to generate and manipulate media. Some of the most popular ones are the variants of Stable Diffusion – powerful image generation models that can create detailed images from textual descriptions. These models are pre-trained, and can be easily deployed on a cloud GPU infrastructure. Once deployed, they can be used by media organizations as part of their workflow, enabling artists, designers, and content creators who need high-quality visual content without the time and expense of traditional creation methods. They can even be fine-tuned to the exact needs of an organization.
These image synthesis models aren’t just capable of generating new images; they can also modify existing ones, fix errors in them or inspire one image from another. Therefore, they are highly effective in a range of use-cases, from stock media generation, to fixing errors that crept in during shoot and, sometimes, even eliminating the need for a shoot.
Similar to how image generation models work, we are now seeing the emergence of video generation models like Stable Video Diffusion as well. Since video stock footage is much harder to find than stock images, this is one of the areas where Gen AI will influence media production the most.
Generative AI’s media generation capability doesn’t stop at just images or video; it has the capability to generate audio as well. Music Gen and Audio Gen models, for instance, can generate musical tracks or background audio effectively, and help reduce cost and effort of audio recording. They effectively address the demand for original audio content, which is often a challenge due to copyright issues with existing music and sounds. They can be used by creators to generate unique soundtracks for videos, games, or digital experiences, enhancing the overall production value while avoiding legal complexities related to copyrighted material.
Generative AI can even assist with creating lip sync videos, which has a huge potential in dubbing a piece of content for a different language. Historically, lip sync has been a manual effort, and painstakingly time-consuming. However, with models like Wav2Lip, one can accurately sync the lip movements in a video to match a given audio track. This technology is particularly useful in post-production for films and broadcasting where audio tracks might need to be replaced or edited without reshooting the visual content. It's also valuable in creating localized versions of videos for different regions by syncing lips to translated audio, thus maintaining the realism and viewer engagement.
Finally, Generative AI can also help with creating entirely new formats we haven’t seen before. For instance, with fine-tuned Stable Diffusion models, it is possible to create images or video where one inserts a new character into a footage, or replaces an existing character, or even modifies their clothing or background. This can help advertisers and media producers to create variations of the same footage, tailored for different audiences, by changing the primary character or the setting in which they are placed. Imagination is the only limit.
Future of Generative AI Powered Media
In the future, Generative AI will also help unveil completely new formats and experiences. For instance, it can be used to create immersive experiences where a viewer can see themselves in a scene. It can also be used to storyboard new content, or imagine and realize visuals that would be otherwise extremely difficult to create.
Several businesses in the media domain have already started testing this modern workflow by deploying open-source Generative AI models on a cloud GPU, fine-tuning them for their specific use-case, and then providing them as assistants to their creatives. There are two big advantages to this approach: first, cost savings in the long run; second, control of the AI model and the ability to tailor it exactly to the use-case.
With heavy usage, proprietary Generative AI platforms can often turn out to be expensive, as the cost for API usage can pile up. Additionally, they offer very limited control over the underlying AI model, thus making it difficult to tailor it to a production houses’ specific needs. Of course, maintaining control over data has other associated benefits, such as protecting sensitive information from potential breaches. For media companies, this means they can safeguard proprietary data, such as unpublished content and source materials, which are highly valuable resources.
Future media organizations will increasingly incorporate Generative AI into their workflow, and create internally managed AI stacks built on open-source technology. This will enable them to reduce costs, streamline their workflows, and launch new and innovative formats that wouldn’t have been possible otherwise. As reports have rightly predicted, Generative AI will exert a significant influence on the media industry in the years to come.

Share
Facebook
YouTube
Tweet
Twitter
LinkedIn