How did this elf get transported to Middle Earth?

And why a basic understanding of CGI for TV and movies can help make your next video project your most precious yet

Have you ever wondered how characters in movies and TV transport themselves to far-off worlds, fight mythical monsters and jump through explosions? The answer is almost always CGI (Computer-Generated Imagery) unless you've got a particularly brave stuntman.

CGI, when done well, can bring a new level of realism and engagement to content creation. Exceptional visual storytelling can captivate audiences in a way that nothing else can.

In this article, I'm going to give a brief, high-level explanation of the technical processes regarding visual effects (VFX) on the big (or little) screen. I'll also go into why understanding the basics, if you deal with video in any capacity, could spark your next business idea.

Who am I?

My name is Michael Goldman, and I'm a Senior Software Engineer at Sigma Connectivity - a technology consultancy and creative tech house. This means at Sigma, we offer specialist hardware and software engineers like myself to help you and your company achieve your goals in a wide array of industries. Equally, we can also offer to build specialized high-tech products for you, turning your next dream product into reality. Have a look at some of our previous work for inspiration!

I specialize in real-time video processing. In layman's terms, I make software that takes in live video, processes each frame before the next one arrives, and shows you the result. What the processing is, varies from project to project!

For example, while working at Sony, I created an app that would stream sports embedded with fun live stats and graphics to your VR headset. I also created an Android app that live-streamed football, letting you scroll the pitch and track individual players.

Most recently though, I worked at Zeiss, where I was Technical Lead for Cincraft Scenario, a real-time camera tracking system. Camera tracking is a key part of VFX - which I will explain later - and this is where my experience in the movie industry stems from.

Bear in mind, my experience only covers a small part of a very large production pipeline! However, it is enough to illuminate how CGI is inserted into film and live TV, which is what I will focus on in this article.

Why does this concern me or my business?

Visual storytelling isn't only restricted to movies and big-budget TV series. The same applies to your marketing campaign, social media posts, news segments, and pretty much anything you point a camera at and want to share.

A modern audience may find it difficult to watch content from 30 or more years ago when the visual effects are so dated; it's hard to suspend disbelief long enough to engage with the story.

Given the bar can be set so high for the video you watch, are you setting your standard high enough for the video content you produce?

Perhaps most crucially, it's important to understand that some of the methods used here to help augment video are the same ones used in VR/AR headsets and robotics. These two fields have the potential to disrupt every industry, so even if you're not producing creative content, familiarity with these topics could be the key to unlocking the value in the video you capture.

Let's have a look at a brief technological overview of how we go from images to images + graphics good enough to show to the world.

A small aside about Generative AI

In late 2024, your thoughts may turn to generative AI when it comes to CGI. Perhaps you can imagine giving an AI a video, a prompt, and a few clicks later produce content with graphics. While that could very well be the future of VFX, it still has some way to go to achieve the control and quality of current state-of-the-art techniques.

At the time of writing, any AI-produced content out there right now is very experimental, and highly unlikely to be fit enough for professional use yet. Any demos you have seen claiming to be so, have either been very constrained use cases, or used in conjunction with existing VFX tools.

This is not to detract from its incredible potential though; it has the capacity to radically transform all video processing in the next decade! However, for now, let's stick to how to do things professionally, and if the AI tech is ready I can write an article on it then.

Technological Overview

At a high level, the process of creating CGI is:

Finding the position and characteristics of the camera you are filming with
Creating a virtual world with a virtual clone of your real-world camera
Placing objects into the virtual world
Combining the video from the virtual camera with the one from the real world

Every production company (or individual) will have their own software stack to do the above, but the general principles stay the same.

Let's look into each of these steps, while also implementing them using Blender, a free VFX tool. I'm skipping a lot of the finer details in this article for brevity, but feel free to download Blender and give it a go yourself!

1: Finding the Camera's Position and Characteristics

Understanding the precise position of the camera is the cornerstone of integrating computer graphics into real-world footage. Computer vision algorithms typically examine video frames and use things it can consistently see over time to "anchor" itself. How these "things" (aka features) move in relation to the camera are used to determine the camera's position over time.

Feature Points Tracking features is harder than it looks! The squares represent features that were detected in the frame, and the red and blue lines streaking from them represent where that feature has moved over time. Notice that the computer vision algorithm wasn't able to track features easily in some environments, but can in others. 1 doesn't have enough features tracked. 2 does not have enough features on the floor. 3 has a good spread of tracked features, this will work well! Without enough features in the right areas, the camera's position will be estimated poorly.

Features can include persistent, static objects (e.g. a table, chair, painting...) or special markers you place in view (e.g. a QR code). Sometimes there is specialized equipment you can attach to a camera to detect them faster and more reliably - this is what I worked on while at Zeiss! This equipment is most frequently used in CGI for live content, where you can't give the usual software the time to infer position. The topic of inferring camera movement is known as camera tracking.

Let's use this video since it produced a good track of the camera. In a realistic scenario where you can't just switch what you're filming, you would add markers to make sure it tracks. I don't want to start vandalizing my office, so selected the right video instead.

Different computer vision algorithms will be used depending on various factors such as:

available sensors
- e.g. an iPhone camera may have lidar scanner (a sensor used to measure depth), a movie camera may not
- not all have to be visual either, sometimes accelerometers can be used to help track movement
time constraints
- live TV has to do this in real-time, but movies can take months
filming environment
- poor lighting and reflective surfaces can be troublesome
precision needed
- can the accuracy be off by millimeters or centimeters?

Generally, if you're providing real-time camera tracking, say for live sports, it will be inferior to the tracking accuracy in movies that can afford to take days to perform these computations. If you're eagle-eyed, on rare occasions you may even notice some graphics "sliding" around when the camera moves during a live broadcast. These downsides can be mitigated by constraining the camera movements drastically, and making the environment easier to process with clearer visual anchor points and lighting.

To be able to replicate a real-world camera in the virtual world, you need more than its position. You need to know characteristics such as its focus, aperture, zoom, and any distortion produced by the lens to name but a few. These can be estimated well enough with software after filming to produce visual effects if you have the time to spare. In real-time, all of this data is captured on set via dedicated equipment. Some are streamed live (e.g. focus, aperture, and zoom) and others through careful preparation before you shoot (e.g. lens distortion).

This plus the many other technical requirements of rigging the filming environment (a lot of which have nothing to do with CGI) can mean it takes many people at least several days to set up a shoot.

2: Crafting a Virtual Camera and Environment

Once the camera's position and characteristics are accurately tracked, the next step is to create a virtual world that we can mix into our real-world setting. The tool to create a virtual environment is called a renderer (which for us is Blender), and sometimes, it is the same renderer used to create your favorite video games.

Just like a video game, you use the renderer to create a virtual camera to look into the virtual world. This virtual camera can zoom, pan, tilt, focus and do all the things a real camera can. However, instead of controlling the camera with your game controller, you're now controlling it with a real-world camera.

Here you are seeing the virtual camera copy the movements of the real-world camera that shot the footage of the keyboard

So when the real-world camera moves left, your virtual camera moves left. When the real-world camera zooms in, the virtual camera zooms in, and so on.

3: Placing Objects in the Virtual World

Now comes the fun part, building the virtual world full of monsters, magic, and scenery. An artist would carefully create and animate these 3D models that you can simply drop into the world.

Here I used SketchFab (I have no affiliation) to download a suitably Swedish model of a viking by user alexbreeze111 (thanks Alex!). I used the Sketchfab plugin for simplicity

To make placement easier, and the merging of the virtual and real-world to be consistent, it's important there is some alignment taking place. For example, the virtual floor and the real floor have to be the same distance away from the virtual camera and real camera respectively. So if you place a monster in the virtual world, you know where it should be in the real world, and your actor can run away from it realistically.

Floor Alignment In this particular example, we align the floor with the keyboard keys. This will give the illusion of the viking standing on the keys. If they were not aligned, he would be floating or would give you a headache looking at him for lack of a better description

4: Merging Real and Virtual Worlds

The final step is to combine the views from the virtual and real-world cameras. This involves rendering the virtual scene from the perspective of the virtual camera and overlaying it onto the real-world footage. The more precise the camera tracking we did earlier, the better the illusion that both worlds coexist seamlessly.

The virtual world can be commonly integrated by replacing a "green screen" in the real-world footage, but there are other techniques. All will be subject to time available, the physical constraints of the shot, where you want the graphics to be, director preference, and budget.

In our keyboard-viking example, we are overlaying our graphics in the foreground with no occlusions, so we can simply display the virtual world over the real one.

The viking is rendered in a basic way; more can be done to improve the look! For lack of time and compute power, I've avoided adding shadows and haven't chosen the highest quality rendering techniques. If you need higher quality then you should definitely take the time to dig into the finer details!

This combined video will then typically go through some further checks and processing before making it to the viewer. What the further steps are will be heavily dependent on if it is a real-time broadcast or not, but regardless, at this stage of the pipeline the CGI is complete.

How knowing this helps you

Many industries outside of media use video, and my hope is some of the ideas presented here help demystify CGI and computer vision a bit, and to think of it as something you can integrate into your own work regardless of industry. Just to be clear, this doesn't mean integrating Blender into your technology stack - you can create your own tailored software to do similar things!

The same steps used to produce movies are some of the same steps used to advance cutting-edge technologies both in augmented reality and in robotics. Self-driving cars, robotic humanoids, augmented reality headsets/glasses, automated farming, and delivery drones are only some of the life-changing areas that we will witness in our lifetime. All of which need computer vision to navigate the world, and some form of CGI to share their vision of the world with us humans.

So while not everyone is making AAA movies and streaming sports, implementing some of the techniques used by the best in the business for your own video projects could be hugely worthwhile.

Whether that's creating new video production pipelines for your next marketing campaign, or designing the next self-navigating drone - talk to us at Sigma Connectivity to see how we can help you realize your dream.