“Sony Pictures Technologies has unveiled its latest developments in real-time game engine technology with this new proof-of-concept project…. Its “cameraless” virtual production style… intends to allow developers to use this real-time game engine to produce a scene live on a motion capture set.”
Jason Reitman, who wrote and directed the two-minute scene in one day, says:
“I love filmmaking in real places with real actors. So for me, this is not a substitute. If I were to make Juno again today, I would make Juno exactly the same way. What I see here, what thrills me, is if I wanted to make a movie like Juno that took place in ancient Rome, I wouldn’t be able to do that because it would cost $200 million to make Juno. And that’s not cost effective. There’s no studio who would sign up for that. You can make Ghostbusters for a lot of money, but you can’t make an independent film in an unexpected location. You can’t make an independent film underwater, on the moon or, you know, a thousand years ago or into the future, and what thrills me about it is the possibility of independent filmmakers who want to tell their kind of stories, but in environments that they don’t have access to with characters that they don’t have access to, and the possibility of getting a whole new wave of stories that you could read in a book, but you would never actually get in a film.”
My take: While I agree with Jason Reitman that this technology is promising, I think their finished scene is underwhelming. It’s just not believable. For instance, the folks on the sidewalks are obviously from a video game. The traffic is not real world either. And the actor is not human; he’s a marshmallow! However, this might be where superhero comic book movies are going: totally computer-generated, with the faces of the stars composited onto the quasi-lifelike animation. (My nightmare situation: those faces and voices are AI generated from scans and recordings!)
The tool is a free AI de-noising and re-reverb plugin called GOYO by South Korea’s Supertone AI.
Michael begins:
“I’ve used many of the sort of more premium and expensive dialogue, restoration and denoising softwares, and those are very good, but I haven’t come across a free tool that even comes close to what is offered by those. So I was really curious, downloaded it, tried it in my DAW and video editor, and was just completely shocked by the results.”
There are three dials: Ambience, Voice and Voice Reverb. You can solo, mute, decrease or increase each band. Simple and powerful!
His expert tip:
“My favourite way to use these sorts of tools is to dial them in and then print the results. So I would control this to the amount I’d like. I would export that as a new WAV file, take a listen and then work with that so that I know that the next time I open up the session, it’s going to be exactly the same.”
The How to Use page from Supertone is quite sparse so Michael’s examples are great.
My take: the perennial joke on set has always been, “We’ll fix it in post.” Well, now that’s possible for sound! I’ve used this on a short and can attest to its ease of use and incredible results. I concur with Michael that it’s best to print out each voice track as a WAV file and re-synchronize it to the timeline because I found that either the effect did not persist between sessions or the values had reset to zero or the effect was present but the numbers displayed as zero. My other tip is to only use the graphical user interface (and do not use the Inspector) as this seemed to work best. After all, this is a free beta!
Netflix researchers have described an experimental way to create more accurate green screen mattes. They propose lighting subjects with magenta light in front of a green background. Devin says:
“The technique is clever in that by making the foreground only red/blue and the background only green, it simplifies the process of separating the two. A regular camera that would normally capture those colors instead captures red, blue and alpha. This makes the resulting mattes extremely accurate, lacking the artifacts that come from having to separate a full-spectrum input from a limited-spectrum key background.”
Once the mattes are created, green information needs to be added back to the subjects. The solution? AI. It learns how to do this task more accurately than a simple green filter:
Each original sound was on a roll of 35mm magnetic acetate film; these were transferred to audio tape in 1990. Craig explains what happened next:
“I got the SSE tapes from the USC Archive in 2016. It was immediately clear that these tapes had a big problem. They were recorded onto used Ampex tape from the 1980s. Tape manufacturers changed their formulations in the early ’80s, and it turned out these new tapes were very unstable. They started to display what became known as Sticky Shed Syndrome. (Google it.) When this happens, the glue that binds the magnetic oxide to the plastic base becomes sticky, and separates. This makes the tapes virtually unplayable. Fortunately, there’s a temporary fix. Tapes can be baked for several hours at a low temperature in an oven. So that’s what I did. Each tape was baked at 150ºF for four hours, then cooled for four hours. This made the tapes stable enough to transfer using my Nagra 4.2 full track recorder.”
By the way, here are the movies listed in the compilation above:
0:15, 1:03 – The Venture Bros. (2004, 2008)
0:21 Aeon Flux (2005)
0:27 Star Wars I: The Phantom Menace (1999)
0:33 Team America: World Police (2004)
0:39 Star Wars IV: A New Hope (1977)
0:46 Spaceballs (1987)
0:54 Lethal Weapon 4 (1998)
1:09 Hellboy (2004)
1:18 Star Wars VI: Return of the Jedi (1983)
1:26 The Animatrix (2003)
1:33 Sin City (2005)
1:39 Batman Returns (1992)
1:46 Lord of the Rings: The Return of the King (2003)
1:52 Howard the Duck (1986)
1:59 Family Guy episode “North by North Quahog” (2005)
2:06 Raiders of the Lost Ark (1981)
2:14 Star Wars Holiday Special (1978)
2:22 King Kong (2005)
2:29 Toy Story (1995)
2:37 Indiana Jones and the Temple of Doom (1984)
2:43 Wallace and Gromit in Curse of the WereRabbit (2005)
2:51 Angel episode “The Cautionary Tale of Numero Cinco” (2003)
2:57, 3:16 Kill Bill, Vol. 1 (2003)
3:04 Lord of the Rings: The Two Towers (2002)
3:11 Angel episode “A New World” (2001)
3:23 Drawn Together (2004)
And, finally, here’s the original in context, in Distant Drums (1951):
My take: the original meme! I’ve used it too, in a pitch video of all things!
“The new Relight FX lets you add virtual light sourcesinto a scene to creatively adjust environmental lighting, fill dark shadows or change the mood. Light sources can be directional to cast a broad light, a point source, or a spotlight and be adjusted for surface softness and specularity control.“
My take: wow! This looks like so much fun. I can see using Relight instead of a power window to punch up illumination on the subject, drawing the eye exactly where you want it to go. This tool brings new meaning to the phrase, “We’ll fix it in Post!”
The fountain of youth is a spring that is said to restore the youth of anyone who drinks or bathes in its waters. This idea has been mentioned in many different cultures throughout history, often as a symbol of eternal youth and rejuvenation. In some stories, the fountain is guarded by a powerful being, such as a nymph or a fairy, and must be sought out by brave adventurers. Despite many people searching for the fountain throughout history, it has never been found and is generally considered to be a mythical concept.
“To make an age-altering AI tool that was ready for the demands of Hollywood and flexible enough to work on moving footage or shots where an actor isn’t always looking directly at the camera, Disney’s researchers, as detailed in a recently published paper, first created a database of thousands of randomly generated synthetic faces. Existing machine learning aging tools were then used to age and de-age these thousands of non-existent test subjects, and those results were then used to train a new neural network called FRAN (face re-aging network). When FRAN is fed an input headshot, instead of generating an altered headshot, it predicts what parts of the face would be altered by age, such as the addition or removal of wrinkles, and those results are then layered over the original face as an extra channel of added visual information. This approach accurately preserves the performer’s appearance and identity, even when their head is moving, when their face is looking around, or when the lighting conditions in a shot change over time. It also allows the AI generated changes to be adjusted and tweaked by an artist, which is an important part of VFX work: making the alterations perfectly blend back into a shot so the changes are invisible to an audience.”
At five seconds per frame, FRAN can age or de-age one minute of footage in two hours.
Shot and projected at 16 frames per second, this footage has had its original frame rate restored, stabilized, upscaled to 240 fps at 4K, colourized and the faces enhanced with AI and finally output at 60 fps.
Dennis details his process in the first four and a half minutes of the film and categorically states, “This is enhanced material and is not historically accurate.”
Nevertheless, the films are a fantastic view into the past. Travel back in time to France, England and Egypt, among other countries. The motion smoothing does impart a different feeling to the footage than the jerky black and white aesthetic we normally associate with old newsreels.
My take: for me, the best shot, at 13:44, is “Panorama of the Golden Horn, Turkey, Istanbul” because it’s one of the few shots that is truly “cinematic” imho. All the other shots are filmed from a tripod and therefore static. This shot is also on a tripod but because we’re on a boat the effect is to dolly to the right, resulting in magical movement with very pleasing foreground, middle ground and background action.
“Why is STS (speech to speech) different from TTS (text to speech)?
The difference between the two is significant. A few important limitations text to speech has:
In most cases, TTS provides non-natural, robotic emotions. AI doesn’t know where to take emotions from, so it tries to generate them based on the text alone.Very limited control over emotions. Some TTS can make the converted voice sound sad or excited using text annotation. But it is hard to manually encode intricacies of human acting using these annotations alone.
Words only. TTS are based on dictionaries. Unknown words and abbreviations pose a significant problem. Natural speech contains lots of non-verbal content as well. TTS struggles to render that.
Most TTS systems face challenges with low-resource languages due to higher data requirements.
The Respeecher voice cloning system works solely in the acoustic domain. We convey all the emotions and sounds of the source speaker while converting their timbre and other subtle variations into the target speaker.“
They even have a program for Small Creators and will accept pitches from interesting projects.
Here’s a glimpse of their online interface:
My take: well, that’s it. Along with deep fakes, now you can’t trust anything you hear either. I guess that leaves “real life” as the one thing you can trust — most of the time, that is. Maybe we are living in a simulation after all….
“Netflix has just approved the Sony FX3 to be used for its 4K Netflix Originals. This approval is a result of the latest Firmware 2.0 that constitutes a major upgrade to the FX3 capabilities regarding cinematography and workflow.”
“Not only does the camera need the ability to record in 4K, but it also has to have a bit depth of 10-bit or higher, a data rate with a minimum of 240Mbps at 24FPS, a screen-referred color space, a scene-referred transfer function, and a timecode written as metadata, and it has to be capable of jamming to an external source. And this is just the start of the list, as ergonomics, durability, and usability also come into play.”
And why you should care:
“Why? Standards. Not in the biblical sense, but in manufacturing. Most camera brands (save for maybe Sony) aren’t building their next camera with a specific exhibition in mind. Codecs are all over the place, not all sensors are the same, and sometimes you even have to worry about overheating. Those kinds of issues on a film set can break your film. So if an exhibitor sets some standards for camera manufacturers, we’re inclined to support it, whether or not we’re shooting for Netflix.”