Intel Labs creates photorealistic 3D VR from photos

Jacob Fox on PCGamesN suggests that new tech from Intel Labs could revolutionise VR gaming.

He describes:

“A new technique called Free View Synthesis. It allows you to take some source images from an environment (from a video recorded while walking through a forest, for example), and then reconstruct and render the environment depicted in these images in full ‘photorealistic’ 3D. You can then have a ‘target view’ (i.e. a virtual camera, or perspective like that of the player in a video game) travel through this environment freely, yielding new photorealistic views.”

David Heaney on Upload VR clarifies: “Researchers at Intel Labs have developed a system capable of digitally recreating a scene from a series of photos taken in it.

“Unlike with previous attempts, Intel’s method produces a sharp output. Even small details in the scene are legible, and there’s very little of the blur normally seen when too much of the output is crudely ‘hallucinated’ by a neural network.”

Read the full paper.

My take: this is fascinating! This could yield the visual version of 3D Audio.

Disney scientists perfect deep fakes

We propose an algorithm for “fully automatic neural face swapping in images and videos.

So begins a startling revelation by Disney Researchers Jacek NaruniecLeonhard HelmingerChristopher Schroers and Romann M. Weber in a paper delivered virtually at The 31st Eurographics Symposium on Rendering in London recently.

Here’s the abstract:

“In this paper, we propose an algorithm for fully automatic neural face swapping in images and videos. To the best of our knowledge, this is the first method capable of rendering photo-realistic and temporally coherent results at megapixel resolution. To this end, we introduce a progressively trained multi-way (comb network) and a light- and contrast-preserving blending method. We also show that while progressive training enables generation of high-resolution images, extending the architecture and training data beyond two people allows us to achieve higher fidelity in generated expressions. When compositing the generated expression onto the target face, we show how to adapt the blending strategy to preserve contrast and low-frequency lighting. Finally, we incorporate a refinement strategy into the face landmark stabilization algorithm to achieve temporal stability, which is crucial for working with high-resolution videos. We conduct an extensive ablation study to show the influence of our design choices on the quality of the swap and compare our work with popular state-of-the-art methods.”

Got that?

My advice: just watch the video and be prepared to be wowed.

My take: Deep fakes were concerning enough. However, this technology actually has production value. I envision a (very near) future where “substitute actors” (sub-actors?) are the ones who give the performances on set and then this Disney technology replaces their faces the those of the “stars” they represent. In fact, if I was an agent, I’d be looking for those subactors now so I could package the pair. A star who didn’t want to mingle with potentially COVID-19 carriers could send their doubles to any number of projects at the same time. All that would be left is to do a high resolution 3D scan and some ADR work. Of course — Jimmy Fallon already perfected this technique five years ago:

TikTok emerges as worthy Vine replacement

Joshua Eferighe posits on OZY that The Next Big Indie Filmmaker Might Be a TikToker.

Joshua’s key points:

  • “The social media platform is shaping the future of filmmaking.
  • Novice filmmakers are using the platform’s sophisticated editing tools to learn the trade and test their work.
  • Unlike Instagram, TikTok’s algorithm allows users without many followers to go viral, adding to its popularity.”

What is TikTok? The Chinese app claims to be “the leading destination for short-form mobile video. Our mission is to inspire creativity and bring joy.”

Why is TikTok valuable to filmmakers? The hashtag #cinematics with 3.7 billion views.

See these risks and this safety guide.

My take: Shorter is better! Remember Vine?

Some SmartTVs to become obsolete

Catie Keck reports in Gizmodo: Here’s Why Netflix Is Leaving Some Roku and Samsung Devices.

She says,

“Select Roku devices, as well as older Samsung or Vizio TVs, will soon lose support for Netflix beginning in December…. With respect to Roku devices in particular, the issue boils down to older devices running Windows Media DRM. Since 2010, Netflix has been using Microsoft PlayReady. Starting December 2, older devices that aren’t able to upgrade to PlayReady won’t be able to use the service.”

Netflix says,

“If you see an error that says: ‘Due to technical limitations, Netflix will no longer be available on this device after December 1st, 2019. Please visit netflix.com/compatibledevices for a list of available devices.’ It means that, due to technical limitations, your device will no longer be able to stream Netflix after the specified date. To continue streaming, you’ll need to switch to a compatible device prior to that date.”

Antonio Villas-Boas writes on Business Insider:

“This has surfaced one key weakness in Smart TVs — while the picture might still be good, the built-in computers that make these TVs ‘smart’ will become old and outdated, just like a regular computer or smartphone. That was never an issue on ‘dumb’ TVs that are purely screens without built-in computers to run apps and stream content over the internet.”

He concludes, “You should buy a streaming device like a Roku, Chromecast, Amazon Fire TV, or Apple TV instead of relying on your Smart TV’s smarts.”

My take: does this happen to cars as well?

The Internet turns 50!

Last Sunday, October 29, 2019, the Internet turned 50 years old.

We’ve grown from the 1970 topology:

to this in 2019:

internetmap072

Okay, here’s a real representation of the Internet.

What’s next? The Interplanetary Internet of course.

My take: It’s important to note that the World Wide Web is not the same thing as the Internet. (The Web wouldn’t be invented for another 20 years!) The Internet is the all-important backbone for the numerous networking protocols that traverse it, http(s) being only one.

Meet the world’s smallest stabilized camera

Insta360 has released the world’s smallest action camera, the GO. It is so small it’s potentially a choking hazard.

They call it the “20 gram steady cam.”

Here are some specs:

  • Standard, Interval Shooting, Timelapse, Hyperlapse, and Slow Motion modes
  • 8GB of on-board storage
  • iPhone or Android Connections
  •  IPX4 water-resistant
  • Charge Time: GO: approx. 20min, Charger Case: approx. 1hr
  • .Mp4 files exported via app at 1080@25fps; Timelapse and Hyperlapse at 1080@30fps;  Slow Motion: 1600×900@100fps and output at 1600×900@30fps

Some sample footage:

See some product reviews.

You can buy it now for $270 in Canada.

My take: this is too cool! My favourite features are the slow motion and the barrel roll you can add in post. This technology sparks lots of storytelling ideas!

Inside a Virtual Production

BBC Click has revealed glimpses of the virtual production techniques Jon Favreau harnessed before the “live action” Lion King was digitally animated.

The discussion of virtual production technology starts at 0:40. Details begin flowing about the Technicolor Virtual Production pipeline at 1:38.

Director Favreau explains further at 8:01 below:

My favourite line is: “We’d move the sun if we had to.”

Here’s Technicolor’s pitch for virtual production:

More here.

My take: Am I the only one that thinks it’s absurd for photo-realistic animals to talk and sing? I can buy the anthropomorphism in most animation, as the techniques they use are suitably abstracted, but this just looks too real. Maybe thought balloons?

AI Portraits can paint like Rembrandt

In the week that FaceApp went viral, Mauro Martino has updated AIportraits to convert your photos into fine art.

This web-based app uses an AI GAN trained on 54,000 fine art paintings to “paint” your portrait in a style it chooses.

Mauro explains:

“This is not a style transfer. With AI Portraits Ars anyone is able to use GAN models to generate a new painting, where facial lines are completely redesigned. The model decides for itself which style to use for the portrait. Details of the face and background contribute to direct the model towards a style. In style transfer, there is usually a strong alteration of colors, but the features of the photo remain unchanged. AI Portraits Ars creates new forms, beyond altering the style of an existing photo.”

Some samples:

He continues:

“You will not see smiles among the portraits. Portrait masters rarely paint smiling people because smiles and laughter were commonly associated with a more comic aspect of genre painting, and because the display of such an overt expression as smiling can seem to distort the face of the sitter. This inability of artificial intelligence to reproduce our smiles is teaching us something about the history of art. This and other biases that emerge in reproducing our photos with AI Portraits Ars are therefore an indirect exploration of the history of art and portraiture.”

My take: This is a lot of fun! I would love to be able to choose the “artist” though, rather than let the AI choose, based on the background. One thing that does NOT work is to feed it fine art; I tried the Mona Lisa and was terribly disappointed!

1000 episodes for BBC’s Click

This week the BBC celebrated the 1000th episode of their technology magazine show Click with an interactive issue.

Access the show and get prepared to click!

One of the pieces that caught my eye was an item in the Tech News section about interactive art, called Mechanical Masterpieces by artist Neil Mendoza.

The exhibit is a mashup of digitized high art and Rube Goldberg-esque analogue controls that let the participants prod and poke the paintings. Very playful! I’ve scoured the web to find some video. This is Neil’s version of American Gothic:

Getting ready for the weekend with another piece from Neil Mendoza’s Mechanical Masterpieces, part of #ToughArt2018. pittsburghkids.org/exhibits/tough-art

Posted by Children's Museum of Pittsburgh on Friday, September 28, 2018

And here is his version of The Laughing Cavalier:

Check out Neil’s latest installation/music video.

My take: I love Click and I love interactive storytelling. But I’m not sure the BBC’s experiment was entirely successful. What I thought was missing was an Index, a way to quickly jump around their show. For instance, it was tortuous trying to find this item in the Tech News section. Of course, Click is in love with their material and expects viewers to patiently lap up every frame, even as they click to choose different paths through the material. But it’s documentary/news content, not narrative fiction, and I found myself wanting to jump ahead or abandon threads. On the other hand, my expectations of a narrative audience looking for A-B interactive entertainment is that they truly are motivated to explore various linear paths through the story. And an Index would reveal too much of what’s up ahead. But I wonder if that’s just me, as a creator, speaking. Perhaps interactive content is relegated to the hypertext/website side of things, versus stories that swallow you up as they twist and turn on their way to revealing their narratives.

Coming soon: fix it in Post with text editing

Scientists working at Stanford University, the Max Planck Institute for Informatics, Princeton University and Adobe Research have developed a technique that synthesizes new video frames from an edited interview transcript.

In other words, soon we’ll be able to alter speech in video clips simply by typing in new words:

“Our method automatically annotates an input talking-head video with phonemes, visemes, 3D face pose and geometry, reflectance, expression and scene illumination per frame. To edit a video, the user has to only edit the transcript, and an optimization strategy then chooses segments of the input corpus as base material. The annotated parameters corresponding to the selected segments are seamlessly stitched together and used to produce an intermediate video representation in which the lower half of the face is rendered with a parametric face model. Finally, a recurrent video generation network transforms this representation to a photorealistic video that matches the edited transcript.”

Why do this?

“Our main application is text-based editing of talking-head video. We support moving and deleting phrases, and the more challenging task of adding new unspoken words. Our approach produces photo-realistic results with good audio to video alignment and a photo-realistic mouth interior including highly detailed teeth.”

Read the full research paper.

My take: Yes, this could be handy in the editing suite. But the potential for abuse is very concerning. The ease of creating Deep Fakes by simply typing new words means that we would never be able to trust any video again. No longer will a picture be worth a thousand words; rather, one word will be worth a thousand pixels.