Jump directly to the content

MICROSOFT has released new AI that can make creepy videos of people using just one photo - but won't release tool over impersonation fears.

The technology can create synchronised animated clips of a person talking or singing with a single snap of their face and an audio track.

The new AI tech can animate a single image into a realistic video with audio synching
4
The new AI tech can animate a single image into a realistic video with audio synchingCredit: ARS Technica
Microsoft has refused to release the codes over fears of impersonation
4
Microsoft has refused to release the codes over fears of impersonationCredit: ARS Technica
A number of competitors are working on similar tech
4
A number of competitors are working on similar techCredit: ARS Technica
Microsoft has denied it is looking to enhance deepfake technology
4
Microsoft has denied it is looking to enhance deepfake technologyCredit: ARS Technica

The computer giant's Research Asia team unveiled the VASA-1 model this week and say in future it could even power virtual avatars that appear to say whatever the creator wants.

"It paves the way for real-time engagements with lifelike avatars that emulate human conversational behaviours," says an

VASA - short for Visual Affective Skills Animator - can analyse a static image alongside audio to generate a realistic video with lip syncing, facial expressions and head movements.

It can't, however, clone or simulate voices like other Microsoft research.

Read more technology news

The company - co-founded by billionaire Bill Gates - claims the model is a significant improvement on previous speech animation methods in terms of realism, expressiveness and efficiency.

In February, an AI model called EMO: Emote Portrait Alive from Alibaba's Institute for Intelligent Computing research group used a similar approach to VASA-1 called Audio2Video.

Microsoft researchers trained their tech on the VoxCeleb2 dataset created in 2018 by a team from the University of Oxford.

That dataset claims to hold over a million "utterances" from 6,112 celebrities taken from videos uploaded to YouTube.

VASA-1 can reportedly generate videos at a resolution and frame rate that would not look out of place if used in realtime applications like video conferencing.

A research page released as part of the launch showcases the tool in use, with people singing and speaking, as well as showing how the model can be controlled.

YouTuber finds her AI clone on the internet begging for a Chinese husband…& promising to do all the cooking and cleaning

Mona Lisa is even seen rapping.

The researchers are adamant that their intention with the tool is not to enhance deepfaking.

The site reads: "We are exploring visual effective skill generation for virtual, interactive characters, NOT impersonating any person in the real world.

"This is only a research demonstration and there's no product or API release plan."

The researchers are instead touting the potential for it to be used in education and even to provide companionship.

They are, however, refusing to release the code that powers the model.

Microsoft is not the only team developing similar technology, with increased realism and availability likely only a matter of time.

Topics