Technologies Involved:
MACHINE LEARNING
Area Of Work: Web Development
Project Description

An AI-first media startup experimenting in synthetic content creation partnered with Oodles to build a minimal video generation platform. The client aimed to replicate talking-head videos using deepfake technology, driven by synthetic audio from AWS Polly. They sought a backend MVP to run experiments—without UI—focusing on real-time rendering precision and infrastructure readiness.

Scope Of Work

The project aimed to develop an experimental video rendering pipeline using deepfake models synced with AWS Polly speechmarks. The client required a backend MVP with the help of Oodles that processes a source video, aligns it with generated audio, and produces a lifelike talking-head output. Core areas of work included audio-visual synchronization, face animation, infrastructure setup, and system automation.

Our Solution

To help the client achieve their goal, a modular cloud-ready backend was architected, focused entirely on processing speed and deepfake accuracy. 

Key Features Implemented:

  • Facial Landmark Detection & Motion Transfer: Used pre-trained models (based on First Order Motion Model) to animate facial expressions from a static source video.
  • Speechmark-Driven Syncing: Integrated AWS Polly to convert text to speech, and used its speechmarks to time mouth and facial movement.
  • Video Processing Pipeline: Built using OpenCV and FFmpeg for frame-by-frame video generation and stitching.
  • Cloud-Based Execution: Deployed on AWS EC2 with input/output handling via Flask endpoints, enabling remote API access.
  • No-UI Architecture: Built intentionally without front-end layers, simplifying integration for ML experiments and rapid iterations.

Related Projects

aiShare Your Requirements