MediaMinds
Text-to-video pipeline that turns articles into fully rendered 90-second videos in under 3 minutes. FlanT5 + Stable Diffusion + Google TTS.
Final-year capstone. Feed it a blog post or news article, get a narrated video back in under 3 minutes.
The pipeline: Flan-T5 for summarization, Selenium for context scraping, Stable Diffusion 1.5 for frame-by-frame visual generation, Google TTS plus a fine-tuned RVC voice model for narration. Mutagen handles metadata, FFMPEG stitches everything together with frame-level precision.
Built a React frontend where users could upload text, preview generated frames, and tweak before export. The whole thing was designed for solo journalists, small media teams, and anyone who needs video content but doesn’t have an editing team.