Voice over using TTS

Using TTS we need to add a voice that explains the content that is displayed on the screen. Perhaps use a video to text model and then we use a TTS model to speak. 

Or the pipeline generates a Script that will be coming with the code generation. Hence the model can read the script fast enf to end the video with the voice in the back. FFMPEG should be able to do that.

*ideation phase*