Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What model does this repo use? #1

Open
wong251440 opened this issue May 2, 2024 · 3 comments
Open

What model does this repo use? #1

wong251440 opened this issue May 2, 2024 · 3 comments
Labels
question Further information is requested

Comments

@wong251440
Copy link

wong251440 commented May 2, 2024

What model does this repo use to condense the original video transcript, and then create a short script??

@vishal-bluebash
Copy link
Contributor

vishal-bluebash commented May 2, 2024

We are using gpt-4-turbo model @wong251440 . Yes it created new shorter and better version of video. Imagine we have 2 hours long video and we want to make short video (best version)

@vishal-bluebash vishal-bluebash added the question Further information is requested label May 2, 2024
@wong251440
Copy link
Author

wong251440 commented May 2, 2024

We are using gpt-4-turbo model @wong251440 . Yes it created new shorter and better version of video. Imagine we have 2 hours long video and we want to make short video (best version)

Okay, I'm quite curious about this.
Isn't the output of GPT an abstractive summary? Shouldn't the sentences in the short transcript it generates be different from the original transcript?

Then how do you use the generated output to allign back to the original video timestamps and then synthesize the result video?

@prince-bluebash
Copy link
Contributor

Hi @wong251440

To extract transcription from video we use Whisper AI model and its one of the most advanced open source model for the task(SpeechToText).Its also provides us the exact timestamps of the utterance's from video.

We use python libraries moviepy and opencv for video editing using the processed transcript by gpt-4-turbo(llm).

We are using gpt-4-turbo model @wong251440 . Yes it created new shorter and better version of video. Imagine we have 2 hours long video and we want to make short video (best version)

Okay, I'm quite curious about this. Isn't the output of GPT an abstractive summary? Shouldn't the sentences in the short transcript it generates be different from the original transcript?

Then how do you use the generated output to allign back to the original video timestamps and then synthesize the result video?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants