Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarification needed on discrepancy between Figure 3 of the paper and the actual dataset clip durations. #13

Open
hongluzhou opened this issue Jul 24, 2024 · 0 comments

Comments

@hongluzhou
Copy link

Thank you for sharing the code and data!
If I understand Figure 3 (from Section 3.2) correctly, it shows that there are over 50k clips with a duration longer than 180 seconds. However, when I checked 'miradata_v1_330k.csv', it seems there are only 35k clips exceeding 180 seconds. I'm confused by the discrepancy. Am I misunderstanding Figure 3?

df = pd.read_csv('miradata_v1_330k.csv', encoding='utf-8')
print(len(df))
# 330313 will be printed

filtered_df = df[df['seconds'] > 180]
print(len(filtered_df))
# 35548 will be printed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant