Clarification needed on discrepancy between Figure 3 of the paper and the actual dataset clip durations. #13

hongluzhou · 2024-07-24T02:00:40Z

Thank you for sharing the code and data!
If I understand Figure 3 (from Section 3.2) correctly, it shows that there are over 50k clips with a duration longer than 180 seconds. However, when I checked 'miradata_v1_330k.csv', it seems there are only 35k clips exceeding 180 seconds. I'm confused by the discrepancy. Am I misunderstanding Figure 3?

df = pd.read_csv('miradata_v1_330k.csv', encoding='utf-8')
print(len(df))
# 330313 will be printed

filtered_df = df[df['seconds'] > 180]
print(len(filtered_df))
# 35548 will be printed

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clarification needed on discrepancy between Figure 3 of the paper and the actual dataset clip durations. #13

Clarification needed on discrepancy between Figure 3 of the paper and the actual dataset clip durations. #13

hongluzhou commented Jul 24, 2024

Clarification needed on discrepancy between Figure 3 of the paper and the actual dataset clip durations. #13

Clarification needed on discrepancy between Figure 3 of the paper and the actual dataset clip durations. #13

Comments

hongluzhou commented Jul 24, 2024