Subtitle
Xinyu Zhang / April 2024 (41 Words, 1 Minutes)
Click here to view the PDF version Video Demo
This project proposes a deep learning-based solution for precise subtitle segmentation, addressing a significant gap in libre-software environments. By leveraging sentence transformers, attention-based neural text segmentation, and hierarchical LSTM models, the project aims to enhance subtitle segmentation for academic and media purposes, improving accessibility for users across diverse contexts. The model was fine-tuned on the MuST-Cinema dataset and adapted to handle multilingual and multi-format subtitle generation. The segmentation solution focuses on improving readability, timing accuracy, and content coherence, surpassing traditional proprietary tools like CapCut. The project also includes an automated pipeline for video-to-text conversion, segmenting the subtitles based on syntactic and semantic cues, and synchronizing them with the video to produce high-quality .srt
files.
In this project, I was responsible for video and subtitle preprocessing, which involved converting video files to text using Whisper for speech-to-text conversion. I then labeled the tokens by marking the positions of end-of-line (EOL)
and end-of-block (EOB)
breaks for effective subtitle segmentation. Additionally, I played a key role in evaluating the model’s performance, comparing our results with those from popular tools like CapCut, focusing on timing accuracy, readability, and content accuracy. This evaluation validated our model’s superior ability to synchronize subtitles with audio while ensuring a better viewer experience.