Subtitle edit ocr not working

9/9/2023

We find nice courses from YouTube and wish to separate subtitles as the learning context for further study. For example, we extract the subtitles of a low-resolution video file, and then add them to a high-resolution version for getting a better visual experience. It’s up to you what you rather want to use.We extract subtitles from videos for different purposes. The other option left is ABBYY FineReader – which is much faster with pretty decent results. You can get some speed boost (at the cost of more accurate results) if you use Tesseract 3.02 or newer versions with “Original Tesseract only” instead of LSTM or both. SubtitleEdit is painfully slow with Tesseract and other methods aren’t really worth using. You can either use SubtitleEdit or ABBYY FineReader for OCR.

After done go though every taken picture and delete everything without subtitles on it. You probaly have to tweak it a bit for less false positives. VideoSubFinder let’s you find frames of a video with hardcoded subtitles. This is just a quick guide, because so far I always avoided it (and recommend the same if possible). If you want to fix the subtitles with one of my Subtitle-Pack scripts, srt input probably leads in better results. I personally recommend “srt”, because there’s no typeset or complex styling anyway. If everything seems okay for you, you can save your result. You can just put them on a replace list (keep in mind that is probably detects āōū as different characters): Within anime it can happen that sometimes translators use characters like āōū. This list is useful for certain special characters that doesn’t exist in your language or stuff that often got wrong detected. Like already said, the smarter you make the OCR the faster and better it gets. The global dictionary doesn’t know everything, so you can make it smarter. Like the name says, you can add “Unknown words” to a name list instead of the putting them in your dictionary list.Īlso like the name says. “Add to names/noise list (case sensitive)” The more you OCR and fill both three lists the smarter and faster the OCR gets. In my tests they seem to be the most accurate:ĭepending on your content you might want to “add names/noise list”, “add to user dictionary” or “add pair to OCR replace list”. I personally use this settings, but it’s totally up to you. Later you have to configure your OCR settings.

Just fire up Subtitle Edit and use on of the following “Import” options: The fact that both PGS aka SUP and VOB are already softsubbed makes it much easier than dealing with hardsubtitles. If you deal with hardcoded subtitles you also need VideoSubFinder and probably ABBYY FineReader. The easiest method is with Subtitle Edit. However, that means you can’t both just convert them into another format. The same advatage and disadvantage goes for hardsubs. The obvious advatage is that you can render to every resolution you want. The higher CPU usage is the disadvatage of them. SRT/ASS are render based, so it takes much more CPU usage for mathematical calculations. This is for perfomance reasons on DVD and Blu-ray players.

PGS/SUP/VOB are in fact “softsubs”, but they are picture based. This is useful if you have DVD/BD subtitles or hardcoded subtitles, but want to “convert” them to the softsub. OCR stands for Optical character recognition.

0 Comments

Subtitle edit ocr not working

Leave a Reply.

Author

Archives

Categories