Rip vobsubs (and convert vobsub to srt)

Share your wisdom. Not for support questions!

Moderator: How-to Curator

Rip vobsubs (and convert vobsub to srt)

Postby julian67 » July 15th, 2011, 2:44 pm

Discuss this HowTo HERE

Here's how to use mencoder rip the vobsub subtitles from a DVD or disc image or video_ts folder, and then convert the vobsubs to text .srt if required.

You need the midentify script. You can get it from mplayer's source package TOOLS directory. I've also posted it at the bottom of this post. It should be in /usr/bin not /usr/local/bin and must be executable.

The first part of this howto deals with English subs. These are simplest for me and for writing the howto because, at least on region 1 or 2 English language movies, the English subs are always the first and have subtitle id 0. For other languages you'll have to check each movie/disk individually for your preferred language's sid (subtitle ID), or modify the script to use midentify and grep the language and hence the sid you want. I've also posted a script which will dump the vobsubs of every language.

Some discs have more than one sub for a language i.e. one regular subtitle and one special subtitle for deaf/hard of hearing viewers, and sometimes there are also erroneously listed duplicate subs which are empty/non-existent. Each will have an identical language description i.e. 'en - English' or 'es - Espanol'. That's why I rely on the numerical sid values, not the language names.

So to dump the eng vobsubs I use a script:

Code: Select all
#!/bin/bash
DISKNAME="$(lsdvd "$@" |grep Disc\ Title |awk -F ': ' '{print $2}')"_VOBSUBS
mkdir $DISKNAME
mencoder -dvd-device "$@" -ovc copy -o /dev/null \
-nosound -sid 0 -vobsuboutindex 0 -vobsubout $DISKNAME/0_en
exit 0


So for example to dump the sub for title 2 of my disc image MYDISC.iso I'd run
Code: Select all
$ dumpenglishvobsub.sh MYDISC.iso dvd://02


and the vobsubs will be dumped to a new folder, named according to the disc title used by the manufacturer. Usually that's the name of the movie but some studios are lazy and it may be a serial number or the unimaginative DVD_VIDEO.

If you want the vobsubs for every language:

Code: Select all
#!/bin/bash
# ripvobsubs.sh
# input must be <devicename> <dvd://#>
# device can be physical device, disk image, video_ts folder etc

# get info about dvd
midentify -dvd-device "$@" >dvdinfo

# set variable:number of subtitle tracks.
# write subtitle lang identifiers to file
NUMSUB=$[$(grep ID\_SID dvdinfo |tee >(cat>subs.txt) |wc -l) -1]

# set variable of dvd diskname_VOBSUBS ##
DISKNAME="$(lsdvd "$@" |grep Disc\ Title |awk -F ': ' '{print $2}')"_VOBSUBS

mkdir $DISKNAME
echo $NUMSUB

for SUBNUM in $(seq 0 $NUMSUB) ; do

# use mencoder to dump vobsubs. one mencoder pass per subtitle.
# runs at about 2000 fps 
mencoder -dvd-device "$@" -ovc copy -o /dev/null \
-nosound -sid $SUBNUM -vobsuboutindex 0 -vobsubout \
$DISKNAME/$(grep \_$SUBNUM\_ subs.txt |awk -F '=' '{print $2}')_$SUBNUM
done


Again this will create a folder according to the title name and dump all the vobsubs into it, naming each set according to their language and sid. This is surprisingly fast. I've seen some people recommend to use '-ovc raw' but that's not right. A copy is always faster than any kind of manipulation or extraction. When I compared it for myself I found that using '-ovc raw' means vobsub extraction will run about 5 or 6 times slower than with '-ovc copy'.

If you want to convert your vobsubs to srt it's easy and quite fast, though you'd perhaps not find this to be the case if you follow the subtitleripper/transcode docs or online tutorials which have some errors of omission due to not being updated in line with the applications. You need the package subtitleripper (get it from debian-multimedia) and a spellchecker such as ispell. If you're using a graphical environment then Gaupol makes for a nice spellchecker for subtitles.

Copy the ifo file for the title from your DVD/video_ts/disc image. It will be VTS_01_0.IFO or vts_01_0.ifo. If you're working with a disc image, not a physical disc, you can use fuseiso to mount the disc without needing root privileges.

So if your English vobsubs are en_0 (you have a file en_0.idx and a file en_0.sub) you can run

Code: Select all
vobsub2pgm -c 255,255,0,255 -i VTS_01_0.IFO -g 2  en_0 english


The '-g 2' is essential because it ensures the output is gzipped which the next tool pgm2txt now requires (earlier versions didn't expect the input to be gzipped and this new requirement is documented in exactly zero places as far as I can tell). This detail is absent from the docs and from every example I've ever seen. I cursed a lot of people.

Now run

Code: Select all
pgm2txt english


and you'll see gocr (optical character recognition) offer you unrecognised characters to identify. The characters composed of ### signs are the unrecognised ones. If you can't recognise any characters either then go back a step: delete all the output of vobsub2pgm and try it again with a slightly different -c option, so if -c 255,255,0,255 didn't produce useful results then this time try -c 255,0,255,255. Check the man page for more detail, but one or other of these two settings will probably work in most cases. Don't worry this is all easy and quick to do.

So run pgm2txt again and it should just take a couple of minutes and not too many prompts for input. Now you have a file called english.srtx and you can make an srt file

Code: Select all
srttool -s -w < english.srtx > moviename.srt


You should now spellcheck the moviename.srt because it will have some errors. On a console system ispell is decent enough but if you have a graphical desktop then a subtitle tool with built in spellchecker such as Gaupol makes a lot of sense. Probably the spellchecking will be the longest part of this process. When it's done you have an srt subtitle file ready to be merged into mkv or mp4 container or be used alongside a movie in avi container. The timings and spelling and grammar should be correct.

Most tutorials you'll see for this will use transcode's tccat and tcextract for the inital task of dumping the subs. I prefer mencoder because I'm more familiar with it and when I've compared them the initial extraction speed is identical. Overall mencoder is definitely faster because it doesn't require a further step (transcode method requires you to next run subtitle2vobsub).

So that's everything youi need to extract and convert vobsub subtitles from DVD type sources, all on the command line and no graphical environment or even xserver required (I usually do this via ssh to a headless PC). If you like to use a graphical tool to do the conversion from vobsub to srt then avidemux is fairly good. In the Avidemux menu bar go Tools> OCR (VobSub -> srt).


Here's the script midentify:
Code: Select all
#!/bin/sh
#
# This is a wrapper around the -identify functionality.
# It is supposed to escape the output properly, so it can be easily
# used in shellscripts by 'eval'ing the output of this script.
#
# Written by Tobias Diedrich <ranma+mplayer@tdiedrich.de>
# Licensed under GNU GPL.

if [ -z "$1" ]; then
        echo "Usage: midentify.sh <file> [<file> ...]"
        exit 1
fi

mplayer -vo null -ao null -frames 0 -identify "$@" 2>/dev/null |
        sed -ne '/^ID_/ {
                          s/[]()|&;<>`'"'"'\\!$" []/\\&/g;p
                        }'
User avatar
julian67
 
Posts: 249
Joined: February 9th, 2011, 12:59 pm

Return to HowTo

Who is online

Users browsing this forum: No registered users and 3 guests

x