youtube-dl/youtube_dl/extractor/subtitles.py

from .common import InfoExtractor

from ..utils import (
    compat_str,
    ExtractorError,
)


class SubtitlesInfoExtractor(InfoExtractor):
    @property
    def _have_to_download_any_subtitles(self):
        return any([self._downloader.params.get('writesubtitles', False),
                    self._downloader.params.get('writeautomaticsub')])

    def _list_available_subtitles(self, video_id, webpage):
        """ outputs the available subtitles for the video """
        sub_lang_list = self._get_available_subtitles(video_id, webpage)
        auto_captions_list = self._get_available_automatic_caption(video_id, webpage)
        sub_lang = ",".join(list(sub_lang_list.keys()))
        self.to_screen(u'%s: Available subtitles for video: %s' %
                       (video_id, sub_lang))
        auto_lang = ",".join(auto_captions_list.keys())
        self.to_screen(u'%s: Available automatic captions for video: %s' %
                       (video_id, auto_lang))

    def extract_subtitles(self, video_id, webpage):
        """
        returns {sub_lang: sub} ,{} if subtitles not found or None if the
        subtitles aren't requested.
        """
        if not self._have_to_download_any_subtitles:
            return None
        available_subs_list = {}
        if self._downloader.params.get('writeautomaticsub', False):
            available_subs_list.update(self._get_available_automatic_caption(video_id, webpage))
        if self._downloader.params.get('writesubtitles', False):
            available_subs_list.update(self._get_available_subtitles(video_id, webpage))

        if not available_subs_list:  # error, it didn't get the available subtitles
            return {}
        if self._downloader.params.get('allsubtitles', False):
            sub_lang_list = available_subs_list
        else:
            if self._downloader.params.get('subtitleslangs', False):
                requested_langs = self._downloader.params.get('subtitleslangs')
            elif 'en' in available_subs_list:
                requested_langs = ['en']
            else:
                requested_langs = [list(available_subs_list.keys())[0]]

            sub_lang_list = {}
            for sub_lang in requested_langs:
                if not sub_lang in available_subs_list:
                    self._downloader.report_warning(u'no closed captions found in the specified language "%s"' % sub_lang)
                    continue
                sub_lang_list[sub_lang] = available_subs_list[sub_lang]

        subtitles = {}
        for sub_lang, url in sub_lang_list.items():
            subtitle = self._request_subtitle_url(sub_lang, url)
            if subtitle:
                subtitles[sub_lang] = subtitle
        return subtitles

    def _download_subtitle_url(self, sub_lang, url):
        return self._download_webpage(url, None, note=False)

    def _request_subtitle_url(self, sub_lang, url):
        """ makes the http request for the subtitle """
        try:
            sub = self._download_subtitle_url(sub_lang, url)
        except ExtractorError as err:
            self._downloader.report_warning(u'unable to download video subtitles for %s: %s' % (sub_lang, compat_str(err)))
            return
        if not sub:
            self._downloader.report_warning(u'Did not fetch video subtitles')
            return
        return sub

    def _get_available_subtitles(self, video_id, webpage):
        """
        returns {sub_lang: url} or {} if not available
        Must be redefined by the subclasses
        """

        # By default, allow implementations to simply pass in the result
        assert isinstance(webpage, dict), \
            '_get_available_subtitles not implemented'
        return webpage

    def _get_available_automatic_caption(self, video_id, webpage):
        """
        returns {sub_lang: url} or {} if not available
        Must be redefined by the subclasses that support automatic captions,
        otherwise it will return {}
        """
        self._downloader.report_warning(u'Automatic Captions not supported by this server')
        return {}
[dailymotion] Added support for subtitles + new InfoExtractor for generic subtitle download. The idea is that all subtitle downloaders must descend from SubtitlesIE and implement only three basic methods to achieve the complete subtitle download functionality. This will allow to reduce the code in YoutubeIE once it is rewritten. 2013-08-08 00:59:11 +08:00			`from .common import InfoExtractor`

			`from ..utils import (`
			`compat_str,`
[subtitles] Use self._download_webpage for extracting the subtitles It raises ExtractorError for the same exceptions we have to catch. 2013-09-11 22:24:47 +08:00			`ExtractorError,`
[dailymotion] Added support for subtitles + new InfoExtractor for generic subtitle download. The idea is that all subtitle downloaders must descend from SubtitlesIE and implement only three basic methods to achieve the complete subtitle download functionality. This will allow to reduce the code in YoutubeIE once it is rewritten. 2013-08-08 00:59:11 +08:00			`)`


[subtitles] rename SubitlesIE to SubtitlesInfoExtractor Otherwise it can be automatically detected as a IE ready for use. 2013-09-11 21:51:04 +08:00			`class SubtitlesInfoExtractor(InfoExtractor):`
Check for both automatic captions and subtitles with options `--write-sub` and `--write-auto-sub` (fixes #1224) 2013-09-12 17:15:25 +08:00			`@property`
			`def _have_to_download_any_subtitles(self):`
			`return any([self._downloader.params.get('writesubtitles', False),`
Now --all-sub is a modifier to --write-sub and --write-auto-sub (closes #1412) For keeping backwards compatibility --all-sub sets --write-sub if --write-auto-sub is not given 2013-09-14 17:14:40 +08:00			`self._downloader.params.get('writeautomaticsub')])`
[dailymotion] Added support for subtitles + new InfoExtractor for generic subtitle download. The idea is that all subtitle downloaders must descend from SubtitlesIE and implement only three basic methods to achieve the complete subtitle download functionality. This will allow to reduce the code in YoutubeIE once it is rewritten. 2013-08-08 00:59:11 +08:00
[subtitles] refactor to support websites with subtitle information the webpage. I added the parameter webpage, so now it's similar to the way automatic captions are handled. This is an improvement needed for websites like TED. 2013-11-03 01:01:05 +08:00			`def _list_available_subtitles(self, video_id, webpage):`
[subtitles] Improved docs + new class for servers who don't support auto-caption 2013-08-08 17:20:56 +08:00			`""" outputs the available subtitles for the video """`
[subtitles] refactor to support websites with subtitle information the webpage. I added the parameter webpage, so now it's similar to the way automatic captions are handled. This is an improvement needed for websites like TED. 2013-11-03 01:01:05 +08:00			`sub_lang_list = self._get_available_subtitles(video_id, webpage)`
[subtitles] Also list the available automatic captions languages with '--list-sub' 2013-09-12 01:17:30 +08:00			`auto_captions_list = self._get_available_automatic_caption(video_id, webpage)`
[dailymotion] Added support for subtitles + new InfoExtractor for generic subtitle download. The idea is that all subtitle downloaders must descend from SubtitlesIE and implement only three basic methods to achieve the complete subtitle download functionality. This will allow to reduce the code in YoutubeIE once it is rewritten. 2013-08-08 00:59:11 +08:00			`sub_lang = ",".join(list(sub_lang_list.keys()))`
[internal] Improved subtitle architecture + (update in youtube/dailymotion) The structure of subtitles was refined, you only need to implement one method that returns a dictionnary of the available subtitles (lang, url) to support all the subtitle options in a website. I updated the subtitle downloaders for youtube/dailymotion to show how it works. 2013-08-08 14:54:10 +08:00			`self.to_screen(u'%s: Available subtitles for video: %s' %`
			`(video_id, sub_lang))`
[subtitles] Also list the available automatic captions languages with '--list-sub' 2013-09-12 01:17:30 +08:00			`auto_lang = ",".join(auto_captions_list.keys())`
			`self.to_screen(u'%s: Available automatic captions for video: %s' %`
			`(video_id, auto_lang))`
[dailymotion] Added support for subtitles + new InfoExtractor for generic subtitle download. The idea is that all subtitle downloaders must descend from SubtitlesIE and implement only three basic methods to achieve the complete subtitle download functionality. This will allow to reduce the code in YoutubeIE once it is rewritten. 2013-08-08 00:59:11 +08:00
[subtitles] refactor to support websites with subtitle information the webpage. I added the parameter webpage, so now it's similar to the way automatic captions are handled. This is an improvement needed for websites like TED. 2013-11-03 01:01:05 +08:00			`def extract_subtitles(self, video_id, webpage):`
Check for both automatic captions and subtitles with options `--write-sub` and `--write-auto-sub` (fixes #1224) 2013-09-12 17:15:25 +08:00			`"""`
			`returns {sub_lang: sub} ,{} if subtitles not found or None if the`
			`subtitles aren't requested.`
			`"""`
			`if not self._have_to_download_any_subtitles:`
[youtube] Support automatic captions with original language different from English (fixes #1225) and download in multiple languages. 2013-09-12 01:02:01 +08:00			`return None`
Check for both automatic captions and subtitles with options `--write-sub` and `--write-auto-sub` (fixes #1224) 2013-09-12 17:15:25 +08:00			`available_subs_list = {}`
			`if self._downloader.params.get('writeautomaticsub', False):`
[subtitles] refactor to support websites with subtitle information the webpage. I added the parameter webpage, so now it's similar to the way automatic captions are handled. This is an improvement needed for websites like TED. 2013-11-03 01:01:05 +08:00			`available_subs_list.update(self._get_available_automatic_caption(video_id, webpage))`
Now --all-sub is a modifier to --write-sub and --write-auto-sub (closes #1412) For keeping backwards compatibility --all-sub sets --write-sub if --write-auto-sub is not given 2013-09-14 17:14:40 +08:00			`if self._downloader.params.get('writesubtitles', False):`
[subtitles] refactor to support websites with subtitle information the webpage. I added the parameter webpage, so now it's similar to the way automatic captions are handled. This is an improvement needed for websites like TED. 2013-11-03 01:01:05 +08:00			`available_subs_list.update(self._get_available_subtitles(video_id, webpage))`
[youtube] Support automatic captions with original language different from English (fixes #1225) and download in multiple languages. 2013-09-12 01:02:01 +08:00
[subtitles] fixed multiple subtitles language separated by comma after merge As mentioned in the pull request, I forgot to include this changes. https://github.com/rg3/youtube-dl/commit/aa6a10c44a8e2e86f709c5301f9ea6ac3f01f002 2013-09-06 22:26:22 +08:00			`if not available_subs_list: # error, it didn't get the available subtitles`
[dailymotion] Added support for subtitles + new InfoExtractor for generic subtitle download. The idea is that all subtitle downloaders must descend from SubtitlesIE and implement only three basic methods to achieve the complete subtitle download functionality. This will allow to reduce the code in YoutubeIE once it is rewritten. 2013-08-08 00:59:11 +08:00			`return {}`
[subtitles] fixed multiple subtitles language separated by comma after merge As mentioned in the pull request, I forgot to include this changes. https://github.com/rg3/youtube-dl/commit/aa6a10c44a8e2e86f709c5301f9ea6ac3f01f002 2013-09-06 22:26:22 +08:00			`if self._downloader.params.get('allsubtitles', False):`
			`sub_lang_list = available_subs_list`
			`else:`
[youtube] Support automatic captions with original language different from English (fixes #1225) and download in multiple languages. 2013-09-12 01:02:01 +08:00			`if self._downloader.params.get('subtitleslangs', False):`
			`requested_langs = self._downloader.params.get('subtitleslangs')`
			`elif 'en' in available_subs_list:`
			`requested_langs = ['en']`
			`else:`
			`requested_langs = [list(available_subs_list.keys())[0]]`
[subtitles] fixed multiple subtitles language separated by comma after merge As mentioned in the pull request, I forgot to include this changes. https://github.com/rg3/youtube-dl/commit/aa6a10c44a8e2e86f709c5301f9ea6ac3f01f002 2013-09-06 22:26:22 +08:00
[youtube] Support automatic captions with original language different from English (fixes #1225) and download in multiple languages. 2013-09-12 01:02:01 +08:00			`sub_lang_list = {}`
			`for sub_lang in requested_langs:`
			`if not sub_lang in available_subs_list:`
			`self._downloader.report_warning(u'no closed captions found in the specified language "%s"' % sub_lang)`
			`continue`
			`sub_lang_list[sub_lang] = available_subs_list[sub_lang]`
[internal] Improved subtitle architecture + (update in youtube/dailymotion) The structure of subtitles was refined, you only need to implement one method that returns a dictionnary of the available subtitles (lang, url) to support all the subtitle options in a website. I updated the subtitle downloaders for youtube/dailymotion to show how it works. 2013-08-08 14:54:10 +08:00
[dailymotion] Added support for subtitles + new InfoExtractor for generic subtitle download. The idea is that all subtitle downloaders must descend from SubtitlesIE and implement only three basic methods to achieve the complete subtitle download functionality. This will allow to reduce the code in YoutubeIE once it is rewritten. 2013-08-08 00:59:11 +08:00			`subtitles = {}`
[subtitles] fixed multiple subtitles language separated by comma after merge As mentioned in the pull request, I forgot to include this changes. https://github.com/rg3/youtube-dl/commit/aa6a10c44a8e2e86f709c5301f9ea6ac3f01f002 2013-09-06 22:26:22 +08:00			`for sub_lang, url in sub_lang_list.items():`
[internal] Improved subtitle architecture + (update in youtube/dailymotion) The structure of subtitles was refined, you only need to implement one method that returns a dictionnary of the available subtitles (lang, url) to support all the subtitle options in a website. I updated the subtitle downloaders for youtube/dailymotion to show how it works. 2013-08-08 14:54:10 +08:00			`subtitle = self._request_subtitle_url(sub_lang, url)`
[dailymotion] Added support for subtitles + new InfoExtractor for generic subtitle download. The idea is that all subtitle downloaders must descend from SubtitlesIE and implement only three basic methods to achieve the complete subtitle download functionality. This will allow to reduce the code in YoutubeIE once it is rewritten. 2013-08-08 00:59:11 +08:00			`if subtitle:`
			`subtitles[sub_lang] = subtitle`
			`return subtitles`

[blip.tv] Add support for subtitles (#2274) 2014-02-03 12:18:30 +08:00			`def _download_subtitle_url(self, sub_lang, url):`
			`return self._download_webpage(url, None, note=False)`

[internal] Improved subtitle architecture + (update in youtube/dailymotion) The structure of subtitles was refined, you only need to implement one method that returns a dictionnary of the available subtitles (lang, url) to support all the subtitle options in a website. I updated the subtitle downloaders for youtube/dailymotion to show how it works. 2013-08-08 14:54:10 +08:00			`def _request_subtitle_url(self, sub_lang, url):`
[subtitles] Improved docs + new class for servers who don't support auto-caption 2013-08-08 17:20:56 +08:00			`""" makes the http request for the subtitle """`
[dailymotion] Added support for subtitles + new InfoExtractor for generic subtitle download. The idea is that all subtitle downloaders must descend from SubtitlesIE and implement only three basic methods to achieve the complete subtitle download functionality. This will allow to reduce the code in YoutubeIE once it is rewritten. 2013-08-08 00:59:11 +08:00			`try:`
[subtittles] Check that the result is not empty 2014-02-04 17:24:17 +08:00			`sub = self._download_subtitle_url(sub_lang, url)`
[subtitles] Use self._download_webpage for extracting the subtitles It raises ExtractorError for the same exceptions we have to catch. 2013-09-11 22:24:47 +08:00			`except ExtractorError as err:`
[dailymotion] Added support for subtitles + new InfoExtractor for generic subtitle download. The idea is that all subtitle downloaders must descend from SubtitlesIE and implement only three basic methods to achieve the complete subtitle download functionality. This will allow to reduce the code in YoutubeIE once it is rewritten. 2013-08-08 00:59:11 +08:00			`self._downloader.report_warning(u'unable to download video subtitles for %s: %s' % (sub_lang, compat_str(err)))`
			`return`
			`if not sub:`
			`self._downloader.report_warning(u'Did not fetch video subtitles')`
			`return`
[subtittles] Check that the result is not empty 2014-02-04 17:24:17 +08:00			`return sub`
[dailymotion] Added support for subtitles + new InfoExtractor for generic subtitle download. The idea is that all subtitle downloaders must descend from SubtitlesIE and implement only three basic methods to achieve the complete subtitle download functionality. This will allow to reduce the code in YoutubeIE once it is rewritten. 2013-08-08 00:59:11 +08:00
[subtitles] refactor to support websites with subtitle information the webpage. I added the parameter webpage, so now it's similar to the way automatic captions are handled. This is an improvement needed for websites like TED. 2013-11-03 01:01:05 +08:00			`def _get_available_subtitles(self, video_id, webpage):`
[subtitles] Simplify the extraction of subtitles in subclasses and remove NoAutoSubtitlesInfoExtractor Subclasses just need to call the method extract_subtitles, which will call _extract_subtitles and _request_automatic_caption Now the default implementation of _request_automatic_caption returns {}. 2013-09-11 22:05:49 +08:00			`"""`
			`returns {sub_lang: url} or {} if not available`
			`Must be redefined by the subclasses`
			`"""`
[blip.tv] Add support for subtitles (#2274) 2014-02-03 12:18:30 +08:00
			`# By default, allow implementations to simply pass in the result`
			`assert isinstance(webpage, dict), \`
			`'_get_available_subtitles not implemented'`
			`return webpage`
[dailymotion] Added support for subtitles + new InfoExtractor for generic subtitle download. The idea is that all subtitle downloaders must descend from SubtitlesIE and implement only three basic methods to achieve the complete subtitle download functionality. This will allow to reduce the code in YoutubeIE once it is rewritten. 2013-08-08 00:59:11 +08:00
[youtube] Support automatic captions with original language different from English (fixes #1225) and download in multiple languages. 2013-09-12 01:02:01 +08:00			`def _get_available_automatic_caption(self, video_id, webpage):`
[subtitles] Simplify the extraction of subtitles in subclasses and remove NoAutoSubtitlesInfoExtractor Subclasses just need to call the method extract_subtitles, which will call _extract_subtitles and _request_automatic_caption Now the default implementation of _request_automatic_caption returns {}. 2013-09-11 22:05:49 +08:00			`"""`
[youtube] Support automatic captions with original language different from English (fixes #1225) and download in multiple languages. 2013-09-12 01:02:01 +08:00			`returns {sub_lang: url} or {} if not available`
[subtitles] Simplify the extraction of subtitles in subclasses and remove NoAutoSubtitlesInfoExtractor Subclasses just need to call the method extract_subtitles, which will call _extract_subtitles and _request_automatic_caption Now the default implementation of _request_automatic_caption returns {}. 2013-09-11 22:05:49 +08:00			`Must be redefined by the subclasses that support automatic captions,`
			`otherwise it will return {}`
			`"""`
			`self._downloader.report_warning(u'Automatic Captions not supported by this server')`
			`return {}`