1
0
mirror of https://github.com/l1ving/youtube-dl synced 2025-03-14 10:00:31 +08:00

Merge remote-tracking branch 'refs/remotes/rg3/master'

This commit is contained in:
Martín Cerdeira 2017-02-15 17:21:55 -03:00
commit 4ede16b29c
36 changed files with 815 additions and 289 deletions

View File

@ -6,8 +6,8 @@
---
### Make sure you are using the *latest* version: run `youtube-dl --version` and ensure your version is *2017.02.10*. If it's not read [this FAQ entry](https://github.com/rg3/youtube-dl/blob/master/README.md#how-do-i-update-youtube-dl) and update. Issues with outdated version will be rejected.
- [ ] I've **verified** and **I assure** that I'm running youtube-dl **2017.02.10**
### Make sure you are using the *latest* version: run `youtube-dl --version` and ensure your version is *2017.02.16*. If it's not read [this FAQ entry](https://github.com/rg3/youtube-dl/blob/master/README.md#how-do-i-update-youtube-dl) and update. Issues with outdated version will be rejected.
- [ ] I've **verified** and **I assure** that I'm running youtube-dl **2017.02.16**
### Before submitting an *issue* make sure you have:
- [ ] At least skimmed through [README](https://github.com/rg3/youtube-dl/blob/master/README.md) and **most notably** [FAQ](https://github.com/rg3/youtube-dl#faq) and [BUGS](https://github.com/rg3/youtube-dl#bugs) sections
@ -35,7 +35,7 @@ $ youtube-dl -v <your command line>
[debug] User config: []
[debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj']
[debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251
[debug] youtube-dl version 2017.02.10
[debug] youtube-dl version 2017.02.16
[debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2
[debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4
[debug] Proxy map: {}

View File

@ -8,7 +8,10 @@ python:
- "3.5"
- "3.6"
sudo: false
script: nosetests test --verbose
env:
- YTDL_TEST_SET=core
- YTDL_TEST_SET=download
script: ./devscripts/run_tests.sh
notifications:
email:
- filippo.valsorda@gmail.com

View File

@ -1,3 +1,58 @@
version 2017.02.16
Core
+ [utils] Add support for quoted string literals in --match-filter (#8050,
#12142, #12144)
Extractors
* [ceskatelevize] Lower priority for audio description sources (#12119)
* [amcnetworks] Fix extraction (#12127)
* [pinkbike] Fix uploader extraction (#12054)
+ [onetpl] Add support for businessinsider.com.pl and plejada.pl
+ [onetpl] Add support for onet.pl (#10507)
+ [onetmvp] Add shortcut extractor
+ [vodpl] Add support for vod.pl (#12122)
+ [pornhub] Extract video URL from tv platform site (#12007, #12129)
+ [ceskatelevize] Extract DASH formats (#12119, #12133)
version 2017.02.14
Core
* TypeError is fixed with Python 2.7.13 on Windows (#11540, #12085)
Extractor
* [zdf] Fix extraction (#12117)
* [xtube] Fix extraction for both kinds of video id (#12088)
* [xtube] Improve title extraction (#12088)
+ [lemonde] Fallback delegate extraction to generic extractor (#12115, #12116)
* [bellmedia] Allow video id longer than 6 characters (#12114)
+ [limelight] Add support for referer protected videos
* [disney] Improve extraction (#4975, #11000, #11882, #11936)
* [hotstar] Improve extraction (#12096)
* [einthusan] Fix extraction (#11416)
+ [aenetworks] Add support for lifetimemovieclub.com (#12097)
* [youtube] Fix parsing codecs (#12091)
version 2017.02.11
Core
+ [utils] Introduce get_elements_by_class and get_elements_by_attribute
utility functions
+ [extractor/common] Skip m3u8 manifests protected with Adobe Flash Access
Extractor
* [pluralsight:course] Fix extraction (#12075)
+ [bbc] Extract m3u8 formats with 320k audio
* [facebook] Relax video id matching (#11017, #12055, #12056)
+ [corus] Add support for Corus Entertainment sites (#12060, #9164)
+ [pluralsight] Detect blocked account error message (#12070)
+ [bloomberg] Add another video id pattern (#12062)
* [extractor/commonmistakes] Restrict URL regular expression (#12050)
+ [tvplayer] Add support for tvplayer.com
version 2017.02.10
Extractors

19
devscripts/run_tests.sh Executable file
View File

@ -0,0 +1,19 @@
#!/bin/bash
DOWNLOAD_TESTS="age_restriction|download|subtitles|write_annotations|iqiyi_sdk_interpreter"
test_set=""
case "$YTDL_TEST_SET" in
core)
test_set="-I test_($DOWNLOAD_TESTS)\.py"
;;
download)
test_set="-I test_(?!$DOWNLOAD_TESTS).+\.py"
;;
*)
break
;;
esac
nosetests test --verbose $test_set

View File

@ -169,6 +169,7 @@
- **ComedyCentralShortname**
- **ComedyCentralTV**
- **CondeNast**: Condé Nast media group: Allure, Architectural Digest, Ars Technica, Bon Appétit, Brides, Condé Nast, Condé Nast Traveler, Details, Epicurious, GQ, Glamour, Golf Digest, SELF, Teen Vogue, The New Yorker, Vanity Fair, Vogue, W Magazine, WIRED
- **Corus**
- **Coub**
- **Cracked**
- **Crackle**
@ -309,7 +310,6 @@
- **HellPorno**
- **Helsinki**: helsinki.fi
- **HentaiStigma**
- **HGTV**
- **hgtv.com:show**
- **HistoricFilms**
- **history:topic**: History.com Topic
@ -546,8 +546,10 @@
- **OktoberfestTV**
- **on.aol.com**
- **OnDemandKorea**
- **onet.pl**
- **onet.tv**
- **onet.tv:channel**
- **OnetMVP**
- **OnionStudios**
- **Ooyala**
- **OoyalaExternal**
@ -806,6 +808,7 @@
- **tvp**: Telewizja Polska
- **tvp:embed**: Telewizja Polska
- **tvp:series**
- **TVPlayer**
- **Tweakers**
- **twitch:chapter**
- **twitch:clips**
@ -899,6 +902,7 @@
- **vlive**
- **vlive:channel**
- **Vodlocker**
- **VODPl**
- **VODPlatform**
- **VoiceRepublic**
- **VoxMedia**

View File

@ -1,4 +1,5 @@
#!/usr/bin/env python
# coding: utf-8
from __future__ import unicode_literals
@ -540,10 +541,10 @@ class TestYoutubeDL(unittest.TestCase):
self.assertEqual(ydl._format_note({}), '')
assertRegexpMatches(self, ydl._format_note({
'vbr': 10,
}), '^\s*10k$')
}), r'^\s*10k$')
assertRegexpMatches(self, ydl._format_note({
'fps': 30,
}), '^30fps$')
}), r'^30fps$')
def test_postprocessors(self):
filename = 'post-processor-testfile.mp4'
@ -606,6 +607,8 @@ class TestYoutubeDL(unittest.TestCase):
'duration': 30,
'filesize': 10 * 1024,
'playlist_id': '42',
'uploader': "變態妍字幕版 太妍 тест",
'creator': "тест ' 123 ' тест--",
}
second = {
'id': '2',
@ -616,6 +619,7 @@ class TestYoutubeDL(unittest.TestCase):
'description': 'foo',
'filesize': 5 * 1024,
'playlist_id': '43',
'uploader': "тест 123",
}
videos = [first, second]
@ -656,6 +660,26 @@ class TestYoutubeDL(unittest.TestCase):
res = get_videos(f)
self.assertEqual(res, ['1'])
f = match_filter_func('uploader = "變態妍字幕版 太妍 тест"')
res = get_videos(f)
self.assertEqual(res, ['1'])
f = match_filter_func('uploader != "變態妍字幕版 太妍 тест"')
res = get_videos(f)
self.assertEqual(res, ['2'])
f = match_filter_func('creator = "тест \' 123 \' тест--"')
res = get_videos(f)
self.assertEqual(res, ['1'])
f = match_filter_func("creator = 'тест \\' 123 \\' тест--'")
res = get_videos(f)
self.assertEqual(res, ['1'])
f = match_filter_func(r"creator = 'тест \' 123 \' тест--' & duration > 30")
res = get_videos(f)
self.assertEqual(res, [])
def test_playlist_items_selection(self):
entries = [{
'id': compat_str(i),

View File

@ -34,6 +34,9 @@ from youtube_dl.utils import (
find_xpath_attr,
fix_xml_ampersands,
get_element_by_class,
get_element_by_attribute,
get_elements_by_class,
get_elements_by_attribute,
InAdvancePagedList,
intlist_to_bytes,
is_html,
@ -1124,6 +1127,32 @@ The first line
self.assertEqual(get_element_by_class('foo', html), 'nice')
self.assertEqual(get_element_by_class('no-such-class', html), None)
def test_get_element_by_attribute(self):
html = '''
<span class="foo bar">nice</span>
'''
self.assertEqual(get_element_by_attribute('class', 'foo bar', html), 'nice')
self.assertEqual(get_element_by_attribute('class', 'foo', html), None)
self.assertEqual(get_element_by_attribute('class', 'no-such-foo', html), None)
def test_get_elements_by_class(self):
html = '''
<span class="foo bar">nice</span><span class="foo bar">also nice</span>
'''
self.assertEqual(get_elements_by_class('foo', html), ['nice', 'also nice'])
self.assertEqual(get_elements_by_class('no-such-class', html), [])
def test_get_elements_by_attribute(self):
html = '''
<span class="foo bar">nice</span><span class="foo bar">also nice</span>
'''
self.assertEqual(get_elements_by_attribute('class', 'foo bar', html), ['nice', 'also nice'])
self.assertEqual(get_elements_by_attribute('class', 'foo', html), [])
self.assertEqual(get_elements_by_attribute('class', 'no-such-foo', html), [])
if __name__ == '__main__':
unittest.main()

View File

@ -23,7 +23,7 @@ class AENetworksBaseIE(ThePlatformIE):
class AENetworksIE(AENetworksBaseIE):
IE_NAME = 'aenetworks'
IE_DESC = 'A+E Networks: A&E, Lifetime, History.com, FYI Network'
_VALID_URL = r'https?://(?:www\.)?(?P<domain>(?:history|aetv|mylifetime)\.com|fyi\.tv)/(?:shows/(?P<show_path>[^/]+(?:/[^/]+){0,2})|movies/(?P<movie_display_id>[^/]+)/full-movie)'
_VALID_URL = r'https?://(?:www\.)?(?P<domain>(?:history|aetv|mylifetime|lifetimemovieclub)\.com|fyi\.tv)/(?:shows/(?P<show_path>[^/]+(?:/[^/]+){0,2})|movies/(?P<movie_display_id>[^/]+)(?:/full-movie)?)'
_TESTS = [{
'url': 'http://www.history.com/shows/mountain-men/season-1/episode-1',
'md5': 'a97a65f7e823ae10e9244bc5433d5fe6',
@ -62,11 +62,15 @@ class AENetworksIE(AENetworksBaseIE):
}, {
'url': 'http://www.mylifetime.com/movies/center-stage-on-pointe/full-movie',
'only_matching': True
}, {
'url': 'https://www.lifetimemovieclub.com/movies/a-killer-among-us',
'only_matching': True
}]
_DOMAIN_TO_REQUESTOR_ID = {
'history.com': 'HISTORY',
'aetv.com': 'AETV',
'mylifetime.com': 'LIFETIME',
'lifetimemovieclub.com': 'LIFETIMEMOVIECLUB',
'fyi.tv': 'FYI',
}

View File

@ -53,20 +53,30 @@ class AMCNetworksIE(ThePlatformIE):
'mbr': 'true',
'manifest': 'm3u',
}
media_url = self._search_regex(r'window\.platformLinkURL\s*=\s*[\'"]([^\'"]+)', webpage, 'media url')
media_url = self._search_regex(
r'window\.platformLinkURL\s*=\s*[\'"]([^\'"]+)',
webpage, 'media url')
theplatform_metadata = self._download_theplatform_metadata(self._search_regex(
r'https?://link.theplatform.com/s/([^?]+)', media_url, 'theplatform_path'), display_id)
r'link\.theplatform\.com/s/([^?]+)',
media_url, 'theplatform_path'), display_id)
info = self._parse_theplatform_metadata(theplatform_metadata)
video_id = theplatform_metadata['pid']
title = theplatform_metadata['title']
rating = theplatform_metadata['ratings'][0]['rating']
auth_required = self._search_regex(r'window\.authRequired\s*=\s*(true|false);', webpage, 'auth required')
auth_required = self._search_regex(
r'window\.authRequired\s*=\s*(true|false);',
webpage, 'auth required')
if auth_required == 'true':
requestor_id = self._search_regex(r'window\.requestor_id\s*=\s*[\'"]([^\'"]+)', webpage, 'requestor id')
resource = self._get_mvpd_resource(requestor_id, title, video_id, rating)
query['auth'] = self._extract_mvpd_auth(url, video_id, requestor_id, resource)
requestor_id = self._search_regex(
r'window\.requestor_id\s*=\s*[\'"]([^\'"]+)',
webpage, 'requestor id')
resource = self._get_mvpd_resource(
requestor_id, title, video_id, rating)
query['auth'] = self._extract_mvpd_auth(
url, video_id, requestor_id, resource)
media_url = update_url_query(media_url, query)
formats, subtitles = self._extract_theplatform_smil(media_url, video_id)
formats, subtitles = self._extract_theplatform_smil(
media_url, video_id)
self._sort_formats(formats)
info.update({
'id': video_id,
@ -78,9 +88,11 @@ class AMCNetworksIE(ThePlatformIE):
if ns_keys:
ns = list(ns_keys)[0]
series = theplatform_metadata.get(ns + '$show')
season_number = int_or_none(theplatform_metadata.get(ns + '$season'))
season_number = int_or_none(
theplatform_metadata.get(ns + '$season'))
episode = theplatform_metadata.get(ns + '$episodeTitle')
episode_number = int_or_none(theplatform_metadata.get(ns + '$episode'))
episode_number = int_or_none(
theplatform_metadata.get(ns + '$episode'))
if season_number:
title = 'Season %d - %s' % (season_number, title)
if series:

View File

@ -225,6 +225,8 @@ class BBCCoUkIE(InfoExtractor):
}
]
_USP_RE = r'/([^/]+?)\.ism(?:\.hlsv2\.ism)?/[^/]+\.m3u8'
class MediaSelectionError(Exception):
def __init__(self, id):
self.id = id
@ -336,6 +338,15 @@ class BBCCoUkIE(InfoExtractor):
formats.extend(self._extract_m3u8_formats(
href, programme_id, ext='mp4', entry_protocol='m3u8_native',
m3u8_id=format_id, fatal=False))
if re.search(self._USP_RE, href):
usp_formats = self._extract_m3u8_formats(
re.sub(self._USP_RE, r'/\1.ism/\1.m3u8', href),
programme_id, ext='mp4', entry_protocol='m3u8_native',
m3u8_id=format_id, fatal=False)
for f in usp_formats:
if f.get('height') and f['height'] > 720:
continue
formats.append(f)
elif transfer_format == 'hds':
formats.extend(self._extract_f4m_formats(
href, programme_id, f4m_id=format_id, fatal=False))

View File

@ -24,7 +24,7 @@ class BellMediaIE(InfoExtractor):
space
)\.ca|
much\.com
)/.*?(?:\bvid=|-vid|~|%7E|/(?:episode)?)(?P<id>[0-9]{6})'''
)/.*?(?:\bvid=|-vid|~|%7E|/(?:episode)?)(?P<id>[0-9]{6,})'''
_TESTS = [{
'url': 'http://www.ctv.ca/video/player?vid=706966',
'md5': 'ff2ebbeae0aa2dcc32a830c3fd69b7b0',
@ -55,6 +55,9 @@ class BellMediaIE(InfoExtractor):
}, {
'url': 'http://www.much.com/shows/the-almost-impossible-gameshow/928979/episode-6',
'only_matching': True,
}, {
'url': 'http://www.ctv.ca/DCs-Legends-of-Tomorrow/Video/S2E11-Turncoat-vid1051430',
'only_matching': True,
}]
_DOMAINS = {
'thecomedynetwork': 'comedy',

View File

@ -33,6 +33,10 @@ class BloombergIE(InfoExtractor):
'params': {
'format': 'best[format_id^=hds]',
},
}, {
# data-bmmrid=
'url': 'https://www.bloomberg.com/politics/articles/2017-02-08/le-pen-aide-briefed-french-central-banker-on-plan-to-print-money',
'only_matching': True,
}, {
'url': 'http://www.bloomberg.com/news/articles/2015-11-12/five-strange-things-that-have-been-happening-in-financial-markets',
'only_matching': True,
@ -45,9 +49,10 @@ class BloombergIE(InfoExtractor):
name = self._match_id(url)
webpage = self._download_webpage(url, name)
video_id = self._search_regex(
(r'["\']bmmrId["\']\s*:\s*(["\'])(?P<url>(?:(?!\1).)+)\1',
r'videoId\s*:\s*(["\'])(?P<url>(?:(?!\1).)+)\1'),
webpage, 'id', group='url', default=None)
(r'["\']bmmrId["\']\s*:\s*(["\'])(?P<id>(?:(?!\1).)+)\1',
r'videoId\s*:\s*(["\'])(?P<id>(?:(?!\1).)+)\1',
r'data-bmmrid=(["\'])(?P<id>(?:(?!\1).)+)\1'),
webpage, 'id', group='id', default=None)
if not video_id:
bplayer_data = self._parse_json(self._search_regex(
r'BPlayer\(null,\s*({[^;]+})\);', webpage, 'id'), name)

View File

@ -13,6 +13,7 @@ from ..utils import (
float_or_none,
sanitized_Request,
urlencode_postdata,
USER_AGENTS,
)
@ -21,10 +22,10 @@ class CeskaTelevizeIE(InfoExtractor):
_TESTS = [{
'url': 'http://www.ceskatelevize.cz/ivysilani/ivysilani/10441294653-hyde-park-civilizace/214411058091220',
'info_dict': {
'id': '61924494876951776',
'id': '61924494877246241',
'ext': 'mp4',
'title': 'Hyde Park Civilizace',
'description': 'md5:fe93f6eda372d150759d11644ebbfb4a',
'title': 'Hyde Park Civilizace: Život v Grónsku',
'description': 'md5:3fec8f6bb497be5cdb0c9e8781076626',
'thumbnail': r're:^https?://.*\.jpg',
'duration': 3350,
},
@ -114,70 +115,100 @@ class CeskaTelevizeIE(InfoExtractor):
'requestSource': 'iVysilani',
}
req = sanitized_Request(
'http://www.ceskatelevize.cz/ivysilani/ajax/get-client-playlist',
data=urlencode_postdata(data))
req.add_header('Content-type', 'application/x-www-form-urlencoded')
req.add_header('x-addr', '127.0.0.1')
req.add_header('X-Requested-With', 'XMLHttpRequest')
req.add_header('Referer', url)
playlistpage = self._download_json(req, playlist_id)
playlist_url = playlistpage['url']
if playlist_url == 'error_region':
raise ExtractorError(NOT_AVAILABLE_STRING, expected=True)
req = sanitized_Request(compat_urllib_parse_unquote(playlist_url))
req.add_header('Referer', url)
playlist_title = self._og_search_title(webpage, default=None)
playlist_description = self._og_search_description(webpage, default=None)
playlist = self._download_json(req, playlist_id)['playlist']
playlist_len = len(playlist)
entries = []
for item in playlist:
is_live = item.get('type') == 'LIVE'
formats = []
for format_id, stream_url in item['streamUrls'].items():
formats.extend(self._extract_m3u8_formats(
stream_url, playlist_id, 'mp4',
entry_protocol='m3u8' if is_live else 'm3u8_native',
fatal=False))
self._sort_formats(formats)
item_id = item.get('id') or item['assetId']
title = item['title']
for user_agent in (None, USER_AGENTS['Safari']):
req = sanitized_Request(
'http://www.ceskatelevize.cz/ivysilani/ajax/get-client-playlist',
data=urlencode_postdata(data))
duration = float_or_none(item.get('duration'))
thumbnail = item.get('previewImageUrl')
req.add_header('Content-type', 'application/x-www-form-urlencoded')
req.add_header('x-addr', '127.0.0.1')
req.add_header('X-Requested-With', 'XMLHttpRequest')
if user_agent:
req.add_header('User-Agent', user_agent)
req.add_header('Referer', url)
subtitles = {}
if item.get('type') == 'VOD':
subs = item.get('subtitles')
if subs:
subtitles = self.extract_subtitles(episode_id, subs)
playlistpage = self._download_json(req, playlist_id, fatal=False)
if playlist_len == 1:
final_title = playlist_title or title
if is_live:
final_title = self._live_title(final_title)
else:
final_title = '%s (%s)' % (playlist_title, title)
if not playlistpage:
continue
entries.append({
'id': item_id,
'title': final_title,
'description': playlist_description if playlist_len == 1 else None,
'thumbnail': thumbnail,
'duration': duration,
'formats': formats,
'subtitles': subtitles,
'is_live': is_live,
})
playlist_url = playlistpage['url']
if playlist_url == 'error_region':
raise ExtractorError(NOT_AVAILABLE_STRING, expected=True)
req = sanitized_Request(compat_urllib_parse_unquote(playlist_url))
req.add_header('Referer', url)
playlist_title = self._og_search_title(webpage, default=None)
playlist_description = self._og_search_description(webpage, default=None)
playlist = self._download_json(req, playlist_id, fatal=False)
if not playlist:
continue
playlist = playlist.get('playlist')
if not isinstance(playlist, list):
continue
playlist_len = len(playlist)
for num, item in enumerate(playlist):
is_live = item.get('type') == 'LIVE'
formats = []
for format_id, stream_url in item.get('streamUrls', {}).items():
if 'playerType=flash' in stream_url:
stream_formats = self._extract_m3u8_formats(
stream_url, playlist_id, 'mp4',
entry_protocol='m3u8' if is_live else 'm3u8_native',
m3u8_id='hls-%s' % format_id, fatal=False)
else:
stream_formats = self._extract_mpd_formats(
stream_url, playlist_id,
mpd_id='dash-%s' % format_id, fatal=False)
# See https://github.com/rg3/youtube-dl/issues/12119#issuecomment-280037031
if format_id == 'audioDescription':
for f in stream_formats:
f['source_preference'] = -10
formats.extend(stream_formats)
if user_agent and len(entries) == playlist_len:
entries[num]['formats'].extend(formats)
continue
item_id = item.get('id') or item['assetId']
title = item['title']
duration = float_or_none(item.get('duration'))
thumbnail = item.get('previewImageUrl')
subtitles = {}
if item.get('type') == 'VOD':
subs = item.get('subtitles')
if subs:
subtitles = self.extract_subtitles(episode_id, subs)
if playlist_len == 1:
final_title = playlist_title or title
if is_live:
final_title = self._live_title(final_title)
else:
final_title = '%s (%s)' % (playlist_title, title)
entries.append({
'id': item_id,
'title': final_title,
'description': playlist_description if playlist_len == 1 else None,
'thumbnail': thumbnail,
'duration': duration,
'formats': formats,
'subtitles': subtitles,
'is_live': is_live,
})
for e in entries:
self._sort_formats(e['formats'])
return self.playlist_result(entries, playlist_id, playlist_title, playlist_description)

View File

@ -1208,6 +1208,9 @@ class InfoExtractor(object):
m3u8_doc, urlh = res
m3u8_url = urlh.geturl()
if '#EXT-X-FAXS-CM:' in m3u8_doc: # Adobe Flash Access
return []
formats = [self._m3u8_meta_format(m3u8_url, ext, preference, m3u8_id)]
format_url = lambda u: (

View File

@ -0,0 +1,72 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .theplatform import ThePlatformFeedIE
from ..utils import int_or_none
class CorusIE(ThePlatformFeedIE):
_VALID_URL = r'https?://(?:www\.)?(?P<domain>(?:globaltv|etcanada)\.com|(?:hgtv|foodnetwork|slice)\.ca)/(?:video/|(?:[^/]+/)+(?:videos/[a-z0-9-]+-|video\.html\?.*?\bv=))(?P<id>\d+)'
_TESTS = [{
'url': 'http://www.hgtv.ca/shows/bryan-inc/videos/movie-night-popcorn-with-bryan-870923331648/',
'md5': '05dcbca777bf1e58c2acbb57168ad3a6',
'info_dict': {
'id': '870923331648',
'ext': 'mp4',
'title': 'Movie Night Popcorn with Bryan',
'description': 'Bryan whips up homemade popcorn, the old fashion way for Jojo and Lincoln.',
'uploader': 'SHWM-NEW',
'upload_date': '20170206',
'timestamp': 1486392197,
},
}, {
'url': 'http://www.foodnetwork.ca/shows/chopped/video/episode/chocolate-obsession/video.html?v=872683587753',
'only_matching': True,
}, {
'url': 'http://etcanada.com/video/873675331955/meet-the-survivor-game-changers-castaways-part-2/',
'only_matching': True,
}]
_TP_FEEDS = {
'globaltv': {
'feed_id': 'ChQqrem0lNUp',
'account_id': 2269680845,
},
'etcanada': {
'feed_id': 'ChQqrem0lNUp',
'account_id': 2269680845,
},
'hgtv': {
'feed_id': 'L0BMHXi2no43',
'account_id': 2414428465,
},
'foodnetwork': {
'feed_id': 'ukK8o58zbRmJ',
'account_id': 2414429569,
},
'slice': {
'feed_id': '5tUJLgV2YNJ5',
'account_id': 2414427935,
},
}
def _real_extract(self, url):
domain, video_id = re.match(self._VALID_URL, url).groups()
feed_info = self._TP_FEEDS[domain.split('.')[0]]
return self._extract_feed_info('dtjsEC', feed_info['feed_id'], 'byId=' + video_id, video_id, lambda e: {
'episode_number': int_or_none(e.get('pl1$episode')),
'season_number': int_or_none(e.get('pl1$season')),
'series': e.get('pl1$show'),
}, {
'HLS': {
'manifest': 'm3u',
},
'DesktopHLS Default': {
'manifest': 'm3u',
},
'MP4 MBR': {
'manifest': 'm3u',
},
}, feed_info['account_id'])

View File

@ -9,13 +9,15 @@ from ..utils import (
unified_strdate,
compat_str,
determine_ext,
ExtractorError,
)
class DisneyIE(InfoExtractor):
_VALID_URL = r'''(?x)
https?://(?P<domain>(?:[^/]+\.)?(?:disney\.[a-z]{2,3}(?:\.[a-z]{2})?|disney(?:(?:me|latino)\.com|turkiye\.com\.tr)|starwars\.com))/(?:embed/|(?:[^/]+/)+[\w-]+-)(?P<id>[a-z0-9]{24})'''
https?://(?P<domain>(?:[^/]+\.)?(?:disney\.[a-z]{2,3}(?:\.[a-z]{2})?|disney(?:(?:me|latino)\.com|turkiye\.com\.tr)|(?:starwars|marvelkids)\.com))/(?:(?:embed/|(?:[^/]+/)+[\w-]+-)(?P<id>[a-z0-9]{24})|(?:[^/]+/)?(?P<display_id>[^/?#]+))'''
_TESTS = [{
# Disney.EmbedVideo
'url': 'http://video.disney.com/watch/moana-trailer-545ed1857afee5a0ec239977',
'info_dict': {
'id': '545ed1857afee5a0ec239977',
@ -28,6 +30,20 @@ class DisneyIE(InfoExtractor):
# m3u8 download
'skip_download': True,
}
}, {
# Grill.burger
'url': 'http://www.starwars.com/video/rogue-one-a-star-wars-story-intro-featurette',
'info_dict': {
'id': '5454e9f4e9804a552e3524c8',
'ext': 'mp4',
'title': '"Intro" Featurette: Rogue One: A Star Wars Story',
'upload_date': '20170104',
'description': 'Go behind-the-scenes of Rogue One: A Star Wars Story in this featurette with Director Gareth Edwards and the cast of the film.',
},
'params': {
# m3u8 download
'skip_download': True,
}
}, {
'url': 'http://videos.disneylatino.com/ver/spider-man-de-regreso-a-casa-primer-adelanto-543a33a1850bdcfcca13bae2',
'only_matching': True,
@ -43,31 +59,55 @@ class DisneyIE(InfoExtractor):
}, {
'url': 'http://www.starwars.com/embed/54690d1e6c42e5f09a0fb097',
'only_matching': True,
}, {
'url': 'http://spiderman.marvelkids.com/embed/522900d2ced3c565e4cc0677',
'only_matching': True,
}, {
'url': 'http://spiderman.marvelkids.com/videos/contest-of-champions-part-four-clip-1',
'only_matching': True,
}, {
'url': 'http://disneyjunior.en.disneyme.com/dj/watch-my-friends-tigger-and-pooh-promo',
'only_matching': True,
}, {
'url': 'http://disneyjunior.disney.com/galactech-the-galactech-grab-galactech-an-admiral-rescue',
'only_matching': True,
}]
def _real_extract(self, url):
domain, video_id = re.match(self._VALID_URL, url).groups()
webpage = self._download_webpage(
'http://%s/embed/%s' % (domain, video_id), video_id)
video_data = self._parse_json(self._search_regex(
r'Disney\.EmbedVideo=({.+});', webpage, 'embed data'), video_id)['video']
domain, video_id, display_id = re.match(self._VALID_URL, url).groups()
if not video_id:
webpage = self._download_webpage(url, display_id)
grill = re.sub(r'"\s*\+\s*"', '', self._search_regex(
r'Grill\.burger\s*=\s*({.+})\s*:',
webpage, 'grill data'))
page_data = next(s for s in self._parse_json(grill, display_id)['stack'] if s.get('type') == 'video')
video_data = page_data['data'][0]
else:
webpage = self._download_webpage(
'http://%s/embed/%s' % (domain, video_id), video_id)
page_data = self._parse_json(self._search_regex(
r'Disney\.EmbedVideo\s*=\s*({.+});',
webpage, 'embed data'), video_id)
video_data = page_data['video']
for external in video_data.get('externals', []):
if external.get('source') == 'vevo':
return self.url_result('vevo:' + external['data_id'], 'Vevo')
video_id = video_data['id']
title = video_data['title']
formats = []
for flavor in video_data.get('flavors', []):
flavor_format = flavor.get('format')
flavor_url = flavor.get('url')
if not flavor_url or not re.match(r'https?://', flavor_url):
if not flavor_url or not re.match(r'https?://', flavor_url) or flavor_format == 'mp4_access':
continue
tbr = int_or_none(flavor.get('bitrate'))
if tbr == 99999:
formats.extend(self._extract_m3u8_formats(
flavor_url, video_id, 'mp4', m3u8_id=flavor_format, fatal=False))
flavor_url, video_id, 'mp4',
m3u8_id=flavor_format, fatal=False))
continue
format_id = []
if flavor_format:
@ -88,6 +128,10 @@ class DisneyIE(InfoExtractor):
'ext': ext,
'vcodec': 'none' if (width == 0 and height == 0) else None,
})
if not formats and video_data.get('expired'):
raise ExtractorError(
'%s said: %s' % (self.IE_NAME, page_data['translations']['video_expired']),
expected=True)
self._sort_formats(formats)
subtitles = {}

View File

@ -1,67 +1,94 @@
# coding: utf-8
from __future__ import unicode_literals
import base64
import json
from .common import InfoExtractor
from ..compat import compat_urlparse
from ..compat import (
compat_urlparse,
compat_str,
)
from ..utils import (
remove_start,
sanitized_Request,
extract_attributes,
ExtractorError,
get_elements_by_class,
urlencode_postdata,
)
class EinthusanIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?einthusan\.com/movies/watch.php\?([^#]*?)id=(?P<id>[0-9]+)'
_TESTS = [
{
'url': 'http://www.einthusan.com/movies/watch.php?id=2447',
'md5': 'd71379996ff5b7f217eca034c34e3461',
'info_dict': {
'id': '2447',
'ext': 'mp4',
'title': 'Ek Villain',
'thumbnail': r're:^https?://.*\.jpg$',
'description': 'md5:9d29fc91a7abadd4591fb862fa560d93',
}
},
{
'url': 'http://www.einthusan.com/movies/watch.php?id=1671',
'md5': 'b16a6fd3c67c06eb7c79c8a8615f4213',
'info_dict': {
'id': '1671',
'ext': 'mp4',
'title': 'Soodhu Kavvuum',
'thumbnail': r're:^https?://.*\.jpg$',
'description': 'md5:b40f2bf7320b4f9414f3780817b2af8c',
}
},
]
_VALID_URL = r'https?://einthusan\.tv/movie/watch/(?P<id>[0-9]+)'
_TEST = {
'url': 'https://einthusan.tv/movie/watch/9097/',
'md5': 'ff0f7f2065031b8a2cf13a933731c035',
'info_dict': {
'id': '9097',
'ext': 'mp4',
'title': 'Ae Dil Hai Mushkil',
'description': 'md5:33ef934c82a671a94652a9b4e54d931b',
'thumbnail': r're:^https?://.*\.jpg$',
}
}
# reversed from jsoncrypto.prototype.decrypt() in einthusan-PGMovieWatcher.js
def _decrypt(self, encrypted_data, video_id):
return self._parse_json(base64.b64decode((
encrypted_data[:10] + encrypted_data[-1] + encrypted_data[12:-1]
).encode('ascii')).decode('utf-8'), video_id)
def _real_extract(self, url):
video_id = self._match_id(url)
request = sanitized_Request(url)
request.add_header('User-Agent', 'Mozilla/5.0 (Windows NT 5.2; WOW64; rv:43.0) Gecko/20100101 Firefox/43.0')
webpage = self._download_webpage(request, video_id)
webpage = self._download_webpage(url, video_id)
title = self._html_search_regex(
r'<h1><a[^>]+class=["\']movie-title["\'][^>]*>(.+?)</a></h1>',
webpage, 'title')
title = self._html_search_regex(r'<h3>([^<]+)</h3>', webpage, 'title')
video_id = self._search_regex(
r'data-movieid=["\'](\d+)', webpage, 'video id', default=video_id)
player_params = extract_attributes(self._search_regex(
r'(<section[^>]+id="UIVideoPlayer"[^>]+>)', webpage, 'player parameters'))
m3u8_url = self._download_webpage(
'http://cdn.einthusan.com/geturl/%s/hd/London,Washington,Toronto,Dallas,San,Sydney/'
% video_id, video_id, headers={'Referer': url})
formats = self._extract_m3u8_formats(
m3u8_url, video_id, ext='mp4', entry_protocol='m3u8_native')
page_id = self._html_search_regex(
'<html[^>]+data-pageid="([^"]+)"', webpage, 'page ID')
video_data = self._download_json(
'https://einthusan.tv/ajax/movie/watch/%s/' % video_id, video_id,
data=urlencode_postdata({
'xEvent': 'UIVideoPlayer.PingOutcome',
'xJson': json.dumps({
'EJOutcomes': player_params['data-ejpingables'],
'NativeHLS': False
}),
'arcVersion': 3,
'appVersion': 59,
'gorilla.csrf.Token': page_id,
}))['Data']
description = self._html_search_meta('description', webpage)
if isinstance(video_data, compat_str) and video_data.startswith('/ratelimited/'):
raise ExtractorError(
'Download rate reached. Please try again later.', expected=True)
ej_links = self._decrypt(video_data['EJLinks'], video_id)
formats = []
m3u8_url = ej_links.get('HLSLink')
if m3u8_url:
formats.extend(self._extract_m3u8_formats(
m3u8_url, video_id, ext='mp4', entry_protocol='m3u8_native'))
mp4_url = ej_links.get('MP4Link')
if mp4_url:
formats.append({
'url': mp4_url,
})
self._sort_formats(formats)
description = get_elements_by_class('synopsis', webpage)[0]
thumbnail = self._html_search_regex(
r'''<a class="movie-cover-wrapper".*?><img src=["'](.*?)["'].*?/></a>''',
webpage, "thumbnail url", fatal=False)
r'''<img[^>]+src=(["'])(?P<url>(?!\1).+?/moviecovers/(?!\1).+?)\1''',
webpage, 'thumbnail url', fatal=False, group='url')
if thumbnail is not None:
thumbnail = compat_urlparse.urljoin(url, remove_start(thumbnail, '..'))
thumbnail = compat_urlparse.urljoin(url, thumbnail)
return {
'id': video_id,

View File

@ -202,6 +202,7 @@ from .commonprotocols import (
RtmpIE,
)
from .condenast import CondeNastIE
from .corus import CorusIE
from .cracked import CrackedIE
from .crackle import CrackleIE
from .criterion import CriterionIE
@ -381,10 +382,7 @@ from .heise import HeiseIE
from .hellporno import HellPornoIE
from .helsinki import HelsinkiIE
from .hentaistigma import HentaiStigmaIE
from .hgtv import (
HGTVIE,
HGTVComShowIE,
)
from .hgtv import HGTVComShowIE
from .historicfilms import HistoricFilmsIE
from .hitbox import HitboxIE, HitboxLiveIE
from .hitrecord import HitRecordIE
@ -696,6 +694,8 @@ from .ondemandkorea import OnDemandKoreaIE
from .onet import (
OnetIE,
OnetChannelIE,
OnetMVPIE,
OnetPlIE,
)
from .onionstudios import OnionStudiosIE
from .ooyala import (
@ -1149,6 +1149,7 @@ from .vlive import (
VLiveChannelIE
)
from .vodlocker import VodlockerIE
from .vodpl import VODPlIE
from .vodplatform import VODPlatformIE
from .voicerepublic import VoiceRepublicIE
from .voxmedia import VoxMediaIE

View File

@ -1,3 +1,4 @@
# coding: utf-8
from __future__ import unicode_literals
import re
@ -148,6 +149,32 @@ class FacebookIE(InfoExtractor):
'params': {
'skip_download': True,
},
}, {
'url': 'https://www.facebook.com/LaGuiaDelVaron/posts/1072691702860471',
'info_dict': {
'id': '1072691702860471',
'ext': 'mp4',
'title': 'md5:ae2d22a93fbb12dad20dc393a869739d',
'timestamp': 1477305000,
'upload_date': '20161024',
'uploader': 'La Guía Del Varón',
},
'params': {
'skip_download': True,
},
}, {
'url': 'https://www.facebook.com/groups/1024490957622648/permalink/1396382447100162/',
'info_dict': {
'id': '1396382447100162',
'ext': 'mp4',
'title': 'md5:e2d2700afdf84e121f5d0f999bad13a3',
'timestamp': 1486035494,
'upload_date': '20170202',
'uploader': 'Elisabeth Ahtn',
},
'params': {
'skip_download': True,
},
}, {
'url': 'https://www.facebook.com/video.php?v=10204634152394104',
'only_matching': True,
@ -263,7 +290,7 @@ class FacebookIE(InfoExtractor):
for item in instances:
if item[1][0] == 'VideoConfig':
video_item = item[2][0]
if video_item.get('video_id') == video_id:
if video_item.get('video_id'):
return video_item['videoData']
server_js_data = self._parse_json(self._search_regex(

View File

@ -991,19 +991,6 @@ class GenericIE(InfoExtractor):
'title': 'Os Guinness // Is It Fools Talk? // Unbelievable? Conference 2014',
},
},
# Kaltura embed protected with referrer
{
'url': 'http://www.disney.nl/disney-channel/filmpjes/achter-de-schermen#/videoId/violetta-achter-de-schermen-ruggero',
'info_dict': {
'id': '1_g4fbemnq',
'ext': 'mp4',
'title': 'Violetta - Achter De Schermen - Ruggero',
'description': 'Achter de schermen met Ruggero',
'timestamp': 1435133761,
'upload_date': '20150624',
'uploader_id': 'echojecka',
},
},
# Kaltura embed with single quotes
{
'url': 'http://fod.infobase.com/p_ViewPlaylist.aspx?AssignmentID=NUN8ZY',
@ -2350,8 +2337,9 @@ class GenericIE(InfoExtractor):
'Channel': 'channel',
'ChannelList': 'channel_list',
}
return self.url_result('limelight:%s:%s' % (
lm[mobj.group(1)], mobj.group(2)), 'Limelight%s' % mobj.group(1), mobj.group(2))
return self.url_result(smuggle_url('limelight:%s:%s' % (
lm[mobj.group(1)], mobj.group(2)), {'source_url': url}),
'Limelight%s' % mobj.group(1), mobj.group(2))
mobj = re.search(
r'''(?sx)
@ -2361,7 +2349,9 @@ class GenericIE(InfoExtractor):
value=(["\'])(?:(?!\3).)*mediaId=(?P<id>[a-z0-9]{32})
''', webpage)
if mobj:
return self.url_result('limelight:media:%s' % mobj.group('id'))
return self.url_result(smuggle_url(
'limelight:media:%s' % mobj.group('id'),
{'source_url': url}), 'LimelightMedia', mobj.group('id'))
# Look for AdobeTVVideo embeds
mobj = re.search(

View File

@ -2,50 +2,6 @@
from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import (
int_or_none,
js_to_json,
smuggle_url,
)
class HGTVIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?hgtv\.ca/[^/]+/video/(?P<id>[^/]+)/video.html'
_TEST = {
'url': 'http://www.hgtv.ca/homefree/video/overnight-success/video.html?v=738081859718&p=1&s=da#video',
'md5': '',
'info_dict': {
'id': 'aFH__I_5FBOX',
'ext': 'mp4',
'title': 'Overnight Success',
'description': 'After weeks of hard work, high stakes, breakdowns and pep talks, the final 2 contestants compete to win the ultimate dream.',
'uploader': 'SHWM-NEW',
'timestamp': 1470320034,
'upload_date': '20160804',
},
'params': {
# m3u8 download
'skip_download': True,
},
}
def _real_extract(self, url):
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
embed_vars = self._parse_json(self._search_regex(
r'(?s)embed_vars\s*=\s*({.*?});',
webpage, 'embed vars'), display_id, js_to_json)
return {
'_type': 'url_transparent',
'url': smuggle_url(
'http://link.theplatform.com/s/dtjsEC/%s?mbr=true&manifest=m3u' % embed_vars['pid'], {
'force_smil_url': True
}),
'series': embed_vars.get('show'),
'season_number': int_or_none(embed_vars.get('season')),
'episode_number': int_or_none(embed_vars.get('episode')),
'ie_key': 'ThePlatform',
}
class HGTVComShowIE(InfoExtractor):

View File

@ -34,11 +34,9 @@ class HotStarIE(InfoExtractor):
'only_matching': True,
}]
_GET_CONTENT_TEMPLATE = 'http://account.hotstar.com/AVS/besc?action=GetAggregatedContentDetails&channel=PCTV&contentId=%s'
_GET_CDN_TEMPLATE = 'http://getcdn.hotstar.com/AVS/besc?action=GetCDN&asJson=Y&channel=%s&id=%s&type=%s'
def _download_json(self, url_or_request, video_id, note='Downloading JSON metadata', fatal=True):
json_data = super(HotStarIE, self)._download_json(url_or_request, video_id, note, fatal=fatal)
def _download_json(self, url_or_request, video_id, note='Downloading JSON metadata', fatal=True, query=None):
json_data = super(HotStarIE, self)._download_json(
url_or_request, video_id, note, fatal=fatal, query=query)
if json_data['resultCode'] != 'OK':
if fatal:
raise ExtractorError(json_data['errorDescription'])
@ -48,20 +46,37 @@ class HotStarIE(InfoExtractor):
def _real_extract(self, url):
video_id = self._match_id(url)
video_data = self._download_json(
self._GET_CONTENT_TEMPLATE % video_id,
video_id)['contentInfo'][0]
'http://account.hotstar.com/AVS/besc', video_id, query={
'action': 'GetAggregatedContentDetails',
'channel': 'PCTV',
'contentId': video_id,
})['contentInfo'][0]
title = video_data['episodeTitle']
if video_data.get('encrypted') == 'Y':
raise ExtractorError('This video is DRM protected.', expected=True)
formats = []
# PCTV for extracting f4m manifest
for f in ('TABLET',):
for f in ('JIO',):
format_data = self._download_json(
self._GET_CDN_TEMPLATE % (f, video_id, 'VOD'),
video_id, 'Downloading %s JSON metadata' % f, fatal=False)
'http://getcdn.hotstar.com/AVS/besc',
video_id, 'Downloading %s JSON metadata' % f,
fatal=False, query={
'action': 'GetCDN',
'asJson': 'Y',
'channel': f,
'id': video_id,
'type': 'VOD',
})
if format_data:
format_url = format_data['src']
format_url = format_data.get('src')
if not format_url:
continue
ext = determine_ext(format_url)
if ext == 'm3u8':
formats.extend(self._extract_m3u8_formats(format_url, video_id, 'mp4', m3u8_id='hls', fatal=False))
formats.extend(self._extract_m3u8_formats(
format_url, video_id, 'mp4',
m3u8_id='hls', fatal=False))
elif ext == 'f4m':
# produce broken files
continue
@ -75,9 +90,12 @@ class HotStarIE(InfoExtractor):
return {
'id': video_id,
'title': video_data['episodeTitle'],
'title': title,
'description': video_data.get('description'),
'duration': int_or_none(video_data.get('duration')),
'timestamp': int_or_none(video_data.get('broadcastDate')),
'formats': formats,
'episode': title,
'episode_number': int_or_none(video_data.get('episodeNumber')),
'series': video_data.get('contentTitle'),
}

View File

@ -173,11 +173,12 @@ class IqiyiIE(InfoExtractor):
}
}, {
'url': 'http://www.iqiyi.com/v_19rrhnnclk.html',
'md5': '667171934041350c5de3f5015f7f1152',
'md5': 'b7dc800a4004b1b57749d9abae0472da',
'info_dict': {
'id': 'e3f585b550a280af23c98b6cb2be19fb',
'ext': 'mp4',
'title': '名侦探柯南 国语版第752集 迫近灰原秘密的黑影 下篇',
# This can be either Simplified Chinese or Traditional Chinese
'title': r're:^(?:名侦探柯南 国语版第752集 迫近灰原秘密的黑影 下篇|名偵探柯南 國語版第752集 迫近灰原秘密的黑影 下篇)$',
},
'skip': 'Geo-restricted to China',
}, {

View File

@ -7,20 +7,40 @@ class LemondeIE(InfoExtractor):
_VALID_URL = r'https?://(?:.+?\.)?lemonde\.fr/(?:[^/]+/)*(?P<id>[^/]+)\.html'
_TESTS = [{
'url': 'http://www.lemonde.fr/police-justice/video/2016/01/19/comprendre-l-affaire-bygmalion-en-cinq-minutes_4849702_1653578.html',
'md5': '01fb3c92de4c12c573343d63e163d302',
'md5': 'da120c8722d8632eec6ced937536cc98',
'info_dict': {
'id': 'lqm3kl',
'ext': 'mp4',
'title': "Comprendre l'affaire Bygmalion en 5 minutes",
'thumbnail': r're:^https?://.*\.jpg',
'duration': 320,
'duration': 309,
'upload_date': '20160119',
'timestamp': 1453194778,
'uploader_id': '3pmkp',
},
}, {
# standard iframe embed
'url': 'http://www.lemonde.fr/les-decodeurs/article/2016/10/18/tout-comprendre-du-ceta-le-petit-cousin-du-traite-transatlantique_5015920_4355770.html',
'info_dict': {
'id': 'uzsxms',
'ext': 'mp4',
'title': "CETA : quelles suites pour l'accord commercial entre l'Europe et le Canada ?",
'thumbnail': r're:^https?://.*\.jpg',
'duration': 325,
'upload_date': '20161021',
'timestamp': 1477044540,
'uploader_id': '3pmkp',
},
'params': {
'skip_download': True,
},
}, {
'url': 'http://redaction.actu.lemonde.fr/societe/video/2016/01/18/calais-debut-des-travaux-de-defrichement-dans-la-jungle_4849233_3224.html',
'only_matching': True,
}, {
# YouTube embeds
'url': 'http://www.lemonde.fr/pixels/article/2016/12/09/pourquoi-pewdiepie-superstar-de-youtube-a-menace-de-fermer-sa-chaine_5046649_4408996.html',
'only_matching': True,
}]
def _real_extract(self, url):
@ -30,5 +50,9 @@ class LemondeIE(InfoExtractor):
digiteka_url = self._proto_relative_url(self._search_regex(
r'url\s*:\s*(["\'])(?P<url>(?:https?://)?//(?:www\.)?(?:digiteka\.net|ultimedia\.com)/deliver/.+?)\1',
webpage, 'digiteka url', group='url'))
return self.url_result(digiteka_url, 'Digiteka')
webpage, 'digiteka url', group='url', default=None))
if digiteka_url:
return self.url_result(digiteka_url, 'Digiteka')
return self.url_result(url, 'Generic')

View File

@ -8,6 +8,7 @@ from ..utils import (
determine_ext,
float_or_none,
int_or_none,
unsmuggle_url,
)
@ -15,20 +16,23 @@ class LimelightBaseIE(InfoExtractor):
_PLAYLIST_SERVICE_URL = 'http://production-ps.lvp.llnw.net/r/PlaylistService/%s/%s/%s'
_API_URL = 'http://api.video.limelight.com/rest/organizations/%s/%s/%s/%s.json'
def _call_playlist_service(self, item_id, method, fatal=True):
def _call_playlist_service(self, item_id, method, fatal=True, referer=None):
headers = {}
if referer:
headers['Referer'] = referer
return self._download_json(
self._PLAYLIST_SERVICE_URL % (self._PLAYLIST_SERVICE_PATH, item_id, method),
item_id, 'Downloading PlaylistService %s JSON' % method, fatal=fatal)
item_id, 'Downloading PlaylistService %s JSON' % method, fatal=fatal, headers=headers)
def _call_api(self, organization_id, item_id, method):
return self._download_json(
self._API_URL % (organization_id, self._API_PATH, item_id, method),
item_id, 'Downloading API %s JSON' % method)
def _extract(self, item_id, pc_method, mobile_method, meta_method):
pc = self._call_playlist_service(item_id, pc_method)
def _extract(self, item_id, pc_method, mobile_method, meta_method, referer=None):
pc = self._call_playlist_service(item_id, pc_method, referer=referer)
metadata = self._call_api(pc['orgId'], item_id, meta_method)
mobile = self._call_playlist_service(item_id, mobile_method, fatal=False)
mobile = self._call_playlist_service(item_id, mobile_method, fatal=False, referer=referer)
return pc, mobile, metadata
def _extract_info(self, streams, mobile_urls, properties):
@ -207,10 +211,13 @@ class LimelightMediaIE(LimelightBaseIE):
_API_PATH = 'media'
def _real_extract(self, url):
url, smuggled_data = unsmuggle_url(url, {})
video_id = self._match_id(url)
pc, mobile, metadata = self._extract(
video_id, 'getPlaylistByMediaId', 'getMobilePlaylistByMediaId', 'properties')
video_id, 'getPlaylistByMediaId',
'getMobilePlaylistByMediaId', 'properties',
smuggled_data.get('source_url'))
return self._extract_info(
pc['playlistItems'][0].get('streams', []),
@ -247,11 +254,13 @@ class LimelightChannelIE(LimelightBaseIE):
_API_PATH = 'channels'
def _real_extract(self, url):
url, smuggled_data = unsmuggle_url(url, {})
channel_id = self._match_id(url)
pc, mobile, medias = self._extract(
channel_id, 'getPlaylistByChannelId',
'getMobilePlaylistWithNItemsByChannelId?begin=0&count=-1', 'media')
'getMobilePlaylistWithNItemsByChannelId?begin=0&count=-1',
'media', smuggled_data.get('source_url'))
entries = [
self._extract_info(

View File

@ -23,7 +23,7 @@ class OnetBaseIE(InfoExtractor):
return self._search_regex(
r'id=(["\'])mvp:(?P<id>.+?)\1', webpage, 'mvp id', group='id')
def _extract_from_id(self, video_id, webpage):
def _extract_from_id(self, video_id, webpage=None):
response = self._download_json(
'http://qi.ckm.onetapi.pl/', video_id,
query={
@ -74,8 +74,10 @@ class OnetBaseIE(InfoExtractor):
meta = video.get('meta', {})
title = self._og_search_title(webpage, default=None) or meta['title']
description = self._og_search_description(webpage, default=None) or meta.get('description')
title = (self._og_search_title(
webpage, default=None) if webpage else None) or meta['title']
description = (self._og_search_description(
webpage, default=None) if webpage else None) or meta.get('description')
duration = meta.get('length') or meta.get('lenght')
timestamp = parse_iso8601(meta.get('addDate'), ' ')
@ -89,6 +91,18 @@ class OnetBaseIE(InfoExtractor):
}
class OnetMVPIE(OnetBaseIE):
_VALID_URL = r'onetmvp:(?P<id>\d+\.\d+)'
_TEST = {
'url': 'onetmvp:381027.1509591944',
'only_matching': True,
}
def _real_extract(self, url):
return self._extract_from_id(self._match_id(url))
class OnetIE(OnetBaseIE):
_VALID_URL = r'https?://(?:www\.)?onet\.tv/[a-z]/[a-z]+/(?P<display_id>[0-9a-z-]+)/(?P<id>[0-9a-z]+)'
IE_NAME = 'onet.tv'
@ -167,3 +181,44 @@ class OnetChannelIE(OnetBaseIE):
channel_title = strip_or_none(get_element_by_class('o_channelName', webpage))
channel_description = strip_or_none(get_element_by_class('o_channelDesc', webpage))
return self.playlist_result(entries, channel_id, channel_title, channel_description)
class OnetPlIE(InfoExtractor):
_VALID_URL = r'https?://(?:[^/]+\.)?(?:onet|businessinsider\.com|plejada)\.pl/(?:[^/]+/)+(?P<id>[0-9a-z]+)'
IE_NAME = 'onet.pl'
_TESTS = [{
'url': 'http://eurosport.onet.pl/zimowe/skoki-narciarskie/ziobro-wygral-kwalifikacje-w-pjongczangu/9ckrly',
'md5': 'b94021eb56214c3969380388b6e73cb0',
'info_dict': {
'id': '1561707.1685479',
'ext': 'mp4',
'title': 'Ziobro wygrał kwalifikacje w Pjongczangu',
'description': 'md5:61fb0740084d2d702ea96512a03585b4',
'upload_date': '20170214',
'timestamp': 1487078046,
},
}, {
'url': 'http://film.onet.pl/zwiastuny/ghost-in-the-shell-drugi-zwiastun-pl/5q6yl3',
'only_matching': True,
}, {
'url': 'http://moto.onet.pl/jak-wybierane-sa-miejsca-na-fotoradary/6rs04e',
'only_matching': True,
}, {
'url': 'http://businessinsider.com.pl/wideo/scenariusz-na-koniec-swiata-wedlug-nasa/dwnqptk',
'only_matching': True,
}, {
'url': 'http://plejada.pl/weronika-rosati-o-swoim-domniemanym-slubie/n2bq89',
'only_matching': True,
}]
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
mvp_id = self._search_regex(
r'data-params-mvp=["\'](\d+\.\d+)', webpage, 'mvp id')
return self.url_result(
'onetmvp:%s' % mvp_id, OnetMVPIE.ie_key(), video_id=mvp_id)

View File

@ -64,7 +64,8 @@ class PinkbikeIE(InfoExtractor):
'video:duration', webpage, 'duration'))
uploader = self._search_regex(
r'un:\s*"([^"]+)"', webpage, 'uploader', fatal=False)
r'<a[^>]+\brel=["\']author[^>]+>([^<]+)', webpage,
'uploader', fatal=False)
upload_date = unified_strdate(self._search_regex(
r'class="fullTime"[^>]+title="([^"]+)"',
webpage, 'upload date', fatal=False))

View File

@ -18,6 +18,7 @@ from ..utils import (
parse_duration,
qualities,
srt_subtitles_timecode,
update_url_query,
urlencode_postdata,
)
@ -92,6 +93,10 @@ class PluralsightIE(PluralsightBaseIE):
raise ExtractorError('Unable to login: %s' % error, expected=True)
if all(p not in response for p in ('__INITIAL_STATE__', '"currentUser"')):
BLOCKED = 'Your account has been blocked due to suspicious activity'
if BLOCKED in response:
raise ExtractorError(
'Unable to login: %s' % BLOCKED, expected=True)
raise ExtractorError('Unable to log in')
def _get_subtitles(self, author, clip_id, lang, name, duration, video_id):
@ -327,25 +332,44 @@ class PluralsightCourseIE(PluralsightBaseIE):
# TODO: PSM cookie
course = self._download_json(
'%s/data/course/%s' % (self._API_BASE, course_id),
course_id, 'Downloading course JSON')
'%s/player/functions/rpc' % self._API_BASE, course_id,
'Downloading course JSON',
data=json.dumps({
'fn': 'bootstrapPlayer',
'payload': {
'courseId': course_id,
}
}).encode('utf-8'),
headers={
'Content-Type': 'application/json;charset=utf-8'
})['payload']['course']
title = course['title']
course_name = course['name']
course_data = course['modules']
description = course.get('description') or course.get('shortDescription')
course_data = self._download_json(
'%s/data/course/content/%s' % (self._API_BASE, course_id),
course_id, 'Downloading course data JSON')
entries = []
for num, module in enumerate(course_data, 1):
author = module.get('author')
module_name = module.get('name')
if not author or not module_name:
continue
for clip in module.get('clips', []):
player_parameters = clip.get('playerParameters')
if not player_parameters:
clip_index = int_or_none(clip.get('index'))
if clip_index is None:
continue
clip_url = update_url_query(
'%s/player' % self._API_BASE, query={
'mode': 'live',
'course': course_name,
'author': author,
'name': module_name,
'clip': clip_index,
})
entries.append({
'_type': 'url_transparent',
'url': '%s/training/player?%s' % (self._API_BASE, player_parameters),
'url': clip_url,
'ie_key': PluralsightIE.ie_key(),
'chapter': module.get('title'),
'chapter_number': num,

View File

@ -2,27 +2,27 @@
from __future__ import unicode_literals
import itertools
import os
# import os
import re
from .common import InfoExtractor
from ..compat import (
compat_HTTPError,
compat_urllib_parse_unquote,
compat_urllib_parse_unquote_plus,
compat_urllib_parse_urlparse,
# compat_urllib_parse_unquote,
# compat_urllib_parse_unquote_plus,
# compat_urllib_parse_urlparse,
)
from ..utils import (
ExtractorError,
int_or_none,
js_to_json,
orderedSet,
sanitized_Request,
# sanitized_Request,
str_to_int,
)
from ..aes import (
aes_decrypt_text
)
# from ..aes import (
# aes_decrypt_text
# )
class PornHubIE(InfoExtractor):
@ -109,10 +109,14 @@ class PornHubIE(InfoExtractor):
def _real_extract(self, url):
video_id = self._match_id(url)
req = sanitized_Request(
'http://www.pornhub.com/view_video.php?viewkey=%s' % video_id)
req.add_header('Cookie', 'age_verified=1')
webpage = self._download_webpage(req, video_id)
def dl_webpage(platform):
return self._download_webpage(
'http://www.pornhub.com/view_video.php?viewkey=%s' % video_id,
video_id, headers={
'Cookie': 'age_verified=1; platform=%s' % platform,
})
webpage = dl_webpage('pc')
error_msg = self._html_search_regex(
r'(?s)<div[^>]+class=(["\'])(?:(?!\1).)*\b(?:removed|userMessageSection)\b(?:(?!\1).)*\1[^>]*>(?P<error>.+?)</div>',
@ -123,10 +127,19 @@ class PornHubIE(InfoExtractor):
'PornHub said: %s' % error_msg,
expected=True, video_id=video_id)
tv_webpage = dl_webpage('tv')
video_url = self._search_regex(
r'<video[^>]+\bsrc=(["\'])(?P<url>(?:https?:)?//.+?)\1', tv_webpage,
'video url', group='url')
title = self._search_regex(
r'<h1>([^>]+)</h1>', tv_webpage, 'title', default=None)
# video_title from flashvars contains whitespace instead of non-ASCII (see
# http://www.pornhub.com/view_video.php?viewkey=1331683002), not relying
# on that anymore.
title = self._html_search_meta(
title = title or self._html_search_meta(
'twitter:title', webpage, default=None) or self._search_regex(
(r'<h1[^>]+class=["\']title["\'][^>]*>(?P<title>[^<]+)',
r'<div[^>]+data-video-title=(["\'])(?P<title>.+?)\1',
@ -156,6 +169,7 @@ class PornHubIE(InfoExtractor):
comment_count = self._extract_count(
r'All Comments\s*<span>\(([\d,.]+)\)', webpage, 'comment')
"""
video_variables = {}
for video_variablename, quote, video_variable in re.findall(
r'(player_quality_[0-9]{3,4}p\w+)\s*=\s*(["\'])(.+?)\2;', webpage):
@ -197,6 +211,7 @@ class PornHubIE(InfoExtractor):
'height': height,
})
self._sort_formats(formats)
"""
page_params = self._parse_json(self._search_regex(
r'page_params\.zoneDetails\[([\'"])[^\'"]+\1\]\s*=\s*(?P<data>{[^}]+})',
@ -209,6 +224,7 @@ class PornHubIE(InfoExtractor):
return {
'id': video_id,
'url': video_url,
'uploader': video_uploader,
'title': title,
'thumbnail': thumbnail,
@ -217,7 +233,7 @@ class PornHubIE(InfoExtractor):
'like_count': like_count,
'dislike_count': dislike_count,
'comment_count': comment_count,
'formats': formats,
# 'formats': formats,
'age_limit': 18,
'tags': tags,
'categories': categories,

View File

@ -306,9 +306,10 @@ class ThePlatformFeedIE(ThePlatformBaseIE):
},
}]
def _extract_feed_info(self, provider_id, feed_id, filter_query, video_id, custom_fields=None, asset_types_query={}):
def _extract_feed_info(self, provider_id, feed_id, filter_query, video_id, custom_fields=None, asset_types_query={}, account_id=None):
real_url = self._URL_TEMPLATE % (self.http_scheme(), provider_id, feed_id, filter_query)
entry = self._download_json(real_url, video_id)['entries'][0]
main_smil_url = 'http://link.theplatform.com/s/%s/media/guid/%d/%s' % (provider_id, account_id, entry['guid']) if account_id else None
formats = []
subtitles = {}
@ -333,7 +334,7 @@ class ThePlatformFeedIE(ThePlatformBaseIE):
if asset_type in asset_types_query:
query.update(asset_types_query[asset_type])
cur_formats, cur_subtitles = self._extract_theplatform_smil(update_url_query(
smil_url, query), video_id, 'Downloading SMIL data for %s' % asset_type)
main_smil_url or smil_url, query), video_id, 'Downloading SMIL data for %s' % asset_type)
formats.extend(cur_formats)
subtitles = self._merge_subtitles(subtitles, cur_subtitles)

View File

@ -0,0 +1,32 @@
# coding: utf-8
from __future__ import unicode_literals
from .onet import OnetBaseIE
class VODPlIE(OnetBaseIE):
_VALID_URL = r'https?://vod\.pl/(?:[^/]+/)+(?P<id>[0-9a-zA-Z]+)'
_TESTS = [{
'url': 'https://vod.pl/filmy/chlopaki-nie-placza/3ep3jns',
'md5': 'a7dc3b2f7faa2421aefb0ecaabf7ec74',
'info_dict': {
'id': '3ep3jns',
'ext': 'mp4',
'title': 'Chłopaki nie płaczą',
'description': 'md5:f5f03b84712e55f5ac9f0a3f94445224',
'timestamp': 1463415154,
'duration': 5765,
'upload_date': '20160516',
},
}, {
'url': 'https://vod.pl/seriale/belfer-na-planie-praca-kamery-online/2c10heh',
'only_matching': True,
}]
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
info_dict = self._extract_from_id(self._search_mvp_id(webpage), webpage)
info_dict['id'] = video_id
return info_dict

View File

@ -44,6 +44,9 @@ class XTubeIE(InfoExtractor):
}, {
'url': 'xtube:625837',
'only_matching': True,
}, {
'url': 'xtube:kVTUy_G222_',
'only_matching': True,
}]
def _real_extract(self, url):
@ -53,11 +56,16 @@ class XTubeIE(InfoExtractor):
if not display_id:
display_id = video_id
url = 'http://www.xtube.com/video-watch/-%s' % video_id
req = sanitized_Request(url)
req.add_header('Cookie', 'age_verified=1; cookiesAccepted=1')
webpage = self._download_webpage(req, display_id)
if video_id.isdigit() and len(video_id) < 11:
url_pattern = 'http://www.xtube.com/video-watch/-%s'
else:
url_pattern = 'http://www.xtube.com/watch.php?v=%s'
webpage = self._download_webpage(
url_pattern % video_id, display_id, headers={
'Cookie': 'age_verified=1; cookiesAccepted=1',
})
sources = self._parse_json(self._search_regex(
r'(["\'])sources\1\s*:\s*(?P<sources>{.+?}),',
@ -73,7 +81,7 @@ class XTubeIE(InfoExtractor):
self._sort_formats(formats)
title = self._search_regex(
(r'<h1>(?P<title>[^<]+)</h1>', r'videoTitle\s*:\s*(["\'])(?P<title>.+?)\1'),
(r'<h1>\s*(?P<title>[^<]+?)\s*</h1>', r'videoTitle\s*:\s*(["\'])(?P<title>.+?)\1'),
webpage, 'title', group='title')
description = self._search_regex(
r'</h1>\s*<p>([^<]+)', webpage, 'description', fatal=False)

View File

@ -34,6 +34,7 @@ from ..utils import (
int_or_none,
mimetype2ext,
orderedSet,
parse_codecs,
parse_duration,
remove_quotes,
remove_start,
@ -1696,15 +1697,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
codecs = mobj.group('val')
break
if codecs:
codecs = codecs.split(',')
if len(codecs) == 2:
acodec, vcodec = codecs[1], codecs[0]
else:
acodec, vcodec = (codecs[0], 'none') if kind == 'audio' else ('none', codecs[0])
dct.update({
'acodec': acodec,
'vcodec': vcodec,
})
dct.update(parse_codecs(codecs))
formats.append(dct)
elif video_info.get('hlsvp'):
manifest_url = video_info['hlsvp'][0]

View File

@ -20,9 +20,9 @@ from ..utils import (
class ZDFBaseIE(InfoExtractor):
def _call_api(self, url, player, referrer, video_id):
def _call_api(self, url, player, referrer, video_id, item):
return self._download_json(
url, video_id, 'Downloading JSON content',
url, video_id, 'Downloading JSON %s' % item,
headers={
'Referer': referrer,
'Api-Auth': 'Bearer %s' % player['apiToken'],
@ -104,7 +104,7 @@ class ZDFIE(ZDFBaseIE):
})
formats.append(f)
def _extract_entry(self, url, content, video_id):
def _extract_entry(self, url, player, content, video_id):
title = content.get('title') or content['teaserHeadline']
t = content['mainVideoContent']['http://zdf.de/rels/target']
@ -116,7 +116,8 @@ class ZDFIE(ZDFBaseIE):
'http://zdf.de/rels/streams/ptmd-template'].replace(
'{playerId}', 'portal')
ptmd = self._download_json(urljoin(url, ptmd_path), video_id)
ptmd = self._call_api(
urljoin(url, ptmd_path), player, url, video_id, 'metadata')
formats = []
track_uris = set()
@ -174,8 +175,9 @@ class ZDFIE(ZDFBaseIE):
}
def _extract_regular(self, url, player, video_id):
content = self._call_api(player['content'], player, url, video_id)
return self._extract_entry(player['content'], content, video_id)
content = self._call_api(
player['content'], player, url, video_id, 'content')
return self._extract_entry(player['content'], player, content, video_id)
def _extract_mobile(self, video_id):
document = self._download_json(

View File

@ -337,17 +337,30 @@ def get_element_by_id(id, html):
def get_element_by_class(class_name, html):
return get_element_by_attribute(
"""Return the content of the first tag with the specified class in the passed HTML document"""
retval = get_elements_by_class(class_name, html)
return retval[0] if retval else None
def get_element_by_attribute(attribute, value, html, escape_value=True):
retval = get_elements_by_attribute(attribute, value, html, escape_value)
return retval[0] if retval else None
def get_elements_by_class(class_name, html):
"""Return the content of all tags with the specified class in the passed HTML document as a list"""
return get_elements_by_attribute(
'class', r'[^\'"]*\b%s\b[^\'"]*' % re.escape(class_name),
html, escape_value=False)
def get_element_by_attribute(attribute, value, html, escape_value=True):
def get_elements_by_attribute(attribute, value, html, escape_value=True):
"""Return the content of the tag with the specified attribute in the passed HTML document"""
value = re.escape(value) if escape_value else value
m = re.search(r'''(?xs)
retlist = []
for m in re.finditer(r'''(?xs)
<([a-zA-Z0-9:._-]+)
(?:\s+[a-zA-Z0-9:._-]+(?:=[a-zA-Z0-9:._-]*|="[^"]*"|='[^']*'))*?
\s+%s=['"]?%s['"]?
@ -355,16 +368,15 @@ def get_element_by_attribute(attribute, value, html, escape_value=True):
\s*>
(?P<content>.*?)
</\1>
''' % (re.escape(attribute), value), html)
''' % (re.escape(attribute), value), html):
res = m.group('content')
if not m:
return None
res = m.group('content')
if res.startswith('"') or res.startswith("'"):
res = res[1:-1]
if res.startswith('"') or res.startswith("'"):
res = res[1:-1]
retlist.append(unescapeHTML(res))
return unescapeHTML(res)
return retlist
class HTMLAttributeParser(compat_HTMLParser):
@ -1672,6 +1684,11 @@ def setproctitle(title):
libc = ctypes.cdll.LoadLibrary('libc.so.6')
except OSError:
return
except TypeError:
# LoadLibrary in Windows Python 2.7.13 only expects
# a bytestring, but since unicode_literals turns
# every string into a unicode string, it fails.
return
title_bytes = title.encode('utf-8')
buf = ctypes.create_string_buffer(len(title_bytes))
buf.value = title_bytes
@ -2366,6 +2383,7 @@ def _match_one(filter_part, dct):
\s*(?P<op>%s)(?P<none_inclusive>\s*\?)?\s*
(?:
(?P<intval>[0-9.]+(?:[kKmMgGtTpPeEzZyY]i?[Bb]?)?)|
(?P<quote>["\'])(?P<quotedstrval>(?:\\.|(?!(?P=quote)|\\).)+?)(?P=quote)|
(?P<strval>(?![0-9.])[a-z0-9A-Z]*)
)
\s*$
@ -2374,7 +2392,8 @@ def _match_one(filter_part, dct):
if m:
op = COMPARISON_OPERATORS[m.group('op')]
actual_value = dct.get(m.group('key'))
if (m.group('strval') is not None or
if (m.group('quotedstrval') is not None or
m.group('strval') is not None or
# If the original field is a string and matching comparisonvalue is
# a number we should respect the origin of the original field
# and process comparison value as a string (see
@ -2384,7 +2403,10 @@ def _match_one(filter_part, dct):
if m.group('op') not in ('=', '!='):
raise ValueError(
'Operator %s does not support string values!' % m.group('op'))
comparison_value = m.group('strval') or m.group('intval')
comparison_value = m.group('quotedstrval') or m.group('strval') or m.group('intval')
quote = m.group('quote')
if quote is not None:
comparison_value = comparison_value.replace(r'\%s' % quote, quote)
else:
try:
comparison_value = int(m.group('intval'))

View File

@ -1,3 +1,3 @@
from __future__ import unicode_literals
__version__ = '2017.02.10'
__version__ = '2017.02.16'