1
0
mirror of https://github.com/l1ving/youtube-dl synced 2025-01-24 21:11:46 +08:00

Merge remote-tracking branch 'upstream/master'

This commit is contained in:
Petr Novak 2017-12-12 19:07:00 +01:00
commit a197ff1e74
30 changed files with 850 additions and 730 deletions

View File

@ -6,8 +6,8 @@
--- ---
### Make sure you are using the *latest* version: run `youtube-dl --version` and ensure your version is *2017.12.02*. If it's not, read [this FAQ entry](https://github.com/rg3/youtube-dl/blob/master/README.md#how-do-i-update-youtube-dl) and update. Issues with outdated version will be rejected. ### Make sure you are using the *latest* version: run `youtube-dl --version` and ensure your version is *2017.12.10*. If it's not, read [this FAQ entry](https://github.com/rg3/youtube-dl/blob/master/README.md#how-do-i-update-youtube-dl) and update. Issues with outdated version will be rejected.
- [ ] I've **verified** and **I assure** that I'm running youtube-dl **2017.12.02** - [ ] I've **verified** and **I assure** that I'm running youtube-dl **2017.12.10**
### Before submitting an *issue* make sure you have: ### Before submitting an *issue* make sure you have:
- [ ] At least skimmed through the [README](https://github.com/rg3/youtube-dl/blob/master/README.md), **most notably** the [FAQ](https://github.com/rg3/youtube-dl#faq) and [BUGS](https://github.com/rg3/youtube-dl#bugs) sections - [ ] At least skimmed through the [README](https://github.com/rg3/youtube-dl/blob/master/README.md), **most notably** the [FAQ](https://github.com/rg3/youtube-dl#faq) and [BUGS](https://github.com/rg3/youtube-dl#bugs) sections
@ -35,7 +35,7 @@ Add the `-v` flag to **your command line** you run youtube-dl with (`youtube-dl
[debug] User config: [] [debug] User config: []
[debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj'] [debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj']
[debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251 [debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251
[debug] youtube-dl version 2017.12.02 [debug] youtube-dl version 2017.12.10
[debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2 [debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2
[debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4 [debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4
[debug] Proxy map: {} [debug] Proxy map: {}

View File

@ -1,3 +1,30 @@
version 2017.12.10
Core
+ [utils] Add sami mimetype to mimetype2ext
Extractors
* [culturebox] Improve video id extraction (#14947)
* [twitter] Improve extraction (#14197)
+ [udemy] Extract more HLS formats
* [udemy] Improve course id extraction (#14938)
+ [stretchinternet] Add support for portal.stretchinternet.com (#14576)
* [ellentube] Fix extraction (#14407, #14570)
+ [raiplay:playlist] Add support for playlists (#14563)
* [sonyliv] Bypass geo restriction
* [sonyliv] Extract higher quality formats (#14922)
* [fox] Extract subtitles
+ [fox] Add support for Adobe Pass authentication (#14205, #14489)
- [dailymotion:cloud] Remove extractor (#6794)
* [xhamster] Fix thumbnail extraction (#14780)
+ [xhamster] Add support for mobile URLs (#14780)
* [generic] Don't pass video id as mpd id while extracting DASH (#14902)
* [ard] Skip invalid stream URLs (#14906)
* [porncom] Fix metadata extraction (#14911)
* [pluralsight] Detect agreement request (#14913)
* [toutv] Fix login (#14614)
version 2017.12.02 version 2017.12.02
Core Core

View File

@ -511,6 +511,9 @@ The basic usage is not to set any template arguments when downloading a single f
- `average_rating` (numeric): Average rating give by users, the scale used depends on the webpage - `average_rating` (numeric): Average rating give by users, the scale used depends on the webpage
- `comment_count` (numeric): Number of comments on the video - `comment_count` (numeric): Number of comments on the video
- `age_limit` (numeric): Age restriction for the video (years) - `age_limit` (numeric): Age restriction for the video (years)
- `is_live` (boolean): Whether this video is a live stream or a fixed-length video
- `start_time` (numeric): Time in seconds where the reproduction should start, as specified in the URL
- `end_time` (numeric): Time in seconds where the reproduction should end, as specified in the URL
- `format` (string): A human-readable description of the format - `format` (string): A human-readable description of the format
- `format_id` (string): Format code specified by `--format` - `format_id` (string): Format code specified by `--format`
- `format_note` (string): Additional info about the format - `format_note` (string): Additional info about the format

View File

@ -198,7 +198,6 @@
- **dailymotion** - **dailymotion**
- **dailymotion:playlist** - **dailymotion:playlist**
- **dailymotion:user** - **dailymotion:user**
- **DailymotionCloud**
- **DaisukiMotto** - **DaisukiMotto**
- **DaisukiMottoPlaylist** - **DaisukiMottoPlaylist**
- **daum.net** - **daum.net**
@ -243,8 +242,9 @@
- **eHow** - **eHow**
- **Einthusan** - **Einthusan**
- **eitb.tv** - **eitb.tv**
- **EllenTV** - **EllenTube**
- **EllenTV:clips** - **EllenTubePlaylist**
- **EllenTubeVideo**
- **ElPais**: El País - **ElPais**: El País
- **Embedly** - **Embedly**
- **EMPFlix** - **EMPFlix**
@ -662,6 +662,7 @@
- **Rai** - **Rai**
- **RaiPlay** - **RaiPlay**
- **RaiPlayLive** - **RaiPlayLive**
- **RaiPlayPlaylist**
- **RBMARadio** - **RBMARadio**
- **RDS**: RDS.ca - **RDS**: RDS.ca
- **RedBullTV** - **RedBullTV**
@ -781,6 +782,7 @@
- **streamcloud.eu** - **streamcloud.eu**
- **StreamCZ** - **StreamCZ**
- **StreetVoice** - **StreetVoice**
- **StretchInternet**
- **SunPorno** - **SunPorno**
- **SVT** - **SVT**
- **SVTPlay**: SVT Play and Öppet arkiv - **SVTPlay**: SVT Play and Öppet arkiv

View File

@ -5,6 +5,7 @@ import re
from .common import InfoExtractor from .common import InfoExtractor
from .generic import GenericIE from .generic import GenericIE
from ..compat import compat_str
from ..utils import ( from ..utils import (
determine_ext, determine_ext,
ExtractorError, ExtractorError,
@ -126,6 +127,8 @@ class ARDMediathekIE(InfoExtractor):
quality = stream.get('_quality') quality = stream.get('_quality')
server = stream.get('_server') server = stream.get('_server')
for stream_url in stream_urls: for stream_url in stream_urls:
if not isinstance(stream_url, compat_str) or '//' not in stream_url:
continue
ext = determine_ext(stream_url) ext = determine_ext(stream_url)
if quality != 'auto' and ext in ('f4m', 'm3u8'): if quality != 'auto' and ext in ('f4m', 'm3u8'):
continue continue
@ -146,13 +149,11 @@ class ARDMediathekIE(InfoExtractor):
'play_path': stream_url, 'play_path': stream_url,
'format_id': 'a%s-rtmp-%s' % (num, quality), 'format_id': 'a%s-rtmp-%s' % (num, quality),
} }
elif stream_url.startswith('http'): else:
f = { f = {
'url': stream_url, 'url': stream_url,
'format_id': 'a%s-%s-%s' % (num, ext, quality) 'format_id': 'a%s-%s-%s' % (num, ext, quality)
} }
else:
continue
m = re.search(r'_(?P<width>\d+)x(?P<height>\d+)\.mp4$', stream_url) m = re.search(r'_(?P<width>\d+)x(?P<height>\d+)\.mp4$', stream_url)
if m: if m:
f.update({ f.update({

View File

@ -386,7 +386,7 @@ class BBCCoUkIE(InfoExtractor):
m3u8_id=format_id, fatal=False)) m3u8_id=format_id, fatal=False))
if re.search(self._USP_RE, href): if re.search(self._USP_RE, href):
usp_formats = self._extract_m3u8_formats( usp_formats = self._extract_m3u8_formats(
re.sub(self._USP_RE, r'/\1\.ism/\1\.m3u8', href), re.sub(self._USP_RE, r'/\1.ism/\1.m3u8', href),
programme_id, ext='mp4', entry_protocol='m3u8_native', programme_id, ext='mp4', entry_protocol='m3u8_native',
m3u8_id=format_id, fatal=False) m3u8_id=format_id, fatal=False)
for f in usp_formats: for f in usp_formats:

View File

@ -413,52 +413,3 @@ class DailymotionUserIE(DailymotionPlaylistIE):
'title': full_user, 'title': full_user,
'entries': self._extract_entries(user), 'entries': self._extract_entries(user),
} }
class DailymotionCloudIE(DailymotionBaseInfoExtractor):
_VALID_URL_PREFIX = r'https?://api\.dmcloud\.net/(?:player/)?embed/'
_VALID_URL = r'%s[^/]+/(?P<id>[^/?]+)' % _VALID_URL_PREFIX
_VALID_EMBED_URL = r'%s[^/]+/[^\'"]+' % _VALID_URL_PREFIX
_TESTS = [{
# From http://www.francetvinfo.fr/economie/entreprises/les-entreprises-familiales-le-secret-de-la-reussite_933271.html
# Tested at FranceTvInfo_2
'url': 'http://api.dmcloud.net/embed/4e7343f894a6f677b10006b4/556e03339473995ee145930c?auth=1464865870-0-jyhsm84b-ead4c701fb750cf9367bf4447167a3db&autoplay=1',
'only_matching': True,
}, {
# http://www.francetvinfo.fr/societe/larguez-les-amarres-le-cobaturage-se-developpe_980101.html
'url': 'http://api.dmcloud.net/player/embed/4e7343f894a6f677b10006b4/559545469473996d31429f06?auth=1467430263-0-90tglw2l-a3a4b64ed41efe48d7fccad85b8b8fda&autoplay=1',
'only_matching': True,
}]
@classmethod
def _extract_dmcloud_url(cls, webpage):
mobj = re.search(r'<iframe[^>]+src=[\'"](%s)[\'"]' % cls._VALID_EMBED_URL, webpage)
if mobj:
return mobj.group(1)
mobj = re.search(
r'<input[^>]+id=[\'"]dmcloudUrlEmissionSelect[\'"][^>]+value=[\'"](%s)[\'"]' % cls._VALID_EMBED_URL,
webpage)
if mobj:
return mobj.group(1)
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage_no_ff(url, video_id)
title = self._html_search_regex(r'<title>([^>]+)</title>', webpage, 'title')
video_info = self._parse_json(self._search_regex(
r'var\s+info\s*=\s*([^;]+);', webpage, 'video info'), video_id)
# TODO: parse ios_url, which is in fact a manifest
video_url = video_info['mp4_url']
return {
'id': video_id,
'url': video_url,
'title': title,
'thumbnail': video_info.get('thumbnail_url'),
}

View File

@ -1,14 +1,18 @@
from __future__ import unicode_literals from __future__ import unicode_literals
from .common import InfoExtractor import random
import re
import string
from .discoverygo import DiscoveryGoBaseIE
from ..utils import ( from ..utils import (
parse_duration, ExtractorError,
parse_iso8601, update_url_query,
) )
from ..compat import compat_str from ..compat import compat_HTTPError
class DiscoveryIE(InfoExtractor): class DiscoveryIE(DiscoveryGoBaseIE):
_VALID_URL = r'''(?x)https?://(?:www\.)?(?: _VALID_URL = r'''(?x)https?://(?:www\.)?(?:
discovery| discovery|
investigationdiscovery| investigationdiscovery|
@ -19,79 +23,65 @@ class DiscoveryIE(InfoExtractor):
sciencechannel| sciencechannel|
tlc| tlc|
velocity velocity
)\.com/(?:[^/]+/)*(?P<id>[^./?#]+)''' )\.com(?P<path>/tv-shows/[^/]+/(?:video|full-episode)s/(?P<id>[^./?#]+))'''
_TESTS = [{ _TESTS = [{
'url': 'http://www.discovery.com/tv-shows/mythbusters/videos/mission-impossible-outtakes.htm', 'url': 'https://www.discovery.com/tv-shows/cash-cab/videos/dave-foley',
'info_dict': { 'info_dict': {
'id': '20769', 'id': '5a2d9b4d6b66d17a5026e1fd',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Mission Impossible Outtakes', 'title': 'Dave Foley',
'description': ('Watch Jamie Hyneman and Adam Savage practice being' 'description': 'md5:4b39bcafccf9167ca42810eb5f28b01f',
' each other -- to the point of confusing Jamie\'s dog -- and ' 'duration': 608,
'don\'t miss Adam moon-walking as Jamie ... behind Jamie\'s'
' back.'),
'duration': 156,
'timestamp': 1302032462,
'upload_date': '20110405',
'uploader_id': '103207',
}, },
'params': { 'params': {
'skip_download': True, # requires ffmpeg 'skip_download': True, # requires ffmpeg
} }
}, { }, {
'url': 'http://www.discovery.com/tv-shows/mythbusters/videos/mythbusters-the-simpsons', 'url': 'https://www.investigationdiscovery.com/tv-shows/final-vision/full-episodes/final-vision',
'info_dict': { 'only_matching': True,
'id': 'mythbusters-the-simpsons',
'title': 'MythBusters: The Simpsons',
},
'playlist_mincount': 10,
}, {
'url': 'http://www.animalplanet.com/longfin-eels-maneaters/',
'info_dict': {
'id': '78326',
'ext': 'mp4',
'title': 'Longfin Eels: Maneaters?',
'description': 'Jeremy Wade tests whether or not New Zealand\'s longfin eels are man-eaters by covering himself in fish guts and getting in the water with them.',
'upload_date': '20140725',
'timestamp': 1406246400,
'duration': 116,
'uploader_id': '103207',
},
'params': {
'skip_download': True, # requires ffmpeg
}
}] }]
_GEO_COUNTRIES = ['US']
_GEO_BYPASS = False
def _real_extract(self, url): def _real_extract(self, url):
display_id = self._match_id(url) path, display_id = re.match(self._VALID_URL, url).groups()
info = self._download_json(url + '?flat=1', display_id) webpage = self._download_webpage(url, display_id)
video_title = info.get('playlist_title') or info.get('video_title') react_data = self._parse_json(self._search_regex(
r'window\.__reactTransmitPacket\s*=\s*({.+?});',
webpage, 'react data'), display_id)
content_blocks = react_data['layout'][path]['contentBlocks']
video = next(cb for cb in content_blocks if cb.get('type') == 'video')['content']['items'][0]
video_id = video['id']
entries = [] access_token = self._download_json(
'https://www.discovery.com/anonymous', display_id, query={
'authLink': update_url_query(
'https://login.discovery.com/v1/oauth2/authorize', {
'client_id': react_data['application']['apiClientId'],
'redirect_uri': 'https://fusion.ddmcdn.com/app/mercury-sdk/180/redirectHandler.html',
'response_type': 'anonymous',
'state': 'nonce,' + ''.join([random.choice(string.ascii_letters) for _ in range(32)]),
})
})['access_token']
for idx, video_info in enumerate(info['playlist']): try:
subtitles = {} stream = self._download_json(
caption_url = video_info.get('captionsUrl') 'https://api.discovery.com/v1/streaming/video/' + video_id,
if caption_url: display_id, headers={
subtitles = { 'Authorization': 'Bearer ' + access_token,
'en': [{ })
'url': caption_url, except ExtractorError as e:
}] if isinstance(e.cause, compat_HTTPError) and e.cause.code == 403:
} e_description = self._parse_json(
e.cause.read().decode(), display_id)['description']
if 'resource not available for country' in e_description:
self.raise_geo_restricted(countries=self._GEO_COUNTRIES)
if 'Authorized Networks' in e_description:
raise ExtractorError(
'This video is only available via cable service provider subscription that'
' is not currently supported. You may want to use --cookies.', expected=True)
raise ExtractorError(e_description)
raise
entries.append({ return self._extract_video_info(video, stream, display_id)
'_type': 'url_transparent',
'url': 'http://players.brightcove.net/103207/default_default/index.html?videoId=ref:%s' % video_info['referenceId'],
'id': compat_str(video_info['id']),
'title': video_info['title'],
'description': video_info.get('description'),
'duration': parse_duration(video_info.get('video_length')),
'webpage_url': video_info.get('href') or video_info.get('url'),
'thumbnail': video_info.get('thumbnailURL'),
'alt_title': video_info.get('secondary_title'),
'timestamp': parse_iso8601(video_info.get('publishedDate')),
'subtitles': subtitles,
})
return self.playlist_result(entries, display_id, video_title)

View File

@ -27,42 +27,9 @@ class DiscoveryGoBaseIE(InfoExtractor):
velocitychannel velocitychannel
)go\.com/%s(?P<id>[^/?#&]+)''' )go\.com/%s(?P<id>[^/?#&]+)'''
def _extract_video_info(self, video, stream, display_id):
class DiscoveryGoIE(DiscoveryGoBaseIE):
_VALID_URL = DiscoveryGoBaseIE._VALID_URL_TEMPLATE % r'(?:[^/]+/)+'
_GEO_COUNTRIES = ['US']
_TEST = {
'url': 'https://www.discoverygo.com/bering-sea-gold/reaper-madness/',
'info_dict': {
'id': '58c167d86b66d12f2addeb01',
'ext': 'mp4',
'title': 'Reaper Madness',
'description': 'md5:09f2c625c99afb8946ed4fb7865f6e78',
'duration': 2519,
'series': 'Bering Sea Gold',
'season_number': 8,
'episode_number': 6,
'age_limit': 14,
},
}
def _real_extract(self, url):
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
container = extract_attributes(
self._search_regex(
r'(<div[^>]+class=["\']video-player-container[^>]+>)',
webpage, 'video container'))
video = self._parse_json(
container.get('data-video') or container.get('data-json'),
display_id)
title = video['name'] title = video['name']
stream = video.get('stream')
if not stream: if not stream:
if video.get('authenticated') is True: if video.get('authenticated') is True:
raise ExtractorError( raise ExtractorError(
@ -124,6 +91,43 @@ class DiscoveryGoIE(DiscoveryGoBaseIE):
} }
class DiscoveryGoIE(DiscoveryGoBaseIE):
_VALID_URL = DiscoveryGoBaseIE._VALID_URL_TEMPLATE % r'(?:[^/]+/)+'
_GEO_COUNTRIES = ['US']
_TEST = {
'url': 'https://www.discoverygo.com/bering-sea-gold/reaper-madness/',
'info_dict': {
'id': '58c167d86b66d12f2addeb01',
'ext': 'mp4',
'title': 'Reaper Madness',
'description': 'md5:09f2c625c99afb8946ed4fb7865f6e78',
'duration': 2519,
'series': 'Bering Sea Gold',
'season_number': 8,
'episode_number': 6,
'age_limit': 14,
},
}
def _real_extract(self, url):
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
container = extract_attributes(
self._search_regex(
r'(<div[^>]+class=["\']video-player-container[^>]+>)',
webpage, 'video container'))
video = self._parse_json(
container.get('data-video') or container.get('data-json'),
display_id)
stream = video.get('stream')
return self._extract_video_info(video, stream, display_id)
class DiscoveryGoPlaylistIE(DiscoveryGoBaseIE): class DiscoveryGoPlaylistIE(DiscoveryGoBaseIE):
_VALID_URL = DiscoveryGoBaseIE._VALID_URL_TEMPLATE % '' _VALID_URL = DiscoveryGoBaseIE._VALID_URL_TEMPLATE % ''
_TEST = { _TEST = {

View File

@ -0,0 +1,133 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import (
clean_html,
extract_attributes,
float_or_none,
int_or_none,
try_get,
)
class EllenTubeBaseIE(InfoExtractor):
def _extract_data_config(self, webpage, video_id):
details = self._search_regex(
r'(<[^>]+\bdata-component=(["\'])[Dd]etails.+?></div>)', webpage,
'details')
return self._parse_json(
extract_attributes(details)['data-config'], video_id)
def _extract_video(self, data, video_id):
title = data['title']
formats = []
duration = None
for entry in data.get('media'):
if entry.get('id') == 'm3u8':
formats = self._extract_m3u8_formats(
entry['url'], video_id, 'mp4',
entry_protocol='m3u8_native', m3u8_id='hls')
duration = int_or_none(entry.get('duration'))
break
self._sort_formats(formats)
def get_insight(kind):
return int_or_none(try_get(
data, lambda x: x['insight']['%ss' % kind]))
return {
'extractor_key': EllenTubeIE.ie_key(),
'id': video_id,
'title': title,
'description': data.get('description'),
'duration': duration,
'thumbnail': data.get('thumbnail'),
'timestamp': float_or_none(data.get('publishTime'), scale=1000),
'view_count': get_insight('view'),
'like_count': get_insight('like'),
'formats': formats,
}
class EllenTubeIE(EllenTubeBaseIE):
_VALID_URL = r'''(?x)
(?:
ellentube:|
https://api-prod\.ellentube\.com/ellenapi/api/item/
)
(?P<id>[\da-f]{8}-[\da-f]{4}-[\da-f]{4}-[\da-f]{4}-[\da-f]{12})
'''
_TESTS = [{
'url': 'https://api-prod.ellentube.com/ellenapi/api/item/0822171c-3829-43bf-b99f-d77358ae75e3',
'md5': '2fabc277131bddafdd120e0fc0f974c9',
'info_dict': {
'id': '0822171c-3829-43bf-b99f-d77358ae75e3',
'ext': 'mp4',
'title': 'Ellen Meets Las Vegas Survivors Jesus Campos and Stephen Schuck',
'description': 'md5:76e3355e2242a78ad9e3858e5616923f',
'thumbnail': r're:^https?://.+?',
'duration': 514,
'timestamp': 1508505120,
'upload_date': '20171020',
'view_count': int,
'like_count': int,
}
}, {
'url': 'ellentube:734a3353-f697-4e79-9ca9-bfc3002dc1e0',
'only_matching': True,
}]
def _real_extract(self, url):
video_id = self._match_id(url)
data = self._download_json(
'https://api-prod.ellentube.com/ellenapi/api/item/%s' % video_id,
video_id)
return self._extract_video(data, video_id)
class EllenTubeVideoIE(EllenTubeBaseIE):
_VALID_URL = r'https?://(?:www\.)?ellentube\.com/video/(?P<id>.+?)\.html'
_TEST = {
'url': 'https://www.ellentube.com/video/ellen-meets-las-vegas-survivors-jesus-campos-and-stephen-schuck.html',
'only_matching': True,
}
def _real_extract(self, url):
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
video_id = self._extract_data_config(webpage, display_id)['id']
return self.url_result(
'ellentube:%s' % video_id, ie=EllenTubeIE.ie_key(),
video_id=video_id)
class EllenTubePlaylistIE(EllenTubeBaseIE):
_VALID_URL = r'https?://(?:www\.)?ellentube\.com/(?:episode|studios)/(?P<id>.+?)\.html'
_TESTS = [{
'url': 'https://www.ellentube.com/episode/dax-shepard-jordan-fisher-haim.html',
'info_dict': {
'id': 'dax-shepard-jordan-fisher-haim',
'title': "Dax Shepard, 'DWTS' Team Jordan Fisher & Lindsay Arnold, HAIM",
'description': 'md5:bfc982194dabb3f4e325e43aa6b2e21c',
},
'playlist_count': 6,
}, {
'url': 'https://www.ellentube.com/studios/macey-goes-rving0.html',
'only_matching': True,
}]
def _real_extract(self, url):
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
data = self._extract_data_config(webpage, display_id)['data']
feed = self._download_json(
'https://api-prod.ellentube.com/ellenapi/api/feed/?%s'
% data['filter'], display_id)
entries = [
self._extract_video(elem, elem['id'])
for elem in feed if elem.get('type') == 'VIDEO' and elem.get('id')]
return self.playlist_result(
entries, display_id, data.get('title'),
clean_html(data.get('description')))

View File

@ -1,101 +0,0 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
from .kaltura import KalturaIE
from ..utils import NO_DEFAULT
class EllenTVIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?(?:ellentv|ellentube)\.com/videos/(?P<id>[a-z0-9_-]+)'
_TESTS = [{
'url': 'http://www.ellentv.com/videos/0-ipq1gsai/',
'md5': '4294cf98bc165f218aaa0b89e0fd8042',
'info_dict': {
'id': '0_ipq1gsai',
'ext': 'mov',
'title': 'Fast Fingers of Fate',
'description': 'md5:3539013ddcbfa64b2a6d1b38d910868a',
'timestamp': 1428035648,
'upload_date': '20150403',
'uploader_id': 'batchUser',
},
}, {
# not available via http://widgets.ellentube.com/
'url': 'http://www.ellentv.com/videos/1-szkgu2m2/',
'info_dict': {
'id': '1_szkgu2m2',
'ext': 'flv',
'title': "Ellen's Amazingly Talented Audience",
'description': 'md5:86ff1e376ff0d717d7171590e273f0a5',
'timestamp': 1255140900,
'upload_date': '20091010',
'uploader_id': 'ellenkaltura@gmail.com',
},
'params': {
'skip_download': True,
},
}]
def _real_extract(self, url):
video_id = self._match_id(url)
URLS = ('http://widgets.ellentube.com/videos/%s' % video_id, url)
for num, url_ in enumerate(URLS, 1):
webpage = self._download_webpage(
url_, video_id, fatal=num == len(URLS))
default = NO_DEFAULT if num == len(URLS) else None
partner_id = self._search_regex(
r"var\s+partnerId\s*=\s*'([^']+)", webpage, 'partner id',
default=default)
kaltura_id = self._search_regex(
[r'id="kaltura_player_([^"]+)"',
r"_wb_entry_id\s*:\s*'([^']+)",
r'data-kaltura-entry-id="([^"]+)'],
webpage, 'kaltura id', default=default)
if partner_id and kaltura_id:
break
return self.url_result('kaltura:%s:%s' % (partner_id, kaltura_id), KalturaIE.ie_key())
class EllenTVClipsIE(InfoExtractor):
IE_NAME = 'EllenTV:clips'
_VALID_URL = r'https?://(?:www\.)?ellentv\.com/episodes/(?P<id>[a-z0-9_-]+)'
_TEST = {
'url': 'http://www.ellentv.com/episodes/meryl-streep-vanessa-hudgens/',
'info_dict': {
'id': 'meryl-streep-vanessa-hudgens',
'title': 'Meryl Streep, Vanessa Hudgens',
},
'playlist_mincount': 5,
}
def _real_extract(self, url):
playlist_id = self._match_id(url)
webpage = self._download_webpage(url, playlist_id)
playlist = self._extract_playlist(webpage, playlist_id)
return {
'_type': 'playlist',
'id': playlist_id,
'title': self._og_search_title(webpage),
'entries': self._extract_entries(playlist)
}
def _extract_playlist(self, webpage, playlist_id):
json_string = self._search_regex(r'playerView.addClips\(\[\{(.*?)\}\]\);', webpage, 'json')
return self._parse_json('[{' + json_string + '}]', playlist_id)
def _extract_entries(self, playlist):
return [
self.url_result(
'kaltura:%s:%s' % (item['kaltura_partner_id'], item['kaltura_entry_id']),
KalturaIE.ie_key(), video_id=item['kaltura_entry_id'])
for item in playlist]

View File

@ -246,7 +246,6 @@ from .dailymotion import (
DailymotionIE, DailymotionIE,
DailymotionPlaylistIE, DailymotionPlaylistIE,
DailymotionUserIE, DailymotionUserIE,
DailymotionCloudIE,
) )
from .daisuki import ( from .daisuki import (
DaisukiMottoIE, DaisukiMottoIE,
@ -312,9 +311,10 @@ from .ehow import EHowIE
from .eighttracks import EightTracksIE from .eighttracks import EightTracksIE
from .einthusan import EinthusanIE from .einthusan import EinthusanIE
from .eitb import EitbIE from .eitb import EitbIE
from .ellentv import ( from .ellentube import (
EllenTVIE, EllenTubeIE,
EllenTVClipsIE, EllenTubeVideoIE,
EllenTubePlaylistIE,
) )
from .elpais import ElPaisIE from .elpais import ElPaisIE
from .embedly import EmbedlyIE from .embedly import EmbedlyIE
@ -689,6 +689,7 @@ from .nhl import (
) )
from .nick import ( from .nick import (
NickIE, NickIE,
NickBrIE,
NickDeIE, NickDeIE,
NickNightIE, NickNightIE,
NickRuIE, NickRuIE,
@ -721,10 +722,6 @@ from .nowness import (
NownessPlaylistIE, NownessPlaylistIE,
NownessSeriesIE, NownessSeriesIE,
) )
from .nowtv import (
NowTVIE,
NowTVListIE,
)
from .noz import NozIE from .noz import NozIE
from .npo import ( from .npo import (
AndereTijdenIE, AndereTijdenIE,
@ -857,6 +854,7 @@ from .radiofrance import RadioFranceIE
from .rai import ( from .rai import (
RaiPlayIE, RaiPlayIE,
RaiPlayLiveIE, RaiPlayLiveIE,
RaiPlayPlaylistIE,
RaiIE, RaiIE,
) )
from .rbmaradio import RBMARadioIE from .rbmaradio import RBMARadioIE
@ -1004,6 +1002,7 @@ from .streamango import StreamangoIE
from .streamcloud import StreamcloudIE from .streamcloud import StreamcloudIE
from .streamcz import StreamCZIE from .streamcz import StreamCZIE
from .streetvoice import StreetVoiceIE from .streetvoice import StreetVoiceIE
from .stretchinternet import StretchInternetIE
from .sunporno import SunPornoIE from .sunporno import SunPornoIE
from .svt import ( from .svt import (
SVTIE, SVTIE,
@ -1106,6 +1105,10 @@ from .tvigle import TvigleIE
from .tvland import TVLandIE from .tvland import TVLandIE
from .tvn24 import TVN24IE from .tvn24 import TVN24IE
from .tvnoe import TVNoeIE from .tvnoe import TVNoeIE
from .tvnow import (
TVNowIE,
TVNowListIE,
)
from .tvp import ( from .tvp import (
TVPEmbedIE, TVPEmbedIE,
TVPIE, TVPIE,

View File

@ -11,6 +11,7 @@ from ..utils import (
parse_duration, parse_duration,
try_get, try_get,
unified_timestamp, unified_timestamp,
update_url_query,
) )
@ -62,7 +63,8 @@ class FOXIE(AdobePassIE):
duration = int_or_none(video.get('durationInSeconds')) or int_or_none( duration = int_or_none(video.get('durationInSeconds')) or int_or_none(
video.get('duration')) or parse_duration(video.get('duration')) video.get('duration')) or parse_duration(video.get('duration'))
timestamp = unified_timestamp(video.get('datePublished')) timestamp = unified_timestamp(video.get('datePublished'))
age_limit = parse_age_limit(video.get('contentRating')) rating = video.get('contentRating')
age_limit = parse_age_limit(rating)
data = try_get( data = try_get(
video, lambda x: x['trackingData']['properties'], dict) or {} video, lambda x: x['trackingData']['properties'], dict) or {}
@ -77,8 +79,24 @@ class FOXIE(AdobePassIE):
release_year = int_or_none(video.get('releaseYear')) release_year = int_or_none(video.get('releaseYear'))
if data.get('authRequired'): if data.get('authRequired'):
# TODO: AP resource = self._get_mvpd_resource(
pass 'fbc-fox', title, video.get('guid'), rating)
release_url = update_url_query(
release_url, {
'auth': self._extract_mvpd_auth(
url, video_id, 'fbc-fox', resource)
})
subtitles = {}
for doc_rel in video.get('documentReleases', []):
rel_url = doc_rel.get('url')
if not url or doc_rel.get('format') != 'SCC':
continue
subtitles['en'] = [{
'url': rel_url,
'ext': 'scc',
}]
break
info = { info = {
'id': video_id, 'id': video_id,
@ -93,6 +111,7 @@ class FOXIE(AdobePassIE):
'episode': episode, 'episode': episode,
'episode_number': episode_number, 'episode_number': episode_number,
'release_year': release_year, 'release_year': release_year,
'subtitles': subtitles,
} }
urlh = self._request_webpage(HEADRequest(release_url), video_id) urlh = self._request_webpage(HEADRequest(release_url), video_id)

View File

@ -13,10 +13,7 @@ from ..utils import (
parse_duration, parse_duration,
determine_ext, determine_ext,
) )
from .dailymotion import ( from .dailymotion import DailymotionIE
DailymotionIE,
DailymotionCloudIE,
)
class FranceTVBaseInfoExtractor(InfoExtractor): class FranceTVBaseInfoExtractor(InfoExtractor):
@ -290,10 +287,6 @@ class FranceTVInfoIE(FranceTVBaseInfoExtractor):
page_title = mobj.group('title') page_title = mobj.group('title')
webpage = self._download_webpage(url, page_title) webpage = self._download_webpage(url, page_title)
dmcloud_url = DailymotionCloudIE._extract_dmcloud_url(webpage)
if dmcloud_url:
return self.url_result(dmcloud_url, DailymotionCloudIE.ie_key())
dailymotion_urls = DailymotionIE._extract_urls(webpage) dailymotion_urls = DailymotionIE._extract_urls(webpage)
if dailymotion_urls: if dailymotion_urls:
return self.playlist_result([ return self.playlist_result([
@ -363,6 +356,7 @@ class CultureboxIE(FranceTVBaseInfoExtractor):
raise ExtractorError('Video %s is not available' % name, expected=True) raise ExtractorError('Video %s is not available' % name, expected=True)
video_id, catalogue = self._search_regex( video_id, catalogue = self._search_regex(
r'"https?://videos\.francetv\.fr/video/([^@]+@[^"]+)"', webpage, 'video id').split('@') r'["\'>]https?://videos\.francetv\.fr/video/([^@]+@.+?)["\'<]',
webpage, 'video id').split('@')
return self._extract_video(video_id, catalogue) return self._extract_video(video_id, catalogue)

View File

@ -59,10 +59,7 @@ from .tnaflix import TNAFlixNetworkEmbedIE
from .drtuber import DrTuberIE from .drtuber import DrTuberIE
from .redtube import RedTubeIE from .redtube import RedTubeIE
from .vimeo import VimeoIE from .vimeo import VimeoIE
from .dailymotion import ( from .dailymotion import DailymotionIE
DailymotionIE,
DailymotionCloudIE,
)
from .dailymail import DailyMailIE from .dailymail import DailyMailIE
from .onionstudios import OnionStudiosIE from .onionstudios import OnionStudiosIE
from .viewlift import ViewLiftEmbedIE from .viewlift import ViewLiftEmbedIE
@ -1472,23 +1469,6 @@ class GenericIE(InfoExtractor):
'timestamp': 1432570283, 'timestamp': 1432570283,
}, },
}, },
# Dailymotion Cloud video
{
'url': 'http://replay.publicsenat.fr/vod/le-debat/florent-kolandjian,dominique-cena,axel-decourtye,laurence-abeille,bruno-parmentier/175910',
'md5': 'dcaf23ad0c67a256f4278bce6e0bae38',
'info_dict': {
'id': 'x2uy8t3',
'ext': 'mp4',
'title': 'Sauvons les abeilles ! - Le débat',
'description': 'md5:d9082128b1c5277987825d684939ca26',
'thumbnail': r're:^https?://.*\.jpe?g$',
'timestamp': 1434970506,
'upload_date': '20150622',
'uploader': 'Public Sénat',
'uploader_id': 'xa9gza',
},
'skip': 'File not found.',
},
# OnionStudios embed # OnionStudios embed
{ {
'url': 'http://www.clickhole.com/video/dont-understand-bitcoin-man-will-mumble-explanatio-2537', 'url': 'http://www.clickhole.com/video/dont-understand-bitcoin-man-will-mumble-explanatio-2537',
@ -2195,7 +2175,7 @@ class GenericIE(InfoExtractor):
return self.playlist_result(self._parse_xspf(doc, video_id), video_id) return self.playlist_result(self._parse_xspf(doc, video_id), video_id)
elif re.match(r'(?i)^(?:{[^}]+})?MPD$', doc.tag): elif re.match(r'(?i)^(?:{[^}]+})?MPD$', doc.tag):
info_dict['formats'] = self._parse_mpd_formats( info_dict['formats'] = self._parse_mpd_formats(
doc, video_id, doc,
mpd_base_url=compat_str(full_response.geturl()).rpartition('/')[0], mpd_base_url=compat_str(full_response.geturl()).rpartition('/')[0],
mpd_url=url) mpd_url=url)
self._sort_formats(info_dict['formats']) self._sort_formats(info_dict['formats'])
@ -2704,11 +2684,6 @@ class GenericIE(InfoExtractor):
if senate_isvp_url: if senate_isvp_url:
return self.url_result(senate_isvp_url, 'SenateISVP') return self.url_result(senate_isvp_url, 'SenateISVP')
# Look for Dailymotion Cloud videos
dmcloud_url = DailymotionCloudIE._extract_dmcloud_url(webpage)
if dmcloud_url:
return self.url_result(dmcloud_url, 'DailymotionCloud')
# Look for OnionStudios embeds # Look for OnionStudios embeds
onionstudios_url = OnionStudiosIE._extract_url(webpage) onionstudios_url = OnionStudiosIE._extract_url(webpage)
if onionstudios_url: if onionstudios_url:

View File

@ -10,7 +10,7 @@ from ..utils import update_url_query
class NickIE(MTVServicesInfoExtractor): class NickIE(MTVServicesInfoExtractor):
# None of videos on the website are still alive? # None of videos on the website are still alive?
IE_NAME = 'nick.com' IE_NAME = 'nick.com'
_VALID_URL = r'https?://(?:(?:www|beta)\.)?nick(?:jr)?\.com/(?:[^/]+/)?(?:videos/clip|[^/]+/videos)/(?P<id>[^/?#.]+)' _VALID_URL = r'https?://(?P<domain>(?:(?:www|beta)\.)?nick(?:jr)?\.com)/(?:[^/]+/)?(?:videos/clip|[^/]+/videos)/(?P<id>[^/?#.]+)'
_FEED_URL = 'http://udat.mtvnservices.com/service1/dispatch.htm' _FEED_URL = 'http://udat.mtvnservices.com/service1/dispatch.htm'
_GEO_COUNTRIES = ['US'] _GEO_COUNTRIES = ['US']
_TESTS = [{ _TESTS = [{
@ -69,8 +69,59 @@ class NickIE(MTVServicesInfoExtractor):
'mgid': uri, 'mgid': uri,
} }
def _extract_mgid(self, webpage): def _real_extract(self, url):
return self._search_regex(r'data-contenturi="([^"]+)', webpage, 'mgid') domain, display_id = re.match(self._VALID_URL, url).groups()
video_data = self._download_json(
'http://%s/data/video.endLevel.json' % domain,
display_id, query={
'urlKey': display_id,
})
return self._get_videos_info(video_data['player'] + video_data['id'])
class NickBrIE(MTVServicesInfoExtractor):
IE_NAME = 'nickelodeon:br'
_VALID_URL = r'https?://(?P<domain>(?:www\.)?nickjr|mundonick\.uol)\.com\.br/(?:programas/)?[^/]+/videos/(?:episodios/)?(?P<id>[^/?#.]+)'
_TESTS = [{
'url': 'http://www.nickjr.com.br/patrulha-canina/videos/210-labirinto-de-pipoca/',
'only_matching': True,
}, {
'url': 'http://mundonick.uol.com.br/programas/the-loud-house/videos/muitas-irmas/7ljo9j',
'only_matching': True,
}]
def _real_extract(self, url):
domain, display_id = re.match(self._VALID_URL, url).groups()
webpage = self._download_webpage(url, display_id)
uri = self._search_regex(
r'data-(?:contenturi|mgid)="([^"]+)', webpage, 'mgid')
video_id = self._id_from_uri(uri)
config = self._download_json(
'http://media.mtvnservices.com/pmt/e1/access/index.html',
video_id, query={
'uri': uri,
'configtype': 'edge',
}, headers={
'Referer': url,
})
info_url = self._remove_template_parameter(config['feedWithQueryParams'])
if info_url == 'None':
if domain.startswith('www.'):
domain = domain[4:]
content_domain = {
'mundonick.uol': 'mundonick.com.br',
'nickjr': 'br.nickelodeonjunior.tv',
}[domain]
query = {
'mgid': uri,
'imageEp': content_domain,
'arcEp': content_domain,
}
if domain == 'nickjr.com.br':
query['ep'] = 'c4b16088'
info_url = update_url_query(
'http://feeds.mtvnservices.com/od/feed/intl-mrss-player-feed', query)
return self._get_videos_info_from_url(info_url, video_id)
class NickDeIE(MTVServicesInfoExtractor): class NickDeIE(MTVServicesInfoExtractor):

View File

@ -1,261 +0,0 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..compat import compat_str
from ..utils import (
ExtractorError,
determine_ext,
int_or_none,
parse_iso8601,
parse_duration,
remove_start,
)
class NowTVBaseIE(InfoExtractor):
_VIDEO_FIELDS = (
'id', 'title', 'free', 'geoblocked', 'articleLong', 'articleShort',
'broadcastStartDate', 'seoUrl', 'duration', 'files',
'format.defaultImage169Format', 'format.defaultImage169Logo')
def _extract_video(self, info, display_id=None):
video_id = compat_str(info['id'])
files = info['files']
if not files:
if info.get('geoblocked', False):
raise ExtractorError(
'Video %s is not available from your location due to geo restriction' % video_id,
expected=True)
if not info.get('free', True):
raise ExtractorError(
'Video %s is not available for free' % video_id, expected=True)
formats = []
for item in files['items']:
if determine_ext(item['path']) != 'f4v':
continue
app, play_path = remove_start(item['path'], '/').split('/', 1)
formats.append({
'url': 'rtmpe://fms.rtl.de',
'app': app,
'play_path': 'mp4:%s' % play_path,
'ext': 'flv',
'page_url': 'http://rtlnow.rtl.de',
'player_url': 'http://cdn.static-fra.de/now/vodplayer.swf',
'tbr': int_or_none(item.get('bitrate')),
})
self._sort_formats(formats)
title = info['title']
description = info.get('articleLong') or info.get('articleShort')
timestamp = parse_iso8601(info.get('broadcastStartDate'), ' ')
duration = parse_duration(info.get('duration'))
f = info.get('format', {})
thumbnail = f.get('defaultImage169Format') or f.get('defaultImage169Logo')
return {
'id': video_id,
'display_id': display_id or info.get('seoUrl'),
'title': title,
'description': description,
'thumbnail': thumbnail,
'timestamp': timestamp,
'duration': duration,
'formats': formats,
}
class NowTVIE(NowTVBaseIE):
_WORKING = False
_VALID_URL = r'https?://(?:www\.)?nowtv\.(?:de|at|ch)/(?:rtl|rtl2|rtlnitro|superrtl|ntv|vox)/(?P<show_id>[^/]+)/(?:(?:list/[^/]+|jahr/\d{4}/\d{1,2})/)?(?P<id>[^/]+)/(?:player|preview)'
_TESTS = [{
# rtl
'url': 'http://www.nowtv.de/rtl/bauer-sucht-frau/die-neuen-bauern-und-eine-hochzeit/player',
'info_dict': {
'id': '203519',
'display_id': 'bauer-sucht-frau/die-neuen-bauern-und-eine-hochzeit',
'ext': 'flv',
'title': 'Inka Bause stellt die neuen Bauern vor',
'description': 'md5:e234e1ed6d63cf06be5c070442612e7e',
'thumbnail': r're:^https?://.*\.jpg$',
'timestamp': 1432580700,
'upload_date': '20150525',
'duration': 2786,
},
'params': {
# rtmp download
'skip_download': True,
},
}, {
# rtl2
'url': 'http://www.nowtv.de/rtl2/berlin-tag-nacht/berlin-tag-nacht-folge-934/player',
'info_dict': {
'id': '203481',
'display_id': 'berlin-tag-nacht/berlin-tag-nacht-folge-934',
'ext': 'flv',
'title': 'Berlin - Tag & Nacht (Folge 934)',
'description': 'md5:c85e88c2e36c552dfe63433bc9506dd0',
'thumbnail': r're:^https?://.*\.jpg$',
'timestamp': 1432666800,
'upload_date': '20150526',
'duration': 2641,
},
'params': {
# rtmp download
'skip_download': True,
},
}, {
# rtlnitro
'url': 'http://www.nowtv.de/rtlnitro/alarm-fuer-cobra-11-die-autobahnpolizei/hals-und-beinbruch-2014-08-23-21-10-00/player',
'info_dict': {
'id': '165780',
'display_id': 'alarm-fuer-cobra-11-die-autobahnpolizei/hals-und-beinbruch-2014-08-23-21-10-00',
'ext': 'flv',
'title': 'Hals- und Beinbruch',
'description': 'md5:b50d248efffe244e6f56737f0911ca57',
'thumbnail': r're:^https?://.*\.jpg$',
'timestamp': 1432415400,
'upload_date': '20150523',
'duration': 2742,
},
'params': {
# rtmp download
'skip_download': True,
},
}, {
# superrtl
'url': 'http://www.nowtv.de/superrtl/medicopter-117/angst/player',
'info_dict': {
'id': '99205',
'display_id': 'medicopter-117/angst',
'ext': 'flv',
'title': 'Angst!',
'description': 'md5:30cbc4c0b73ec98bcd73c9f2a8c17c4e',
'thumbnail': r're:^https?://.*\.jpg$',
'timestamp': 1222632900,
'upload_date': '20080928',
'duration': 3025,
},
'params': {
# rtmp download
'skip_download': True,
},
}, {
# ntv
'url': 'http://www.nowtv.de/ntv/ratgeber-geld/thema-ua-der-erste-blick-die-apple-watch/player',
'info_dict': {
'id': '203521',
'display_id': 'ratgeber-geld/thema-ua-der-erste-blick-die-apple-watch',
'ext': 'flv',
'title': 'Thema u.a.: Der erste Blick: Die Apple Watch',
'description': 'md5:4312b6c9d839ffe7d8caf03865a531af',
'thumbnail': r're:^https?://.*\.jpg$',
'timestamp': 1432751700,
'upload_date': '20150527',
'duration': 1083,
},
'params': {
# rtmp download
'skip_download': True,
},
}, {
# vox
'url': 'http://www.nowtv.de/vox/der-hundeprofi/buero-fall-chihuahua-joel/player',
'info_dict': {
'id': '128953',
'display_id': 'der-hundeprofi/buero-fall-chihuahua-joel',
'ext': 'flv',
'title': "Büro-Fall / Chihuahua 'Joel'",
'description': 'md5:e62cb6bf7c3cc669179d4f1eb279ad8d',
'thumbnail': r're:^https?://.*\.jpg$',
'timestamp': 1432408200,
'upload_date': '20150523',
'duration': 3092,
},
'params': {
# rtmp download
'skip_download': True,
},
}, {
'url': 'http://www.nowtv.de/rtl/bauer-sucht-frau/die-neuen-bauern-und-eine-hochzeit/preview',
'only_matching': True,
}, {
'url': 'http://www.nowtv.at/rtl/bauer-sucht-frau/die-neuen-bauern-und-eine-hochzeit/preview?return=/rtl/bauer-sucht-frau/die-neuen-bauern-und-eine-hochzeit',
'only_matching': True,
}, {
'url': 'http://www.nowtv.de/rtl2/echtzeit/list/aktuell/schnelles-geld-am-ende-der-welt/player',
'only_matching': True,
}, {
'url': 'http://www.nowtv.de/rtl2/zuhause-im-glueck/jahr/2015/11/eine-erschuetternde-diagnose/player',
'only_matching': True,
}]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
display_id = '%s/%s' % (mobj.group('show_id'), mobj.group('id'))
info = self._download_json(
'https://api.nowtv.de/v3/movies/%s?fields=%s'
% (display_id, ','.join(self._VIDEO_FIELDS)), display_id)
return self._extract_video(info, display_id)
class NowTVListIE(NowTVBaseIE):
_VALID_URL = r'https?://(?:www\.)?nowtv\.(?:de|at|ch)/(?:rtl|rtl2|rtlnitro|superrtl|ntv|vox)/(?P<show_id>[^/]+)/list/(?P<id>[^?/#&]+)$'
_SHOW_FIELDS = ('title', )
_SEASON_FIELDS = ('id', 'headline', 'seoheadline', )
_TESTS = [{
'url': 'http://www.nowtv.at/rtl/stern-tv/list/aktuell',
'info_dict': {
'id': '17006',
'title': 'stern TV - Aktuell',
},
'playlist_count': 1,
}, {
'url': 'http://www.nowtv.at/rtl/das-supertalent/list/free-staffel-8',
'info_dict': {
'id': '20716',
'title': 'Das Supertalent - FREE Staffel 8',
},
'playlist_count': 14,
}]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
show_id = mobj.group('show_id')
season_id = mobj.group('id')
fields = []
fields.extend(self._SHOW_FIELDS)
fields.extend('formatTabs.%s' % field for field in self._SEASON_FIELDS)
fields.extend(
'formatTabs.formatTabPages.container.movies.%s' % field
for field in self._VIDEO_FIELDS)
list_info = self._download_json(
'https://api.nowtv.de/v3/formats/seo?fields=%s&name=%s.php'
% (','.join(fields), show_id),
season_id)
season = next(
season for season in list_info['formatTabs']['items']
if season.get('seoheadline') == season_id)
title = '%s - %s' % (list_info['title'], season['headline'])
entries = []
for container in season['formatTabPages']['items']:
for info in ((container.get('container') or {}).get('movies') or {}).get('items') or []:
entries.append(self._extract_video(info))
return self.playlist_result(
entries, compat_str(season.get('id') or season_id), title)

View File

@ -131,6 +131,13 @@ class PluralsightIE(PluralsightBaseIE):
if BLOCKED in response: if BLOCKED in response:
raise ExtractorError( raise ExtractorError(
'Unable to login: %s' % BLOCKED, expected=True) 'Unable to login: %s' % BLOCKED, expected=True)
MUST_AGREE = 'To continue using Pluralsight, you must agree to'
if any(p in response for p in (MUST_AGREE, '>Disagree<', '>Agree<')):
raise ExtractorError(
'Unable to login: %s some documents. Go to pluralsight.com, '
'log in and agree with what Pluralsight requires.'
% MUST_AGREE, expected=True)
raise ExtractorError('Unable to log in') raise ExtractorError('Unable to log in')
def _get_subtitles(self, author, clip_id, lang, name, duration, video_id): def _get_subtitles(self, author, clip_id, lang, name, duration, video_id):

View File

@ -77,12 +77,14 @@ class PornComIE(InfoExtractor):
self._sort_formats(formats) self._sort_formats(formats)
view_count = str_to_int(self._search_regex( view_count = str_to_int(self._search_regex(
r'class=["\']views["\'][^>]*><p>([\d,.]+)', webpage, (r'Views:\s*</span>\s*<span>\s*([\d,.]+)',
r'class=["\']views["\'][^>]*><p>([\d,.]+)'), webpage,
'view count', fatal=False)) 'view count', fatal=False))
def extract_list(kind): def extract_list(kind):
s = self._search_regex( s = self._search_regex(
r'(?s)<p[^>]*>%s:(.+?)</p>' % kind.capitalize(), (r'(?s)%s:\s*</span>\s*<span>(.+?)</span>' % kind.capitalize(),
r'(?s)<p[^>]*>%s:(.+?)</p>' % kind.capitalize()),
webpage, kind, fatal=False) webpage, kind, fatal=False)
return re.findall(r'<a[^>]+>([^<]+)</a>', s or '') return re.findall(r'<a[^>]+>([^<]+)</a>', s or '')

View File

@ -17,6 +17,7 @@ from ..utils import (
parse_duration, parse_duration,
strip_or_none, strip_or_none,
try_get, try_get,
unescapeHTML,
unified_strdate, unified_strdate,
unified_timestamp, unified_timestamp,
update_url_query, update_url_query,
@ -249,6 +250,41 @@ class RaiPlayLiveIE(RaiBaseIE):
} }
class RaiPlayPlaylistIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?raiplay\.it/programmi/(?P<id>[^/?#&]+)'
_TESTS = [{
'url': 'http://www.raiplay.it/programmi/nondirloalmiocapo/',
'info_dict': {
'id': 'nondirloalmiocapo',
'title': 'Non dirlo al mio capo',
'description': 'md5:9f3d603b2947c1c7abb098f3b14fac86',
},
'playlist_mincount': 12,
}]
def _real_extract(self, url):
playlist_id = self._match_id(url)
webpage = self._download_webpage(url, playlist_id)
title = self._html_search_meta(
('programma', 'nomeProgramma'), webpage, 'title')
description = unescapeHTML(self._html_search_meta(
('description', 'og:description'), webpage, 'description'))
print(description)
entries = []
for mobj in re.finditer(
r'<a\b[^>]+\bhref=(["\'])(?P<path>/raiplay/video/.+?)\1',
webpage):
video_url = urljoin(url, mobj.group('path'))
entries.append(self.url_result(
video_url, ie=RaiPlayIE.ie_key(),
video_id=RaiPlayIE._match_id(video_url)))
return self.playlist_result(entries, playlist_id, title, description)
class RaiIE(RaiBaseIE): class RaiIE(RaiBaseIE):
_VALID_URL = r'https?://[^/]+\.(?:rai\.(?:it|tv)|rainews\.it)/dl/.+?-(?P<id>%s)(?:-.+?)?\.html' % RaiBaseIE._UUID_RE _VALID_URL = r'https?://[^/]+\.(?:rai\.(?:it|tv)|rainews\.it)/dl/.+?-(?P<id>%s)(?:-.+?)?\.html' % RaiBaseIE._UUID_RE
_TESTS = [{ _TESTS = [{

View File

@ -2,6 +2,7 @@
from __future__ import unicode_literals from __future__ import unicode_literals
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import smuggle_url
class SonyLIVIE(InfoExtractor): class SonyLIVIE(InfoExtractor):
@ -10,12 +11,12 @@ class SonyLIVIE(InfoExtractor):
'url': "http://www.sonyliv.com/details/episodes/5024612095001/Ep.-1---Achaari-Cheese-Toast---Bachelor's-Delight", 'url': "http://www.sonyliv.com/details/episodes/5024612095001/Ep.-1---Achaari-Cheese-Toast---Bachelor's-Delight",
'info_dict': { 'info_dict': {
'title': "Ep. 1 - Achaari Cheese Toast - Bachelor's Delight", 'title': "Ep. 1 - Achaari Cheese Toast - Bachelor's Delight",
'id': '5024612095001', 'id': 'ref:5024612095001',
'ext': 'mp4', 'ext': 'mp4',
'upload_date': '20160707', 'upload_date': '20170923',
'description': 'md5:7f28509a148d5be9d0782b4d5106410d', 'description': 'md5:7f28509a148d5be9d0782b4d5106410d',
'uploader_id': '4338955589001', 'uploader_id': '5182475815001',
'timestamp': 1467870968, 'timestamp': 1506200547,
}, },
'params': { 'params': {
'skip_download': True, 'skip_download': True,
@ -26,9 +27,11 @@ class SonyLIVIE(InfoExtractor):
'only_matching': True, 'only_matching': True,
}] }]
BRIGHTCOVE_URL_TEMPLATE = 'http://players.brightcove.net/4338955589001/default_default/index.html?videoId=%s' # BRIGHTCOVE_URL_TEMPLATE = 'http://players.brightcove.net/4338955589001/default_default/index.html?videoId=%s'
BRIGHTCOVE_URL_TEMPLATE = 'http://players.brightcove.net/5182475815001/default_default/index.html?videoId=ref:%s'
def _real_extract(self, url): def _real_extract(self, url):
brightcove_id = self._match_id(url) brightcove_id = self._match_id(url)
return self.url_result( return self.url_result(
self.BRIGHTCOVE_URL_TEMPLATE % brightcove_id, 'BrightcoveNew', brightcove_id) smuggle_url(self.BRIGHTCOVE_URL_TEMPLATE % brightcove_id, {'geo_countries': ['IN']}),
'BrightcoveNew', brightcove_id)

View File

@ -0,0 +1,48 @@
from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import int_or_none
class StretchInternetIE(InfoExtractor):
_VALID_URL = r'https?://portal\.stretchinternet\.com/[^/]+/portal\.htm\?.*?\beventId=(?P<id>\d+)'
_TEST = {
'url': 'https://portal.stretchinternet.com/umary/portal.htm?eventId=313900&streamType=video',
'info_dict': {
'id': '313900',
'ext': 'mp4',
'title': 'Augustana (S.D.) Baseball vs University of Mary',
'description': 'md5:7578478614aae3bdd4a90f578f787438',
'timestamp': 1490468400,
'upload_date': '20170325',
}
}
def _real_extract(self, url):
video_id = self._match_id(url)
stream = self._download_json(
'https://neo-client.stretchinternet.com/streamservice/v1/media/stream/v%s'
% video_id, video_id)
video_url = 'https://%s' % stream['source']
event = self._download_json(
'https://neo-client.stretchinternet.com/portal-ws/getEvent.json',
video_id, query={
'clientID': 99997,
'eventID': video_id,
'token': 'asdf',
})['event']
title = event.get('title') or event['mobileTitle']
description = event.get('customText')
timestamp = int_or_none(event.get('longtime'))
return {
'id': video_id,
'title': title,
'description': description,
'timestamp': timestamp,
'url': video_url,
}

View File

@ -4,58 +4,109 @@ from __future__ import unicode_literals
import re import re
from .turner import TurnerBaseIE from .turner import TurnerBaseIE
from ..utils import extract_attributes from ..utils import (
float_or_none,
int_or_none,
strip_or_none,
)
class TBSIE(TurnerBaseIE): class TBSIE(TurnerBaseIE):
# https://github.com/rg3/youtube-dl/issues/13658 _VALID_URL = r'https?://(?:www\.)?(?P<site>tbs|tntdrama)\.com/(?:movies|shows/[^/]+/(?:clips|season-\d+/episode-\d+))/(?P<id>[^/?#]+)'
_WORKING = False
_VALID_URL = r'https?://(?:www\.)?(?P<site>tbs|tntdrama)\.com/videos/(?:[^/]+/)+(?P<id>[^/?#]+)\.html'
_TESTS = [{ _TESTS = [{
'url': 'http://www.tbs.com/videos/people-of-earth/season-1/extras/2007318/theatrical-trailer.html', 'url': 'http://www.tntdrama.com/shows/the-alienist/clips/monster',
'md5': '9e61d680e2285066ade7199e6408b2ee',
'info_dict': { 'info_dict': {
'id': '2007318', 'id': '8d384cde33b89f3a43ce5329de42903ed5099887',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Theatrical Trailer', 'title': 'Monster',
'description': 'Catch the latest comedy from TBS, People of Earth, premiering Halloween night--Monday, October 31, at 9/8c.', 'description': 'Get a first look at the theatrical trailer for TNTs highly anticipated new psychological thriller The Alienist, which premieres January 22 on TNT.',
'timestamp': 1508175329,
'upload_date': '20171016',
}, },
'skip': 'TBS videos are deleted after a while', 'params': {
# m3u8 download
'skip_download': True,
}
}, { }, {
'url': 'http://www.tntdrama.com/videos/good-behavior/season-1/extras/1538823/you-better-run.html', 'url': 'http://www.tbs.com/shows/search-party/season-1/episode-1/explicit-the-mysterious-disappearance-of-the-girl-no-one-knew',
'md5': 'ce53c6ead5e9f3280b4ad2031a6fab56', 'only_matching': True,
'info_dict': { }, {
'id': '1538823', 'url': 'http://www.tntdrama.com/movies/star-wars-a-new-hope',
'ext': 'mp4', 'only_matching': True,
'title': 'You Better Run',
'description': 'Letty Raines must figure out what she\'s running toward while running away from her past. Good Behavior premieres November 15 at 9/8c.',
},
'skip': 'TBS videos are deleted after a while',
}] }]
def _real_extract(self, url): def _real_extract(self, url):
domain, display_id = re.match(self._VALID_URL, url).groups() site, display_id = re.match(self._VALID_URL, url).groups()
site = domain[:3]
webpage = self._download_webpage(url, display_id) webpage = self._download_webpage(url, display_id)
video_params = extract_attributes(self._search_regex(r'(<[^>]+id="page-video"[^>]*>)', webpage, 'video params')) video_data = self._parse_json(self._search_regex(
query = None r'<script[^>]+?data-drupal-selector="drupal-settings-json"[^>]*?>({.+?})</script>',
clip_id = video_params.get('clipid') webpage, 'drupal setting'), display_id)['turner_playlist'][0]
if clip_id:
query = 'id=' + clip_id media_id = video_data['mediaID']
else: title = video_data['title']
query = 'titleId=' + video_params['titleid']
return self._extract_cvp_info( streams_data = self._download_json(
'http://www.%s.com/service/cvpXml?%s' % (domain, query), display_id, { 'http://medium.ngtv.io/media/%s/tv' % media_id,
'default': { media_id)['media']['tv']
'media_src': 'http://ht.cdn.turner.com/%s/big' % site, duration = None
}, chapters = []
'secure': { formats = []
'media_src': 'http://androidhls-secure.cdn.turner.com/%s/big' % site, for supported_type in ('unprotected', 'bulkaes'):
'tokenizer_src': 'http://www.%s.com/video/processors/services/token_ipadAdobe.do' % domain, stream_data = streams_data.get(supported_type, {})
}, m3u8_url = stream_data.get('secureUrl') or stream_data.get('url')
}, { if not m3u8_url:
'url': url, continue
'site_name': site.upper(), if stream_data.get('playlistProtection') == 'spe':
'auth_required': video_params.get('isAuthRequired') != 'false', m3u8_url = self._add_akamai_spe_token(
}) 'http://www.%s.com/service/token_spe' % site,
m3u8_url, media_id, {
'url': url,
'site_name': site[:3].upper(),
'auth_required': video_data.get('authRequired') == '1',
})
formats.extend(self._extract_m3u8_formats(
m3u8_url, media_id, 'mp4', m3u8_id='hls', fatal=False))
duration = float_or_none(stream_data.get('totalRuntime') or video_data.get('duration'))
if not chapters:
for chapter in stream_data.get('contentSegments', []):
start_time = float_or_none(chapter.get('start'))
duration = float_or_none(chapter.get('duration'))
if start_time is None or duration is None:
continue
chapters.append({
'start_time': start_time,
'end_time': start_time + duration,
})
self._sort_formats(formats)
thumbnails = []
for image_id, image in video_data.get('images', {}).items():
image_url = image.get('url')
if not image_url or image.get('type') != 'video':
continue
i = {
'id': image_id,
'url': image_url,
}
mobj = re.search(r'(\d+)x(\d+)', image_url)
if mobj:
i.update({
'width': int(mobj.group(1)),
'height': int(mobj.group(2)),
})
thumbnails.append(i)
return {
'id': media_id,
'title': title,
'description': strip_or_none(video_data.get('descriptionNoTags') or video_data.get('shortDescriptionNoTags')),
'duration': duration,
'timestamp': int_or_none(video_data.get('created')),
'season_number': int_or_none(video_data.get('season')),
'episode_number': int_or_none(video_data.get('episode')),
'cahpters': chapters,
'thumbnails': thumbnails,
'formats': formats,
}

View File

@ -16,7 +16,7 @@ from ..utils import (
class TouTvIE(InfoExtractor): class TouTvIE(InfoExtractor):
_NETRC_MACHINE = 'toutv' _NETRC_MACHINE = 'toutv'
IE_NAME = 'tou.tv' IE_NAME = 'tou.tv'
_VALID_URL = r'https?://ici\.tou\.tv/(?P<id>[a-zA-Z0-9_-]+(?:/S[0-9]+E[0-9]+)?)' _VALID_URL = r'https?://ici\.tou\.tv/(?P<id>[a-zA-Z0-9_-]+(?:/S[0-9]+[EC][0-9]+)?)'
_access_token = None _access_token = None
_claims = None _claims = None
@ -37,6 +37,9 @@ class TouTvIE(InfoExtractor):
}, { }, {
'url': 'http://ici.tou.tv/hackers', 'url': 'http://ici.tou.tv/hackers',
'only_matching': True, 'only_matching': True,
}, {
'url': 'https://ici.tou.tv/l-age-adulte/S01C501',
'only_matching': True,
}] }]
def _real_initialize(self): def _real_initialize(self):

View File

@ -18,9 +18,32 @@ from ..utils import (
class TurnerBaseIE(AdobePassIE): class TurnerBaseIE(AdobePassIE):
_AKAMAI_SPE_TOKEN_CACHE = {}
def _extract_timestamp(self, video_data): def _extract_timestamp(self, video_data):
return int_or_none(xpath_attr(video_data, 'dateCreated', 'uts')) return int_or_none(xpath_attr(video_data, 'dateCreated', 'uts'))
def _add_akamai_spe_token(self, tokenizer_src, video_url, content_id, ap_data):
secure_path = self._search_regex(r'https?://[^/]+(.+/)', video_url, 'secure path') + '*'
token = self._AKAMAI_SPE_TOKEN_CACHE.get(secure_path)
if not token:
query = {
'path': secure_path,
'videoId': content_id,
}
if ap_data.get('auth_required'):
query['accessToken'] = self._extract_mvpd_auth(ap_data['url'], content_id, ap_data['site_name'], ap_data['site_name'])
auth = self._download_xml(
tokenizer_src, content_id, query=query)
error_msg = xpath_text(auth, 'error/msg')
if error_msg:
raise ExtractorError(error_msg, expected=True)
token = xpath_text(auth, 'token')
if not token:
return video_url
self._AKAMAI_SPE_TOKEN_CACHE[secure_path] = token
return video_url + '?hdnea=' + token
def _extract_cvp_info(self, data_src, video_id, path_data={}, ap_data={}): def _extract_cvp_info(self, data_src, video_id, path_data={}, ap_data={}):
video_data = self._download_xml(data_src, video_id) video_data = self._download_xml(data_src, video_id)
video_id = video_data.attrib['id'] video_id = video_data.attrib['id']
@ -33,7 +56,6 @@ class TurnerBaseIE(AdobePassIE):
# rtmp_src = splited_rtmp_src[1] # rtmp_src = splited_rtmp_src[1]
# aifp = xpath_text(video_data, 'akamai/aifp', default='') # aifp = xpath_text(video_data, 'akamai/aifp', default='')
tokens = {}
urls = [] urls = []
formats = [] formats = []
rex = re.compile( rex = re.compile(
@ -67,26 +89,10 @@ class TurnerBaseIE(AdobePassIE):
secure_path_data = path_data.get('secure') secure_path_data = path_data.get('secure')
if not secure_path_data: if not secure_path_data:
continue continue
video_url = secure_path_data['media_src'] + video_url video_url = self._add_akamai_spe_token(
secure_path = self._search_regex(r'https?://[^/]+(.+/)', video_url, 'secure path') + '*' secure_path_data['tokenizer_src'],
token = tokens.get(secure_path) secure_path_data['media_src'] + video_url,
if not token: content_id, ap_data)
query = {
'path': secure_path,
'videoId': content_id,
}
if ap_data.get('auth_required'):
query['accessToken'] = self._extract_mvpd_auth(ap_data['url'], video_id, ap_data['site_name'], ap_data['site_name'])
auth = self._download_xml(
secure_path_data['tokenizer_src'], video_id, query=query)
error_msg = xpath_text(auth, 'error/msg')
if error_msg:
raise ExtractorError(error_msg, expected=True)
token = xpath_text(auth, 'token')
if not token:
continue
tokens[secure_path] = token
video_url = video_url + '?hdnea=' + token
elif not re.match('https?://', video_url): elif not re.match('https?://', video_url):
base_path_data = path_data.get(ext, path_data.get('default', {})) base_path_data = path_data.get(ext, path_data.get('default', {}))
media_src = base_path_data.get('media_src') media_src = base_path_data.get('media_src')

View File

@ -0,0 +1,175 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..compat import compat_str
from ..utils import (
ExtractorError,
parse_iso8601,
parse_duration,
update_url_query,
)
class TVNowBaseIE(InfoExtractor):
_VIDEO_FIELDS = (
'id', 'title', 'free', 'geoblocked', 'articleLong', 'articleShort',
'broadcastStartDate', 'isDrm', 'duration', 'manifest.dashclear',
'format.defaultImage169Format', 'format.defaultImage169Logo')
def _call_api(self, path, video_id, query):
return self._download_json(
'https://api.tvnow.de/v3/' + path,
video_id, query=query)
def _extract_video(self, info, display_id):
video_id = compat_str(info['id'])
title = info['title']
mpd_url = info['manifest']['dashclear']
if not mpd_url:
if info.get('isDrm'):
raise ExtractorError(
'Video %s is DRM protected' % video_id, expected=True)
if info.get('geoblocked'):
raise ExtractorError(
'Video %s is not available from your location due to geo restriction' % video_id,
expected=True)
if not info.get('free', True):
raise ExtractorError(
'Video %s is not available for free' % video_id, expected=True)
mpd_url = update_url_query(mpd_url, {'filter': ''})
formats = self._extract_mpd_formats(mpd_url, video_id, mpd_id='dash', fatal=False)
formats.extend(self._extract_ism_formats(
mpd_url.replace('dash.', 'hss.').replace('/.mpd', '/Manifest'),
video_id, ism_id='mss', fatal=False))
formats.extend(self._extract_m3u8_formats(
mpd_url.replace('dash.', 'hls.').replace('/.mpd', '/.m3u8'),
video_id, 'mp4', 'm3u8_native', m3u8_id='hls', fatal=False))
self._sort_formats(formats)
description = info.get('articleLong') or info.get('articleShort')
timestamp = parse_iso8601(info.get('broadcastStartDate'), ' ')
duration = parse_duration(info.get('duration'))
f = info.get('format', {})
thumbnail = f.get('defaultImage169Format') or f.get('defaultImage169Logo')
return {
'id': video_id,
'display_id': display_id,
'title': title,
'description': description,
'thumbnail': thumbnail,
'timestamp': timestamp,
'duration': duration,
'formats': formats,
}
class TVNowIE(TVNowBaseIE):
_VALID_URL = r'https?://(?:www\.)?tvnow\.(?:de|at|ch)/(?:rtl(?:2|plus)?|nitro|superrtl|ntv|vox)/(?P<show_id>[^/]+)/(?:(?:list/[^/]+|jahr/\d{4}/\d{1,2})/)?(?P<id>[^/]+)/(?:player|preview)'
_TESTS = [{
# rtl
'url': 'https://www.tvnow.de/rtl/alarm-fuer-cobra-11/freier-fall/player?return=/rtl',
'info_dict': {
'id': '385314',
'display_id': 'alarm-fuer-cobra-11/freier-fall',
'ext': 'mp4',
'title': 'Freier Fall',
'description': 'md5:8c2d8f727261adf7e0dc18366124ca02',
'thumbnail': r're:^https?://.*\.jpg$',
'timestamp': 1512677700,
'upload_date': '20171207',
'duration': 2862.0,
},
}, {
# rtl2
'url': 'https://www.tvnow.de/rtl2/armes-deutschland/episode-0008/player',
'only_matching': 'True',
}, {
# rtlnitro
'url': 'https://www.tvnow.de/nitro/alarm-fuer-cobra-11-die-autobahnpolizei/auf-eigene-faust-pilot/player',
'only_matching': 'True',
}, {
# superrtl
'url': 'https://www.tvnow.de/superrtl/die-lustigsten-schlamassel-der-welt/u-a-ketchup-effekt/player',
'only_matching': 'True',
}, {
# ntv
'url': 'https://www.tvnow.de/ntv/startup-news/goetter-in-weiss/player',
'only_matching': 'True',
}, {
# vox
'url': 'https://www.tvnow.de/vox/auto-mobil/neues-vom-automobilmarkt-2017-11-19-17-00-00/player',
'only_matching': 'True',
}, {
# rtlplus
'url': 'https://www.tvnow.de/rtlplus/op-ruft-dr-bruckner/die-vernaehte-frau/player',
'only_matching': 'True',
}]
def _real_extract(self, url):
display_id = '%s/%s' % re.match(self._VALID_URL, url).groups()
info = self._call_api(
'movies/' + display_id, display_id, query={
'fields': ','.join(self._VIDEO_FIELDS),
})
return self._extract_video(info, display_id)
class TVNowListIE(TVNowBaseIE):
_VALID_URL = r'(?P<base_url>https?://(?:www\.)?tvnow\.(?:de|at|ch)/(?:rtl(?:2|plus)?|nitro|superrtl|ntv|vox)/(?P<show_id>[^/]+)/)list/(?P<id>[^?/#&]+)$'
_SHOW_FIELDS = ('title', )
_SEASON_FIELDS = ('id', 'headline', 'seoheadline', )
_VIDEO_FIELDS = ('id', 'headline', 'seoUrl', )
_TESTS = [{
'url': 'https://www.tvnow.de/rtl/30-minuten-deutschland/list/aktuell',
'info_dict': {
'id': '28296',
'title': '30 Minuten Deutschland - Aktuell',
},
'playlist_mincount': 1,
}]
def _real_extract(self, url):
base_url, show_id, season_id = re.match(self._VALID_URL, url).groups()
fields = []
fields.extend(self._SHOW_FIELDS)
fields.extend('formatTabs.%s' % field for field in self._SEASON_FIELDS)
fields.extend(
'formatTabs.formatTabPages.container.movies.%s' % field
for field in self._VIDEO_FIELDS)
list_info = self._call_api(
'formats/seo', season_id, query={
'fields': ','.join(fields),
'name': show_id + '.php'
})
season = next(
season for season in list_info['formatTabs']['items']
if season.get('seoheadline') == season_id)
title = '%s - %s' % (list_info['title'], season['headline'])
entries = []
for container in season['formatTabPages']['items']:
for info in ((container.get('container') or {}).get('movies') or {}).get('items') or []:
seo_url = info.get('seoUrl')
if not seo_url:
continue
entries.append(self.url_result(
base_url + seo_url + '/player', 'TVNow', info.get('id')))
return self.playlist_result(
entries, compat_str(season.get('id') or season_id), title)

View File

@ -43,7 +43,7 @@ class TwitterBaseIE(InfoExtractor):
class TwitterCardIE(TwitterBaseIE): class TwitterCardIE(TwitterBaseIE):
IE_NAME = 'twitter:card' IE_NAME = 'twitter:card'
_VALID_URL = r'https?://(?:www\.)?twitter\.com/i/(?:cards/tfw/v1|videos(?:/tweet)?)/(?P<id>\d+)' _VALID_URL = r'https?://(?:www\.)?twitter\.com/i/(?P<path>cards/tfw/v1|videos(?:/tweet)?)/(?P<id>\d+)'
_TESTS = [ _TESTS = [
{ {
'url': 'https://twitter.com/i/cards/tfw/v1/560070183650213889', 'url': 'https://twitter.com/i/cards/tfw/v1/560070183650213889',
@ -51,11 +51,10 @@ class TwitterCardIE(TwitterBaseIE):
'info_dict': { 'info_dict': {
'id': '560070183650213889', 'id': '560070183650213889',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Twitter Card', 'title': 'Twitter web player',
'thumbnail': r're:^https?://.*\.jpg$', 'thumbnail': r're:^https?://.*\.jpg$',
'duration': 30.033, 'duration': 30.033,
}, },
'skip': 'Video gone',
}, },
{ {
'url': 'https://twitter.com/i/cards/tfw/v1/623160978427936768', 'url': 'https://twitter.com/i/cards/tfw/v1/623160978427936768',
@ -63,11 +62,9 @@ class TwitterCardIE(TwitterBaseIE):
'info_dict': { 'info_dict': {
'id': '623160978427936768', 'id': '623160978427936768',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Twitter Card', 'title': 'Twitter web player',
'thumbnail': r're:^https?://.*\.jpg', 'thumbnail': r're:^https?://.*(?:\bformat=|\.)jpg',
'duration': 80.155,
}, },
'skip': 'Video gone',
}, },
{ {
'url': 'https://twitter.com/i/cards/tfw/v1/654001591733886977', 'url': 'https://twitter.com/i/cards/tfw/v1/654001591733886977',
@ -120,15 +117,15 @@ class TwitterCardIE(TwitterBaseIE):
elif media_url.endswith('.mpd'): elif media_url.endswith('.mpd'):
formats.extend(self._extract_mpd_formats(media_url, video_id, mpd_id='dash')) formats.extend(self._extract_mpd_formats(media_url, video_id, mpd_id='dash'))
else: else:
vbr = int_or_none(dict_get(media_variant, ('bitRate', 'bitrate')), scale=1000) tbr = int_or_none(dict_get(media_variant, ('bitRate', 'bitrate')), scale=1000)
a_format = { a_format = {
'url': media_url, 'url': media_url,
'format_id': 'http-%d' % vbr if vbr else 'http', 'format_id': 'http-%d' % tbr if tbr else 'http',
'vbr': vbr, 'tbr': tbr,
} }
# Reported bitRate may be zero # Reported bitRate may be zero
if not a_format['vbr']: if not a_format['tbr']:
del a_format['vbr'] del a_format['tbr']
self._search_dimensions_in_video_url(a_format, media_url) self._search_dimensions_in_video_url(a_format, media_url)
@ -150,79 +147,83 @@ class TwitterCardIE(TwitterBaseIE):
bearer_token = self._search_regex( bearer_token = self._search_regex(
r'BEARER_TOKEN\s*:\s*"([^"]+)"', r'BEARER_TOKEN\s*:\s*"([^"]+)"',
main_script, 'bearer token') main_script, 'bearer token')
guest_token = self._search_regex( # https://developer.twitter.com/en/docs/tweets/post-and-engage/api-reference/get-statuses-show-id
r'document\.cookie\s*=\s*decodeURIComponent\("gt=(\d+)',
webpage, 'guest token')
api_data = self._download_json( api_data = self._download_json(
'https://api.twitter.com/2/timeline/conversation/%s.json' % video_id, 'https://api.twitter.com/1.1/statuses/show/%s.json' % video_id,
video_id, 'Downloading mobile API data', video_id, 'Downloading API data',
headers={ headers={
'Authorization': 'Bearer ' + bearer_token, 'Authorization': 'Bearer ' + bearer_token,
'x-guest-token': guest_token,
}) })
media_info = try_get(api_data, lambda o: o['globalObjects']['tweets'][video_id] media_info = try_get(api_data, lambda o: o['extended_entities']['media'][0]['video_info']) or {}
['extended_entities']['media'][0]['video_info']) or {}
return self._parse_media_info(media_info, video_id) return self._parse_media_info(media_info, video_id)
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) path, video_id = re.search(self._VALID_URL, url).groups()
config = None config = None
formats = [] formats = []
duration = None duration = None
webpage = self._download_webpage(url, video_id) urls = [url]
if path.startswith('cards/'):
urls.append('https://twitter.com/i/videos/' + video_id)
iframe_url = self._html_search_regex( for u in urls:
r'<iframe[^>]+src="((?:https?:)?//(?:www\.youtube\.com/embed/[^"]+|(?:www\.)?vine\.co/v/\w+/card))"', webpage = self._download_webpage(u, video_id)
webpage, 'video iframe', default=None)
if iframe_url:
return self.url_result(iframe_url)
config = self._parse_json(self._html_search_regex( iframe_url = self._html_search_regex(
r'data-(?:player-)?config="([^"]+)"', webpage, r'<iframe[^>]+src="((?:https?:)?//(?:www\.youtube\.com/embed/[^"]+|(?:www\.)?vine\.co/v/\w+/card))"',
'data player config', default='{}'), webpage, 'video iframe', default=None)
video_id) if iframe_url:
return self.url_result(iframe_url)
if config.get('source_type') == 'vine': config = self._parse_json(self._html_search_regex(
return self.url_result(config['player_url'], 'Vine') r'data-(?:player-)?config="([^"]+)"', webpage,
'data player config', default='{}'),
video_id)
periscope_url = PeriscopeIE._extract_url(webpage) if config.get('source_type') == 'vine':
if periscope_url: return self.url_result(config['player_url'], 'Vine')
return self.url_result(periscope_url, PeriscopeIE.ie_key())
video_url = config.get('video_url') or config.get('playlist', [{}])[0].get('source') periscope_url = PeriscopeIE._extract_url(webpage)
if periscope_url:
return self.url_result(periscope_url, PeriscopeIE.ie_key())
if video_url: video_url = config.get('video_url') or config.get('playlist', [{}])[0].get('source')
if determine_ext(video_url) == 'm3u8':
formats.extend(self._extract_m3u8_formats(video_url, video_id, ext='mp4', m3u8_id='hls'))
else:
f = {
'url': video_url,
}
self._search_dimensions_in_video_url(f, video_url) if video_url:
if determine_ext(video_url) == 'm3u8':
formats.extend(self._extract_m3u8_formats(video_url, video_id, ext='mp4', m3u8_id='hls'))
else:
f = {
'url': video_url,
}
formats.append(f) self._search_dimensions_in_video_url(f, video_url)
vmap_url = config.get('vmapUrl') or config.get('vmap_url') formats.append(f)
if vmap_url:
formats.extend(
self._extract_formats_from_vmap_url(vmap_url, video_id))
media_info = None vmap_url = config.get('vmapUrl') or config.get('vmap_url')
if vmap_url:
formats.extend(
self._extract_formats_from_vmap_url(vmap_url, video_id))
for entity in config.get('status', {}).get('entities', []): media_info = None
if 'mediaInfo' in entity:
media_info = entity['mediaInfo']
if media_info: for entity in config.get('status', {}).get('entities', []):
formats.extend(self._parse_media_info(media_info, video_id)) if 'mediaInfo' in entity:
duration = float_or_none(media_info.get('duration', {}).get('nanos'), scale=1e9) media_info = entity['mediaInfo']
username = config.get('user', {}).get('screen_name') if media_info:
if username: formats.extend(self._parse_media_info(media_info, video_id))
formats.extend(self._extract_mobile_formats(username, video_id)) duration = float_or_none(media_info.get('duration', {}).get('nanos'), scale=1e9)
username = config.get('user', {}).get('screen_name')
if username:
formats.extend(self._extract_mobile_formats(username, video_id))
if formats:
break
self._remove_duplicate_formats(formats) self._remove_duplicate_formats(formats)
self._sort_formats(formats) self._sort_formats(formats)
@ -258,9 +259,6 @@ class TwitterIE(InfoExtractor):
'uploader_id': 'freethenipple', 'uploader_id': 'freethenipple',
'duration': 12.922, 'duration': 12.922,
}, },
'params': {
'skip_download': True, # requires ffmpeg
},
}, { }, {
'url': 'https://twitter.com/giphz/status/657991469417025536/photo/1', 'url': 'https://twitter.com/giphz/status/657991469417025536/photo/1',
'md5': 'f36dcd5fb92bf7057f155e7d927eeb42', 'md5': 'f36dcd5fb92bf7057f155e7d927eeb42',
@ -277,7 +275,6 @@ class TwitterIE(InfoExtractor):
'skip': 'Account suspended', 'skip': 'Account suspended',
}, { }, {
'url': 'https://twitter.com/starwars/status/665052190608723968', 'url': 'https://twitter.com/starwars/status/665052190608723968',
'md5': '39b7199856dee6cd4432e72c74bc69d4',
'info_dict': { 'info_dict': {
'id': '665052190608723968', 'id': '665052190608723968',
'ext': 'mp4', 'ext': 'mp4',
@ -303,20 +300,16 @@ class TwitterIE(InfoExtractor):
}, },
}, { }, {
'url': 'https://twitter.com/jaydingeer/status/700207533655363584', 'url': 'https://twitter.com/jaydingeer/status/700207533655363584',
'md5': '',
'info_dict': { 'info_dict': {
'id': '700207533655363584', 'id': '700207533655363584',
'ext': 'mp4', 'ext': 'mp4',
'title': 'あかさ - BEAT PROD: @suhmeduh #Damndaniel', 'title': 'JG - BEAT PROD: @suhmeduh #Damndaniel',
'description': 'あかさ on Twitter: "BEAT PROD: @suhmeduh https://t.co/HBrQ4AfpvZ #Damndaniel https://t.co/byBooq2ejZ"', 'description': 'JG on Twitter: "BEAT PROD: @suhmeduh https://t.co/HBrQ4AfpvZ #Damndaniel https://t.co/byBooq2ejZ"',
'thumbnail': r're:^https?://.*\.jpg', 'thumbnail': r're:^https?://.*\.jpg',
'uploader': 'あかさ', 'uploader': 'JG',
'uploader_id': 'jaydingeer', 'uploader_id': 'jaydingeer',
'duration': 30.0, 'duration': 30.0,
}, },
'params': {
'skip_download': True, # requires ffmpeg
},
}, { }, {
'url': 'https://twitter.com/Filmdrunk/status/713801302971588609', 'url': 'https://twitter.com/Filmdrunk/status/713801302971588609',
'md5': '89a15ed345d13b86e9a5a5e051fa308a', 'md5': '89a15ed345d13b86e9a5a5e051fa308a',
@ -342,9 +335,6 @@ class TwitterIE(InfoExtractor):
'uploader': 'Captain America', 'uploader': 'Captain America',
'duration': 3.17, 'duration': 3.17,
}, },
'params': {
'skip_download': True, # requires ffmpeg
},
}, { }, {
'url': 'https://twitter.com/OPP_HSD/status/779210622571536384', 'url': 'https://twitter.com/OPP_HSD/status/779210622571536384',
'info_dict': { 'info_dict': {
@ -370,9 +360,6 @@ class TwitterIE(InfoExtractor):
'uploader_id': 'news_al3alm', 'uploader_id': 'news_al3alm',
'duration': 277.4, 'duration': 277.4,
}, },
'params': {
'format': 'best[format_id^=http-]',
},
}, { }, {
'url': 'https://twitter.com/i/web/status/910031516746514432', 'url': 'https://twitter.com/i/web/status/910031516746514432',
'info_dict': { 'info_dict': {

View File

@ -62,11 +62,11 @@ class UdemyIE(InfoExtractor):
def _extract_course_info(self, webpage, video_id): def _extract_course_info(self, webpage, video_id):
course = self._parse_json( course = self._parse_json(
unescapeHTML(self._search_regex( unescapeHTML(self._search_regex(
r'ng-init=["\'].*\bcourse=({.+?});', webpage, 'course', default='{}')), r'ng-init=["\'].*\bcourse=({.+?})[;"\']',
webpage, 'course', default='{}')),
video_id, fatal=False) or {} video_id, fatal=False) or {}
course_id = course.get('id') or self._search_regex( course_id = course.get('id') or self._search_regex(
(r'&quot;id&quot;\s*:\s*(\d+)', r'data-course-id=["\'](\d+)'), r'data-course-id=["\'](\d+)', webpage, 'course id')
webpage, 'course id')
return course_id, course.get('title') return course_id, course.get('title')
def _enroll_course(self, base_url, webpage, course_id): def _enroll_course(self, base_url, webpage, course_id):
@ -257,6 +257,11 @@ class UdemyIE(InfoExtractor):
video_url = source.get('file') or source.get('src') video_url = source.get('file') or source.get('src')
if not video_url or not isinstance(video_url, compat_str): if not video_url or not isinstance(video_url, compat_str):
continue continue
if source.get('type') == 'application/x-mpegURL' or determine_ext(video_url) == 'm3u8':
formats.extend(self._extract_m3u8_formats(
video_url, video_id, 'mp4', entry_protocol='m3u8_native',
m3u8_id='hls', fatal=False))
continue
format_id = source.get('label') format_id = source.get('label')
f = { f = {
'url': video_url, 'url': video_url,

View File

@ -75,6 +75,10 @@ class XHamsterIE(InfoExtractor):
'params': { 'params': {
'skip_download': True, 'skip_download': True,
}, },
}, {
# mobile site
'url': 'https://m.xhamster.com/videos/cute-teen-jacqueline-solo-masturbation-8559111',
'only_matching': True,
}, { }, {
'url': 'https://xhamster.com/movies/2272726/amber_slayed_by_the_knight.html', 'url': 'https://xhamster.com/movies/2272726/amber_slayed_by_the_knight.html',
'only_matching': True, 'only_matching': True,
@ -93,7 +97,8 @@ class XHamsterIE(InfoExtractor):
video_id = mobj.group('id') or mobj.group('id_2') video_id = mobj.group('id') or mobj.group('id_2')
display_id = mobj.group('display_id') or mobj.group('display_id_2') display_id = mobj.group('display_id') or mobj.group('display_id_2')
webpage = self._download_webpage(url, video_id) desktop_url = re.sub(r'^(https?://(?:.+?\.)?)m\.', r'\1', url)
webpage = self._download_webpage(desktop_url, video_id)
error = self._html_search_regex( error = self._html_search_regex(
r'<div[^>]+id=["\']videoClosed["\'][^>]*>(.+?)</div>', r'<div[^>]+id=["\']videoClosed["\'][^>]*>(.+?)</div>',
@ -229,8 +234,8 @@ class XHamsterIE(InfoExtractor):
webpage, 'uploader', default='anonymous') webpage, 'uploader', default='anonymous')
thumbnail = self._search_regex( thumbnail = self._search_regex(
[r'''thumb\s*:\s*(?P<q>["'])(?P<thumbnail>.+?)(?P=q)''', [r'''["']thumbUrl["']\s*:\s*(?P<q>["'])(?P<thumbnail>.+?)(?P=q)''',
r'''<video[^>]+poster=(?P<q>["'])(?P<thumbnail>.+?)(?P=q)[^>]*>'''], r'''<video[^>]+"poster"=(?P<q>["'])(?P<thumbnail>.+?)(?P=q)[^>]*>'''],
webpage, 'thumbnail', fatal=False, group='thumbnail') webpage, 'thumbnail', fatal=False, group='thumbnail')
duration = parse_duration(self._search_regex( duration = parse_duration(self._search_regex(
@ -274,15 +279,16 @@ class XHamsterIE(InfoExtractor):
class XHamsterEmbedIE(InfoExtractor): class XHamsterEmbedIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?xhamster\.com/xembed\.php\?video=(?P<id>\d+)' _VALID_URL = r'https?://(?:.+?\.)?xhamster\.com/xembed\.php\?video=(?P<id>\d+)'
_TEST = { _TEST = {
'url': 'http://xhamster.com/xembed.php?video=3328539', 'url': 'http://xhamster.com/xembed.php?video=3328539',
'info_dict': { 'info_dict': {
'id': '3328539', 'id': '3328539',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Pen Masturbation', 'title': 'Pen Masturbation',
'timestamp': 1406581861,
'upload_date': '20140728', 'upload_date': '20140728',
'uploader_id': 'anonymous', 'uploader': 'ManyakisArt',
'duration': 5, 'duration': 5,
'age_limit': 18, 'age_limit': 18,
} }

View File

@ -1,3 +1,3 @@
from __future__ import unicode_literals from __future__ import unicode_literals
__version__ = '2017.12.02' __version__ = '2017.12.10'