1
0
mirror of https://github.com/l1ving/youtube-dl synced 2025-03-10 22:17:20 +08:00

Merge branch 'master' into fix.25.12.2018

# Conflicts:
#	youtube_dl/version.py
This commit is contained in:
Avi Peretz 2019-06-09 21:22:28 +03:00
commit c8e18d7718
40 changed files with 660 additions and 701 deletions

View File

@ -18,7 +18,7 @@ title: ''
<!--
Carefully read and work through this check list in order to prevent the most common mistakes and misuse of youtube-dl:
- First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2019.05.11. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED.
- First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2019.06.08. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED.
- Make sure that all provided video/audio/playlist URLs (if any) are alive and playable in a browser.
- Make sure that all URLs and arguments with special characters are properly quoted or escaped as explained in http://yt-dl.org/escape.
- Search the bugtracker for similar issues: http://yt-dl.org/search-issues. DO NOT post duplicates.
@ -26,7 +26,7 @@ Carefully read and work through this check list in order to prevent the most com
-->
- [ ] I'm reporting a broken site support
- [ ] I've verified that I'm running youtube-dl version **2019.05.11**
- [ ] I've verified that I'm running youtube-dl version **2019.06.08**
- [ ] I've checked that all provided URLs are alive and playable in a browser
- [ ] I've checked that all URLs and arguments with special characters are properly quoted or escaped
- [ ] I've searched the bugtracker for similar issues including closed ones
@ -41,7 +41,7 @@ Add the `-v` flag to your command line you run youtube-dl with (`youtube-dl -v <
[debug] User config: []
[debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj']
[debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251
[debug] youtube-dl version 2019.05.11
[debug] youtube-dl version 2019.06.08
[debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2
[debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4
[debug] Proxy map: {}

View File

@ -19,7 +19,7 @@ labels: 'site-support-request'
<!--
Carefully read and work through this check list in order to prevent the most common mistakes and misuse of youtube-dl:
- First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2019.05.11. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED.
- First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2019.06.08. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED.
- Make sure that all provided video/audio/playlist URLs (if any) are alive and playable in a browser.
- Make sure that site you are requesting is not dedicated to copyright infringement, see https://yt-dl.org/copyright-infringement. youtube-dl does not support such sites. In order for site support request to be accepted all provided example URLs should not violate any copyrights.
- Search the bugtracker for similar site support requests: http://yt-dl.org/search-issues. DO NOT post duplicates.
@ -27,7 +27,7 @@ Carefully read and work through this check list in order to prevent the most com
-->
- [ ] I'm reporting a new site support request
- [ ] I've verified that I'm running youtube-dl version **2019.05.11**
- [ ] I've verified that I'm running youtube-dl version **2019.06.08**
- [ ] I've checked that all provided URLs are alive and playable in a browser
- [ ] I've checked that none of provided URLs violate any copyrights
- [ ] I've searched the bugtracker for similar site support requests including closed ones

View File

@ -18,13 +18,13 @@ title: ''
<!--
Carefully read and work through this check list in order to prevent the most common mistakes and misuse of youtube-dl:
- First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2019.05.11. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED.
- First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2019.06.08. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED.
- Search the bugtracker for similar site feature requests: http://yt-dl.org/search-issues. DO NOT post duplicates.
- Finally, put x into all relevant boxes (like this [x])
-->
- [ ] I'm reporting a site feature request
- [ ] I've verified that I'm running youtube-dl version **2019.05.11**
- [ ] I've verified that I'm running youtube-dl version **2019.06.08**
- [ ] I've searched the bugtracker for similar site feature requests including closed ones

View File

@ -18,7 +18,7 @@ title: ''
<!--
Carefully read and work through this check list in order to prevent the most common mistakes and misuse of youtube-dl:
- First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2019.05.11. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED.
- First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2019.06.08. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED.
- Make sure that all provided video/audio/playlist URLs (if any) are alive and playable in a browser.
- Make sure that all URLs and arguments with special characters are properly quoted or escaped as explained in http://yt-dl.org/escape.
- Search the bugtracker for similar issues: http://yt-dl.org/search-issues. DO NOT post duplicates.
@ -27,7 +27,7 @@ Carefully read and work through this check list in order to prevent the most com
-->
- [ ] I'm reporting a broken site support issue
- [ ] I've verified that I'm running youtube-dl version **2019.05.11**
- [ ] I've verified that I'm running youtube-dl version **2019.06.08**
- [ ] I've checked that all provided URLs are alive and playable in a browser
- [ ] I've checked that all URLs and arguments with special characters are properly quoted or escaped
- [ ] I've searched the bugtracker for similar bug reports including closed ones
@ -43,7 +43,7 @@ Add the `-v` flag to your command line you run youtube-dl with (`youtube-dl -v <
[debug] User config: []
[debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj']
[debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251
[debug] youtube-dl version 2019.05.11
[debug] youtube-dl version 2019.06.08
[debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2
[debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4
[debug] Proxy map: {}

View File

@ -19,13 +19,13 @@ labels: 'request'
<!--
Carefully read and work through this check list in order to prevent the most common mistakes and misuse of youtube-dl:
- First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2019.05.11. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED.
- First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2019.06.08. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED.
- Search the bugtracker for similar feature requests: http://yt-dl.org/search-issues. DO NOT post duplicates.
- Finally, put x into all relevant boxes (like this [x])
-->
- [ ] I'm reporting a feature request
- [ ] I've verified that I'm running youtube-dl version **2019.05.11**
- [ ] I've verified that I'm running youtube-dl version **2019.06.08**
- [ ] I've searched the bugtracker for similar feature requests including closed ones

View File

@ -9,6 +9,7 @@ python:
- "3.6"
- "pypy"
- "pypy3"
dist: trusty
env:
- YTDL_TEST_SET=core
- YTDL_TEST_SET=download

View File

@ -1,3 +1,60 @@
version 2019.06.08
Core
* [downloader/common] Improve rate limit (#21301)
* [utils] Improve strip_or_none
* [extractor/common] Strip src attribute for HTML5 entries code (#18485,
#21169)
Extractors
* [ted] Fix playlist extraction (#20844, #21032)
* [vlive:playlist] Fix video extraction when no playlist is found (#20590)
+ [vlive] Add CH+ support (#16887, #21209)
+ [openload] Add support for oload.website (#21329)
+ [tvnow] Extract HD formats (#21201)
+ [redbulltv] Add support for rrn:content URLs (#21297)
* [youtube] Fix average rating extraction (#21304)
+ [bitchute] Extract HTML5 formats (#21306)
* [cbsnews] Fix extraction (#9659, #15397)
* [vvvvid] Relax URL regular expression (#21299)
+ [prosiebensat1] Add support for new API (#21272)
+ [vrv] Extract adaptive_hls formats (#21243)
* [viki] Switch to HTTPS (#21001)
* [LiveLeak] Check if the original videos exist (#21206, #21208)
* [rtp] Fix extraction (#15099)
* [youtube] Improve DRM protected videos detection (#1774)
+ [srgssrplay] Add support for popupvideoplayer URLs (#21155)
+ [24video] Add support for porno.24video.net (#21194)
+ [24video] Add support for 24video.site (#21193)
- [pornflip] Remove extractor
- [criterion] Remove extractor (#21195)
* [pornhub] Use HTTPS (#21061)
* [bitchute] Fix uploader extraction (#21076)
* [streamcloud] Reduce waiting time to 6 seconds (#21092)
- [novamov] Remove extractors (#21077)
+ [openload] Add support for oload.press (#21135)
* [vivo] Fix extraction (#18906, #19217)
version 2019.05.20
Core
+ [extractor/common] Move workaround for applying first Set-Cookie header
into a separate _apply_first_set_cookie_header method
Extractors
* [safari] Fix authentication (#21090)
* [vk] Use _apply_first_set_cookie_header
* [vrt] Fix extraction (#20527)
+ [canvas] Add support for vrtnieuws and sporza site ids and extract
AES HLS formats
+ [vrv] Extract captions (#19238)
* [tele5] Improve video id extraction
* [tele5] Relax URL regular expression (#21020, #21063)
* [svtplay] Update API URL (#21075)
+ [yahoo:gyao] Add X-User-Agent header to dam proxy requests (#21071)
version 2019.05.11
Core

View File

@ -78,7 +78,6 @@
- **AudioBoom**
- **audiomack**
- **audiomack:album**
- **auroravid**: AuroraVid
- **AWAAN**
- **awaan:live**
- **awaan:season**
@ -150,6 +149,7 @@
- **CBSInteractive**
- **CBSLocal**
- **cbsnews**: CBS News
- **cbsnews:embed**
- **cbsnews:livevideo**: CBS News Live Videos
- **CBSSports**
- **CCMA**
@ -174,7 +174,6 @@
- **Clipsyndicate**
- **CloserToTruth**
- **CloudflareStream**
- **cloudtime**: CloudTime
- **Cloudy**
- **Clubic**
- **Clyp**
@ -194,7 +193,6 @@
- **Coub**
- **Cracked**
- **Crackle**
- **Criterion**
- **CrooksAndLiars**
- **crunchyroll**
- **crunchyroll:playlist**
@ -609,7 +607,6 @@
- **nowness**
- **nowness:playlist**
- **nowness:series**
- **nowvideo**: NowVideo
- **Noz**
- **npo**: npo.nl, ntr.nl, omroepwnl.nl, zapp.nl and npo3.nl
- **npo.nl:live**
@ -693,7 +690,6 @@
- **PopcornTV**
- **PornCom**
- **PornerBros**
- **PornFlip**
- **PornHd**
- **PornHub**: PornHub and Thumbzilla
- **PornHubPlaylist**
@ -734,6 +730,7 @@
- **RBMARadio**
- **RDS**: RDS.ca
- **RedBullTV**
- **RedBullTVRrnContent**
- **Reddit**
- **RedditR**
- **RedTube**
@ -1024,7 +1021,6 @@
- **videomore:video**
- **VideoPremium**
- **VideoPress**
- **videoweed**: VideoWeed
- **Vidio**
- **VidLii**
- **vidme**
@ -1071,7 +1067,7 @@
- **VoxMediaVolume**
- **vpro**: npo.nl, ntr.nl, omroepwnl.nl, zapp.nl and npo3.nl
- **Vrak**
- **VRT**: deredactie.be, sporza.be, cobra.be and cobra.canvas.be
- **VRT**: VRT NWS, Flanders News, Flandern Info and Sporza
- **VrtNU**: VrtNU.be
- **vrv**
- **vrv:series**
@ -1101,7 +1097,6 @@
- **Weibo**
- **WeiboMobile**
- **WeiqiTV**: WQTV
- **wholecloud**: WholeCloud
- **Wimp**
- **Wistia**
- **wnl**: npo.nl, ntr.nl, omroepwnl.nl, zapp.nl and npo3.nl

View File

@ -73,6 +73,7 @@ from youtube_dl.utils import (
smuggle_url,
str_to_int,
strip_jsonp,
strip_or_none,
timeconvert,
unescapeHTML,
unified_strdate,
@ -752,6 +753,18 @@ class TestUtil(unittest.TestCase):
d = json.loads(stripped)
self.assertEqual(d, {'status': 'success'})
def test_strip_or_none(self):
self.assertEqual(strip_or_none(' abc'), 'abc')
self.assertEqual(strip_or_none('abc '), 'abc')
self.assertEqual(strip_or_none(' abc '), 'abc')
self.assertEqual(strip_or_none('\tabc\t'), 'abc')
self.assertEqual(strip_or_none('\n\tabc\n\t'), 'abc')
self.assertEqual(strip_or_none('abc'), 'abc')
self.assertEqual(strip_or_none(''), '')
self.assertEqual(strip_or_none(None), None)
self.assertEqual(strip_or_none(42), None)
self.assertEqual(strip_or_none([]), None)
def test_uppercase_escape(self):
self.assertEqual(uppercase_escape(''), '')
self.assertEqual(uppercase_escape('\\U0001d550'), '𝕐')

View File

@ -176,7 +176,9 @@ class FileDownloader(object):
return
speed = float(byte_counter) / elapsed
if speed > rate_limit:
time.sleep(max((byte_counter // rate_limit) - elapsed, 0))
sleep_time = float(byte_counter) / rate_limit - elapsed
if sleep_time > 0:
time.sleep(sleep_time)
def temp_name(self, filename):
"""Returns a temporary filename for the given filename."""

View File

@ -55,6 +55,11 @@ class BitChuteIE(InfoExtractor):
formats = [
{'url': format_url}
for format_url in orderedSet(format_urls)]
if not formats:
formats = self._parse_html5_media_entries(
url, webpage, video_id)[0]['formats']
self._check_formats(formats, video_id)
self._sort_formats(formats)
@ -65,8 +70,9 @@ class BitChuteIE(InfoExtractor):
webpage, default=None) or self._html_search_meta(
'twitter:image:src', webpage, 'thumbnail')
uploader = self._html_search_regex(
r'(?s)<p\b[^>]+\bclass=["\']video-author[^>]+>(.+?)</p>', webpage,
'uploader', fatal=False)
(r'(?s)<div class=["\']channel-banner.*?<p\b[^>]+\bclass=["\']name[^>]+>(.+?)</p>',
r'(?s)<p\b[^>]+\bclass=["\']video-author[^>]+>(.+?)</p>'),
webpage, 'uploader', fatal=False)
return {
'id': video_id,

View File

@ -69,7 +69,7 @@ class CBSIE(CBSBaseIE):
last_e = None
for item in items_data.findall('.//item'):
asset_type = xpath_text(item, 'assetType')
if not asset_type or asset_type in asset_types or asset_type in ('HLS_FPS', 'DASH_CENC'):
if not asset_type or asset_type in asset_types or 'HLS_FPS' in asset_type or 'DASH_CENC' in asset_type:
continue
asset_types.append(asset_type)
query = {

View File

@ -1,40 +1,62 @@
# coding: utf-8
from __future__ import unicode_literals
import re
import zlib
from .common import InfoExtractor
from .cbs import CBSIE
from ..compat import (
compat_b64decode,
compat_urllib_parse_unquote,
)
from ..utils import (
parse_duration,
)
class CBSNewsEmbedIE(CBSIE):
IE_NAME = 'cbsnews:embed'
_VALID_URL = r'https?://(?:www\.)?cbsnews\.com/embed/video[^#]*#(?P<id>.+)'
_TESTS = [{
'url': 'https://www.cbsnews.com/embed/video/?v=1.c9b5b61492913d6660db0b2f03579ef25e86307a#1Vb7b9s2EP5XBAHbT6Gt98PAMKTJ0se6LVjWYWtdGBR1stlIpEBSTtwi%2F%2FvuJNkNhmHdGxgM2NL57vjd6zt%2B8PngdN%2Fyg79qeGvhzN%2FLGrS%2F%2BuBLB531V28%2B%2BO7Qg7%2Fy97r2z3xZ42NW8yLhDbA0S0KWlHnIijwKWJBHZZnHBa8Cgbpdf%2F89NM9Hi9fXifhpr8sr%2FlP848tn%2BTdXycX25zh4cdX%2FvHl6PmmPqnWQv9w8Ed%2B9GjYRim07bFEqdG%2BZVHuwTm65A7bVRrYtR5lAyMox7pigF6W4k%2By91mjspGsJ%2BwVae4%2BsvdnaO1p73HkXs%2FVisUDTGm7R8IcdnOROeq%2B19qT1amhA1VJtPenoTUgrtfKc9m7Rq8dP7nnjwOB7wg7ADdNt7VX64DWAWlKhPtmDEq22g4GF99x6Dk9E8OSsankHXqPNKDxC%2FdK7MLKTircTDgsI3mmj4OBdSq64dy7fd1x577RU1rt4cvMtOaulFYOd%2FLewRWvDO9lIgXFpZSnkZmjbv5SxKTPoQXClFbpsf%2Fhbbpzs0IB3vb8KkyzJQ%2BywOAgCrMpgRrz%2BKk4fvb7kFbR4XJCu0gAdtNO7woCwZTu%2BBUs9bam%2Fds71drVerpeisgrubLjAB4nnOSkWQnfr5W6o1ku5Xpr1MgrCbL0M0vUyDtfLLK15WiYp47xKWSLyjFVpwVmVJSLIoCjSOFkv3W7oKsVliwZJcB9nwXpZ5GEQQwY8jNKqKCBrgjTLeFxgdCIpazojDgnRtn43J6kG7nZ6cAbxh0EeFFk4%2B1u867cY5u4344n%2FxXjCqAjucdTHgLKojNKmSfO8KRsOFY%2FzKEYCKEJBzv90QA9nfm9gL%2BHulaFqUkz9ULUYxl62B3U%2FRVNLA8IhggaPycOoBuwOCESciDQVSSUgiOMsROB%2FhKfwCKOzEk%2B4k6rWd4uuT%2FwTDz7K7t3d3WLO8ISD95jSPQbayBacthbz86XVgxHwhex5zawzgDOmtp%2F3GPcXn0VXHdSS029%2Fj99UC%2FwJUvyKQ%2FzKyixIEVlYJOn4RxxuaH43Ty9fbJ5OObykHH435XAzJTHeOF4hhEUXD8URe%2FQ%2FBT%2BMpf8d5GN02Ox%2FfiGsl7TA7POu1xZ5%2BbTzcAVKMe48mqcC21hkacVEVScM26liVVBnrKkC4CLKyzAvHu0lhEaTKMFwI3a4SN9MsrfYzdBLq2vkwRD1gVviLT8kY9h2CHH6Y%2Bix6609weFtey4ESp60WtyeWMy%2BsmBuhsoKIyuoT%2Bq2R%2FrW5qi3g%2FvzS2j40DoixDP8%2BKP0yUdpXJ4l6Vla%2Bg9vce%2BC4yM5YlUcbA%2F0jLKdpmTwvsdN5z88nAIe08%2F0HgxeG1iv%2B6Hlhjh7uiW0SDzYNI92L401uha3JKYk268UVRzdOzNQvAaJqoXzAc80dAV440NZ1WVVAAMRYQ2KrGJFmDUsq8saWSnjvIj8t78y%2FRa3JRnbHVfyFpfwoDiGpPgjzekyUiKNlU3OMlwuLMmzgvEojllYVE2Z1HhImvsnk%2BuhusTEoB21PAtSFodeFK3iYhXEH9WOG2%2FkOE833sfeG%2Ff5cfHtEFNXgYes0%2FXj7aGivUgJ9XpusCtoNcNYVVnJVrrDo0OmJAutHCpuZul4W9lLcfy7BnuLPT02%2ByXsCTk%2B9zhzswIN04YueNSK%2BPtM0jS88QdLqSLJDTLsuGZJNolm2yO0PXh3UPnz9Ix5bfIAqxPjvETQsDCEiPG4QbqNyhBZISxybLnZYCrW5H3Axp690%2F0BJdXtDZ5ITuM4xj3f4oUHGzc5JeJmZKpp%2FjwKh4wMV%2FV1yx3emLoR0MwbG4K%2F%2BZgVep3PnzXGDHZ6a3i%2Fk%2BJrONDN13%2Bnq6tBTYk4o7cLGhBtqCC4KwacGHpEVuoH5JNro%2FE6JfE6d5RydbiR76k%2BW5wioDHBIjw1euhHjUGRB0y5A97KoaPx6MlL%2BwgboUVtUFRI%2FLemgTpdtF59ii7pab08kuPcfWzs0l%2FRI5takWnFpka0zOgWRtYcuf9aIxZMxlwr6IiGpsb6j2DQUXPl%2FimXI599Ev7fWjoPD78A',
'only_matching': True,
}]
def _real_extract(self, url):
item = self._parse_json(zlib.decompress(compat_b64decode(
compat_urllib_parse_unquote(self._match_id(url))),
-zlib.MAX_WBITS), None)['video']['items'][0]
return self._extract_video_info(item['mpxRefId'], 'cbsnews')
class CBSNewsIE(CBSIE):
IE_NAME = 'cbsnews'
IE_DESC = 'CBS News'
_VALID_URL = r'https?://(?:www\.)?cbsnews\.com/(?:news|videos)/(?P<id>[\da-z_-]+)'
_VALID_URL = r'https?://(?:www\.)?cbsnews\.com/(?:news|video)/(?P<id>[\da-z_-]+)'
_TESTS = [
{
# 60 minutes
'url': 'http://www.cbsnews.com/news/artificial-intelligence-positioned-to-be-a-game-changer/',
'info_dict': {
'id': '_B6Ga3VJrI4iQNKsir_cdFo9Re_YJHE_',
'ext': 'mp4',
'title': 'Artificial Intelligence',
'description': 'md5:8818145f9974431e0fb58a1b8d69613c',
'id': 'Y_nf_aEg6WwO9OLAq0MpKaPgfnBUxfW4',
'ext': 'flv',
'title': 'Artificial Intelligence, real-life applications',
'description': 'md5:a7aaf27f1b4777244de8b0b442289304',
'thumbnail': r're:^https?://.*\.jpg$',
'duration': 1606,
'duration': 317,
'uploader': 'CBSI-NEW',
'timestamp': 1498431900,
'upload_date': '20170625',
'timestamp': 1476046464,
'upload_date': '20161009',
},
'params': {
# m3u8 download
# rtmp download
'skip_download': True,
},
},
{
'url': 'http://www.cbsnews.com/videos/fort-hood-shooting-army-downplays-mental-illness-as-cause-of-attack/',
'url': 'https://www.cbsnews.com/video/fort-hood-shooting-army-downplays-mental-illness-as-cause-of-attack/',
'info_dict': {
'id': 'SNJBOYzXiWBOvaLsdzwH8fmtP1SCd91Y',
'ext': 'mp4',
@ -60,37 +82,29 @@ class CBSNewsIE(CBSIE):
# 48 hours
'url': 'http://www.cbsnews.com/news/maria-ridulph-murder-will-the-nations-oldest-cold-case-to-go-to-trial-ever-get-solved/',
'info_dict': {
'id': 'QpM5BJjBVEAUFi7ydR9LusS69DPLqPJ1',
'ext': 'mp4',
'title': 'Cold as Ice',
'description': 'Can a childhood memory of a friend\'s murder solve a 1957 cold case? "48 Hours" correspondent Erin Moriarty has the latest.',
'upload_date': '20170604',
'timestamp': 1496538000,
'uploader': 'CBSI-NEW',
},
'params': {
'skip_download': True,
'description': 'Can a childhood memory solve the 1957 murder of 7-year-old Maria Ridulph?',
},
'playlist_mincount': 7,
},
]
def _real_extract(self, url):
video_id = self._match_id(url)
display_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
webpage = self._download_webpage(url, display_id)
video_info = self._parse_json(self._html_search_regex(
r'(?:<ul class="media-list items" id="media-related-items"[^>]*><li data-video-info|<div id="cbsNewsVideoPlayer" data-video-player-options)=\'({.+?})\'',
webpage, 'video JSON info', default='{}'), video_id, fatal=False)
if video_info:
item = video_info['item'] if 'item' in video_info else video_info
else:
state = self._parse_json(self._search_regex(
r'data-cbsvideoui-options=(["\'])(?P<json>{.+?})\1', webpage,
'playlist JSON info', group='json'), video_id)['state']
item = state['playlist'][state['pid']]
entries = []
for embed_url in re.findall(r'<iframe[^>]+data-src="(https?://(?:www\.)?cbsnews\.com/embed/video/[^#]*#[^"]+)"', webpage):
entries.append(self.url_result(embed_url, CBSNewsEmbedIE.ie_key()))
if entries:
return self.playlist_result(
entries, playlist_title=self._html_search_meta(['og:title', 'twitter:title'], webpage),
playlist_description=self._html_search_meta(['og:description', 'twitter:description', 'description'], webpage))
item = self._parse_json(self._html_search_regex(
r'CBSNEWS\.defaultPayload\s*=\s*({.+})',
webpage, 'video JSON info'), display_id)['items'][0]
return self._extract_video_info(item['mpxRefId'], 'cbsnews')

View File

@ -67,6 +67,7 @@ from ..utils import (
sanitized_Request,
sanitize_filename,
str_or_none,
strip_or_none,
unescapeHTML,
unified_strdate,
unified_timestamp,
@ -2480,7 +2481,7 @@ class InfoExtractor(object):
'subtitles': {},
}
media_attributes = extract_attributes(media_tag)
src = media_attributes.get('src')
src = strip_or_none(media_attributes.get('src'))
if src:
_, formats = _media_formats(src, media_type)
media_info['formats'].extend(formats)
@ -2490,7 +2491,7 @@ class InfoExtractor(object):
s_attr = extract_attributes(source_tag)
# data-video-src and data-src are non standard but seen
# several times in the wild
src = dict_get(s_attr, ('src', 'data-video-src', 'data-src'))
src = strip_or_none(dict_get(s_attr, ('src', 'data-video-src', 'data-src')))
if not src:
continue
f = parse_content_type(s_attr.get('type'))
@ -2533,7 +2534,7 @@ class InfoExtractor(object):
track_attributes = extract_attributes(track_tag)
kind = track_attributes.get('kind')
if not kind or kind in ('subtitles', 'captions'):
src = track_attributes.get('src')
src = strip_or_none(track_attributes.get('src'))
if not src:
continue
lang = track_attributes.get('srclang') or track_attributes.get('lang') or track_attributes.get('label')
@ -2817,6 +2818,33 @@ class InfoExtractor(object):
self._downloader.cookiejar.add_cookie_header(req)
return compat_cookies.SimpleCookie(req.get_header('Cookie'))
def _apply_first_set_cookie_header(self, url_handle, cookie):
"""
Apply first Set-Cookie header instead of the last. Experimental.
Some sites (e.g. [1-3]) may serve two cookies under the same name
in Set-Cookie header and expect the first (old) one to be set rather
than second (new). However, as of RFC6265 the newer one cookie
should be set into cookie store what actually happens.
We will workaround this issue by resetting the cookie to
the first one manually.
1. https://new.vk.com/
2. https://github.com/ytdl-org/youtube-dl/issues/9841#issuecomment-227871201
3. https://learning.oreilly.com/
"""
for header, cookies in url_handle.headers.items():
if header.lower() != 'set-cookie':
continue
if sys.version_info[0] >= 3:
cookies = cookies.encode('iso-8859-1')
cookies = cookies.decode('utf-8')
cookie_value = re.search(
r'%s=(.+?);.*?\b[Dd]omain=(.+?)(?:[,;]|$)' % cookie, cookies)
if cookie_value:
value, domain = cookie_value.groups()
self._set_cookie(domain, cookie, value)
break
def get_testcases(self, include_onlymatching=False):
t = getattr(self, '_TEST', None)
if t:

View File

@ -1,39 +0,0 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
class CriterionIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?criterion\.com/films/(?P<id>[0-9]+)-.+'
_TEST = {
'url': 'http://www.criterion.com/films/184-le-samourai',
'md5': 'bc51beba55685509883a9a7830919ec3',
'info_dict': {
'id': '184',
'ext': 'mp4',
'title': 'Le Samouraï',
'description': 'md5:a2b4b116326558149bef81f76dcbb93f',
'thumbnail': r're:^https?://.*\.jpg$',
}
}
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
final_url = self._search_regex(
r'so\.addVariable\("videoURL", "(.+?)"\)\;', webpage, 'video url')
title = self._og_search_title(webpage)
description = self._html_search_meta('description', webpage)
thumbnail = self._search_regex(
r'so\.addVariable\("thumbnailURL", "(.+?)"\)\;',
webpage, 'thumbnail url')
return {
'id': video_id,
'url': final_url,
'title': title,
'description': description,
'thumbnail': thumbnail,
}

View File

@ -173,6 +173,7 @@ from .cbs import CBSIE
from .cbslocal import CBSLocalIE
from .cbsinteractive import CBSInteractiveIE
from .cbsnews import (
CBSNewsEmbedIE,
CBSNewsIE,
CBSNewsLiveVideoIE,
)
@ -240,7 +241,6 @@ from .condenast import CondeNastIE
from .corus import CorusIE
from .cracked import CrackedIE
from .crackle import CrackleIE
from .criterion import CriterionIE
from .crooksandliars import CrooksAndLiarsIE
from .crunchyroll import (
CrunchyrollIE,
@ -772,13 +772,6 @@ from .nova import (
NovaEmbedIE,
NovaIE,
)
from .novamov import (
AuroraVidIE,
CloudTimeIE,
NowVideoIE,
VideoWeedIE,
WholeCloudIE,
)
from .nowness import (
NownessIE,
NownessPlaylistIE,
@ -896,7 +889,6 @@ from .polskieradio import (
from .popcorntv import PopcornTVIE
from .porn91 import Porn91IE
from .porncom import PornComIE
from .pornflip import PornFlipIE
from .pornhd import PornHdIE
from .pornhub import (
PornHubIE,
@ -946,7 +938,10 @@ from .raywenderlich import (
)
from .rbmaradio import RBMARadioIE
from .rds import RDSIE
from .redbulltv import RedBullTVIE
from .redbulltv import (
RedBullTVIE,
RedBullTVRrnContentIE,
)
from .reddit import (
RedditIE,
RedditRIE,

View File

@ -2583,19 +2583,6 @@ class GenericIE(InfoExtractor):
if mobj is not None:
return self.url_result(mobj.group(1), 'Mpora')
# Look for embedded NovaMov-based player
mobj = re.search(
r'''(?x)<(?:pagespeed_)?iframe[^>]+?src=(["\'])
(?P<url>http://(?:(?:embed|www)\.)?
(?:novamov\.com|
nowvideo\.(?:ch|sx|eu|at|ag|co)|
videoweed\.(?:es|com)|
movshare\.(?:net|sx|ag)|
divxstage\.(?:eu|net|ch|co|at|ag))
/embed\.php.+?)\1''', webpage)
if mobj is not None:
return self.url_result(mobj.group('url'))
# Look for embedded Facebook player
facebook_urls = FacebookIE._extract_urls(webpage)
if facebook_urls:

View File

@ -82,6 +82,10 @@ class LiveLeakIE(InfoExtractor):
}, {
'url': 'https://www.liveleak.com/view?t=HvHi_1523016227',
'only_matching': True,
}, {
# No original video
'url': 'https://www.liveleak.com/view?t=C26ZZ_1558612804',
'only_matching': True,
}]
@staticmethod
@ -134,11 +138,13 @@ class LiveLeakIE(InfoExtractor):
orig_url = re.sub(r'\.mp4\.[^.]+', '', a_format['url'])
if a_format['url'] != orig_url:
format_id = a_format.get('format_id')
formats.append({
'format_id': 'original' + ('-' + format_id if format_id else ''),
'url': orig_url,
'preference': 1,
})
format_id = 'original' + ('-' + format_id if format_id else '')
if self._is_valid_url(orig_url, video_id, format_id):
formats.append({
'format_id': format_id,
'url': orig_url,
'preference': 1,
})
self._sort_formats(formats)
info_dict['formats'] = formats

View File

@ -1,212 +0,0 @@
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..compat import compat_urlparse
from ..utils import (
ExtractorError,
NO_DEFAULT,
sanitized_Request,
urlencode_postdata,
)
class NovaMovIE(InfoExtractor):
IE_NAME = 'novamov'
IE_DESC = 'NovaMov'
_VALID_URL_TEMPLATE = r'''(?x)
http://
(?:
(?:www\.)?%(host)s/(?:file|video|mobile/\#/videos)/|
(?:(?:embed|www)\.)%(host)s/embed(?:\.php|/)?\?(?:.*?&)?\bv=
)
(?P<id>[a-z\d]{13})
'''
_VALID_URL = _VALID_URL_TEMPLATE % {'host': r'novamov\.com'}
_HOST = 'www.novamov.com'
_FILE_DELETED_REGEX = r'This file no longer exists on our servers!</h2>'
_FILEKEY_REGEX = r'flashvars\.filekey=(?P<filekey>"?[^"]+"?);'
_TITLE_REGEX = r'(?s)<div class="v_tab blockborder rounded5" id="v_tab1">\s*<h3>([^<]+)</h3>'
_DESCRIPTION_REGEX = r'(?s)<div class="v_tab blockborder rounded5" id="v_tab1">\s*<h3>[^<]+</h3><p>([^<]+)</p>'
_URL_TEMPLATE = 'http://%s/video/%s'
_TEST = None
def _check_existence(self, webpage, video_id):
if re.search(self._FILE_DELETED_REGEX, webpage) is not None:
raise ExtractorError('Video %s does not exist' % video_id, expected=True)
def _real_extract(self, url):
video_id = self._match_id(url)
url = self._URL_TEMPLATE % (self._HOST, video_id)
webpage = self._download_webpage(
url, video_id, 'Downloading video page')
self._check_existence(webpage, video_id)
def extract_filekey(default=NO_DEFAULT):
filekey = self._search_regex(
self._FILEKEY_REGEX, webpage, 'filekey', default=default)
if filekey is not default and (filekey[0] != '"' or filekey[-1] != '"'):
return self._search_regex(
r'var\s+%s\s*=\s*"([^"]+)"' % re.escape(filekey), webpage, 'filekey', default=default)
else:
return filekey
filekey = extract_filekey(default=None)
if not filekey:
fields = self._hidden_inputs(webpage)
post_url = self._search_regex(
r'<form[^>]+action=(["\'])(?P<url>.+?)\1', webpage,
'post url', default=url, group='url')
if not post_url.startswith('http'):
post_url = compat_urlparse.urljoin(url, post_url)
request = sanitized_Request(
post_url, urlencode_postdata(fields))
request.add_header('Content-Type', 'application/x-www-form-urlencoded')
request.add_header('Referer', post_url)
webpage = self._download_webpage(
request, video_id, 'Downloading continue to the video page')
self._check_existence(webpage, video_id)
filekey = extract_filekey()
title = self._html_search_regex(self._TITLE_REGEX, webpage, 'title')
description = self._html_search_regex(self._DESCRIPTION_REGEX, webpage, 'description', default='', fatal=False)
api_response = self._download_webpage(
'http://%s/api/player.api.php?key=%s&file=%s' % (self._HOST, filekey, video_id), video_id,
'Downloading video api response')
response = compat_urlparse.parse_qs(api_response)
if 'error_msg' in response:
raise ExtractorError('%s returned error: %s' % (self.IE_NAME, response['error_msg'][0]), expected=True)
video_url = response['url'][0]
return {
'id': video_id,
'url': video_url,
'title': title,
'description': description
}
class WholeCloudIE(NovaMovIE):
IE_NAME = 'wholecloud'
IE_DESC = 'WholeCloud'
_VALID_URL = NovaMovIE._VALID_URL_TEMPLATE % {'host': r'(?:wholecloud\.net|movshare\.(?:net|sx|ag))'}
_HOST = 'www.wholecloud.net'
_FILE_DELETED_REGEX = r'>This file no longer exists on our servers.<'
_TITLE_REGEX = r'<strong>Title:</strong> ([^<]+)</p>'
_DESCRIPTION_REGEX = r'<strong>Description:</strong> ([^<]+)</p>'
_TEST = {
'url': 'http://www.wholecloud.net/video/559e28be54d96',
'md5': 'abd31a2132947262c50429e1d16c1bfd',
'info_dict': {
'id': '559e28be54d96',
'ext': 'flv',
'title': 'dissapeared image',
'description': 'optical illusion dissapeared image magic illusion',
}
}
class NowVideoIE(NovaMovIE):
IE_NAME = 'nowvideo'
IE_DESC = 'NowVideo'
_VALID_URL = NovaMovIE._VALID_URL_TEMPLATE % {'host': r'nowvideo\.(?:to|ch|ec|sx|eu|at|ag|co|li)'}
_HOST = 'www.nowvideo.to'
_FILE_DELETED_REGEX = r'>This file no longer exists on our servers.<'
_TITLE_REGEX = r'<h4>([^<]+)</h4>'
_DESCRIPTION_REGEX = r'</h4>\s*<p>([^<]+)</p>'
_TEST = {
'url': 'http://www.nowvideo.sx/video/f1d6fce9a968b',
'md5': '12c82cad4f2084881d8bc60ee29df092',
'info_dict': {
'id': 'f1d6fce9a968b',
'ext': 'flv',
'title': 'youtubedl test video BaWjenozKc',
'description': 'Description',
},
}
class VideoWeedIE(NovaMovIE):
IE_NAME = 'videoweed'
IE_DESC = 'VideoWeed'
_VALID_URL = NovaMovIE._VALID_URL_TEMPLATE % {'host': r'videoweed\.(?:es|com)'}
_HOST = 'www.videoweed.es'
_FILE_DELETED_REGEX = r'>This file no longer exists on our servers.<'
_TITLE_REGEX = r'<h1 class="text_shadow">([^<]+)</h1>'
_URL_TEMPLATE = 'http://%s/file/%s'
_TEST = {
'url': 'http://www.videoweed.es/file/b42178afbea14',
'md5': 'abd31a2132947262c50429e1d16c1bfd',
'info_dict': {
'id': 'b42178afbea14',
'ext': 'flv',
'title': 'optical illusion dissapeared image magic illusion',
'description': ''
},
}
class CloudTimeIE(NovaMovIE):
IE_NAME = 'cloudtime'
IE_DESC = 'CloudTime'
_VALID_URL = NovaMovIE._VALID_URL_TEMPLATE % {'host': r'cloudtime\.to'}
_HOST = 'www.cloudtime.to'
_FILE_DELETED_REGEX = r'>This file no longer exists on our servers.<'
_TITLE_REGEX = r'<div[^>]+class=["\']video_det["\'][^>]*>\s*<strong>([^<]+)</strong>'
_TEST = None
class AuroraVidIE(NovaMovIE):
IE_NAME = 'auroravid'
IE_DESC = 'AuroraVid'
_VALID_URL = NovaMovIE._VALID_URL_TEMPLATE % {'host': r'auroravid\.to'}
_HOST = 'www.auroravid.to'
_FILE_DELETED_REGEX = r'This file no longer exists on our servers!<'
_TESTS = [{
'url': 'http://www.auroravid.to/video/4rurhn9x446jj',
'md5': '7205f346a52bbeba427603ba10d4b935',
'info_dict': {
'id': '4rurhn9x446jj',
'ext': 'flv',
'title': 'search engine optimization',
'description': 'search engine optimization is used to rank the web page in the google search engine'
},
'skip': '"Invalid token" errors abound (in web interface as well as youtube-dl, there is nothing we can do about it.)'
}, {
'url': 'http://www.auroravid.to/embed/?v=4rurhn9x446jj',
'only_matching': True,
}]

View File

@ -244,7 +244,7 @@ class PhantomJSwrapper(object):
class OpenloadIE(InfoExtractor):
_DOMAINS = r'(?:openload\.(?:co|io|link|pw)|oload\.(?:tv|stream|site|xyz|win|download|cloud|cc|icu|fun|club|info|pw|live|space|services)|oladblock\.(?:services|xyz|me)|openloed\.co)'
_DOMAINS = r'(?:openload\.(?:co|io|link|pw)|oload\.(?:tv|stream|site|xyz|win|download|cloud|cc|icu|fun|club|info|press|pw|live|space|services|website)|oladblock\.(?:services|xyz|me)|openloed\.co)'
_VALID_URL = r'''(?x)
https?://
(?P<host>
@ -357,6 +357,12 @@ class OpenloadIE(InfoExtractor):
}, {
'url': 'https://oload.services/embed/bs1NWj1dCag/',
'only_matching': True,
}, {
'url': 'https://oload.press/embed/drTBl1aOTvk/',
'only_matching': True,
}, {
'url': 'https://oload.website/embed/drTBl1aOTvk/',
'only_matching': True,
}, {
'url': 'https://oladblock.services/f/b8NWEgkqNLI/',
'only_matching': True,

View File

@ -1,101 +0,0 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
from ..compat import (
compat_parse_qs,
compat_str,
)
from ..utils import (
int_or_none,
try_get,
unified_timestamp,
)
class PornFlipIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?pornflip\.com/(?:v|embed)/(?P<id>[^/?#&]+)'
_TESTS = [{
'url': 'https://www.pornflip.com/v/wz7DfNhMmep',
'md5': '98c46639849145ae1fd77af532a9278c',
'info_dict': {
'id': 'wz7DfNhMmep',
'ext': 'mp4',
'title': '2 Amateurs swallow make his dream cumshots true',
'thumbnail': r're:^https?://.*\.jpg$',
'duration': 112,
'timestamp': 1481655502,
'upload_date': '20161213',
'uploader_id': '106786',
'uploader': 'figifoto',
'view_count': int,
'age_limit': 18,
}
}, {
'url': 'https://www.pornflip.com/embed/wz7DfNhMmep',
'only_matching': True,
}, {
'url': 'https://www.pornflip.com/v/EkRD6-vS2-s',
'only_matching': True,
}, {
'url': 'https://www.pornflip.com/embed/EkRD6-vS2-s',
'only_matching': True,
}, {
'url': 'https://www.pornflip.com/v/NG9q6Pb_iK8',
'only_matching': True,
}]
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(
'https://www.pornflip.com/v/%s' % video_id, video_id)
flashvars = compat_parse_qs(self._search_regex(
r'<embed[^>]+flashvars=(["\'])(?P<flashvars>(?:(?!\1).)+)\1',
webpage, 'flashvars', group='flashvars'))
title = flashvars['video_vars[title]'][0]
def flashvar(kind):
return try_get(
flashvars, lambda x: x['video_vars[%s]' % kind][0], compat_str)
formats = []
for key, value in flashvars.items():
if not (value and isinstance(value, list)):
continue
format_url = value[0]
if key == 'video_vars[hds_manifest]':
formats.extend(self._extract_mpd_formats(
format_url, video_id, mpd_id='dash', fatal=False))
continue
height = self._search_regex(
r'video_vars\[video_urls\]\[(\d+)', key, 'height', default=None)
if not height:
continue
formats.append({
'url': format_url,
'format_id': 'http-%s' % height,
'height': int_or_none(height),
})
self._sort_formats(formats)
uploader = self._html_search_regex(
(r'<span[^>]+class="name"[^>]*>\s*<a[^>]+>\s*<strong>(?P<uploader>[^<]+)',
r'<meta[^>]+content=(["\'])[^>]*\buploaded by (?P<uploader>.+?)\1'),
webpage, 'uploader', fatal=False, group='uploader')
return {
'id': video_id,
'formats': formats,
'title': title,
'thumbnail': flashvar('big_thumb'),
'duration': int_or_none(flashvar('duration')),
'timestamp': unified_timestamp(self._html_search_meta(
'uploadDate', webpage, 'timestamp')),
'uploader_id': flashvar('author_id'),
'uploader': uploader,
'view_count': int_or_none(flashvar('views')),
'age_limit': 18,
}

View File

@ -170,7 +170,7 @@ class PornHubIE(PornHubBaseIE):
def dl_webpage(platform):
self._set_cookie(host, 'platform', platform)
return self._download_webpage(
'http://www.%s/view_video.php?viewkey=%s' % (host, video_id),
'https://www.%s/view_video.php?viewkey=%s' % (host, video_id),
video_id, 'Downloading %s webpage' % platform)
webpage = dl_webpage('pc')

View File

@ -16,6 +16,11 @@ from ..utils import (
class ProSiebenSat1BaseIE(InfoExtractor):
_GEO_COUNTRIES = ['DE']
_ACCESS_ID = None
_SUPPORTED_PROTOCOLS = 'dash:clear,hls:clear,progressive:clear'
_V4_BASE_URL = 'https://vas-v4.p7s1video.net/4.0/get'
def _extract_video_info(self, url, clip_id):
client_location = url
@ -31,93 +36,128 @@ class ProSiebenSat1BaseIE(InfoExtractor):
if video.get('is_protected') is True:
raise ExtractorError('This video is DRM protected.', expected=True)
duration = float_or_none(video.get('duration'))
source_ids = [compat_str(source['id']) for source in video['sources']]
client_id = self._SALT[:2] + sha1(''.join([clip_id, self._SALT, self._TOKEN, client_location, self._SALT, self._CLIENT_NAME]).encode('utf-8')).hexdigest()
sources = self._download_json(
'http://vas.sim-technik.de/vas/live/v2/videos/%s/sources' % clip_id,
clip_id, 'Downloading sources JSON', query={
'access_token': self._TOKEN,
'client_id': client_id,
'client_location': client_location,
'client_name': self._CLIENT_NAME,
})
server_id = sources['server_id']
def fix_bitrate(bitrate):
bitrate = int_or_none(bitrate)
if not bitrate:
return None
return (bitrate // 1000) if bitrate % 1000 == 0 else bitrate
formats = []
for source_id in source_ids:
client_id = self._SALT[:2] + sha1(''.join([self._SALT, clip_id, self._TOKEN, server_id, client_location, source_id, self._SALT, self._CLIENT_NAME]).encode('utf-8')).hexdigest()
urls = self._download_json(
'http://vas.sim-technik.de/vas/live/v2/videos/%s/sources/url' % clip_id,
clip_id, 'Downloading urls JSON', fatal=False, query={
if self._ACCESS_ID:
raw_ct = self._ENCRYPTION_KEY + clip_id + self._IV + self._ACCESS_ID
server_token = (self._download_json(
self._V4_BASE_URL + 'protocols', clip_id,
'Downloading protocols JSON',
headers=self.geo_verification_headers(), query={
'access_id': self._ACCESS_ID,
'client_token': sha1((raw_ct).encode()).hexdigest(),
'video_id': clip_id,
}, fatal=False) or {}).get('server_token')
if server_token:
urls = (self._download_json(
self._V4_BASE_URL + 'urls', clip_id, 'Downloading urls JSON', query={
'access_id': self._ACCESS_ID,
'client_token': sha1((raw_ct + server_token + self._SUPPORTED_PROTOCOLS).encode()).hexdigest(),
'protocols': self._SUPPORTED_PROTOCOLS,
'server_token': server_token,
'video_id': clip_id,
}, fatal=False) or {}).get('urls') or {}
for protocol, variant in urls.items():
source_url = variant.get('clear', {}).get('url')
if not source_url:
continue
if protocol == 'dash':
formats.extend(self._extract_mpd_formats(
source_url, clip_id, mpd_id=protocol, fatal=False))
elif protocol == 'hls':
formats.extend(self._extract_m3u8_formats(
source_url, clip_id, 'mp4', 'm3u8_native',
m3u8_id=protocol, fatal=False))
else:
formats.append({
'url': source_url,
'format_id': protocol,
})
if not formats:
source_ids = [compat_str(source['id']) for source in video['sources']]
client_id = self._SALT[:2] + sha1(''.join([clip_id, self._SALT, self._TOKEN, client_location, self._SALT, self._CLIENT_NAME]).encode('utf-8')).hexdigest()
sources = self._download_json(
'http://vas.sim-technik.de/vas/live/v2/videos/%s/sources' % clip_id,
clip_id, 'Downloading sources JSON', query={
'access_token': self._TOKEN,
'client_id': client_id,
'client_location': client_location,
'client_name': self._CLIENT_NAME,
'server_id': server_id,
'source_ids': source_id,
})
if not urls:
continue
if urls.get('status_code') != 0:
raise ExtractorError('This video is unavailable', expected=True)
urls_sources = urls['sources']
if isinstance(urls_sources, dict):
urls_sources = urls_sources.values()
for source in urls_sources:
source_url = source.get('url')
if not source_url:
server_id = sources['server_id']
def fix_bitrate(bitrate):
bitrate = int_or_none(bitrate)
if not bitrate:
return None
return (bitrate // 1000) if bitrate % 1000 == 0 else bitrate
for source_id in source_ids:
client_id = self._SALT[:2] + sha1(''.join([self._SALT, clip_id, self._TOKEN, server_id, client_location, source_id, self._SALT, self._CLIENT_NAME]).encode('utf-8')).hexdigest()
urls = self._download_json(
'http://vas.sim-technik.de/vas/live/v2/videos/%s/sources/url' % clip_id,
clip_id, 'Downloading urls JSON', fatal=False, query={
'access_token': self._TOKEN,
'client_id': client_id,
'client_location': client_location,
'client_name': self._CLIENT_NAME,
'server_id': server_id,
'source_ids': source_id,
})
if not urls:
continue
protocol = source.get('protocol')
mimetype = source.get('mimetype')
if mimetype == 'application/f4m+xml' or 'f4mgenerator' in source_url or determine_ext(source_url) == 'f4m':
formats.extend(self._extract_f4m_formats(
source_url, clip_id, f4m_id='hds', fatal=False))
elif mimetype == 'application/x-mpegURL':
formats.extend(self._extract_m3u8_formats(
source_url, clip_id, 'mp4', 'm3u8_native',
m3u8_id='hls', fatal=False))
elif mimetype == 'application/dash+xml':
formats.extend(self._extract_mpd_formats(
source_url, clip_id, mpd_id='dash', fatal=False))
else:
tbr = fix_bitrate(source['bitrate'])
if protocol in ('rtmp', 'rtmpe'):
mobj = re.search(r'^(?P<url>rtmpe?://[^/]+)/(?P<path>.+)$', source_url)
if not mobj:
continue
path = mobj.group('path')
mp4colon_index = path.rfind('mp4:')
app = path[:mp4colon_index]
play_path = path[mp4colon_index:]
formats.append({
'url': '%s/%s' % (mobj.group('url'), app),
'app': app,
'play_path': play_path,
'player_url': 'http://livepassdl.conviva.com/hf/ver/2.79.0.17083/LivePassModuleMain.swf',
'page_url': 'http://www.prosieben.de',
'tbr': tbr,
'ext': 'flv',
'format_id': 'rtmp%s' % ('-%d' % tbr if tbr else ''),
})
if urls.get('status_code') != 0:
raise ExtractorError('This video is unavailable', expected=True)
urls_sources = urls['sources']
if isinstance(urls_sources, dict):
urls_sources = urls_sources.values()
for source in urls_sources:
source_url = source.get('url')
if not source_url:
continue
protocol = source.get('protocol')
mimetype = source.get('mimetype')
if mimetype == 'application/f4m+xml' or 'f4mgenerator' in source_url or determine_ext(source_url) == 'f4m':
formats.extend(self._extract_f4m_formats(
source_url, clip_id, f4m_id='hds', fatal=False))
elif mimetype == 'application/x-mpegURL':
formats.extend(self._extract_m3u8_formats(
source_url, clip_id, 'mp4', 'm3u8_native',
m3u8_id='hls', fatal=False))
elif mimetype == 'application/dash+xml':
formats.extend(self._extract_mpd_formats(
source_url, clip_id, mpd_id='dash', fatal=False))
else:
formats.append({
'url': source_url,
'tbr': tbr,
'format_id': 'http%s' % ('-%d' % tbr if tbr else ''),
})
tbr = fix_bitrate(source['bitrate'])
if protocol in ('rtmp', 'rtmpe'):
mobj = re.search(r'^(?P<url>rtmpe?://[^/]+)/(?P<path>.+)$', source_url)
if not mobj:
continue
path = mobj.group('path')
mp4colon_index = path.rfind('mp4:')
app = path[:mp4colon_index]
play_path = path[mp4colon_index:]
formats.append({
'url': '%s/%s' % (mobj.group('url'), app),
'app': app,
'play_path': play_path,
'player_url': 'http://livepassdl.conviva.com/hf/ver/2.79.0.17083/LivePassModuleMain.swf',
'page_url': 'http://www.prosieben.de',
'tbr': tbr,
'ext': 'flv',
'format_id': 'rtmp%s' % ('-%d' % tbr if tbr else ''),
})
else:
formats.append({
'url': source_url,
'tbr': tbr,
'format_id': 'http%s' % ('-%d' % tbr if tbr else ''),
})
self._sort_formats(formats)
return {
'duration': duration,
'duration': float_or_none(video.get('duration')),
'formats': formats,
}
@ -344,6 +384,11 @@ class ProSiebenSat1IE(ProSiebenSat1BaseIE):
_TOKEN = 'prosieben'
_SALT = '01!8d8F_)r9]4s[qeuXfP%'
_CLIENT_NAME = 'kolibri-2.0.19-splec4'
_ACCESS_ID = 'x_prosiebenmaxx-de'
_ENCRYPTION_KEY = 'Eeyeey9oquahthainoofashoyoikosag'
_IV = 'Aeluchoc6aevechuipiexeeboowedaok'
_CLIPID_REGEXES = [
r'"clip_id"\s*:\s+"(\d+)"',
r'clipid: "(\d+)"',

View File

@ -104,3 +104,25 @@ class RedBullTVIE(InfoExtractor):
'formats': formats,
'subtitles': subtitles,
}
class RedBullTVRrnContentIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?redbull(?:\.tv|\.com(?:/[^/]+)?(?:/tv)?)/(?:video|live)/rrn:content:[^:]+:(?P<id>[\da-f]{8}-[\da-f]{4}-[\da-f]{4}-[\da-f]{4}-[\da-f]{12})'
_TESTS = [{
'url': 'https://www.redbull.com/int-en/tv/video/rrn:content:live-videos:e3e6feb4-e95f-50b7-962a-c70f8fd13c73/mens-dh-finals-fort-william',
'only_matching': True,
}, {
'url': 'https://www.redbull.com/int-en/tv/video/rrn:content:videos:a36a0f36-ff1b-5db8-a69d-ee11a14bf48b/tn-ts-style?playlist=rrn:content:event-profiles:83f05926-5de8-5389-b5e4-9bb312d715e8:extras',
'only_matching': True,
}]
def _real_extract(self, url):
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
video_url = self._og_search_url(webpage)
return self.url_result(
video_url, ie=RedBullTVIE.ie_key(),
video_id=RedBullTVIE._match_id(video_url))

View File

@ -1,9 +1,11 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import (
determine_ext,
js_to_json,
)
class RTPIE(InfoExtractor):
@ -18,10 +20,6 @@ class RTPIE(InfoExtractor):
'description': 'As paixões musicais de António Cartaxo e António Macedo',
'thumbnail': r're:^https?://.*\.jpg',
},
'params': {
# rtmp download
'skip_download': True,
},
}, {
'url': 'http://www.rtp.pt/play/p831/a-quimica-das-coisas',
'only_matching': True,
@ -33,57 +31,36 @@ class RTPIE(InfoExtractor):
webpage = self._download_webpage(url, video_id)
title = self._html_search_meta(
'twitter:title', webpage, display_name='title', fatal=True)
description = self._html_search_meta('description', webpage)
thumbnail = self._og_search_thumbnail(webpage)
player_config = self._search_regex(
r'(?s)RTPPLAY\.player\.newPlayer\(\s*(\{.*?\})\s*\)', webpage, 'player config')
config = self._parse_json(player_config, video_id)
path, ext = config.get('file').rsplit('.', 1)
formats = [{
'format_id': 'rtmp',
'ext': ext,
'vcodec': config.get('type') == 'audio' and 'none' or None,
'preference': -2,
'url': 'rtmp://{streamer:s}/{application:s}'.format(**config),
'app': config.get('application'),
'play_path': '{ext:s}:{path:s}'.format(ext=ext, path=path),
'page_url': url,
'rtmp_live': config.get('live', False),
'player_url': 'http://programas.rtp.pt/play/player.swf?v3',
'rtmp_real_time': True,
}]
# Construct regular HTTP download URLs
replacements = {
'audio': {
'format_id': 'mp3',
'pattern': r'^nas2\.share/wavrss/',
'repl': 'http://rsspod.rtp.pt/podcasts/',
'vcodec': 'none',
},
'video': {
'format_id': 'mp4_h264',
'pattern': r'^nas2\.share/h264/',
'repl': 'http://rsspod.rtp.pt/videocasts/',
'vcodec': 'h264',
},
}
r = replacements[config['type']]
if re.match(r['pattern'], config['file']) is not None:
formats.append({
'format_id': r['format_id'],
'url': re.sub(r['pattern'], r['repl'], config['file']),
'vcodec': r['vcodec'],
})
self._sort_formats(formats)
config = self._parse_json(self._search_regex(
r'(?s)RTPPlayer\(({.+?})\);', webpage,
'player config'), video_id, js_to_json)
file_url = config['file']
ext = determine_ext(file_url)
if ext == 'm3u8':
file_key = config.get('fileKey')
formats = self._extract_m3u8_formats(
file_url, video_id, 'mp4', 'm3u8_native',
m3u8_id='hls', fatal=file_key)
if file_key:
formats.append({
'url': 'https://cdn-ondemand.rtp.pt' + file_key,
'preference': 1,
})
self._sort_formats(formats)
else:
formats = [{
'url': file_url,
'ext': ext,
}]
if config.get('mediaType') == 'audio':
for f in formats:
f['vcodec'] = 'none'
return {
'id': video_id,
'title': title,
'formats': formats,
'description': description,
'thumbnail': thumbnail,
'description': self._html_search_meta(['description', 'twitter:description'], webpage),
'thumbnail': config.get('poster') or self._og_search_thumbnail(webpage),
}

View File

@ -1,15 +1,18 @@
# coding: utf-8
from __future__ import unicode_literals
import json
import re
from .common import InfoExtractor
from ..compat import (
compat_parse_qs,
compat_str,
compat_urlparse,
)
from ..utils import (
ExtractorError,
sanitized_Request,
std_headers,
urlencode_postdata,
update_url_query,
)
@ -31,44 +34,52 @@ class SafariBaseIE(InfoExtractor):
if username is None:
return
headers = std_headers.copy()
if 'Referer' not in headers:
headers['Referer'] = self._LOGIN_URL
_, urlh = self._download_webpage_handle(
'https://learning.oreilly.com/accounts/login-check/', None,
'Downloading login page')
login_page = self._download_webpage(
self._LOGIN_URL, None, 'Downloading login form', headers=headers)
def is_logged(urlh):
return 'learning.oreilly.com/home/' in compat_str(urlh.geturl())
def is_logged(webpage):
return any(re.search(p, webpage) for p in (
r'href=["\']/accounts/logout/', r'>Sign Out<'))
if is_logged(login_page):
if is_logged(urlh):
self.LOGGED_IN = True
return
csrf = self._html_search_regex(
r"name='csrfmiddlewaretoken'\s+value='([^']+)'",
login_page, 'csrf token')
redirect_url = compat_str(urlh.geturl())
parsed_url = compat_urlparse.urlparse(redirect_url)
qs = compat_parse_qs(parsed_url.query)
next_uri = compat_urlparse.urljoin(
'https://api.oreilly.com', qs['next'][0])
login_form = {
'csrfmiddlewaretoken': csrf,
'email': username,
'password1': password,
'login': 'Sign In',
'next': '',
}
auth, urlh = self._download_json_handle(
'https://www.oreilly.com/member/auth/login/', None, 'Logging in',
data=json.dumps({
'email': username,
'password': password,
'redirect_uri': next_uri,
}).encode(), headers={
'Content-Type': 'application/json',
'Referer': redirect_url,
}, expected_status=400)
request = sanitized_Request(
self._LOGIN_URL, urlencode_postdata(login_form), headers=headers)
login_page = self._download_webpage(
request, None, 'Logging in')
if not is_logged(login_page):
credentials = auth.get('credentials')
if (not auth.get('logged_in') and not auth.get('redirect_uri')
and credentials):
raise ExtractorError(
'Login failed; make sure your credentials are correct and try again.',
expected=True)
'Unable to login: %s' % credentials, expected=True)
self.LOGGED_IN = True
# oreilly serves two same groot_sessionid cookies in Set-Cookie header
# and expects first one to be actually set
self._apply_first_set_cookie_header(urlh, 'groot_sessionid')
_, urlh = self._download_webpage_handle(
auth.get('redirect_uri') or next_uri, None, 'Completing login',)
if is_logged(urlh):
self.LOGGED_IN = True
return
raise ExtractorError('Unable to log in')
class SafariIE(SafariBaseIE):
@ -76,7 +87,7 @@ class SafariIE(SafariBaseIE):
IE_DESC = 'safaribooksonline.com online video'
_VALID_URL = r'''(?x)
https?://
(?:www\.)?(?:safaribooksonline|learning\.oreilly)\.com/
(?:www\.)?(?:safaribooksonline|(?:learning\.)?oreilly)\.com/
(?:
library/view/[^/]+/(?P<course_id>[^/]+)/(?P<part>[^/?\#&]+)\.html|
videos/[^/]+/[^/]+/(?P<reference_id>[^-]+-[^/?\#&]+)
@ -107,6 +118,9 @@ class SafariIE(SafariBaseIE):
}, {
'url': 'https://learning.oreilly.com/videos/hadoop-fundamentals-livelessons/9780133392838/9780133392838-00_SeriesIntro',
'only_matching': True,
}, {
'url': 'https://www.oreilly.com/library/view/hadoop-fundamentals-livelessons/9780133392838/00_SeriesIntro.html',
'only_matching': True,
}]
_PARTNER_ID = '1926081'
@ -163,7 +177,7 @@ class SafariIE(SafariBaseIE):
class SafariApiIE(SafariBaseIE):
IE_NAME = 'safari:api'
_VALID_URL = r'https?://(?:www\.)?(?:safaribooksonline|learning\.oreilly)\.com/api/v1/book/(?P<course_id>[^/]+)/chapter(?:-content)?/(?P<part>[^/?#&]+)\.html'
_VALID_URL = r'https?://(?:www\.)?(?:safaribooksonline|(?:learning\.)?oreilly)\.com/api/v1/book/(?P<course_id>[^/]+)/chapter(?:-content)?/(?P<part>[^/?#&]+)\.html'
_TESTS = [{
'url': 'https://www.safaribooksonline.com/api/v1/book/9780133392838/chapter/part00.html',
@ -188,7 +202,7 @@ class SafariCourseIE(SafariBaseIE):
_VALID_URL = r'''(?x)
https?://
(?:
(?:www\.)?(?:safaribooksonline|learning\.oreilly)\.com/
(?:www\.)?(?:safaribooksonline|(?:learning\.)?oreilly)\.com/
(?:
library/view/[^/]+|
api/v1/book|
@ -219,6 +233,9 @@ class SafariCourseIE(SafariBaseIE):
}, {
'url': 'https://learning.oreilly.com/videos/hadoop-fundamentals-livelessons/9780133392838',
'only_matching': True,
}, {
'url': 'https://www.oreilly.com/library/view/hadoop-fundamentals-livelessons/9780133392838/',
'only_matching': True,
}]
@classmethod

View File

@ -3,8 +3,11 @@ from __future__ import unicode_literals
from .common import InfoExtractor
from ..compat import compat_b64decode
from ..utils import (
determine_ext,
ExtractorError,
int_or_none,
KNOWN_EXTENSIONS,
parse_filesize,
url_or_none,
urlencode_postdata,
)
@ -22,10 +25,8 @@ class SharedBaseIE(InfoExtractor):
video_url = self._extract_video_url(webpage, video_id, url)
title = compat_b64decode(self._html_search_meta(
'full:title', webpage, 'title')).decode('utf-8')
filesize = int_or_none(self._html_search_meta(
'full:size', webpage, 'file size', fatal=False))
title = self._extract_title(webpage)
filesize = int_or_none(self._extract_filesize(webpage))
return {
'id': video_id,
@ -35,6 +36,14 @@ class SharedBaseIE(InfoExtractor):
'title': title,
}
def _extract_title(self, webpage):
return compat_b64decode(self._html_search_meta(
'full:title', webpage, 'title')).decode('utf-8')
def _extract_filesize(self, webpage):
return self._html_search_meta(
'full:size', webpage, 'file size', fatal=False)
class SharedIE(SharedBaseIE):
IE_DESC = 'shared.sx'
@ -82,11 +91,27 @@ class VivoIE(SharedBaseIE):
'id': 'd7ddda0e78',
'ext': 'mp4',
'title': 'Chicken',
'filesize': 528031,
'filesize': 515659,
},
}
def _extract_video_url(self, webpage, video_id, *args):
def _extract_title(self, webpage):
title = self._html_search_regex(
r'data-name\s*=\s*(["\'])(?P<title>(?:(?!\1).)+)\1', webpage,
'title', default=None, group='title')
if title:
ext = determine_ext(title)
if ext.lower() in KNOWN_EXTENSIONS:
title = title.rpartition('.' + ext)[0]
return title
return self._og_search_title(webpage)
def _extract_filesize(self, webpage):
return parse_filesize(self._search_regex(
r'data-type=["\']video["\'][^>]*>Watch.*?<strong>\s*\((.+?)\)',
webpage, 'filesize', fatal=False))
def _extract_video_url(self, webpage, video_id, url):
def decode_url(encoded_url):
return compat_b64decode(encoded_url).decode('utf-8')

View File

@ -106,7 +106,16 @@ class SRGSSRIE(InfoExtractor):
class SRGSSRPlayIE(InfoExtractor):
IE_DESC = 'srf.ch, rts.ch, rsi.ch, rtr.ch and swissinfo.ch play sites'
_VALID_URL = r'https?://(?:(?:www|play)\.)?(?P<bu>srf|rts|rsi|rtr|swissinfo)\.ch/play/(?:tv|radio)/[^/]+/(?P<type>video|audio)/[^?]+\?id=(?P<id>[0-9a-f\-]{36}|\d+)'
_VALID_URL = r'''(?x)
https?://
(?:(?:www|play)\.)?
(?P<bu>srf|rts|rsi|rtr|swissinfo)\.ch/play/(?:tv|radio)/
(?:
[^/]+/(?P<type>video|audio)/[^?]+|
popup(?P<type_2>video|audio)player
)
\?id=(?P<id>[0-9a-f\-]{36}|\d+)
'''
_TESTS = [{
'url': 'http://www.srf.ch/play/tv/10vor10/video/snowden-beantragt-asyl-in-russland?id=28e1a57d-5b76-4399-8ab3-9097f071e6c5',
@ -163,9 +172,15 @@ class SRGSSRPlayIE(InfoExtractor):
# m3u8 download
'skip_download': True,
}
}, {
'url': 'https://www.srf.ch/play/tv/popupvideoplayer?id=c4dba0ca-e75b-43b2-a34f-f708a4932e01',
'only_matching': True,
}]
def _real_extract(self, url):
bu, media_type, media_id = re.match(self._VALID_URL, url).groups()
mobj = re.match(self._VALID_URL, url)
bu = mobj.group('bu')
media_type = mobj.group('type') or mobj.group('type_2')
media_id = mobj.group('id')
# other info can be extracted from url + '&layout=json'
return self.url_result('srgssr:%s:%s:%s' % (bu[:3], media_type, media_id), 'SRGSSR')

View File

@ -45,7 +45,7 @@ class StreamcloudIE(InfoExtractor):
value="([^"]*)"
''', orig_webpage)
self._sleep(12, video_id)
self._sleep(6, video_id)
webpage = self._download_webpage(
url, video_id, data=urlencode_postdata(fields), headers={

View File

@ -5,8 +5,12 @@ import re
from .common import InfoExtractor
from ..compat import compat_str
from ..compat import (
compat_str,
compat_urlparse
)
from ..utils import (
extract_attributes,
float_or_none,
int_or_none,
try_get,
@ -20,7 +24,7 @@ class TEDIE(InfoExtractor):
(?P<proto>https?://)
(?P<type>www|embed(?:-ssl)?)(?P<urlmain>\.ted\.com/
(
(?P<type_playlist>playlists(?:/\d+)?) # We have a playlist
(?P<type_playlist>playlists(?:/(?P<playlist_id>\d+))?) # We have a playlist
|
((?P<type_talk>talks)) # We have a simple talk
|
@ -84,6 +88,7 @@ class TEDIE(InfoExtractor):
'info_dict': {
'id': '10',
'title': 'Who are the hackers?',
'description': 'md5:49a0dbe8fb76d81a0e64b4a80af7f15a'
},
'playlist_mincount': 6,
}, {
@ -150,22 +155,22 @@ class TEDIE(InfoExtractor):
webpage = self._download_webpage(url, name,
'Downloading playlist webpage')
info = self._extract_info(webpage)
playlist_info = try_get(
info, lambda x: x['__INITIAL_DATA__']['playlist'],
dict) or info['playlist']
playlist_entries = []
for entry in re.findall(r'(?s)<[^>]+data-ga-context=["\']playlist["\'][^>]*>', webpage):
attrs = extract_attributes(entry)
entry_url = compat_urlparse.urljoin(url, attrs['href'])
playlist_entries.append(self.url_result(entry_url, self.ie_key()))
final_url = self._og_search_url(webpage, fatal=False)
playlist_id = (
re.match(self._VALID_URL, final_url).group('playlist_id')
if final_url else None)
playlist_entries = [
self.url_result('http://www.ted.com/talks/' + talk['slug'], self.ie_key())
for talk in try_get(
info, lambda x: x['__INITIAL_DATA__']['talks'],
dict) or info['talks']
]
return self.playlist_result(
playlist_entries,
playlist_id=compat_str(playlist_info['id']),
playlist_title=playlist_info['title'])
playlist_entries, playlist_id=playlist_id,
playlist_title=self._og_search_title(webpage, fatal=False),
playlist_description=self._og_search_description(webpage))
def _talk_info(self, url, video_name):
webpage = self._download_webpage(url, video_name)

View File

@ -47,15 +47,23 @@ class TVNowBaseIE(InfoExtractor):
r'\.ism/(?:[^.]*\.(?:m3u8|mpd)|[Mm]anifest)',
'.ism/' + suffix, manifest_url))
formats = self._extract_mpd_formats(
url_repl('dash', '.mpd'), video_id,
mpd_id='dash', fatal=False)
formats.extend(self._extract_ism_formats(
url_repl('hss', 'Manifest'),
video_id, ism_id='mss', fatal=False))
formats.extend(self._extract_m3u8_formats(
url_repl('hls', '.m3u8'), video_id, 'mp4',
'm3u8_native', m3u8_id='hls', fatal=False))
def make_urls(proto, suffix):
urls = [url_repl(proto, suffix)]
hd_url = urls[0].replace('/manifest/', '/ngvod/')
if hd_url != urls[0]:
urls.append(hd_url)
return urls
for man_url in make_urls('dash', '.mpd'):
formats = self._extract_mpd_formats(
man_url, video_id, mpd_id='dash', fatal=False)
for man_url in make_urls('hss', 'Manifest'):
formats.extend(self._extract_ism_formats(
man_url, video_id, ism_id='mss', fatal=False))
for man_url in make_urls('hls', '.m3u8'):
formats.extend(self._extract_m3u8_formats(
man_url, video_id, 'mp4', 'm3u8_native', m3u8_id='hls',
fatal=False))
if formats:
break
else:

View File

@ -14,7 +14,18 @@ from ..utils import (
class TwentyFourVideoIE(InfoExtractor):
IE_NAME = '24video'
_VALID_URL = r'https?://(?P<host>(?:www\.)?24video\.(?:net|me|xxx|sexy?|tube|adult))/(?:video/(?:view|xml)/|player/new24_play\.swf\?id=)(?P<id>\d+)'
_VALID_URL = r'''(?x)
https?://
(?P<host>
(?:(?:www|porno)\.)?24video\.
(?:net|me|xxx|sexy?|tube|adult|site)
)/
(?:
video/(?:(?:view|xml)/)?|
player/new24_play\.swf\?id=
)
(?P<id>\d+)
'''
_TESTS = [{
'url': 'http://www.24video.net/video/view/1044982',
@ -42,6 +53,12 @@ class TwentyFourVideoIE(InfoExtractor):
}, {
'url': 'http://www.24video.tube/video/view/2363750',
'only_matching': True,
}, {
'url': 'https://www.24video.site/video/view/2640421',
'only_matching': True,
}, {
'url': 'https://porno.24video.net/video/2640421-vsya-takaya-gibkaya-i-v-masle',
'only_matching': True,
}]
def _real_extract(self, url):

View File

@ -21,7 +21,7 @@ from ..utils import (
class VikiBaseIE(InfoExtractor):
_VALID_URL_BASE = r'https?://(?:www\.)?viki\.(?:com|net|mx|jp|fr)/'
_API_QUERY_TEMPLATE = '/v4/%sapp=%s&t=%s&site=www.viki.com'
_API_URL_TEMPLATE = 'http://api.viki.io%s&sig=%s'
_API_URL_TEMPLATE = 'https://api.viki.io%s&sig=%s'
_APP = '100005a'
_APP_VERSION = '2.2.5.1428709186'
@ -377,7 +377,7 @@ class VikiChannelIE(VikiBaseIE):
for video in page['response']:
video_id = video['id']
entries.append(self.url_result(
'http://www.viki.com/videos/%s' % video_id, 'Viki'))
'https://www.viki.com/videos/%s' % video_id, 'Viki'))
if not page['pagination']['next']:
break

View File

@ -3,7 +3,6 @@ from __future__ import unicode_literals
import collections
import re
import sys
from .common import InfoExtractor
from ..compat import compat_urlparse
@ -45,24 +44,9 @@ class VKBaseIE(InfoExtractor):
'pass': password.encode('cp1251'),
})
# https://new.vk.com/ serves two same remixlhk cookies in Set-Cookie header
# and expects the first one to be set rather than second (see
# https://github.com/ytdl-org/youtube-dl/issues/9841#issuecomment-227871201).
# As of RFC6265 the newer one cookie should be set into cookie store
# what actually happens.
# We will workaround this VK issue by resetting the remixlhk cookie to
# the first one manually.
for header, cookies in url_handle.headers.items():
if header.lower() != 'set-cookie':
continue
if sys.version_info[0] >= 3:
cookies = cookies.encode('iso-8859-1')
cookies = cookies.decode('utf-8')
remixlhk = re.search(r'remixlhk=(.+?);.*?\bdomain=(.+?)(?:[,;]|$)', cookies)
if remixlhk:
value, domain = remixlhk.groups()
self._set_cookie(domain, 'remixlhk', value)
break
# vk serves two same remixlhk cookies in Set-Cookie header and expects
# first one to be actually set
self._apply_first_set_cookie_header(url_handle, 'remixlhk')
login_page = self._download_webpage(
'https://login.vk.com/?act=login', None,

View File

@ -24,6 +24,7 @@ from ..utils import (
class VLiveIE(InfoExtractor):
IE_NAME = 'vlive'
_VALID_URL = r'https?://(?:(?:www|m)\.)?vlive\.tv/video/(?P<id>[0-9]+)'
_NETRC_MACHINE = 'vlive'
_TESTS = [{
'url': 'http://www.vlive.tv/video/1326',
'md5': 'cc7314812855ce56de70a06a27314983',
@ -47,12 +48,55 @@ class VLiveIE(InfoExtractor):
'params': {
'skip_download': True,
},
}, {
'url': 'https://www.vlive.tv/video/129100',
'md5': 'ca2569453b79d66e5b919e5d308bff6b',
'info_dict': {
'id': '129100',
'ext': 'mp4',
'title': '[V LIVE] [BTS+] Run BTS! 2019 - EP.71 :: Behind the scene',
'creator': 'BTS+',
'view_count': int,
'subtitles': 'mincount:10',
},
'skip': 'This video is only available for CH+ subscribers',
}]
@classmethod
def suitable(cls, url):
return False if VLivePlaylistIE.suitable(url) else super(VLiveIE, cls).suitable(url)
def _real_initialize(self):
self._login()
def _login(self):
email, password = self._get_login_info()
if None in (email, password):
return
def is_logged_in():
login_info = self._download_json(
'https://www.vlive.tv/auth/loginInfo', None,
note='Downloading login info',
headers={'Referer': 'https://www.vlive.tv/home'})
return try_get(
login_info, lambda x: x['message']['login'], bool) or False
LOGIN_URL = 'https://www.vlive.tv/auth/email/login'
self._request_webpage(
LOGIN_URL, None, note='Downloading login cookies')
self._download_webpage(
LOGIN_URL, None, note='Logging in',
data=urlencode_postdata({'email': email, 'pwd': password}),
headers={
'Referer': LOGIN_URL,
'Content-Type': 'application/x-www-form-urlencoded'
})
if not is_logged_in():
raise ExtractorError('Unable to log in', expected=True)
def _real_extract(self, url):
video_id = self._match_id(url)
@ -77,10 +121,7 @@ class VLiveIE(InfoExtractor):
if status in ('LIVE_ON_AIR', 'BIG_EVENT_ON_AIR'):
return self._live(video_id, webpage)
elif status in ('VOD_ON_AIR', 'BIG_EVENT_INTRO'):
if long_video_id and key:
return self._replay(video_id, webpage, long_video_id, key)
else:
status = 'COMING_SOON'
return self._replay(video_id, webpage, long_video_id, key)
if status == 'LIVE_END':
raise ExtractorError('Uploading for replay. Please wait...',
@ -91,13 +132,15 @@ class VLiveIE(InfoExtractor):
raise ExtractorError('We are sorry, '
'but the live broadcast has been canceled.',
expected=True)
elif status == 'ONLY_APP':
raise ExtractorError('Unsupported video type', expected=True)
else:
raise ExtractorError('Unknown status %s' % status)
def _get_common_fields(self, webpage):
title = self._og_search_title(webpage)
creator = self._html_search_regex(
r'<div[^>]+class="info_area"[^>]*>\s*<a\s+[^>]*>([^<]+)',
r'<div[^>]+class="info_area"[^>]*>\s*(?:<em[^>]*>.*?</em\s*>\s*)?<a\s+[^>]*>([^<]+)',
webpage, 'creator', fatal=False)
thumbnail = self._og_search_thumbnail(webpage)
return {
@ -107,14 +150,7 @@ class VLiveIE(InfoExtractor):
}
def _live(self, video_id, webpage):
init_page = self._download_webpage(
'https://www.vlive.tv/video/init/view',
video_id, note='Downloading live webpage',
data=urlencode_postdata({'videoSeq': video_id}),
headers={
'Referer': 'https://www.vlive.tv/video/%s' % video_id,
'Content-Type': 'application/x-www-form-urlencoded'
})
init_page = self._download_init_page(video_id)
live_params = self._search_regex(
r'"liveStreamInfo"\s*:\s*(".*"),',
@ -140,6 +176,17 @@ class VLiveIE(InfoExtractor):
return info
def _replay(self, video_id, webpage, long_video_id, key):
if '' in (long_video_id, key):
init_page = self._download_init_page(video_id)
video_info = self._parse_json(self._search_regex(
(r'(?s)oVideoStatus\s*=\s*({.+?})\s*</script',
r'(?s)oVideoStatus\s*=\s*({.+})'), init_page, 'video info'),
video_id)
if video_info.get('status') == 'NEED_CHANNEL_PLUS':
self.raise_login_required(
'This video is only available for CH+ subscribers')
long_video_id, key = video_info['vid'], video_info['inkey']
playinfo = self._download_json(
'http://global.apis.naver.com/rmcnmv/rmcnmv/vod_play_videoInfo.json?%s'
% compat_urllib_parse_urlencode({
@ -180,6 +227,16 @@ class VLiveIE(InfoExtractor):
})
return info
def _download_init_page(self, video_id):
return self._download_webpage(
'https://www.vlive.tv/video/init/view',
video_id, note='Downloading live webpage',
data=urlencode_postdata({'videoSeq': video_id}),
headers={
'Referer': 'https://www.vlive.tv/video/%s' % video_id,
'Content-Type': 'application/x-www-form-urlencoded'
})
class VLiveChannelIE(InfoExtractor):
IE_NAME = 'vlive:channel'
@ -275,26 +332,45 @@ class VLiveChannelIE(InfoExtractor):
class VLivePlaylistIE(InfoExtractor):
IE_NAME = 'vlive:playlist'
_VALID_URL = r'https?://(?:(?:www|m)\.)?vlive\.tv/video/(?P<video_id>[0-9]+)/playlist/(?P<id>[0-9]+)'
_TEST = {
_VIDEO_URL_TEMPLATE = 'http://www.vlive.tv/video/%s'
_TESTS = [{
# regular working playlist
'url': 'https://www.vlive.tv/video/117956/playlist/117963',
'info_dict': {
'id': '117963',
'title': '아이돌룸(IDOL ROOM) 41회 - (여자)아이들'
},
'playlist_mincount': 10
}, {
# playlist with no playlistVideoSeqs
'url': 'http://www.vlive.tv/video/22867/playlist/22912',
'info_dict': {
'id': '22912',
'title': 'Valentine Day Message from TWICE'
'id': '22867',
'ext': 'mp4',
'title': '[V LIVE] Valentine Day Message from MINA',
'creator': 'TWICE',
'view_count': int
},
'playlist_mincount': 9
}
'params': {
'skip_download': True,
}
}]
def _build_video_result(self, video_id, message):
self.to_screen(message)
return self.url_result(
self._VIDEO_URL_TEMPLATE % video_id,
ie=VLiveIE.ie_key(), video_id=video_id)
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id, playlist_id = mobj.group('video_id', 'id')
VIDEO_URL_TEMPLATE = 'http://www.vlive.tv/video/%s'
if self._downloader.params.get('noplaylist'):
self.to_screen(
'Downloading just video %s because of --no-playlist' % video_id)
return self.url_result(
VIDEO_URL_TEMPLATE % video_id,
ie=VLiveIE.ie_key(), video_id=video_id)
return self._build_video_result(
video_id,
'Downloading just video %s because of --no-playlist'
% video_id)
self.to_screen(
'Downloading playlist %s - add --no-playlist to just download video'
@ -304,15 +380,21 @@ class VLivePlaylistIE(InfoExtractor):
'http://www.vlive.tv/video/%s/playlist/%s'
% (video_id, playlist_id), playlist_id)
item_ids = self._parse_json(
self._search_regex(
r'playlistVideoSeqs\s*=\s*(\[[^]]+\])', webpage,
'playlist video seqs'),
playlist_id)
raw_item_ids = self._search_regex(
r'playlistVideoSeqs\s*=\s*(\[[^]]+\])', webpage,
'playlist video seqs', default=None, fatal=False)
if not raw_item_ids:
return self._build_video_result(
video_id,
'Downloading just video %s because no playlist was found'
% video_id)
item_ids = self._parse_json(raw_item_ids, playlist_id)
entries = [
self.url_result(
VIDEO_URL_TEMPLATE % item_id, ie=VLiveIE.ie_key(),
self._VIDEO_URL_TEMPLATE % item_id, ie=VLiveIE.ie_key(),
video_id=compat_str(item_id))
for item_id in item_ids]

View File

@ -130,7 +130,7 @@ class VRVIE(VRVBaseIE):
self._TOKEN_SECRET = token_credentials['oauth_token_secret']
def _extract_vrv_formats(self, url, video_id, stream_format, audio_lang, hardsub_lang):
if not url or stream_format not in ('hls', 'dash'):
if not url or stream_format not in ('hls', 'dash', 'adaptive_hls'):
return []
stream_id_list = []
if audio_lang:
@ -140,7 +140,7 @@ class VRVIE(VRVBaseIE):
format_id = stream_format
if stream_id_list:
format_id += '-' + '-'.join(stream_id_list)
if stream_format == 'hls':
if 'hls' in stream_format:
adaptive_formats = self._extract_m3u8_formats(
url, video_id, 'mp4', m3u8_id=format_id,
note='Downloading %s information' % format_id,

View File

@ -12,7 +12,7 @@ from ..utils import (
class VVVVIDIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?vvvvid\.it/#!(?:show|anime|film|series)/(?P<show_id>\d+)/[^/]+/(?P<season_id>\d+)/(?P<id>[0-9]+)'
_VALID_URL = r'https?://(?:www\.)?vvvvid\.it/(?:#!)?(?:show|anime|film|series)/(?P<show_id>\d+)/[^/]+/(?P<season_id>\d+)/(?P<id>[0-9]+)'
_TESTS = [{
# video_type == 'video/vvvvid'
'url': 'https://www.vvvvid.it/#!show/434/perche-dovrei-guardarlo-di-dario-moccia/437/489048/ping-pong',

View File

@ -1789,9 +1789,6 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
raise ExtractorError(
'YouTube said: %s' % unavailable_message, expected=True, video_id=video_id)
if video_info.get('license_info'):
raise ExtractorError('This video is DRM protected.', expected=True)
video_details = try_get(
player_response, lambda x: x['videoDetails'], dict) or {}
@ -1927,7 +1924,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
formats = []
for url_data_str in encoded_url_map.split(','):
url_data = compat_parse_qs(url_data_str)
if 'itag' not in url_data or 'url' not in url_data:
if 'itag' not in url_data or 'url' not in url_data or url_data.get('drm_families'):
continue
stream_type = int_or_none(try_get(url_data, lambda x: x['stream_type'][0]))
# Unsupported FORMAT_STREAM_TYPE_OTF
@ -2227,6 +2224,10 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
r'<[^>]+class=["\']watch-view-count[^>]+>\s*([\d,\s]+)', video_webpage,
'view count', default=None))
average_rating = (
float_or_none(video_details.get('averageRating'))
or try_get(video_info, lambda x: float_or_none(x['avg_rating'][0])))
# subtitles
video_subtitles = self.extract_subtitles(video_id, video_webpage)
automatic_captions = self.extract_automatic_captions(video_id, video_webpage)
@ -2323,6 +2324,9 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
'"token" parameter not in video info for unknown reason',
video_id=video_id)
if not formats and (video_info.get('license_info') or try_get(player_response, lambda x: x['streamingData']['licenseInfos'])):
raise ExtractorError('This video is DRM protected.', expected=True)
self._sort_formats(formats)
self.mark_watched(video_id, video_info, player_response)
@ -2353,7 +2357,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
'view_count': view_count,
'like_count': like_count,
'dislike_count': dislike_count,
'average_rating': float_or_none(video_info.get('avg_rating', [None])[0]),
'average_rating': average_rating,
'formats': formats,
'is_live': is_live,
'start_time': start_time,

View File

@ -1951,8 +1951,8 @@ def bool_or_none(v, default=None):
return v if isinstance(v, bool) else default
def strip_or_none(v):
return None if v is None else v.strip()
def strip_or_none(v, default=None):
return v.strip() if isinstance(v, compat_str) else default
def url_or_none(url):

View File

@ -1,3 +1,3 @@
from __future__ import unicode_literals
__version__ = 'vc.2019.05.16'
__version__ = 'vc.2019.06.08'