1
0
mirror of https://github.com/l1ving/youtube-dl synced 2025-03-10 16:47:21 +08:00

Merge remote-tracking branch 'upstream/master' into dailymotion-age-limit

This commit is contained in:
Antoine Guillemin 2019-01-18 15:31:46 +01:00
commit b0971b17f1
59 changed files with 1522 additions and 655 deletions

View File

@ -6,8 +6,8 @@
---
### Make sure you are using the *latest* version: run `youtube-dl --version` and ensure your version is *2018.12.17*. If it's not, read [this FAQ entry](https://github.com/rg3/youtube-dl/blob/master/README.md#how-do-i-update-youtube-dl) and update. Issues with outdated version will be rejected.
- [ ] I've **verified** and **I assure** that I'm running youtube-dl **2018.12.17**
### Make sure you are using the *latest* version: run `youtube-dl --version` and ensure your version is *2019.01.17*. If it's not, read [this FAQ entry](https://github.com/rg3/youtube-dl/blob/master/README.md#how-do-i-update-youtube-dl) and update. Issues with outdated version will be rejected.
- [ ] I've **verified** and **I assure** that I'm running youtube-dl **2019.01.17**
### Before submitting an *issue* make sure you have:
- [ ] At least skimmed through the [README](https://github.com/rg3/youtube-dl/blob/master/README.md), **most notably** the [FAQ](https://github.com/rg3/youtube-dl#faq) and [BUGS](https://github.com/rg3/youtube-dl#bugs) sections
@ -36,7 +36,7 @@ Add the `-v` flag to **your command line** you run youtube-dl with (`youtube-dl
[debug] User config: []
[debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj']
[debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251
[debug] youtube-dl version 2018.12.17
[debug] youtube-dl version 2019.01.17
[debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2
[debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4
[debug] Proxy map: {}

View File

@ -153,15 +153,19 @@ After you have ensured this site is distributing its content legally, you can fo
5. Add an import in [`youtube_dl/extractor/extractors.py`](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/extractors.py).
6. Run `python test/test_download.py TestDownload.test_YourExtractor`. This *should fail* at first, but you can continually re-run it until you're done. If you decide to add more than one test, then rename ``_TEST`` to ``_TESTS`` and make it into a list of dictionaries. The tests will then be named `TestDownload.test_YourExtractor`, `TestDownload.test_YourExtractor_1`, `TestDownload.test_YourExtractor_2`, etc. Note that tests with `only_matching` key in test's dict are not counted in.
7. Have a look at [`youtube_dl/extractor/common.py`](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/common.py) for possible helper methods and a [detailed description of what your extractor should and may return](https://github.com/rg3/youtube-dl/blob/7f41a598b3fba1bcab2817de64a08941200aa3c8/youtube_dl/extractor/common.py#L94-L303). Add tests and code for as many as you want.
8. Make sure your code follows [youtube-dl coding conventions](#youtube-dl-coding-conventions) and check the code with [flake8](https://pypi.python.org/pypi/flake8). Also make sure your code works under all [Python](https://www.python.org/) versions claimed supported by youtube-dl, namely 2.6, 2.7, and 3.2+.
9. When the tests pass, [add](https://git-scm.com/docs/git-add) the new files and [commit](https://git-scm.com/docs/git-commit) them and [push](https://git-scm.com/docs/git-push) the result, like this:
8. Make sure your code follows [youtube-dl coding conventions](#youtube-dl-coding-conventions) and check the code with [flake8](http://flake8.pycqa.org/en/latest/index.html#quickstart):
$ flake8 youtube_dl/extractor/yourextractor.py
9. Make sure your code works under all [Python](https://www.python.org/) versions claimed supported by youtube-dl, namely 2.6, 2.7, and 3.2+.
10. When the tests pass, [add](https://git-scm.com/docs/git-add) the new files and [commit](https://git-scm.com/docs/git-commit) them and [push](https://git-scm.com/docs/git-push) the result, like this:
$ git add youtube_dl/extractor/extractors.py
$ git add youtube_dl/extractor/yourextractor.py
$ git commit -m '[yourextractor] Add new extractor'
$ git push origin yourextractor
10. Finally, [create a pull request](https://help.github.com/articles/creating-a-pull-request). We'll then review and merge it.
11. Finally, [create a pull request](https://help.github.com/articles/creating-a-pull-request). We'll then review and merge it.
In any case, thank you very much for your contributions!
@ -257,11 +261,33 @@ title = meta.get('title') or self._og_search_title(webpage)
This code will try to extract from `meta` first and if it fails it will try extracting `og:title` from a `webpage`.
### Make regular expressions flexible
### Regular expressions
When using regular expressions try to write them fuzzy and flexible.
#### Don't capture groups you don't use
Capturing group must be an indication that it's used somewhere in the code. Any group that is not used must be non capturing.
##### Example
Don't capture id attribute name here since you can't use it for anything anyway.
Correct:
```python
r'(?:id|ID)=(?P<id>\d+)'
```
Incorrect:
```python
r'(id|ID)=(?P<id>\d+)'
```
#### Make regular expressions relaxed and flexible
When using regular expressions try to write them fuzzy, relaxed and flexible, skipping insignificant parts that are more likely to change, allowing both single and double quotes for quoted values and so on.
#### Example
##### Example
Say you need to extract `title` from the following HTML code:
@ -294,6 +320,25 @@ title = self._search_regex(
webpage, 'title', group='title')
```
### Long lines policy
There is a soft limit to keep lines of code under 80 characters long. This means it should be respected if possible and if it does not make readability and code maintenance worse.
For example, you should **never** split long string literals like URLs or some other often copied entities over multiple lines to fit this limit:
Correct:
```python
'https://www.youtube.com/watch?v=FqZTN594JQw&list=PLMYEtVRpaqY00V9W81Cwmzp6N6vZqfUKD4'
```
Incorrect:
```python
'https://www.youtube.com/watch?v=FqZTN594JQw&list='
'PLMYEtVRpaqY00V9W81Cwmzp6N6vZqfUKD4'
```
### Use safe conversion functions
Wrap all extracted numeric data into safe functions from [`youtube_dl/utils.py`](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/utils.py): `int_or_none`, `float_or_none`. Use them for string to number conversions as well.

View File

@ -1,3 +1,89 @@
version 2019.01.17
Extractors
* [youtube] Extend JS player signature function name regular expressions
(#18890, #18891, #18893)
version 2019.01.16
Core
+ [test/helper] Add support for maxcount and count collection len checkers
* [downloader/hls] Fix uplynk ad skipping (#18824)
* [postprocessor/ffmpeg] Improve ffmpeg version parsing (#18813)
Extractors
* [youtube] Skip unsupported adaptive stream type (#18804)
+ [youtube] Extract DASH formats from player response (#18804)
* [funimation] Fix extraction (#14089)
* [skylinewebcams] Fix extraction (#18853)
+ [curiositystream] Add support for non app URLs
+ [bitchute] Check formats (#18833)
* [wistia] Extend URL regular expression (#18823)
+ [playplustv] Add support for playplus.com (#18789)
version 2019.01.10
Core
* [extractor/common] Use episode name as title in _json_ld
+ [extractor/common] Add support for movies in _json_ld
* [postprocessor/ffmpeg] Embed subtitles with non-standard language codes
(#18765)
+ [utils] Add language codes replaced in 1989 revision of ISO 639
to ISO639Utils (#18765)
Extractors
* [youtube] Extract live HLS URL from player response (#18799)
+ [outsidetv] Add support for outsidetv.com (#18774)
* [jwplatform] Use JW Platform Delivery API V2 and add support for more URLs
+ [fox] Add support National Geographic (#17985, #15333, #14698)
+ [playplustv] Add support for playplus.tv (#18789)
* [globo] Set GLBID cookie manually (#17346)
+ [gaia] Add support for gaia.com (#14605)
* [youporn] Fix title and description extraction (#18748)
+ [hungama] Add support for hungama.com (#17402, #18771)
* [dtube] Fix extraction (#18741)
* [tvnow] Fix and rework extractors and prepare for a switch to the new API
(#17245, #18499)
* [carambatv:page] Fix extraction (#18739)
version 2019.01.02
Extractors
* [discovery] Use geo verification headers (#17838)
+ [packtpub] Add support for subscription.packtpub.com (#18718)
* [yourporn] Fix extraction (#18583)
+ [acast:channel] Add support for play.acast.com (#18587)
+ [extractors] Add missing age limits (#18621)
+ [rmcdecouverte] Add support for live stream
* [rmcdecouverte] Bypass geo restriction
* [rmcdecouverte] Update URL regular expression (#18595, 18697)
* [manyvids] Fix extraction (#18604, #18614)
* [bitchute] Fix extraction (#18567)
version 2018.12.31
Extractors
+ [bbc] Add support for another embed pattern (#18643)
+ [npo:live] Add support for npostart.nl (#18644)
* [beeg] Fix extraction (#18610, #18626)
* [youtube] Unescape HTML for series (#18641)
+ [youtube] Extract more format metadata
* [youtube] Detect DRM protected videos (#1774)
* [youtube] Relax HTML5 player regular expressions (#18465, #18466)
* [youtube] Extend HTML5 player regular expression (#17516)
+ [liveleak] Add support for another embed type and restore original
format extraction
+ [crackle] Extract ISM and HTTP formats
+ [twitter] Pass Referer with card request (#18579)
* [mediasite] Extend URL regular expression (#18558)
+ [lecturio] Add support for lecturio.de (#18562)
+ [discovery] Add support for Scripps Networks watch domains (#17947)
version 2018.12.17
Extractors

View File

@ -496,7 +496,7 @@ The `-o` option allows users to indicate a template for the output file names.
**tl;dr:** [navigate me to examples](#output-template-examples).
The basic usage is not to set any template arguments when downloading a single file, like in `youtube-dl -o funny_video.flv "https://some/video"`. However, it may contain special sequences that will be replaced when downloading each video. The special sequences may be formatted according to [python string formatting operations](https://docs.python.org/2/library/stdtypes.html#string-formatting). For example, `%(NAME)s` or `%(NAME)05d`. To clarify, that is a percent symbol followed by a name in parentheses, followed by a formatting operations. Allowed names along with sequence type are:
The basic usage is not to set any template arguments when downloading a single file, like in `youtube-dl -o funny_video.flv "https://some/video"`. However, it may contain special sequences that will be replaced when downloading each video. The special sequences may be formatted according to [python string formatting operations](https://docs.python.org/2/library/stdtypes.html#string-formatting). For example, `%(NAME)s` or `%(NAME)05d`. To clarify, that is a percent symbol followed by a name in parentheses, followed by formatting operations. Allowed names along with sequence type are:
- `id` (string): Video identifier
- `title` (string): Video title
@ -1025,15 +1025,19 @@ After you have ensured this site is distributing its content legally, you can fo
5. Add an import in [`youtube_dl/extractor/extractors.py`](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/extractors.py).
6. Run `python test/test_download.py TestDownload.test_YourExtractor`. This *should fail* at first, but you can continually re-run it until you're done. If you decide to add more than one test, then rename ``_TEST`` to ``_TESTS`` and make it into a list of dictionaries. The tests will then be named `TestDownload.test_YourExtractor`, `TestDownload.test_YourExtractor_1`, `TestDownload.test_YourExtractor_2`, etc. Note that tests with `only_matching` key in test's dict are not counted in.
7. Have a look at [`youtube_dl/extractor/common.py`](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/common.py) for possible helper methods and a [detailed description of what your extractor should and may return](https://github.com/rg3/youtube-dl/blob/7f41a598b3fba1bcab2817de64a08941200aa3c8/youtube_dl/extractor/common.py#L94-L303). Add tests and code for as many as you want.
8. Make sure your code follows [youtube-dl coding conventions](#youtube-dl-coding-conventions) and check the code with [flake8](https://pypi.python.org/pypi/flake8). Also make sure your code works under all [Python](https://www.python.org/) versions claimed supported by youtube-dl, namely 2.6, 2.7, and 3.2+.
9. When the tests pass, [add](https://git-scm.com/docs/git-add) the new files and [commit](https://git-scm.com/docs/git-commit) them and [push](https://git-scm.com/docs/git-push) the result, like this:
8. Make sure your code follows [youtube-dl coding conventions](#youtube-dl-coding-conventions) and check the code with [flake8](http://flake8.pycqa.org/en/latest/index.html#quickstart):
$ flake8 youtube_dl/extractor/yourextractor.py
9. Make sure your code works under all [Python](https://www.python.org/) versions claimed supported by youtube-dl, namely 2.6, 2.7, and 3.2+.
10. When the tests pass, [add](https://git-scm.com/docs/git-add) the new files and [commit](https://git-scm.com/docs/git-commit) them and [push](https://git-scm.com/docs/git-push) the result, like this:
$ git add youtube_dl/extractor/extractors.py
$ git add youtube_dl/extractor/yourextractor.py
$ git commit -m '[yourextractor] Add new extractor'
$ git push origin yourextractor
10. Finally, [create a pull request](https://help.github.com/articles/creating-a-pull-request). We'll then review and merge it.
11. Finally, [create a pull request](https://help.github.com/articles/creating-a-pull-request). We'll then review and merge it.
In any case, thank you very much for your contributions!
@ -1129,11 +1133,33 @@ title = meta.get('title') or self._og_search_title(webpage)
This code will try to extract from `meta` first and if it fails it will try extracting `og:title` from a `webpage`.
### Make regular expressions flexible
### Regular expressions
When using regular expressions try to write them fuzzy and flexible.
#### Don't capture groups you don't use
Capturing group must be an indication that it's used somewhere in the code. Any group that is not used must be non capturing.
##### Example
Don't capture id attribute name here since you can't use it for anything anyway.
Correct:
```python
r'(?:id|ID)=(?P<id>\d+)'
```
Incorrect:
```python
r'(id|ID)=(?P<id>\d+)'
```
#### Make regular expressions relaxed and flexible
When using regular expressions try to write them fuzzy, relaxed and flexible, skipping insignificant parts that are more likely to change, allowing both single and double quotes for quoted values and so on.
#### Example
##### Example
Say you need to extract `title` from the following HTML code:
@ -1166,6 +1192,25 @@ title = self._search_regex(
webpage, 'title', group='title')
```
### Long lines policy
There is a soft limit to keep lines of code under 80 characters long. This means it should be respected if possible and if it does not make readability and code maintenance worse.
For example, you should **never** split long string literals like URLs or some other often copied entities over multiple lines to fit this limit:
Correct:
```python
'https://www.youtube.com/watch?v=FqZTN594JQw&list=PLMYEtVRpaqY00V9W81Cwmzp6N6vZqfUKD4'
```
Incorrect:
```python
'https://www.youtube.com/watch?v=FqZTN594JQw&list='
'PLMYEtVRpaqY00V9W81Cwmzp6N6vZqfUKD4'
```
### Use safe conversion functions
Wrap all extracted numeric data into safe functions from [`youtube_dl/utils.py`](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/utils.py): `int_or_none`, `float_or_none`. Use them for string to number conversions as well.

View File

@ -320,6 +320,7 @@
- **Fusion**
- **Fux**
- **FXNetworks**
- **Gaia**
- **GameInformer**
- **GameOne**
- **gameone:playlist**
@ -370,6 +371,8 @@
- **HRTiPlaylist**
- **Huajiao**: 花椒直播
- **HuffPost**: Huffington Post
- **Hungama**
- **HungamaSong**
- **Hypem**
- **Iconosquare**
- **ign.com**
@ -438,6 +441,7 @@
- **Lecture2Go**
- **Lecturio**
- **LecturioCourse**
- **LecturioDeCourse**
- **LEGO**
- **Lemonde**
- **Lenta**
@ -539,8 +543,6 @@
- **MyviEmbed**
- **MyVisionTV**
- **n-tv.de**
- **natgeo**
- **natgeo:episodeguide**
- **natgeo:video**
- **Naver**
- **NBA**
@ -641,6 +643,7 @@
- **orf:oe1**: Radio Österreich 1
- **orf:tvthek**: ORF TVthek
- **OsnatelTV**
- **OutsideTV**
- **PacktPub**
- **PacktPubCourse**
- **PandaTV**: 熊猫TV
@ -665,6 +668,7 @@
- **Pinkbike**
- **Pladform**
- **play.fm**
- **PlayPlusTV**
- **PlaysTV**
- **Playtvak**: Playtvak.cz, iDNES.cz and Lidovky.cz
- **Playvid**
@ -933,7 +937,9 @@
- **TVNet**
- **TVNoe**
- **TVNow**
- **TVNowList**
- **TVNowAnnual**
- **TVNowNew**
- **TVNowSeason**
- **TVNowShow**
- **tvp**: Telewizja Polska
- **tvp:embed**: Telewizja Polska

View File

@ -153,15 +153,27 @@ def expect_value(self, got, expected, field):
isinstance(got, compat_str),
'Expected field %s to be a unicode object, but got value %r of type %r' % (field, got, type(got)))
got = 'md5:' + md5(got)
elif isinstance(expected, compat_str) and expected.startswith('mincount:'):
elif isinstance(expected, compat_str) and re.match(r'^(?:min|max)?count:\d+', expected):
self.assertTrue(
isinstance(got, (list, dict)),
'Expected field %s to be a list or a dict, but it is of type %s' % (
field, type(got).__name__))
expected_num = int(expected.partition(':')[2])
assertGreaterEqual(
op, _, expected_num = expected.partition(':')
expected_num = int(expected_num)
if op == 'mincount':
assert_func = assertGreaterEqual
msg_tmpl = 'Expected %d items in field %s, but only got %d'
elif op == 'maxcount':
assert_func = assertLessEqual
msg_tmpl = 'Expected maximum %d items in field %s, but got %d'
elif op == 'count':
assert_func = assertEqual
msg_tmpl = 'Expected exactly %d items in field %s, but got %d'
else:
assert False
assert_func(
self, len(got), expected_num,
'Expected %d items in field %s, but only got %d' % (expected_num, field, len(got)))
msg_tmpl % (expected_num, field, len(got)))
return
self.assertEqual(
expected, got,
@ -237,6 +249,20 @@ def assertGreaterEqual(self, got, expected, msg=None):
self.assertTrue(got >= expected, msg)
def assertLessEqual(self, got, expected, msg=None):
if not (got <= expected):
if msg is None:
msg = '%r not less than or equal to %r' % (got, expected)
self.assertTrue(got <= expected, msg)
def assertEqual(self, got, expected, msg=None):
if not (got == expected):
if msg is None:
msg = '%r not equal to %r' % (got, expected)
self.assertTrue(got == expected, msg)
def expect_warnings(ydl, warnings_re):
real_warning = ydl.report_warning

View File

@ -75,10 +75,14 @@ class HlsFD(FragmentFD):
fd.add_progress_hook(ph)
return fd.real_download(filename, info_dict)
def is_ad_fragment(s):
def is_ad_fragment_start(s):
return (s.startswith('#ANVATO-SEGMENT-INFO') and 'type=ad' in s or
s.startswith('#UPLYNK-SEGMENT') and s.endswith(',ad'))
def is_ad_fragment_end(s):
return (s.startswith('#ANVATO-SEGMENT-INFO') and 'type=master' in s or
s.startswith('#UPLYNK-SEGMENT') and s.endswith(',segment'))
media_frags = 0
ad_frags = 0
ad_frag_next = False
@ -87,12 +91,13 @@ class HlsFD(FragmentFD):
if not line:
continue
if line.startswith('#'):
if is_ad_fragment(line):
ad_frags += 1
if is_ad_fragment_start(line):
ad_frag_next = True
elif is_ad_fragment_end(line):
ad_frag_next = False
continue
if ad_frag_next:
ad_frag_next = False
ad_frags += 1
continue
media_frags += 1
@ -123,7 +128,6 @@ class HlsFD(FragmentFD):
if line:
if not line.startswith('#'):
if ad_frag_next:
ad_frag_next = False
continue
frag_index += 1
if frag_index <= ctx['fragment_index']:
@ -196,8 +200,10 @@ class HlsFD(FragmentFD):
'start': sub_range_start,
'end': sub_range_start + int(splitted_byte_range[0]),
}
elif is_ad_fragment(line):
elif is_ad_fragment_start(line):
ad_frag_next = True
elif is_ad_fragment_end(line):
ad_frag_next = False
self._finish_frag_download(ctx)

View File

@ -79,17 +79,27 @@ class ACastIE(InfoExtractor):
class ACastChannelIE(InfoExtractor):
IE_NAME = 'acast:channel'
_VALID_URL = r'https?://(?:www\.)?acast\.com/(?P<id>[^/#?]+)'
_TEST = {
'url': 'https://www.acast.com/condenasttraveler',
_VALID_URL = r'''(?x)
https?://
(?:
(?:www\.)?acast\.com/|
play\.acast\.com/s/
)
(?P<id>[^/#?]+)
'''
_TESTS = [{
'url': 'https://www.acast.com/todayinfocus',
'info_dict': {
'id': '50544219-29bb-499e-a083-6087f4cb7797',
'title': 'Condé Nast Traveler Podcast',
'description': 'md5:98646dee22a5b386626ae31866638fbd',
'id': '4efc5294-5385-4847-98bd-519799ce5786',
'title': 'Today in Focus',
'description': 'md5:9ba5564de5ce897faeb12963f4537a64',
},
'playlist_mincount': 20,
}
_API_BASE_URL = 'https://www.acast.com/api/'
'playlist_mincount': 35,
}, {
'url': 'http://play.acast.com/s/ft-banking-weekly',
'only_matching': True,
}]
_API_BASE_URL = 'https://play.acast.com/api/'
_PAGE_SIZE = 10
@classmethod
@ -102,7 +112,7 @@ class ACastChannelIE(InfoExtractor):
channel_slug, note='Download page %d of channel data' % page)
for cast in casts:
yield self.url_result(
'https://www.acast.com/%s/%s' % (channel_slug, cast['url']),
'https://play.acast.com/s/%s/%s' % (channel_slug, cast['url']),
'ACast', cast['id'])
def _real_extract(self, url):

View File

@ -62,7 +62,7 @@ class AudiomackIE(InfoExtractor):
# Audiomack wraps a lot of soundcloud tracks in their branded wrapper
# if so, pass the work off to the soundcloud extractor
if SoundcloudIE.suitable(api_response['url']):
return {'_type': 'url', 'url': api_response['url'], 'ie_key': 'Soundcloud'}
return self.url_result(api_response['url'], SoundcloudIE.ie_key())
return {
'id': compat_str(api_response.get('id', album_url_tag)),

View File

@ -795,6 +795,15 @@ class BBCIE(BBCCoUkIE):
'uploader': 'Radio 3',
'uploader_id': 'bbc_radio_three',
},
}, {
'url': 'http://www.bbc.co.uk/learningenglish/chinese/features/lingohack/ep-181227',
'info_dict': {
'id': 'p06w9tws',
'ext': 'mp4',
'title': 'md5:2fabf12a726603193a2879a055f72514',
'description': 'Learn English words and phrases from this story',
},
'add_ie': [BBCCoUkIE.ie_key()],
}]
@classmethod
@ -945,6 +954,15 @@ class BBCIE(BBCCoUkIE):
if entries:
return self.playlist_result(entries, playlist_id, playlist_title, playlist_description)
# http://www.bbc.co.uk/learningenglish/chinese/features/lingohack/ep-181227
group_id = self._search_regex(
r'<div[^>]+\bclass=["\']video["\'][^>]+\bdata-pid=["\'](%s)' % self._ID_REGEX,
webpage, 'group id', default=None)
if playlist_id:
return self.url_result(
'https://www.bbc.co.uk/programmes/%s' % group_id,
ie=BBCCoUkIE.ie_key())
# single video story (e.g. http://www.bbc.com/travel/story/20150625-sri-lankas-spicy-secret)
programme_id = self._search_regex(
[r'data-(?:video-player|media)-vpid="(%s)"' % self._ID_REGEX,

View File

@ -1,15 +1,10 @@
from __future__ import unicode_literals
from .common import InfoExtractor
from ..compat import (
compat_chr,
compat_ord,
compat_urllib_parse_unquote,
)
from ..compat import compat_str
from ..utils import (
int_or_none,
parse_iso8601,
urljoin,
unified_timestamp,
)
@ -36,29 +31,9 @@ class BeegIE(InfoExtractor):
webpage = self._download_webpage(url, video_id)
cpl_url = self._search_regex(
r'<script[^>]+src=(["\'])(?P<url>(?:/static|(?:https?:)?//static\.beeg\.com)/cpl/\d+\.js.*?)\1',
webpage, 'cpl', default=None, group='url')
cpl_url = urljoin(url, cpl_url)
beeg_version, beeg_salt = [None] * 2
if cpl_url:
cpl = self._download_webpage(
self._proto_relative_url(cpl_url), video_id,
'Downloading cpl JS', fatal=False)
if cpl:
beeg_version = int_or_none(self._search_regex(
r'beeg_version\s*=\s*([^\b]+)', cpl,
'beeg version', default=None)) or self._search_regex(
r'/(\d+)\.js', cpl_url, 'beeg version', default=None)
beeg_salt = self._search_regex(
r'beeg_salt\s*=\s*(["\'])(?P<beeg_salt>.+?)\1', cpl, 'beeg salt',
default=None, group='beeg_salt')
beeg_version = beeg_version or '2185'
beeg_salt = beeg_salt or 'pmweAkq8lAYKdfWcFCUj0yoVgoPlinamH5UE1CB3H'
beeg_version = self._search_regex(
r'beeg_version\s*=\s*([\da-zA-Z_-]+)', webpage, 'beeg version',
default='1546225636701')
for api_path in ('', 'api.'):
video = self._download_json(
@ -68,37 +43,6 @@ class BeegIE(InfoExtractor):
if video:
break
def split(o, e):
def cut(s, x):
n.append(s[:x])
return s[x:]
n = []
r = len(o) % e
if r > 0:
o = cut(o, r)
while len(o) > e:
o = cut(o, e)
n.append(o)
return n
def decrypt_key(key):
# Reverse engineered from http://static.beeg.com/cpl/1738.js
a = beeg_salt
e = compat_urllib_parse_unquote(key)
o = ''.join([
compat_chr(compat_ord(e[n]) - compat_ord(a[n % len(a)]) % 21)
for n in range(len(e))])
return ''.join(split(o, 3)[::-1])
def decrypt_url(encrypted_url):
encrypted_url = self._proto_relative_url(
encrypted_url.replace('{DATA_MARKERS}', ''), 'https:')
key = self._search_regex(
r'/key=(.*?)%2Cend=', encrypted_url, 'key', default=None)
if not key:
return encrypted_url
return encrypted_url.replace(key, decrypt_key(key))
formats = []
for format_id, video_url in video.items():
if not video_url:
@ -108,18 +52,20 @@ class BeegIE(InfoExtractor):
if not height:
continue
formats.append({
'url': decrypt_url(video_url),
'url': self._proto_relative_url(
video_url.replace('{DATA_MARKERS}', 'data=pc_XX__%s_0' % beeg_version), 'https:'),
'format_id': format_id,
'height': int(height),
})
self._sort_formats(formats)
title = video['title']
video_id = video.get('id') or video_id
video_id = compat_str(video.get('id') or video_id)
display_id = video.get('code')
description = video.get('desc')
series = video.get('ps_name')
timestamp = parse_iso8601(video.get('date'), ' ')
timestamp = unified_timestamp(video.get('date'))
duration = int_or_none(video.get('duration'))
tags = [tag.strip() for tag in video['tags'].split(',')] if video.get('tags') else None
@ -129,6 +75,7 @@ class BeegIE(InfoExtractor):
'display_id': display_id,
'title': title,
'description': description,
'series': series,
'timestamp': timestamp,
'duration': duration,
'tags': tags,

View File

@ -5,7 +5,10 @@ import itertools
import re
from .common import InfoExtractor
from ..utils import urlencode_postdata
from ..utils import (
orderedSet,
urlencode_postdata,
)
class BitChuteIE(InfoExtractor):
@ -43,10 +46,16 @@ class BitChuteIE(InfoExtractor):
'description', webpage, 'title',
default=None) or self._og_search_description(webpage)
format_urls = []
for mobj in re.finditer(
r'addWebSeed\s*\(\s*(["\'])(?P<url>(?:(?!\1).)+)\1', webpage):
format_urls.append(mobj.group('url'))
format_urls.extend(re.findall(r'as=(https?://[^&"\']+)', webpage))
formats = [
{'url': mobj.group('url')}
for mobj in re.finditer(
r'addWebSeed\s*\(\s*(["\'])(?P<url>(?:(?!\1).)+)\1', webpage)]
{'url': format_url}
for format_url in orderedSet(format_urls)]
self._check_formats(formats, video_id)
self._sort_formats(formats)
description = self._html_search_regex(

View File

@ -14,6 +14,7 @@ class CamModelsIE(InfoExtractor):
_TESTS = [{
'url': 'https://www.cammodels.com/cam/AutumnKnight/',
'only_matching': True,
'age_limit': 18
}]
def _real_extract(self, url):
@ -93,4 +94,5 @@ class CamModelsIE(InfoExtractor):
'title': self._live_title(user_id),
'is_live': True,
'formats': formats,
'age_limit': 18
}

View File

@ -20,6 +20,7 @@ class CamTubeIE(InfoExtractor):
'duration': 1274,
'timestamp': 1528018608,
'upload_date': '20180603',
'age_limit': 18
},
'params': {
'skip_download': True,
@ -66,4 +67,5 @@ class CamTubeIE(InfoExtractor):
'like_count': like_count,
'creator': creator,
'formats': formats,
'age_limit': 18
}

View File

@ -25,6 +25,7 @@ class CamWithHerIE(InfoExtractor):
'comment_count': int,
'uploader': 'MileenaK',
'upload_date': '20160322',
'age_limit': 18,
},
'params': {
'skip_download': True,
@ -84,4 +85,5 @@ class CamWithHerIE(InfoExtractor):
'comment_count': comment_count,
'uploader': uploader,
'upload_date': upload_date,
'age_limit': 18
}

View File

@ -82,6 +82,12 @@ class CarambaTVPageIE(InfoExtractor):
webpage = self._download_webpage(url, video_id)
videomore_url = VideomoreIE._extract_url(webpage)
if not videomore_url:
videomore_id = self._search_regex(
r'getVMCode\s*\(\s*["\']?(\d+)', webpage, 'videomore id',
default=None)
if videomore_id:
videomore_url = 'videomore:%s' % videomore_id
if videomore_url:
title = self._og_search_title(webpage)
return {

View File

@ -1,20 +1,19 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .turner import TurnerBaseIE
from ..utils import int_or_none
class CartoonNetworkIE(TurnerBaseIE):
_VALID_URL = r'https?://(?:www\.)?cartoonnetwork\.com/video/(?:[^/]+/)+(?P<id>[^/?#]+)-(?:clip|episode)\.html'
_TEST = {
'url': 'http://www.cartoonnetwork.com/video/teen-titans-go/starfire-the-cat-lady-clip.html',
'url': 'https://www.cartoonnetwork.com/video/ben-10/how-to-draw-upgrade-episode.html',
'info_dict': {
'id': '8a250ab04ed07e6c014ef3f1e2f9016c',
'id': '6e3375097f63874ebccec7ef677c1c3845fa850e',
'ext': 'mp4',
'title': 'Starfire the Cat Lady',
'description': 'Robin decides to become a cat so that Starfire will finally love him.',
'title': 'How to Draw Upgrade',
'description': 'md5:2061d83776db7e8be4879684eefe8c0f',
},
'params': {
# m3u8 download
@ -25,18 +24,39 @@ class CartoonNetworkIE(TurnerBaseIE):
def _real_extract(self, url):
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
id_type, video_id = re.search(r"_cnglobal\.cvp(Video|Title)Id\s*=\s*'([^']+)';", webpage).groups()
query = ('id' if id_type == 'Video' else 'titleId') + '=' + video_id
return self._extract_cvp_info(
'http://www.cartoonnetwork.com/video-seo-svc/episodeservices/getCvpPlaylist?networkName=CN2&' + query, video_id, {
'secure': {
'media_src': 'http://androidhls-secure.cdn.turner.com/toon/big',
'tokenizer_src': 'https://token.vgtf.net/token/token_mobile',
},
}, {
def find_field(global_re, name, content_re=None, value_re='[^"]+', fatal=False):
metadata_re = ''
if content_re:
metadata_re = r'|video_metadata\.content_' + content_re
return self._search_regex(
r'(?:_cnglobal\.currentVideo\.%s%s)\s*=\s*"(%s)";' % (global_re, metadata_re, value_re),
webpage, name, fatal=fatal)
media_id = find_field('mediaId', 'media id', 'id', '[0-9a-f]{40}', True)
title = find_field('episodeTitle', 'title', '(?:episodeName|name)', fatal=True)
info = self._extract_ngtv_info(
media_id, {'networkId': 'cartoonnetwork'}, {
'url': url,
'site_name': 'CartoonNetwork',
'auth_required': self._search_regex(
r'_cnglobal\.cvpFullOrPreviewAuth\s*=\s*(true|false);',
webpage, 'auth required', default='false') == 'true',
'auth_required': find_field('authType', 'auth type') != 'unauth',
})
series = find_field(
'propertyName', 'series', 'showName') or self._html_search_meta('partOfSeries', webpage)
info.update({
'id': media_id,
'display_id': display_id,
'title': title,
'description': self._html_search_meta('description', webpage),
'series': series,
'episode': title,
})
for field in ('season', 'episode'):
field_name = field + 'Number'
info[field + '_number'] = int_or_none(find_field(
field_name, field + ' number', value_re=r'\d+') or self._html_search_meta(field_name, webpage))
return info

View File

@ -119,11 +119,7 @@ class CNNBlogsIE(InfoExtractor):
def _real_extract(self, url):
webpage = self._download_webpage(url, url_basename(url))
cnn_url = self._html_search_regex(r'data-url="(.+?)"', webpage, 'cnn url')
return {
'_type': 'url',
'url': cnn_url,
'ie_key': CNNIE.ie_key(),
}
return self.url_result(cnn_url, CNNIE.ie_key())
class CNNArticleIE(InfoExtractor):
@ -145,8 +141,4 @@ class CNNArticleIE(InfoExtractor):
def _real_extract(self, url):
webpage = self._download_webpage(url, url_basename(url))
cnn_url = self._html_search_regex(r"video:\s*'([^']+)'", webpage, 'cnn url')
return {
'_type': 'url',
'url': 'http://cnn.com/video/?/video/' + cnn_url,
'ie_key': CNNIE.ie_key(),
}
return self.url_result('http://cnn.com/video/?/video/' + cnn_url, CNNIE.ie_key())

View File

@ -1239,17 +1239,27 @@ class InfoExtractor(object):
if expected_type is not None and expected_type != item_type:
return info
if item_type in ('TVEpisode', 'Episode'):
episode_name = unescapeHTML(e.get('name'))
info.update({
'episode': unescapeHTML(e.get('name')),
'episode': episode_name,
'episode_number': int_or_none(e.get('episodeNumber')),
'description': unescapeHTML(e.get('description')),
})
if not info.get('title') and episode_name:
info['title'] = episode_name
part_of_season = e.get('partOfSeason')
if isinstance(part_of_season, dict) and part_of_season.get('@type') in ('TVSeason', 'Season', 'CreativeWorkSeason'):
info['season_number'] = int_or_none(part_of_season.get('seasonNumber'))
part_of_series = e.get('partOfSeries') or e.get('partOfTVSeries')
if isinstance(part_of_series, dict) and part_of_series.get('@type') in ('TVSeries', 'Series', 'CreativeWorkSeries'):
info['series'] = unescapeHTML(part_of_series.get('name'))
elif item_type == 'Movie':
info.update({
'title': unescapeHTML(e.get('name')),
'description': unescapeHTML(e.get('description')),
'duration': parse_duration(e.get('duration')),
'timestamp': unified_timestamp(e.get('dateCreated')),
})
elif item_type in ('Article', 'NewsArticle'):
info.update({
'timestamp': parse_iso8601(e.get('datePublished')),

View File

@ -48,6 +48,21 @@ class CrackleIE(InfoExtractor):
'only_matching': True,
}]
_MEDIA_FILE_SLOTS = {
'360p.mp4': {
'width': 640,
'height': 360,
},
'480p.mp4': {
'width': 768,
'height': 432,
},
'480p_1mbps.mp4': {
'width': 852,
'height': 480,
},
}
def _real_extract(self, url):
video_id = self._match_id(url)
@ -95,6 +110,20 @@ class CrackleIE(InfoExtractor):
elif ext == 'mpd':
formats.extend(self._extract_mpd_formats(
format_url, video_id, mpd_id='dash', fatal=False))
elif format_url.endswith('.ism/Manifest'):
formats.extend(self._extract_ism_formats(
format_url, video_id, ism_id='mss', fatal=False))
else:
mfs_path = e.get('Type')
mfs_info = self._MEDIA_FILE_SLOTS.get(mfs_path)
if not mfs_info:
continue
formats.append({
'url': format_url,
'format_id': 'http-' + mfs_path.split('.')[0],
'width': mfs_info['width'],
'height': mfs_info['height'],
})
self._sort_formats(formats)
description = media.get('Description')

View File

@ -46,8 +46,24 @@ class CuriosityStreamBaseIE(InfoExtractor):
self._handle_errors(result)
self._auth_token = result['message']['auth_token']
def _extract_media_info(self, media):
video_id = compat_str(media['id'])
class CuriosityStreamIE(CuriosityStreamBaseIE):
IE_NAME = 'curiositystream'
_VALID_URL = r'https?://(?:app\.)?curiositystream\.com/video/(?P<id>\d+)'
_TEST = {
'url': 'https://app.curiositystream.com/video/2',
'md5': '262bb2f257ff301115f1973540de8983',
'info_dict': {
'id': '2',
'ext': 'mp4',
'title': 'How Did You Develop The Internet?',
'description': 'Vint Cerf, Google\'s Chief Internet Evangelist, describes how he and Bob Kahn created the internet.',
}
}
def _real_extract(self, url):
video_id = self._match_id(url)
media = self._call_api('media/' + video_id, video_id)
title = media['title']
formats = []
@ -114,38 +130,21 @@ class CuriosityStreamBaseIE(InfoExtractor):
}
class CuriosityStreamIE(CuriosityStreamBaseIE):
IE_NAME = 'curiositystream'
_VALID_URL = r'https?://app\.curiositystream\.com/video/(?P<id>\d+)'
_TEST = {
'url': 'https://app.curiositystream.com/video/2',
'md5': '262bb2f257ff301115f1973540de8983',
'info_dict': {
'id': '2',
'ext': 'mp4',
'title': 'How Did You Develop The Internet?',
'description': 'Vint Cerf, Google\'s Chief Internet Evangelist, describes how he and Bob Kahn created the internet.',
}
}
def _real_extract(self, url):
video_id = self._match_id(url)
media = self._call_api('media/' + video_id, video_id)
return self._extract_media_info(media)
class CuriosityStreamCollectionIE(CuriosityStreamBaseIE):
IE_NAME = 'curiositystream:collection'
_VALID_URL = r'https?://app\.curiositystream\.com/collection/(?P<id>\d+)'
_TEST = {
_VALID_URL = r'https?://(?:app\.)?curiositystream\.com/(?:collection|series)/(?P<id>\d+)'
_TESTS = [{
'url': 'https://app.curiositystream.com/collection/2',
'info_dict': {
'id': '2',
'title': 'Curious Minds: The Internet',
'description': 'How is the internet shaping our lives in the 21st Century?',
},
'playlist_mincount': 12,
}
'playlist_mincount': 17,
}, {
'url': 'https://curiositystream.com/series/2',
'only_matching': True,
}]
def _real_extract(self, url):
collection_id = self._match_id(url)
@ -153,7 +152,10 @@ class CuriosityStreamCollectionIE(CuriosityStreamBaseIE):
'collections/' + collection_id, collection_id)
entries = []
for media in collection.get('media', []):
entries.append(self._extract_media_info(media))
media_id = compat_str(media.get('id'))
entries.append(self.url_result(
'https://curiositystream.com/video/' + media_id,
CuriosityStreamIE.ie_key(), media_id))
return self.playlist_result(
entries, collection_id,
collection.get('title'), collection.get('description'))

View File

@ -17,16 +17,29 @@ from ..compat import compat_HTTPError
class DiscoveryIE(DiscoveryGoBaseIE):
_VALID_URL = r'''(?x)https?://(?:www\.)?(?P<site>
discovery|
investigationdiscovery|
discoverylife|
animalplanet|
ahctv|
destinationamerica|
sciencechannel|
tlc|
velocity
_VALID_URL = r'''(?x)https?://
(?P<site>
(?:www\.)?
(?:
discovery|
investigationdiscovery|
discoverylife|
animalplanet|
ahctv|
destinationamerica|
sciencechannel|
tlc|
velocity
)|
watch\.
(?:
hgtv|
foodnetwork|
travelchannel|
diynetwork|
cookingchanneltv|
motortrend
)
)\.com(?P<path>/tv-shows/[^/]+/(?:video|full-episode)s/(?P<id>[^./?#]+))'''
_TESTS = [{
'url': 'https://www.discovery.com/tv-shows/cash-cab/videos/dave-foley',
@ -71,7 +84,7 @@ class DiscoveryIE(DiscoveryGoBaseIE):
if not access_token:
access_token = self._download_json(
'https://www.%s.com/anonymous' % site, display_id, query={
'https://%s.com/anonymous' % site, display_id, query={
'authRel': 'authorization',
'client_id': try_get(
react_data, lambda x: x['application']['apiClientId'],
@ -81,11 +94,12 @@ class DiscoveryIE(DiscoveryGoBaseIE):
})['access_token']
try:
headers = self.geo_verification_headers()
headers['Authorization'] = 'Bearer ' + access_token
stream = self._download_json(
'https://api.discovery.com/v1/streaming/video/' + video_id,
display_id, headers={
'Authorization': 'Bearer ' + access_token,
})
display_id, headers=headers)
except ExtractorError as e:
if isinstance(e.cause, compat_HTTPError) and e.cause.code in (401, 403):
e_description = self._parse_json(

View File

@ -15,16 +15,16 @@ from ..utils import (
class DTubeIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?d\.tube/(?:#!/)?v/(?P<uploader_id>[0-9a-z.-]+)/(?P<id>[0-9a-z]{8})'
_TEST = {
'url': 'https://d.tube/#!/v/benswann/zqd630em',
'md5': 'a03eaa186618ffa7a3145945543a251e',
'url': 'https://d.tube/#!/v/broncnutz/x380jtr1',
'md5': '9f29088fa08d699a7565ee983f56a06e',
'info_dict': {
'id': 'zqd630em',
'id': 'x380jtr1',
'ext': 'mp4',
'title': 'Reality Check: FDA\'s Disinformation Campaign on Kratom',
'description': 'md5:700d164e066b87f9eac057949e4227c2',
'uploader_id': 'benswann',
'upload_date': '20180222',
'timestamp': 1519328958,
'title': 'Lefty 3-Rings is Back Baby!! NCAA Picks',
'description': 'md5:60be222088183be3a42f196f34235776',
'uploader_id': 'broncnutz',
'upload_date': '20190107',
'timestamp': 1546854054,
},
'params': {
'format': '480p',
@ -48,7 +48,7 @@ class DTubeIE(InfoExtractor):
def canonical_url(h):
if not h:
return None
return 'https://ipfs.io/ipfs/' + h
return 'https://video.dtube.top/ipfs/' + h
formats = []
for q in ('240', '480', '720', '1080', ''):

View File

@ -411,6 +411,7 @@ from .funk import (
from .funnyordie import FunnyOrDieIE
from .fusion import FusionIE
from .fxnetworks import FXNetworksIE
from .gaia import GaiaIE
from .gameinformer import GameInformerIE
from .gameone import (
GameOneIE,
@ -469,6 +470,10 @@ from .hrti import (
)
from .huajiao import HuajiaoIE
from .huffpost import HuffPostIE
from .hungama import (
HungamaIE,
HungamaSongIE,
)
from .hypem import HypemIE
from .iconosquare import IconosquareIE
from .ign import (
@ -557,6 +562,7 @@ from .lecture2go import Lecture2GoIE
from .lecturio import (
LecturioIE,
LecturioCourseIE,
LecturioDeCourseIE,
)
from .leeco import (
LeIE,
@ -681,11 +687,7 @@ from .myvi import (
MyviEmbedIE,
)
from .myvidster import MyVidsterIE
from .nationalgeographic import (
NationalGeographicVideoIE,
NationalGeographicIE,
NationalGeographicEpisodeGuideIE,
)
from .nationalgeographic import NationalGeographicVideoIE
from .naver import NaverIE
from .nba import NBAIE
from .nbc import (
@ -827,6 +829,7 @@ from .orf import (
ORFOE1IE,
ORFIPTVIE,
)
from .outsidetv import OutsideTVIE
from .packtpub import (
PacktPubIE,
PacktPubCourseIE,
@ -855,6 +858,7 @@ from .piksel import PikselIE
from .pinkbike import PinkbikeIE
from .pladform import PladformIE
from .playfm import PlayFMIE
from .playplustv import PlayPlusTVIE
from .plays import PlaysTVIE
from .playtvak import PlaytvakIE
from .playvid import PlayvidIE
@ -1192,7 +1196,9 @@ from .tvnet import TVNetIE
from .tvnoe import TVNoeIE
from .tvnow import (
TVNowIE,
TVNowListIE,
TVNowNewIE,
TVNowSeasonIE,
TVNowAnnualIE,
TVNowShowIE,
)
from .tvp import (

View File

@ -1,11 +1,11 @@
# coding: utf-8
from __future__ import unicode_literals
# import json
# import uuid
from .adobepass import AdobePassIE
from .uplynk import UplynkPreplayIE
from ..compat import compat_str
from ..utils import (
HEADRequest,
int_or_none,
parse_age_limit,
parse_duration,
@ -16,7 +16,7 @@ from ..utils import (
class FOXIE(AdobePassIE):
_VALID_URL = r'https?://(?:www\.)?fox\.com/watch/(?P<id>[\da-fA-F]+)'
_VALID_URL = r'https?://(?:www\.)?(?:fox\.com|nationalgeographic\.com/tv)/watch/(?P<id>[\da-fA-F]+)'
_TESTS = [{
# clip
'url': 'https://www.fox.com/watch/4b765a60490325103ea69888fb2bd4e8/',
@ -43,41 +43,47 @@ class FOXIE(AdobePassIE):
# episode, geo-restricted, tv provided required
'url': 'https://www.fox.com/watch/30056b295fb57f7452aeeb4920bc3024/',
'only_matching': True,
}, {
'url': 'https://www.nationalgeographic.com/tv/watch/f690e05ebbe23ab79747becd0cc223d1/',
'only_matching': True,
}]
# _access_token = None
# def _call_api(self, path, video_id, data=None):
# headers = {
# 'X-Api-Key': '238bb0a0c2aba67922c48709ce0c06fd',
# }
# if self._access_token:
# headers['Authorization'] = 'Bearer ' + self._access_token
# return self._download_json(
# 'https://api2.fox.com/v2.0/' + path, video_id, data=data, headers=headers)
# def _real_initialize(self):
# self._access_token = self._call_api(
# 'login', None, json.dumps({
# 'deviceId': compat_str(uuid.uuid4()),
# }).encode())['accessToken']
def _real_extract(self, url):
video_id = self._match_id(url)
video = self._download_json(
'https://api.fox.com/fbc-content/v1_4/video/%s' % video_id,
'https://api.fox.com/fbc-content/v1_5/video/%s' % video_id,
video_id, headers={
'apikey': 'abdcbed02c124d393b39e818a4312055',
'Content-Type': 'application/json',
'Referer': url,
})
# video = self._call_api('vodplayer/' + video_id, video_id)
title = video['name']
release_url = video['videoRelease']['url']
description = video.get('description')
duration = int_or_none(video.get('durationInSeconds')) or int_or_none(
video.get('duration')) or parse_duration(video.get('duration'))
timestamp = unified_timestamp(video.get('datePublished'))
rating = video.get('contentRating')
age_limit = parse_age_limit(rating)
# release_url = video['url']
data = try_get(
video, lambda x: x['trackingData']['properties'], dict) or {}
creator = data.get('brand') or data.get('network') or video.get('network')
series = video.get('seriesName') or data.get(
'seriesName') or data.get('show')
season_number = int_or_none(video.get('seasonNumber'))
episode = video.get('name')
episode_number = int_or_none(video.get('episodeNumber'))
release_year = int_or_none(video.get('releaseYear'))
rating = video.get('contentRating')
if data.get('authRequired'):
resource = self._get_mvpd_resource(
'fbc-fox', title, video.get('guid'), rating)
@ -86,6 +92,18 @@ class FOXIE(AdobePassIE):
'auth': self._extract_mvpd_auth(
url, video_id, 'fbc-fox', resource)
})
m3u8_url = self._download_json(release_url, video_id)['playURL']
formats = self._extract_m3u8_formats(
m3u8_url, video_id, 'mp4',
entry_protocol='m3u8_native', m3u8_id='hls')
self._sort_formats(formats)
duration = int_or_none(video.get('durationInSeconds')) or int_or_none(
video.get('duration')) or parse_duration(video.get('duration'))
timestamp = unified_timestamp(video.get('datePublished'))
creator = data.get('brand') or data.get('network') or video.get('network')
series = video.get('seriesName') or data.get(
'seriesName') or data.get('show')
subtitles = {}
for doc_rel in video.get('documentReleases', []):
@ -98,36 +116,19 @@ class FOXIE(AdobePassIE):
}]
break
info = {
return {
'id': video_id,
'title': title,
'description': description,
'formats': formats,
'description': video.get('description'),
'duration': duration,
'timestamp': timestamp,
'age_limit': age_limit,
'age_limit': parse_age_limit(rating),
'creator': creator,
'series': series,
'season_number': season_number,
'episode': episode,
'episode_number': episode_number,
'release_year': release_year,
'season_number': int_or_none(video.get('seasonNumber')),
'episode': video.get('name'),
'episode_number': int_or_none(video.get('episodeNumber')),
'release_year': int_or_none(video.get('releaseYear')),
'subtitles': subtitles,
}
urlh = self._request_webpage(HEADRequest(release_url), video_id)
video_url = compat_str(urlh.geturl())
if UplynkPreplayIE.suitable(video_url):
info.update({
'_type': 'url_transparent',
'url': video_url,
'ie_key': UplynkPreplayIE.ie_key(),
})
else:
m3u8_url = self._download_json(release_url, video_id)['playURL']
formats = self._extract_m3u8_formats(
m3u8_url, video_id, 'mp4',
entry_protocol='m3u8_native', m3u8_id='hls')
self._sort_formats(formats)
info['formats'] = formats
return info

View File

@ -1,6 +1,7 @@
from __future__ import unicode_literals
from .common import InfoExtractor
from .youtube import YoutubeIE
class FreespeechIE(InfoExtractor):
@ -27,8 +28,4 @@ class FreespeechIE(InfoExtractor):
r'data-video-url="([^"]+)"',
webpage, 'youtube url')
return {
'_type': 'url',
'url': youtube_url,
'ie_key': 'Youtube',
}
return self.url_result(youtube_url, YoutubeIE.ie_key())

View File

@ -1,6 +1,9 @@
# coding: utf-8
from __future__ import unicode_literals
import random
import string
from .common import InfoExtractor
from ..compat import compat_HTTPError
from ..utils import (
@ -87,7 +90,7 @@ class FunimationIE(InfoExtractor):
video_id = title_data.get('id') or self._search_regex([
r"KANE_customdimensions.videoID\s*=\s*'(\d+)';",
r'<iframe[^>]+src="/player/(\d+)"',
r'<iframe[^>]+src="/player/(\d+)',
], webpage, 'video_id', default=None)
if not video_id:
player_url = self._html_search_meta([
@ -108,8 +111,10 @@ class FunimationIE(InfoExtractor):
if self._TOKEN:
headers['Authorization'] = 'Token %s' % self._TOKEN
sources = self._download_json(
'https://prod-api-funimationnow.dadcdigital.com/api/source/catalog/video/%s/signed/' % video_id,
video_id, headers=headers)['items']
'https://www.funimation.com/api/showexperience/%s/' % video_id,
video_id, headers=headers, query={
'pinst_id': ''.join([random.choice(string.digits + string.ascii_letters) for _ in range(8)]),
})['items']
except ExtractorError as e:
if isinstance(e.cause, compat_HTTPError) and e.cause.code == 403:
error = self._parse_json(e.cause.read(), video_id)['errors'][0]

View File

@ -0,0 +1,98 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..compat import compat_str
from ..utils import (
int_or_none,
str_or_none,
strip_or_none,
try_get,
)
class GaiaIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?gaia\.com/video/(?P<id>[^/?]+).*?\bfullplayer=(?P<type>feature|preview)'
_TESTS = [{
'url': 'https://www.gaia.com/video/connecting-universal-consciousness?fullplayer=feature',
'info_dict': {
'id': '89356',
'ext': 'mp4',
'title': 'Connecting with Universal Consciousness',
'description': 'md5:844e209ad31b7d31345f5ed689e3df6f',
'upload_date': '20151116',
'timestamp': 1447707266,
'duration': 936,
},
'params': {
# m3u8 download
'skip_download': True,
},
}, {
'url': 'https://www.gaia.com/video/connecting-universal-consciousness?fullplayer=preview',
'info_dict': {
'id': '89351',
'ext': 'mp4',
'title': 'Connecting with Universal Consciousness',
'description': 'md5:844e209ad31b7d31345f5ed689e3df6f',
'upload_date': '20151116',
'timestamp': 1447707266,
'duration': 53,
},
'params': {
# m3u8 download
'skip_download': True,
},
}]
def _real_extract(self, url):
display_id, vtype = re.search(self._VALID_URL, url).groups()
node_id = self._download_json(
'https://brooklyn.gaia.com/pathinfo', display_id, query={
'path': 'video/' + display_id,
})['id']
node = self._download_json(
'https://brooklyn.gaia.com/node/%d' % node_id, node_id)
vdata = node[vtype]
media_id = compat_str(vdata['nid'])
title = node['title']
media = self._download_json(
'https://brooklyn.gaia.com/media/' + media_id, media_id)
formats = self._extract_m3u8_formats(
media['mediaUrls']['bcHLS'], media_id, 'mp4')
self._sort_formats(formats)
subtitles = {}
text_tracks = media.get('textTracks', {})
for key in ('captions', 'subtitles'):
for lang, sub_url in text_tracks.get(key, {}).items():
subtitles.setdefault(lang, []).append({
'url': sub_url,
})
fivestar = node.get('fivestar', {})
fields = node.get('fields', {})
def get_field_value(key, value_key='value'):
return try_get(fields, lambda x: x[key][0][value_key])
return {
'id': media_id,
'display_id': display_id,
'title': title,
'formats': formats,
'description': strip_or_none(get_field_value('body') or get_field_value('teaser')),
'timestamp': int_or_none(node.get('created')),
'subtitles': subtitles,
'duration': int_or_none(vdata.get('duration')),
'like_count': int_or_none(try_get(fivestar, lambda x: x['up_count']['value'])),
'dislike_count': int_or_none(try_get(fivestar, lambda x: x['down_count']['value'])),
'comment_count': int_or_none(node.get('comment_count')),
'series': try_get(node, lambda x: x['series']['title'], compat_str),
'season_number': int_or_none(get_field_value('season')),
'season_id': str_or_none(get_field_value('series_nid', 'nid')),
'episode_number': int_or_none(get_field_value('episode')),
}

View File

@ -2197,10 +2197,7 @@ class GenericIE(InfoExtractor):
def _real_extract(self, url):
if url.startswith('//'):
return {
'_type': 'url',
'url': self.http_scheme() + url,
}
return self.url_result(self.http_scheme() + url)
parsed_url = compat_urlparse.urlparse(url)
if not parsed_url.scheme:

View File

@ -72,7 +72,7 @@ class GloboIE(InfoExtractor):
return
try:
self._download_json(
glb_id = (self._download_json(
'https://login.globo.com/api/authentication', None, data=json.dumps({
'payload': {
'email': email,
@ -81,7 +81,9 @@ class GloboIE(InfoExtractor):
},
}).encode(), headers={
'Content-Type': 'application/json; charset=utf-8',
})
}) or {}).get('glbId')
if glb_id:
self._set_cookie('.globo.com', 'GLBID', glb_id)
except ExtractorError as e:
if isinstance(e.cause, compat_HTTPError) and e.cause.code == 401:
resp = self._parse_json(e.cause.read(), None)

View File

@ -0,0 +1,117 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import (
int_or_none,
urlencode_postdata,
)
class HungamaIE(InfoExtractor):
_VALID_URL = r'''(?x)
https?://
(?:www\.)?hungama\.com/
(?:
(?:video|movie)/[^/]+/|
tv-show/(?:[^/]+/){2}\d+/episode/[^/]+/
)
(?P<id>\d+)
'''
_TESTS = [{
'url': 'http://www.hungama.com/video/krishna-chants/39349649/',
'md5': 'a845a6d1ebd08d80c1035126d49bd6a0',
'info_dict': {
'id': '2931166',
'ext': 'mp4',
'title': 'Lucky Ali - Kitni Haseen Zindagi',
'track': 'Kitni Haseen Zindagi',
'artist': 'Lucky Ali',
'album': 'Aks',
'release_year': 2000,
}
}, {
'url': 'https://www.hungama.com/movie/kahaani-2/44129919/',
'only_matching': True,
}, {
'url': 'https://www.hungama.com/tv-show/padded-ki-pushup/season-1/44139461/episode/ep-02-training-sasu-pathlaag-karing/44139503/',
'only_matching': True,
}]
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
info = self._search_json_ld(webpage, video_id)
m3u8_url = self._download_json(
'https://www.hungama.com/index.php', video_id,
data=urlencode_postdata({'content_id': video_id}), headers={
'Content-Type': 'application/x-www-form-urlencoded; charset=UTF-8',
'X-Requested-With': 'XMLHttpRequest',
}, query={
'c': 'common',
'm': 'get_video_mdn_url',
})['stream_url']
formats = self._extract_m3u8_formats(
m3u8_url, video_id, ext='mp4', entry_protocol='m3u8_native',
m3u8_id='hls')
self._sort_formats(formats)
info.update({
'id': video_id,
'formats': formats,
})
return info
class HungamaSongIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?hungama\.com/song/[^/]+/(?P<id>\d+)'
_TEST = {
'url': 'https://www.hungama.com/song/kitni-haseen-zindagi/2931166/',
'md5': 'a845a6d1ebd08d80c1035126d49bd6a0',
'info_dict': {
'id': '2931166',
'ext': 'mp4',
'title': 'Lucky Ali - Kitni Haseen Zindagi',
'track': 'Kitni Haseen Zindagi',
'artist': 'Lucky Ali',
'album': 'Aks',
'release_year': 2000,
}
}
def _real_extract(self, url):
audio_id = self._match_id(url)
data = self._download_json(
'https://www.hungama.com/audio-player-data/track/%s' % audio_id,
audio_id, query={'_country': 'IN'})[0]
track = data['song_name']
artist = data.get('singer_name')
m3u8_url = self._download_json(
data.get('file') or data['preview_link'],
audio_id)['response']['media_url']
formats = self._extract_m3u8_formats(
m3u8_url, audio_id, ext='mp4', entry_protocol='m3u8_native',
m3u8_id='hls')
self._sort_formats(formats)
title = '%s - %s' % (artist, track) if artist else track
thumbnail = data.get('img_src') or data.get('album_image')
return {
'id': audio_id,
'title': title,
'thumbnail': thumbnail,
'track': track,
'artist': artist,
'album': data.get('album_name'),
'release_year': int_or_none(data.get('date')),
'formats': formats,
}

View File

@ -7,8 +7,8 @@ from .common import InfoExtractor
class JWPlatformIE(InfoExtractor):
_VALID_URL = r'(?:https?://content\.jwplatform\.com/(?:feeds|players|jw6)/|jwplatform:)(?P<id>[a-zA-Z0-9]{8})'
_TEST = {
_VALID_URL = r'(?:https?://(?:content\.jwplatform|cdn\.jwplayer)\.com/(?:(?:feed|player|thumb|preview|video|manifest)s|jw6|v2/media)/|jwplatform:)(?P<id>[a-zA-Z0-9]{8})'
_TESTS = [{
'url': 'http://content.jwplatform.com/players/nPripu9l-ALJ3XQCI.js',
'md5': 'fa8899fa601eb7c83a64e9d568bdf325',
'info_dict': {
@ -19,7 +19,10 @@ class JWPlatformIE(InfoExtractor):
'upload_date': '20081127',
'timestamp': 1227796140,
}
}
}, {
'url': 'https://cdn.jwplayer.com/players/nPripu9l-ALJ3XQCI.js',
'only_matching': True,
}]
@staticmethod
def _extract_url(webpage):
@ -34,5 +37,5 @@ class JWPlatformIE(InfoExtractor):
def _real_extract(self, url):
video_id = self._match_id(url)
json_data = self._download_json('http://content.jwplatform.com/feeds/%s.json' % video_id, video_id)
json_data = self._download_json('https://cdn.jwplayer.com/v2/media/' + video_id, video_id)
return self._parse_jwplayer_data(json_data, video_id)

View File

@ -64,8 +64,14 @@ class LecturioBaseIE(InfoExtractor):
class LecturioIE(LecturioBaseIE):
_VALID_URL = r'https://app\.lecturio\.com/[^/]+/(?P<id>[^/?#&]+)\.lecture'
_TEST = {
_VALID_URL = r'''(?x)
https://
(?:
app\.lecturio\.com/[^/]+/(?P<id>[^/?#&]+)\.lecture|
(?:www\.)?lecturio\.de/[^/]+/(?P<id_de>[^/?#&]+)\.vortrag
)
'''
_TESTS = [{
'url': 'https://app.lecturio.com/medical-courses/important-concepts-and-terms-introduction-to-microbiology.lecture#tab/videos',
'md5': 'f576a797a5b7a5e4e4bbdfc25a6a6870',
'info_dict': {
@ -74,7 +80,10 @@ class LecturioIE(LecturioBaseIE):
'title': 'Important Concepts and Terms Introduction to Microbiology',
},
'skip': 'Requires lecturio account credentials',
}
}, {
'url': 'https://www.lecturio.de/jura/oeffentliches-recht-staatsexamen.vortrag',
'only_matching': True,
}]
_CC_LANGS = {
'German': 'de',
@ -86,7 +95,8 @@ class LecturioIE(LecturioBaseIE):
}
def _real_extract(self, url):
display_id = self._match_id(url)
mobj = re.match(self._VALID_URL, url)
display_id = mobj.group('id') or mobj.group('id_de')
webpage = self._download_webpage(
'https://app.lecturio.com/en/lecture/%s/player.html' % display_id,
@ -190,3 +200,30 @@ class LecturioCourseIE(LecturioBaseIE):
'title', default=None)
return self.playlist_result(entries, display_id, title)
class LecturioDeCourseIE(LecturioBaseIE):
_VALID_URL = r'https://(?:www\.)?lecturio\.de/[^/]+/(?P<id>[^/?#&]+)\.kurs'
_TEST = {
'url': 'https://www.lecturio.de/jura/grundrechte.kurs',
'only_matching': True,
}
def _real_extract(self, url):
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
entries = []
for mobj in re.finditer(
r'(?s)<td[^>]+\bdata-lecture-id=["\'](?P<id>\d+).+?\bhref=(["\'])(?P<url>(?:(?!\2).)+\.vortrag)\b[^>]+>',
webpage):
lecture_url = urljoin(url, mobj.group('url'))
lecture_id = mobj.group('id')
entries.append(self.url_result(
lecture_url, ie=LecturioIE.ie_key(), video_id=lecture_id))
title = self._search_regex(
r'<h1[^>]*>([^<]+)', webpage, 'title', default=None)
return self.playlist_result(entries, display_id, title)

View File

@ -87,7 +87,7 @@ class LiveLeakIE(InfoExtractor):
@staticmethod
def _extract_urls(webpage):
return re.findall(
r'<iframe[^>]+src="(https?://(?:\w+\.)?liveleak\.com/ll_embed\?[^"]*[if]=[\w_]+[^"]+)"',
r'<iframe[^>]+src="(https?://(?:\w+\.)?liveleak\.com/ll_embed\?[^"]*[ift]=[\w_]+[^"]+)"',
webpage)
def _real_extract(self, url):
@ -120,13 +120,27 @@ class LiveLeakIE(InfoExtractor):
}
for idx, info_dict in enumerate(entries):
formats = []
for a_format in info_dict['formats']:
if not a_format.get('height'):
a_format['height'] = int_or_none(self._search_regex(
r'([0-9]+)p\.mp4', a_format['url'], 'height label',
default=None))
formats.append(a_format)
self._sort_formats(info_dict['formats'])
# Removing '.*.mp4' gives the raw video, which is essentially
# the same video without the LiveLeak logo at the top (see
# https://github.com/rg3/youtube-dl/pull/4768)
orig_url = re.sub(r'\.mp4\.[^.]+', '', a_format['url'])
if a_format['url'] != orig_url:
format_id = a_format.get('format_id')
formats.append({
'format_id': 'original' + ('-' + format_id if format_id else ''),
'url': orig_url,
'preference': 1,
})
self._sort_formats(formats)
info_dict['formats'] = formats
# Don't append entry ID for one-video pages to keep backward compatibility
if len(entries) > 1:
@ -146,7 +160,7 @@ class LiveLeakIE(InfoExtractor):
class LiveLeakEmbedIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?liveleak\.com/ll_embed\?.*?\b(?P<kind>[if])=(?P<id>[\w_]+)'
_VALID_URL = r'https?://(?:www\.)?liveleak\.com/ll_embed\?.*?\b(?P<kind>[ift])=(?P<id>[\w_]+)'
# See generic.py for actual test cases
_TESTS = [{
@ -158,15 +172,14 @@ class LiveLeakEmbedIE(InfoExtractor):
}]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
kind, video_id = mobj.group('kind', 'id')
kind, video_id = re.match(self._VALID_URL, url).groups()
if kind == 'f':
webpage = self._download_webpage(url, video_id)
liveleak_url = self._search_regex(
r'logourl\s*:\s*(?P<q1>[\'"])(?P<url>%s)(?P=q1)' % LiveLeakIE._VALID_URL,
r'(?:logourl\s*:\s*|window\.open\()(?P<q1>[\'"])(?P<url>%s)(?P=q1)' % LiveLeakIE._VALID_URL,
webpage, 'LiveLeak URL', group='url')
elif kind == 'i':
liveleak_url = 'http://www.liveleak.com/view?i=%s' % video_id
else:
liveleak_url = 'http://www.liveleak.com/view?%s=%s' % (kind, video_id)
return self.url_result(liveleak_url, ie=LiveLeakIE.ie_key())

View File

@ -363,7 +363,4 @@ class LivestreamShortenerIE(InfoExtractor):
id = mobj.group('id')
webpage = self._download_webpage(url, id)
return {
'_type': 'url',
'url': self._og_search_url(webpage),
}
return self.url_result(self._og_search_url(webpage))

View File

@ -2,12 +2,18 @@
from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import int_or_none
from ..utils import (
determine_ext,
int_or_none,
str_to_int,
urlencode_postdata,
)
class ManyVidsIE(InfoExtractor):
_VALID_URL = r'(?i)https?://(?:www\.)?manyvids\.com/video/(?P<id>\d+)'
_TEST = {
_TESTS = [{
# preview video
'url': 'https://www.manyvids.com/Video/133957/everthing-about-me/',
'md5': '03f11bb21c52dd12a05be21a5c7dcc97',
'info_dict': {
@ -17,7 +23,18 @@ class ManyVidsIE(InfoExtractor):
'view_count': int,
'like_count': int,
},
}
}, {
# full video
'url': 'https://www.manyvids.com/Video/935718/MY-FACE-REVEAL/',
'md5': 'f3e8f7086409e9b470e2643edb96bdcc',
'info_dict': {
'id': '935718',
'ext': 'mp4',
'title': 'MY FACE REVEAL',
'view_count': int,
'like_count': int,
},
}]
def _real_extract(self, url):
video_id = self._match_id(url)
@ -28,12 +45,41 @@ class ManyVidsIE(InfoExtractor):
r'data-(?:video-filepath|meta-video)\s*=s*(["\'])(?P<url>(?:(?!\1).)+)\1',
webpage, 'video URL', group='url')
title = '%s (Preview)' % self._html_search_regex(
r'<h2[^>]+class="m-a-0"[^>]*>([^<]+)', webpage, 'title')
title = self._html_search_regex(
(r'<span[^>]+class=["\']item-title[^>]+>([^<]+)',
r'<h2[^>]+class=["\']h2 m-0["\'][^>]*>([^<]+)'),
webpage, 'title', default=None) or self._html_search_meta(
'twitter:title', webpage, 'title', fatal=True)
if any(p in webpage for p in ('preview_videos', '_preview.mp4')):
title += ' (Preview)'
mv_token = self._search_regex(
r'data-mvtoken=(["\'])(?P<value>(?:(?!\1).)+)\1', webpage,
'mv token', default=None, group='value')
if mv_token:
# Sets some cookies
self._download_webpage(
'https://www.manyvids.com/includes/ajax_repository/you_had_me_at_hello.php',
video_id, fatal=False, data=urlencode_postdata({
'mvtoken': mv_token,
'vid': video_id,
}), headers={
'Referer': url,
'X-Requested-With': 'XMLHttpRequest'
})
if determine_ext(video_url) == 'm3u8':
formats = self._extract_m3u8_formats(
video_url, video_id, 'mp4', entry_protocol='m3u8_native',
m3u8_id='hls')
else:
formats = [{'url': video_url}]
like_count = int_or_none(self._search_regex(
r'data-likes=["\'](\d+)', webpage, 'like count', default=None))
view_count = int_or_none(self._html_search_regex(
view_count = str_to_int(self._html_search_regex(
r'(?s)<span[^>]+class="views-wrapper"[^>]*>(.+?)</span', webpage,
'view count', default=None))
@ -42,7 +88,5 @@ class ManyVidsIE(InfoExtractor):
'title': title,
'view_count': view_count,
'like_count': like_count,
'formats': [{
'url': video_url,
}],
'formats': formats,
}

View File

@ -21,7 +21,7 @@ from ..utils import (
class MediasiteIE(InfoExtractor):
_VALID_URL = r'(?xi)https?://[^/]+/Mediasite/Play/(?P<id>[0-9a-f]{32,34})(?P<query>\?[^#]+|)'
_VALID_URL = r'(?xi)https?://[^/]+/Mediasite/(?:Play|Showcase/(?:default|livebroadcast)/Presentation)/(?P<id>[0-9a-f]{32,34})(?P<query>\?[^#]+|)'
_TESTS = [
{
'url': 'https://hitsmediaweb.h-its.org/mediasite/Play/2db6c271681e4f199af3c60d1f82869b1d',
@ -84,7 +84,15 @@ class MediasiteIE(InfoExtractor):
'timestamp': 1333983600,
'duration': 7794,
}
}
},
{
'url': 'https://collegerama.tudelft.nl/Mediasite/Showcase/livebroadcast/Presentation/ada7020854f743c49fbb45c9ec7dbb351d',
'only_matching': True,
},
{
'url': 'https://mediasite.ntnu.no/Mediasite/Showcase/default/Presentation/7d8b913259334b688986e970fae6fcb31d',
'only_matching': True,
},
]
# look in Mediasite.Core.js (Mediasite.ContentStreamType[*])

View File

@ -1,15 +1,9 @@
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from .adobepass import AdobePassIE
from .theplatform import ThePlatformIE
from ..utils import (
smuggle_url,
url_basename,
update_url_query,
get_element_by_class,
)
@ -64,132 +58,3 @@ class NationalGeographicVideoIE(InfoExtractor):
{'force_smil_url': True}),
'id': guid,
}
class NationalGeographicIE(ThePlatformIE, AdobePassIE):
IE_NAME = 'natgeo'
_VALID_URL = r'https?://channel\.nationalgeographic\.com/(?:(?:(?:wild/)?[^/]+/)?(?:videos|episodes)|u)/(?P<id>[^/?]+)'
_TESTS = [
{
'url': 'http://channel.nationalgeographic.com/u/kdi9Ld0PN2molUUIMSBGxoeDhD729KRjQcnxtetilWPMevo8ZwUBIDuPR0Q3D2LVaTsk0MPRkRWDB8ZhqWVeyoxfsZZm36yRp1j-zPfsHEyI_EgAeFY/',
'md5': '518c9aa655686cf81493af5cc21e2a04',
'info_dict': {
'id': 'vKInpacll2pC',
'ext': 'mp4',
'title': 'Uncovering a Universal Knowledge',
'description': 'md5:1a89148475bf931b3661fcd6ddb2ae3a',
'timestamp': 1458680907,
'upload_date': '20160322',
'uploader': 'NEWA-FNG-NGTV',
},
'add_ie': ['ThePlatform'],
},
{
'url': 'http://channel.nationalgeographic.com/u/kdvOstqYaBY-vSBPyYgAZRUL4sWUJ5XUUPEhc7ISyBHqoIO4_dzfY3K6EjHIC0hmFXoQ7Cpzm6RkET7S3oMlm6CFnrQwSUwo/',
'md5': 'c4912f656b4cbe58f3e000c489360989',
'info_dict': {
'id': 'Pok5lWCkiEFA',
'ext': 'mp4',
'title': 'The Stunning Red Bird of Paradise',
'description': 'md5:7bc8cd1da29686be4d17ad1230f0140c',
'timestamp': 1459362152,
'upload_date': '20160330',
'uploader': 'NEWA-FNG-NGTV',
},
'add_ie': ['ThePlatform'],
},
{
'url': 'http://channel.nationalgeographic.com/the-story-of-god-with-morgan-freeman/episodes/the-power-of-miracles/',
'only_matching': True,
},
{
'url': 'http://channel.nationalgeographic.com/videos/treasures-rediscovered/',
'only_matching': True,
},
{
'url': 'http://channel.nationalgeographic.com/the-story-of-god-with-morgan-freeman/videos/uncovering-a-universal-knowledge/',
'only_matching': True,
},
{
'url': 'http://channel.nationalgeographic.com/wild/destination-wild/videos/the-stunning-red-bird-of-paradise/',
'only_matching': True,
}
]
def _real_extract(self, url):
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
release_url = self._search_regex(
r'video_auth_playlist_url\s*=\s*"([^"]+)"',
webpage, 'release url')
theplatform_path = self._search_regex(r'https?://link\.theplatform\.com/s/([^?]+)', release_url, 'theplatform path')
video_id = theplatform_path.split('/')[-1]
query = {
'mbr': 'true',
}
is_auth = self._search_regex(r'video_is_auth\s*=\s*"([^"]+)"', webpage, 'is auth', fatal=False)
if is_auth == 'auth':
auth_resource_id = self._search_regex(
r"video_auth_resourceId\s*=\s*'([^']+)'",
webpage, 'auth resource id')
query['auth'] = self._extract_mvpd_auth(url, video_id, 'natgeo', auth_resource_id)
formats = []
subtitles = {}
for key, value in (('switch', 'http'), ('manifest', 'm3u')):
tp_query = query.copy()
tp_query.update({
key: value,
})
tp_formats, tp_subtitles = self._extract_theplatform_smil(
update_url_query(release_url, tp_query), video_id, 'Downloading %s SMIL data' % value)
formats.extend(tp_formats)
subtitles = self._merge_subtitles(subtitles, tp_subtitles)
self._sort_formats(formats)
info = self._extract_theplatform_metadata(theplatform_path, display_id)
info.update({
'id': video_id,
'formats': formats,
'subtitles': subtitles,
'display_id': display_id,
})
return info
class NationalGeographicEpisodeGuideIE(InfoExtractor):
IE_NAME = 'natgeo:episodeguide'
_VALID_URL = r'https?://channel\.nationalgeographic\.com/(?:wild/)?(?P<id>[^/]+)/episode-guide'
_TESTS = [
{
'url': 'http://channel.nationalgeographic.com/the-story-of-god-with-morgan-freeman/episode-guide/',
'info_dict': {
'id': 'the-story-of-god-with-morgan-freeman-season-1',
'title': 'The Story of God with Morgan Freeman - Season 1',
},
'playlist_mincount': 6,
},
{
'url': 'http://channel.nationalgeographic.com/underworld-inc/episode-guide/?s=2',
'info_dict': {
'id': 'underworld-inc-season-2',
'title': 'Underworld, Inc. - Season 2',
},
'playlist_mincount': 7,
},
]
def _real_extract(self, url):
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
show = get_element_by_class('show', webpage)
selected_season = self._search_regex(
r'<div[^>]+class="select-seasons[^"]*".*?<a[^>]*>(.*?)</a>',
webpage, 'selected season')
entries = [
self.url_result(self._proto_relative_url(entry_url), 'NationalGeographic')
for entry_url in re.findall('(?s)<div[^>]+class="col-inner"[^>]*?>.*?<a[^>]+href="([^"]+)"', webpage)]
return self.playlist_result(
entries, '%s-%s' % (display_id, selected_season.lower().replace(' ', '-')),
'%s - %s' % (show, selected_season))

View File

@ -363,7 +363,7 @@ class NPOIE(NPOBaseIE):
class NPOLiveIE(NPOBaseIE):
IE_NAME = 'npo.nl:live'
_VALID_URL = r'https?://(?:www\.)?npo\.nl/live(?:/(?P<id>[^/?#&]+))?'
_VALID_URL = r'https?://(?:www\.)?npo(?:start)?\.nl/live(?:/(?P<id>[^/?#&]+))?'
_TESTS = [{
'url': 'http://www.npo.nl/live/npo-1',
@ -380,6 +380,9 @@ class NPOLiveIE(NPOBaseIE):
}, {
'url': 'http://www.npo.nl/live',
'only_matching': True,
}, {
'url': 'https://www.npostart.nl/live/npo-1',
'only_matching': True,
}]
def _real_extract(self, url):

View File

@ -0,0 +1,28 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
class OutsideTVIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?outsidetv\.com/(?:[^/]+/)*?play/[a-zA-Z0-9]{8}/\d+/\d+/(?P<id>[a-zA-Z0-9]{8})'
_TESTS = [{
'url': 'http://www.outsidetv.com/category/snow/play/ZjQYboH6/1/10/Hdg0jukV/4',
'md5': '192d968fedc10b2f70ec31865ffba0da',
'info_dict': {
'id': 'Hdg0jukV',
'ext': 'mp4',
'title': 'Home - Jackson Ep 1 | Arbor Snowboards',
'description': 'md5:41a12e94f3db3ca253b04bb1e8d8f4cd',
'upload_date': '20181225',
'timestamp': 1545742800,
}
}, {
'url': 'http://www.outsidetv.com/home/play/ZjQYboH6/1/10/Hdg0jukV/4',
'only_matching': True,
}]
def _real_extract(self, url):
jw_media_id = self._match_id(url)
return self.url_result(
'jwplatform:' + jw_media_id, 'JWPlatform', jw_media_id)

View File

@ -24,9 +24,9 @@ class PacktPubBaseIE(InfoExtractor):
class PacktPubIE(PacktPubBaseIE):
_VALID_URL = r'https?://(?:www\.)?packtpub\.com/mapt/video/[^/]+/(?P<course_id>\d+)/(?P<chapter_id>\d+)/(?P<id>\d+)'
_VALID_URL = r'https?://(?:(?:www\.)?packtpub\.com/mapt|subscription\.packtpub\.com)/video/[^/]+/(?P<course_id>\d+)/(?P<chapter_id>\d+)/(?P<id>\d+)'
_TEST = {
_TESTS = [{
'url': 'https://www.packtpub.com/mapt/video/web-development/9781787122215/20528/20530/Project+Intro',
'md5': '1e74bd6cfd45d7d07666f4684ef58f70',
'info_dict': {
@ -37,7 +37,10 @@ class PacktPubIE(PacktPubBaseIE):
'timestamp': 1490918400,
'upload_date': '20170331',
},
}
}, {
'url': 'https://subscription.packtpub.com/video/web_development/9781787122215/20528/20530/project-intro',
'only_matching': True,
}]
_NETRC_MACHINE = 'packtpub'
_TOKEN = None
@ -110,15 +113,18 @@ class PacktPubIE(PacktPubBaseIE):
class PacktPubCourseIE(PacktPubBaseIE):
_VALID_URL = r'(?P<url>https?://(?:www\.)?packtpub\.com/mapt/video/[^/]+/(?P<id>\d+))'
_TEST = {
_VALID_URL = r'(?P<url>https?://(?:(?:www\.)?packtpub\.com/mapt|subscription\.packtpub\.com)/video/[^/]+/(?P<id>\d+))'
_TESTS = [{
'url': 'https://www.packtpub.com/mapt/video/web-development/9781787122215',
'info_dict': {
'id': '9781787122215',
'title': 'Learn Nodejs by building 12 projects [Video]',
},
'playlist_count': 90,
}
}, {
'url': 'https://subscription.packtpub.com/video/web_development/9781787122215',
'only_matching': True,
}]
@classmethod
def suitable(cls, url):

View File

@ -0,0 +1,109 @@
# coding: utf-8
from __future__ import unicode_literals
import json
import re
from .common import InfoExtractor
from ..compat import compat_HTTPError
from ..utils import (
clean_html,
ExtractorError,
int_or_none,
PUTRequest,
)
class PlayPlusTVIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?playplus\.(?:com|tv)/VOD/(?P<project_id>[0-9]+)/(?P<id>[0-9a-f]{32})'
_TEST = {
'url': 'https://www.playplus.tv/VOD/7572/db8d274a5163424e967f35a30ddafb8e',
'md5': 'd078cb89d7ab6b9df37ce23c647aef72',
'info_dict': {
'id': 'db8d274a5163424e967f35a30ddafb8e',
'ext': 'mp4',
'title': 'Capítulo 179 - Final',
'description': 'md5:01085d62d8033a1e34121d3c3cabc838',
'timestamp': 1529992740,
'upload_date': '20180626',
},
'skip': 'Requires account credential',
}
_NETRC_MACHINE = 'playplustv'
_GEO_COUNTRIES = ['BR']
_token = None
_profile_id = None
def _call_api(self, resource, video_id=None, query=None):
return self._download_json('https://api.playplus.tv/api/media/v2/get' + resource, video_id, headers={
'Authorization': 'Bearer ' + self._token,
}, query=query)
def _real_initialize(self):
email, password = self._get_login_info()
if email is None:
self.raise_login_required()
req = PUTRequest(
'https://api.playplus.tv/api/web/login', json.dumps({
'email': email,
'password': password,
}).encode(), {
'Content-Type': 'application/json; charset=utf-8',
})
try:
self._token = self._download_json(req, None)['token']
except ExtractorError as e:
if isinstance(e.cause, compat_HTTPError) and e.cause.code == 401:
raise ExtractorError(self._parse_json(
e.cause.read(), None)['errorMessage'], expected=True)
raise
self._profile = self._call_api('Profiles')['list'][0]['_id']
def _real_extract(self, url):
project_id, media_id = re.match(self._VALID_URL, url).groups()
media = self._call_api(
'Media', media_id, {
'profileId': self._profile,
'projectId': project_id,
'mediaId': media_id,
})['obj']
title = media['title']
formats = []
for f in media.get('files', []):
f_url = f.get('url')
if not f_url:
continue
file_info = f.get('fileInfo') or {}
formats.append({
'url': f_url,
'width': int_or_none(file_info.get('width')),
'height': int_or_none(file_info.get('height')),
})
self._sort_formats(formats)
thumbnails = []
for thumb in media.get('thumbs', []):
thumb_url = thumb.get('url')
if not thumb_url:
continue
thumbnails.append({
'url': thumb_url,
'width': int_or_none(thumb.get('width')),
'height': int_or_none(thumb.get('height')),
})
return {
'id': media_id,
'title': title,
'formats': formats,
'thumbnails': thumbnails,
'description': clean_html(media.get('description')) or media.get('shortDescription'),
'timestamp': int_or_none(media.get('publishDate'), 1000),
'view_count': int_or_none(media.get('numberOfViews')),
'comment_count': int_or_none(media.get('numberOfComments')),
'tags': media.get('tags'),
}

View File

@ -1,38 +1,46 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from .brightcove import BrightcoveLegacyIE
from ..compat import (
compat_parse_qs,
compat_urlparse,
)
from ..utils import smuggle_url
class RMCDecouverteIE(InfoExtractor):
_VALID_URL = r'https?://rmcdecouverte\.bfmtv\.com/mediaplayer-replay.*?\bid=(?P<id>\d+)'
_VALID_URL = r'https?://rmcdecouverte\.bfmtv\.com/(?:(?:[^/]+/)*program_(?P<id>\d+)|(?P<live_id>mediaplayer-direct))'
_TEST = {
'url': 'http://rmcdecouverte.bfmtv.com/mediaplayer-replay/?id=13502&title=AQUAMEN:LES%20ROIS%20DES%20AQUARIUMS%20:UN%20DELICIEUX%20PROJET',
_TESTS = [{
'url': 'https://rmcdecouverte.bfmtv.com/wheeler-dealers-occasions-a-saisir/program_2566/',
'info_dict': {
'id': '5419055995001',
'id': '5983675500001',
'ext': 'mp4',
'title': 'UN DELICIEUX PROJET',
'description': 'md5:63610df7c8b1fc1698acd4d0d90ba8b5',
'title': 'CORVETTE',
'description': 'md5:c1e8295521e45ffebf635d6a7658f506',
'uploader_id': '1969646226001',
'upload_date': '20170502',
'timestamp': 1493745308,
'upload_date': '20181226',
'timestamp': 1545861635,
},
'params': {
'skip_download': True,
},
'skip': 'only available for a week',
}
}, {
# live, geo restricted, bypassable
'url': 'https://rmcdecouverte.bfmtv.com/mediaplayer-direct/',
'only_matching': True,
}]
BRIGHTCOVE_URL_TEMPLATE = 'http://players.brightcove.net/1969646226001/default_default/index.html?videoId=%s'
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
mobj = re.match(self._VALID_URL, url)
display_id = mobj.group('id') or mobj.group('live_id')
webpage = self._download_webpage(url, display_id)
brightcove_legacy_url = BrightcoveLegacyIE._extract_brightcove_url(webpage)
if brightcove_legacy_url:
brightcove_id = compat_parse_qs(compat_urlparse.urlparse(
@ -41,5 +49,7 @@ class RMCDecouverteIE(InfoExtractor):
brightcove_id = self._search_regex(
r'data-video-id=["\'](\d+)', webpage, 'brightcove id')
return self.url_result(
self.BRIGHTCOVE_URL_TEMPLATE % brightcove_id, 'BrightcoveNew',
brightcove_id)
smuggle_url(
self.BRIGHTCOVE_URL_TEMPLATE % brightcove_id,
{'geo_countries': ['FR']}),
'BrightcoveNew', brightcove_id)

View File

@ -30,8 +30,5 @@ class SaveFromIE(InfoExtractor):
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = os.path.splitext(url.split('/')[-1])[0]
return {
'_type': 'url',
'id': video_id,
'url': mobj.group('url'),
}
return self.url_result(mobj.group('url'), video_id=video_id)

View File

@ -19,7 +19,7 @@ class ScrippsNetworksWatchIE(AWSIE):
_VALID_URL = r'''(?x)
https?://
watch\.
(?P<site>hgtv|foodnetwork|travelchannel|diynetwork|cookingchanneltv|geniuskitchen)\.com/
(?P<site>geniuskitchen)\.com/
(?:
player\.[A-Z0-9]+\.html\#|
show/(?:[^/]+/){2}|
@ -28,38 +28,23 @@ class ScrippsNetworksWatchIE(AWSIE):
(?P<id>\d+)
'''
_TESTS = [{
'url': 'http://watch.hgtv.com/show/HGTVE/Best-Ever-Treehouses/2241515/Best-Ever-Treehouses/',
'md5': '26545fd676d939954c6808274bdb905a',
'url': 'http://watch.geniuskitchen.com/player/3787617/Ample-Hills-Ice-Cream-Bike/',
'info_dict': {
'id': '4173834',
'id': '4194875',
'ext': 'mp4',
'title': 'Best Ever Treehouses',
'description': "We're searching for the most over the top treehouses.",
'title': 'Ample Hills Ice Cream Bike',
'description': 'Courtney Rada churns up a signature GK Now ice cream with The Scoopmaster.',
'uploader': 'ANV',
'upload_date': '20170922',
'timestamp': 1506056400,
'upload_date': '20171011',
'timestamp': 1507698000,
},
'params': {
'skip_download': True,
},
'add_ie': [AnvatoIE.ie_key()],
}, {
'url': 'http://watch.diynetwork.com/show/DSAL/Salvage-Dawgs/2656646/Covington-Church/',
'only_matching': True,
}, {
'url': 'http://watch.diynetwork.com/player.HNT.html#2656646',
'only_matching': True,
}, {
'url': 'http://watch.geniuskitchen.com/player/3787617/Ample-Hills-Ice-Cream-Bike/',
'only_matching': True,
}]
_SNI_TABLE = {
'hgtv': 'hgtv',
'diynetwork': 'diy',
'foodnetwork': 'food',
'cookingchanneltv': 'cook',
'travelchannel': 'trav',
'geniuskitchen': 'genius',
}

View File

@ -26,7 +26,7 @@ class SkylineWebcamsIE(InfoExtractor):
webpage = self._download_webpage(url, video_id)
stream_url = self._search_regex(
r'url\s*:\s*(["\'])(?P<url>(?:https?:)?//.+?\.m3u8.*?)\1', webpage,
r'(?:url|source)\s*:\s*(["\'])(?P<url>(?:https?:)?//.+?\.m3u8.*?)\1', webpage,
'stream url', group='url')
title = self._og_search_title(webpage)

View File

@ -203,10 +203,8 @@ class TEDIE(InfoExtractor):
ext_url = None
if service.lower() == 'youtube':
ext_url = external.get('code')
return {
'_type': 'url',
'url': ext_url or external['uri'],
}
return self.url_result(ext_url or external['uri'])
resources_ = player_talk.get('resources') or talk_info.get('resources')

View File

@ -61,8 +61,4 @@ class TestURLIE(InfoExtractor):
self.to_screen('Test URL: %s' % tc['url'])
return {
'_type': 'url',
'url': tc['url'],
'id': video_id,
}
return self.url_result(tc['url'], video_id=video_id)

View File

@ -10,8 +10,9 @@ from ..utils import (
int_or_none,
parse_iso8601,
parse_duration,
try_get,
str_or_none,
update_url_query,
urljoin,
)
@ -24,8 +25,7 @@ class TVNowBaseIE(InfoExtractor):
def _call_api(self, path, video_id, query):
return self._download_json(
'https://api.tvnow.de/v3/' + path,
video_id, query=query)
'https://api.tvnow.de/v3/' + path, video_id, query=query)
def _extract_video(self, info, display_id):
video_id = compat_str(info['id'])
@ -108,6 +108,11 @@ class TVNowIE(TVNowBaseIE):
(?!(?:list|jahr)(?:/|$))(?P<id>[^/?\#&]+)
'''
@classmethod
def suitable(cls, url):
return (False if TVNowNewIE.suitable(url) or TVNowSeasonIE.suitable(url) or TVNowAnnualIE.suitable(url) or TVNowShowIE.suitable(url)
else super(TVNowIE, cls).suitable(url))
_TESTS = [{
'url': 'https://www.tvnow.de/rtl2/grip-das-motormagazin/der-neue-porsche-911-gt-3/player',
'info_dict': {
@ -116,7 +121,6 @@ class TVNowIE(TVNowBaseIE):
'ext': 'mp4',
'title': 'Der neue Porsche 911 GT 3',
'description': 'md5:6143220c661f9b0aae73b245e5d898bb',
'thumbnail': r're:^https?://.*\.jpg$',
'timestamp': 1495994400,
'upload_date': '20170528',
'duration': 5283,
@ -161,136 +165,314 @@ class TVNowIE(TVNowBaseIE):
info = self._call_api(
'movies/' + display_id, display_id, query={
'fields': ','.join(self._VIDEO_FIELDS),
'station': mobj.group(1),
})
return self._extract_video(info, display_id)
class TVNowListBaseIE(TVNowBaseIE):
_SHOW_VALID_URL = r'''(?x)
(?P<base_url>
https?://
(?:www\.)?tvnow\.(?:de|at|ch)/[^/]+/
(?P<show_id>[^/]+)
)
class TVNowNewIE(InfoExtractor):
_VALID_URL = r'''(?x)
(?P<base_url>https?://
(?:www\.)?tvnow\.(?:de|at|ch)/
(?:shows|serien))/
(?P<show>[^/]+)-\d+/
[^/]+/
episode-\d+-(?P<episode>[^/?$&]+)-(?P<id>\d+)
'''
def _extract_list_info(self, display_id, show_id):
fields = list(self._SHOW_FIELDS)
fields.extend('formatTabs.%s' % field for field in self._SEASON_FIELDS)
fields.extend(
'formatTabs.formatTabPages.container.movies.%s' % field
for field in self._VIDEO_FIELDS)
return self._call_api(
'formats/seo', display_id, query={
'fields': ','.join(fields),
'name': show_id + '.php'
})
class TVNowListIE(TVNowListBaseIE):
_VALID_URL = r'%s/(?:list|jahr)/(?P<id>[^?\#&]+)' % TVNowListBaseIE._SHOW_VALID_URL
_SHOW_FIELDS = ('title', )
_SEASON_FIELDS = ('id', 'headline', 'seoheadline', )
_VIDEO_FIELDS = ('id', 'headline', 'seoUrl', )
_TESTS = [{
'url': 'https://www.tvnow.de/rtl/30-minuten-deutschland/list/aktuell',
'info_dict': {
'id': '28296',
'title': '30 Minuten Deutschland - Aktuell',
},
'playlist_mincount': 1,
}, {
'url': 'https://www.tvnow.de/vox/ab-ins-beet/list/staffel-14',
'only_matching': True,
}, {
'url': 'https://www.tvnow.de/rtl2/grip-das-motormagazin/jahr/2018/3',
'url': 'https://www.tvnow.de/shows/grip-das-motormagazin-1669/2017-05/episode-405-der-neue-porsche-911-gt-3-331082',
'only_matching': True,
}]
@classmethod
def suitable(cls, url):
return (False if TVNowIE.suitable(url)
else super(TVNowListIE, cls).suitable(url))
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
base_url = re.sub(r'(?:shows|serien)', '_', mobj.group('base_url'))
show, episode = mobj.group('show', 'episode')
return self.url_result(
# Rewrite new URLs to the old format and use extraction via old API
# at api.tvnow.de as a loophole for bypassing premium content checks
'%s/%s/%s' % (base_url, show, episode),
ie=TVNowIE.ie_key(), video_id=mobj.group('id'))
class TVNowNewBaseIE(InfoExtractor):
def _call_api(self, path, video_id, query={}):
result = self._download_json(
'https://apigw.tvnow.de/module/' + path, video_id, query=query)
error = result.get('error')
if error:
raise ExtractorError(
'%s said: %s' % (self.IE_NAME, error), expected=True)
return result
"""
TODO: new apigw.tvnow.de based version of TVNowIE. Replace old TVNowIE with it
when api.tvnow.de is shut down. This version can't bypass premium checks though.
class TVNowIE(TVNowNewBaseIE):
_VALID_URL = r'''(?x)
https?://
(?:www\.)?tvnow\.(?:de|at|ch)/
(?:shows|serien)/[^/]+/
(?:[^/]+/)+
(?P<display_id>[^/?$&]+)-(?P<id>\d+)
'''
_TESTS = [{
# episode with annual navigation
'url': 'https://www.tvnow.de/shows/grip-das-motormagazin-1669/2017-05/episode-405-der-neue-porsche-911-gt-3-331082',
'info_dict': {
'id': '331082',
'display_id': 'grip-das-motormagazin/der-neue-porsche-911-gt-3',
'ext': 'mp4',
'title': 'Der neue Porsche 911 GT 3',
'description': 'md5:6143220c661f9b0aae73b245e5d898bb',
'thumbnail': r're:^https?://.*\.jpg$',
'timestamp': 1495994400,
'upload_date': '20170528',
'duration': 5283,
'series': 'GRIP - Das Motormagazin',
'season_number': 14,
'episode_number': 405,
'episode': 'Der neue Porsche 911 GT 3',
},
}, {
# rtl2, episode with season navigation
'url': 'https://www.tvnow.de/shows/armes-deutschland-11471/staffel-3/episode-14-bernd-steht-seit-der-trennung-von-seiner-frau-allein-da-526124',
'only_matching': True,
}, {
# rtlnitro
'url': 'https://www.tvnow.de/serien/alarm-fuer-cobra-11-die-autobahnpolizei-1815/staffel-13/episode-5-auf-eigene-faust-pilot-366822',
'only_matching': True,
}, {
# superrtl
'url': 'https://www.tvnow.de/shows/die-lustigsten-schlamassel-der-welt-1221/staffel-2/episode-14-u-a-ketchup-effekt-364120',
'only_matching': True,
}, {
# ntv
'url': 'https://www.tvnow.de/shows/startup-news-10674/staffel-2/episode-39-goetter-in-weiss-387630',
'only_matching': True,
}, {
# vox
'url': 'https://www.tvnow.de/shows/auto-mobil-174/2017-11/episode-46-neues-vom-automobilmarkt-2017-11-19-17-00-00-380072',
'only_matching': True,
}, {
'url': 'https://www.tvnow.de/shows/grip-das-motormagazin-1669/2017-05/episode-405-der-neue-porsche-911-gt-3-331082',
'only_matching': True,
}]
def _extract_video(self, info, url, display_id):
config = info['config']
source = config['source']
video_id = compat_str(info.get('id') or source['videoId'])
title = source['title'].strip()
paths = []
for manifest_url in (info.get('manifest') or {}).values():
if not manifest_url:
continue
manifest_url = update_url_query(manifest_url, {'filter': ''})
path = self._search_regex(r'https?://[^/]+/(.+?)\.ism/', manifest_url, 'path')
if path in paths:
continue
paths.append(path)
def url_repl(proto, suffix):
return re.sub(
r'(?:hls|dash|hss)([.-])', proto + r'\1', re.sub(
r'\.ism/(?:[^.]*\.(?:m3u8|mpd)|[Mm]anifest)',
'.ism/' + suffix, manifest_url))
formats = self._extract_mpd_formats(
url_repl('dash', '.mpd'), video_id,
mpd_id='dash', fatal=False)
formats.extend(self._extract_ism_formats(
url_repl('hss', 'Manifest'),
video_id, ism_id='mss', fatal=False))
formats.extend(self._extract_m3u8_formats(
url_repl('hls', '.m3u8'), video_id, 'mp4',
'm3u8_native', m3u8_id='hls', fatal=False))
if formats:
break
else:
if try_get(info, lambda x: x['rights']['isDrm']):
raise ExtractorError(
'Video %s is DRM protected' % video_id, expected=True)
if try_get(config, lambda x: x['boards']['geoBlocking']['block']):
raise self.raise_geo_restricted()
if not info.get('free', True):
raise ExtractorError(
'Video %s is not available for free' % video_id, expected=True)
self._sort_formats(formats)
description = source.get('description')
thumbnail = url_or_none(source.get('poster'))
timestamp = unified_timestamp(source.get('previewStart'))
duration = parse_duration(source.get('length'))
series = source.get('format')
season_number = int_or_none(self._search_regex(
r'staffel-(\d+)', url, 'season number', default=None))
episode_number = int_or_none(self._search_regex(
r'episode-(\d+)', url, 'episode number', default=None))
return {
'id': video_id,
'display_id': display_id,
'title': title,
'description': description,
'thumbnail': thumbnail,
'timestamp': timestamp,
'duration': duration,
'series': series,
'season_number': season_number,
'episode_number': episode_number,
'episode': title,
'formats': formats,
}
def _real_extract(self, url):
base_url, show_id, season_id = re.match(self._VALID_URL, url).groups()
display_id, video_id = re.match(self._VALID_URL, url).groups()
info = self._call_api('player/' + video_id, video_id)
return self._extract_video(info, video_id, display_id)
"""
list_info = self._extract_list_info(season_id, show_id)
season = next(
season for season in list_info['formatTabs']['items']
if season.get('seoheadline') == season_id)
class TVNowListBaseIE(TVNowNewBaseIE):
_SHOW_VALID_URL = r'''(?x)
(?P<base_url>
https?://
(?:www\.)?tvnow\.(?:de|at|ch)/(?:shows|serien)/
[^/?#&]+-(?P<show_id>\d+)
)
'''
title = list_info.get('title')
headline = season.get('headline')
if title and headline:
title = '%s - %s' % (title, headline)
else:
title = headline or title
@classmethod
def suitable(cls, url):
return (False if TVNowNewIE.suitable(url)
else super(TVNowListBaseIE, cls).suitable(url))
def _extract_items(self, url, show_id, list_id, query):
items = self._call_api(
'teaserrow/format/episode/' + show_id, list_id,
query=query)['items']
entries = []
for container in season['formatTabPages']['items']:
items = try_get(
container, lambda x: x['container']['movies']['items'],
list) or []
for info in items:
seo_url = info.get('seoUrl')
if not seo_url:
continue
video_id = info.get('id')
entries.append(self.url_result(
'%s/%s/player' % (base_url, seo_url), TVNowIE.ie_key(),
compat_str(video_id) if video_id else None))
for item in items:
if not isinstance(item, dict):
continue
item_url = urljoin(url, item.get('url'))
if not item_url:
continue
video_id = str_or_none(item.get('id') or item.get('videoId'))
item_title = item.get('subheadline') or item.get('text')
entries.append(self.url_result(
item_url, ie=TVNowNewIE.ie_key(), video_id=video_id,
video_title=item_title))
return self.playlist_result(
entries, compat_str(season.get('id') or season_id), title)
return self.playlist_result(entries, '%s/%s' % (show_id, list_id))
class TVNowSeasonIE(TVNowListBaseIE):
_VALID_URL = r'%s/staffel-(?P<id>\d+)' % TVNowListBaseIE._SHOW_VALID_URL
_TESTS = [{
'url': 'https://www.tvnow.de/serien/alarm-fuer-cobra-11-die-autobahnpolizei-1815/staffel-13',
'info_dict': {
'id': '1815/13',
},
'playlist_mincount': 22,
}]
def _real_extract(self, url):
_, show_id, season_id = re.match(self._VALID_URL, url).groups()
return self._extract_items(
url, show_id, season_id, {'season': season_id})
class TVNowAnnualIE(TVNowListBaseIE):
_VALID_URL = r'%s/(?P<year>\d{4})-(?P<month>\d{2})' % TVNowListBaseIE._SHOW_VALID_URL
_TESTS = [{
'url': 'https://www.tvnow.de/shows/grip-das-motormagazin-1669/2017-05',
'info_dict': {
'id': '1669/2017-05',
},
'playlist_mincount': 2,
}]
def _real_extract(self, url):
_, show_id, year, month = re.match(self._VALID_URL, url).groups()
return self._extract_items(
url, show_id, '%s-%s' % (year, month), {
'year': int(year),
'month': int(month),
})
class TVNowShowIE(TVNowListBaseIE):
_VALID_URL = TVNowListBaseIE._SHOW_VALID_URL
_SHOW_FIELDS = ('id', 'title', )
_SEASON_FIELDS = ('id', 'headline', 'seoheadline', )
_VIDEO_FIELDS = ()
_TESTS = [{
'url': 'https://www.tvnow.at/vox/ab-ins-beet',
# annual navigationType
'url': 'https://www.tvnow.de/shows/grip-das-motormagazin-1669',
'info_dict': {
'id': 'ab-ins-beet',
'title': 'Ab ins Beet!',
'id': '1669',
},
'playlist_mincount': 7,
'playlist_mincount': 73,
}, {
'url': 'https://www.tvnow.at/vox/ab-ins-beet/list',
'only_matching': True,
}, {
'url': 'https://www.tvnow.de/rtl2/grip-das-motormagazin/jahr/',
'only_matching': True,
# season navigationType
'url': 'https://www.tvnow.de/shows/armes-deutschland-11471',
'info_dict': {
'id': '11471',
},
'playlist_mincount': 3,
}]
@classmethod
def suitable(cls, url):
return (False if TVNowIE.suitable(url) or TVNowListIE.suitable(url)
return (False if TVNowNewIE.suitable(url) or TVNowSeasonIE.suitable(url) or TVNowAnnualIE.suitable(url)
else super(TVNowShowIE, cls).suitable(url))
def _real_extract(self, url):
base_url, show_id = re.match(self._VALID_URL, url).groups()
list_info = self._extract_list_info(show_id, show_id)
result = self._call_api(
'teaserrow/format/navigation/' + show_id, show_id)
items = result['items']
entries = []
for season_info in list_info['formatTabs']['items']:
season_url = season_info.get('seoheadline')
if not season_url:
continue
season_id = season_info.get('id')
entries.append(self.url_result(
'%s/list/%s' % (base_url, season_url), TVNowListIE.ie_key(),
compat_str(season_id) if season_id else None,
season_info.get('headline')))
navigation = result.get('navigationType')
if navigation == 'annual':
for item in items:
if not isinstance(item, dict):
continue
year = int_or_none(item.get('year'))
if year is None:
continue
months = item.get('months')
if not isinstance(months, list):
continue
for month_dict in months:
if not isinstance(month_dict, dict) or not month_dict:
continue
month_number = int_or_none(list(month_dict.keys())[0])
if month_number is None:
continue
entries.append(self.url_result(
'%s/%04d-%02d' % (base_url, year, month_number),
ie=TVNowAnnualIE.ie_key()))
elif navigation == 'season':
for item in items:
if not isinstance(item, dict):
continue
season_number = int_or_none(item.get('season'))
if season_number is None:
continue
entries.append(self.url_result(
'%s/staffel-%d' % (base_url, season_number),
ie=TVNowSeasonIE.ie_key()))
else:
raise ExtractorError('Unknown navigationType')
return self.playlist_result(entries, show_id, list_info.get('title'))
return self.playlist_result(entries, show_id)

View File

@ -171,7 +171,8 @@ class TwitterCardIE(TwitterBaseIE):
urls.append('https://twitter.com/i/videos/' + video_id)
for u in urls:
webpage = self._download_webpage(u, video_id)
webpage = self._download_webpage(
u, video_id, headers={'Referer': 'https://twitter.com/'})
iframe_url = self._html_search_regex(
r'<iframe[^>]+src="((?:https?:)?//(?:www\.youtube\.com/embed/[^"]+|(?:www\.)?vine\.co/v/\w+/card))"',

View File

@ -1,6 +1,7 @@
# coding: utf-8
from __future__ import unicode_literals
import base64
import json
import re
import itertools
@ -392,6 +393,22 @@ class VimeoIE(VimeoBaseInfoExtractor):
'skip_download': True,
},
},
{
'url': 'http://player.vimeo.com/video/68375962',
'md5': 'aaf896bdb7ddd6476df50007a0ac0ae7',
'info_dict': {
'id': '68375962',
'ext': 'mp4',
'title': 'youtube-dl password protected test video',
'uploader_url': r're:https?://(?:www\.)?vimeo\.com/user18948128',
'uploader_id': 'user18948128',
'uploader': 'Jaime Marquínez Ferrándiz',
'duration': 10,
},
'params': {
'videopassword': 'youtube-dl',
},
},
{
'url': 'http://vimeo.com/moogaloop.swf?clip_id=2539741',
'only_matching': True,
@ -452,7 +469,9 @@ class VimeoIE(VimeoBaseInfoExtractor):
password = self._downloader.params.get('videopassword')
if password is None:
raise ExtractorError('This video is protected by a password, use the --video-password option')
data = urlencode_postdata({'password': password})
data = urlencode_postdata({
'password': base64.b64encode(password.encode()),
})
pass_url = url + '/check-password'
password_request = sanitized_Request(pass_url, data)
password_request.add_header('Content-Type', 'application/x-www-form-urlencoded')

View File

@ -40,11 +40,7 @@ class WimpIE(InfoExtractor):
r'data-id=["\']([0-9A-Za-z_-]{11})'),
webpage, 'video URL', default=None)
if youtube_id:
return {
'_type': 'url',
'url': youtube_id,
'ie_key': YoutubeIE.ie_key(),
}
return self.url_result(youtube_id, YoutubeIE.ie_key())
info_dict = self._extract_jwplayer_data(
webpage, video_id, require_title=False)

View File

@ -12,7 +12,7 @@ from ..utils import (
class WistiaIE(InfoExtractor):
_VALID_URL = r'(?:wistia:|https?://(?:fast\.)?wistia\.(?:net|com)/embed/iframe/)(?P<id>[a-z0-9]+)'
_VALID_URL = r'(?:wistia:|https?://(?:fast\.)?wistia\.(?:net|com)/embed/(?:iframe|medias)/)(?P<id>[a-z0-9]+)'
_API_URL = 'http://fast.wistia.com/embed/medias/%s.json'
_IFRAME_URL = 'http://fast.wistia.net/embed/iframe/%s'
@ -38,6 +38,9 @@ class WistiaIE(InfoExtractor):
}, {
'url': 'http://fast.wistia.com/embed/iframe/sh7fpupwlt',
'only_matching': True,
}, {
'url': 'http://fast.wistia.net/embed/medias/sh7fpupwlt.json',
'only_matching': True,
}]
@staticmethod

View File

@ -68,11 +68,9 @@ class YouPornIE(InfoExtractor):
request.add_header('Cookie', 'age_verified=1')
webpage = self._download_webpage(request, display_id)
title = self._search_regex(
[r'(?:video_titles|videoTitle)\s*[:=]\s*(["\'])(?P<title>(?:(?!\1).)+)\1',
r'<h1[^>]+class=["\']heading\d?["\'][^>]*>(?P<title>[^<]+)<'],
webpage, 'title', group='title',
default=None) or self._og_search_title(
title = self._html_search_regex(
r'(?s)<div[^>]+class=["\']watchVideoTitle[^>]+>(.+?)</div>',
webpage, 'title', default=None) or self._og_search_title(
webpage, default=None) or self._html_search_meta(
'title', webpage, fatal=True)
@ -134,7 +132,11 @@ class YouPornIE(InfoExtractor):
formats.append(f)
self._sort_formats(formats)
description = self._og_search_description(webpage, default=None)
description = self._html_search_regex(
r'(?s)<div[^>]+\bid=["\']description["\'][^>]*>(.+?)</div>',
webpage, 'description',
default=None) or self._og_search_description(
webpage, default=None)
thumbnail = self._search_regex(
r'(?:imageurl\s*=|poster\s*:)\s*(["\'])(?P<thumbnail>.+?)\1',
webpage, 'thumbnail', fatal=False, group='thumbnail')

View File

@ -14,6 +14,7 @@ class YourPornIE(InfoExtractor):
'ext': 'mp4',
'title': 'md5:c9f43630bd968267672651ba905a7d35',
'thumbnail': r're:^https?://.*\.jpg$',
'age_limit': 18
},
}
@ -26,7 +27,7 @@ class YourPornIE(InfoExtractor):
self._search_regex(
r'data-vnfo=(["\'])(?P<data>{.+?})\1', webpage, 'data info',
group='data'),
video_id)[video_id]).replace('/cdn/', '/cdn2/')
video_id)[video_id]).replace('/cdn/', '/cdn3/')
title = (self._search_regex(
r'<[^>]+\bclass=["\']PostEditTA[^>]+>([^<]+)', webpage, 'title',
@ -38,4 +39,5 @@ class YourPornIE(InfoExtractor):
'url': video_url,
'title': title,
'thumbnail': thumbnail,
'age_limit': 18
}

View File

@ -498,7 +498,6 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
'channel_id': 'UCLqxVugv74EIW3VWh2NOa3Q',
'channel_url': r're:https?://(?:www\.)?youtube\.com/channel/UCLqxVugv74EIW3VWh2NOa3Q',
'upload_date': '20121002',
'license': 'Standard YouTube License',
'description': 'test chars: "\'/\\ä↭𝕐\ntest URL: https://github.com/rg3/youtube-dl/issues/1892\n\nThis is a test video for youtube-dl.\n\nFor more information, contact phihag@phihag.de .',
'categories': ['Science & Technology'],
'tags': ['youtube-dl'],
@ -527,7 +526,6 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
'uploader': 'Icona Pop',
'uploader_id': 'IconaPop',
'uploader_url': r're:https?://(?:www\.)?youtube\.com/user/IconaPop',
'license': 'Standard YouTube License',
'creator': 'Icona Pop',
'track': 'I Love It (feat. Charli XCX)',
'artist': 'Icona Pop',
@ -540,14 +538,13 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
'id': '07FYdnEawAQ',
'ext': 'mp4',
'upload_date': '20130703',
'title': 'Justin Timberlake - Tunnel Vision (Explicit)',
'title': 'Justin Timberlake - Tunnel Vision (Official Music Video) (Explicit)',
'alt_title': 'Tunnel Vision',
'description': 'md5:64249768eec3bc4276236606ea996373',
'description': 'md5:07dab3356cde4199048e4c7cd93471e1',
'duration': 419,
'uploader': 'justintimberlakeVEVO',
'uploader_id': 'justintimberlakeVEVO',
'uploader_url': r're:https?://(?:www\.)?youtube\.com/user/justintimberlakeVEVO',
'license': 'Standard YouTube License',
'creator': 'Justin Timberlake',
'track': 'Tunnel Vision',
'artist': 'Justin Timberlake',
@ -566,7 +563,6 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
'uploader': 'SET India',
'uploader_id': 'setindia',
'uploader_url': r're:https?://(?:www\.)?youtube\.com/user/setindia',
'license': 'Standard YouTube License',
'age_limit': 18,
}
},
@ -581,7 +577,6 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
'uploader_id': 'phihag',
'uploader_url': r're:https?://(?:www\.)?youtube\.com/user/phihag',
'upload_date': '20121002',
'license': 'Standard YouTube License',
'description': 'test chars: "\'/\\ä↭𝕐\ntest URL: https://github.com/rg3/youtube-dl/issues/1892\n\nThis is a test video for youtube-dl.\n\nFor more information, contact phihag@phihag.de .',
'categories': ['Science & Technology'],
'tags': ['youtube-dl'],
@ -605,7 +600,6 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
'uploader_url': r're:https?://(?:www\.)?youtube\.com/user/8KVIDEO',
'description': '',
'uploader': '8KVIDEO',
'license': 'Standard YouTube License',
'title': 'UHDTV TEST 8K VIDEO.mp4'
},
'params': {
@ -620,13 +614,12 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
'info_dict': {
'id': 'IB3lcPjvWLA',
'ext': 'm4a',
'title': 'Afrojack, Spree Wilson - The Spark ft. Spree Wilson',
'description': 'md5:1900ed86ee514927b9e00fbead6969a5',
'title': 'Afrojack, Spree Wilson - The Spark (Official Music Video) ft. Spree Wilson',
'description': 'md5:8f5e2b82460520b619ccac1f509d43bf',
'duration': 244,
'uploader': 'AfrojackVEVO',
'uploader_id': 'AfrojackVEVO',
'upload_date': '20131011',
'license': 'Standard YouTube License',
},
'params': {
'youtube_include_dash_manifest': True,
@ -640,13 +633,11 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
'id': 'nfWlot6h_JM',
'ext': 'm4a',
'title': 'Taylor Swift - Shake It Off',
'alt_title': 'Shake It Off',
'description': 'md5:95f66187cd7c8b2c13eb78e1223b63c3',
'description': 'md5:bec2185232c05479482cb5a9b82719bf',
'duration': 242,
'uploader': 'TaylorSwiftVEVO',
'uploader_id': 'TaylorSwiftVEVO',
'upload_date': '20140818',
'license': 'Standard YouTube License',
'creator': 'Taylor Swift',
},
'params': {
@ -662,10 +653,9 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
'ext': 'mp4',
'duration': 219,
'upload_date': '20100909',
'uploader': 'TJ Kirk',
'uploader': 'Amazing Atheist',
'uploader_id': 'TheAmazingAtheist',
'uploader_url': r're:https?://(?:www\.)?youtube\.com/user/TheAmazingAtheist',
'license': 'Standard YouTube License',
'title': 'Burning Everyone\'s Koran',
'description': 'SUBSCRIBE: http://www.youtube.com/saturninefilms\n\nEven Obama has taken a stand against freedom on this issue: http://www.huffingtonpost.com/2010/09/09/obama-gma-interview-quran_n_710282.html',
}
@ -683,7 +673,6 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
'uploader_id': 'WitcherGame',
'uploader_url': r're:https?://(?:www\.)?youtube\.com/user/WitcherGame',
'upload_date': '20140605',
'license': 'Standard YouTube License',
'age_limit': 18,
},
},
@ -692,7 +681,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
'url': 'https://www.youtube.com/watch?v=6kLq3WMV1nU',
'info_dict': {
'id': '6kLq3WMV1nU',
'ext': 'webm',
'ext': 'mp4',
'title': 'Dedication To My Ex (Miss That) (Lyric Video)',
'description': 'md5:33765bb339e1b47e7e72b5490139bb41',
'duration': 246,
@ -700,7 +689,6 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
'uploader_id': 'LloydVEVO',
'uploader_url': r're:https?://(?:www\.)?youtube\.com/user/LloydVEVO',
'upload_date': '20110629',
'license': 'Standard YouTube License',
'age_limit': 18,
},
},
@ -718,7 +706,6 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
'creator': 'deadmau5',
'description': 'md5:12c56784b8032162bb936a5f76d55360',
'uploader': 'deadmau5',
'license': 'Standard YouTube License',
'title': 'Deadmau5 - Some Chords (HD)',
'alt_title': 'Some Chords',
},
@ -736,7 +723,6 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
'upload_date': '20150827',
'uploader_id': 'olympic',
'uploader_url': r're:https?://(?:www\.)?youtube\.com/user/olympic',
'license': 'Standard YouTube License',
'description': 'HO09 - Women - GER-AUS - Hockey - 31 July 2012 - London 2012 Olympic Games',
'uploader': 'Olympic',
'title': 'Hockey - Women - GER-AUS - London 2012 Olympic Games',
@ -758,7 +744,6 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
'uploader_url': r're:https?://(?:www\.)?youtube\.com/user/AllenMeow',
'description': 'made by Wacom from Korea | 字幕&加油添醋 by TY\'s Allen | 感謝heylisa00cavey1001同學熱情提供梗及翻譯',
'uploader': '孫ᄋᄅ',
'license': 'Standard YouTube License',
'title': '[A-made] 變態妍字幕版 太妍 我就是這樣的人',
},
},
@ -792,7 +777,6 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
'uploader_id': 'dorappi2000',
'uploader_url': r're:https?://(?:www\.)?youtube\.com/user/dorappi2000',
'uploader': 'dorappi2000',
'license': 'Standard YouTube License',
'formats': 'mincount:31',
},
'skip': 'not actual anymore',
@ -808,7 +792,6 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
'uploader': 'Airtek',
'description': 'Retransmisión en directo de la XVIII media maratón de Zaragoza.',
'uploader_id': 'UCzTzUmjXxxacNnL8I3m4LnQ',
'license': 'Standard YouTube License',
'title': 'Retransmisión XVIII Media maratón Zaragoza 2015',
},
'params': {
@ -881,6 +864,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
'params': {
'skip_download': True,
},
'skip': 'This video is not available.',
},
{
# Multifeed video with comma in title (see https://github.com/rg3/youtube-dl/issues/8536)
@ -917,7 +901,6 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
'uploader_id': 'IronSoulElf',
'uploader_url': r're:https?://(?:www\.)?youtube\.com/user/IronSoulElf',
'uploader': 'IronSoulElf',
'license': 'Standard YouTube License',
'creator': 'Todd Haberman, Daniel Law Heath and Aaron Kaplan',
'track': 'Dark Walk - Position Music',
'artist': 'Todd Haberman, Daniel Law Heath and Aaron Kaplan',
@ -1021,13 +1004,12 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
'id': 'iqKdEhx-dD4',
'ext': 'mp4',
'title': 'Isolation - Mind Field (Ep 1)',
'description': 'md5:25b78d2f64ae81719f5c96319889b736',
'description': 'md5:46a29be4ceffa65b92d277b93f463c0f',
'duration': 2085,
'upload_date': '20170118',
'uploader': 'Vsauce',
'uploader_id': 'Vsauce',
'uploader_url': r're:https?://(?:www\.)?youtube\.com/user/Vsauce',
'license': 'Standard YouTube License',
'series': 'Mind Field',
'season_number': 1,
'episode_number': 1,
@ -1053,7 +1035,6 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
'uploader': 'New Century Foundation',
'uploader_id': 'UCEJYpZGqgUob0zVVEaLhvVg',
'uploader_url': r're:https?://(?:www\.)?youtube\.com/channel/UCEJYpZGqgUob0zVVEaLhvVg',
'license': 'Standard YouTube License',
},
'params': {
'skip_download': True,
@ -1077,6 +1058,31 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
'url': 'https://invidio.us/watch?v=BaW_jenozKc',
'only_matching': True,
},
{
# DRM protected
'url': 'https://www.youtube.com/watch?v=s7_qI6_mIXc',
'only_matching': True,
},
{
# Video with unsupported adaptive stream type formats
'url': 'https://www.youtube.com/watch?v=Z4Vy8R84T1U',
'info_dict': {
'id': 'Z4Vy8R84T1U',
'ext': 'mp4',
'title': 'saman SMAN 53 Jakarta(Sancety) opening COFFEE4th at SMAN 53 Jakarta',
'description': 'md5:d41d8cd98f00b204e9800998ecf8427e',
'duration': 433,
'upload_date': '20130923',
'uploader': 'Amelia Putri Harwita',
'uploader_id': 'UCpOxM49HJxmC1qCalXyB3_Q',
'uploader_url': r're:https?://(?:www\.)?youtube\.com/channel/UCpOxM49HJxmC1qCalXyB3_Q',
'formats': 'maxcount:10',
},
'params': {
'skip_download': True,
'youtube_include_dash_manifest': False,
},
}
]
def __init__(self, *args, **kwargs):
@ -1105,7 +1111,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
def _extract_signature_function(self, video_id, player_url, example_sig):
id_m = re.match(
r'.*?-(?P<id>[a-zA-Z0-9_-]+)(?:/watch_as3|/html5player(?:-new)?|(?:/[a-z]{2}_[A-Z]{2})?/base)?\.(?P<ext>[a-z]+)$',
r'.*?-(?P<id>[a-zA-Z0-9_-]+)(?:/watch_as3|/html5player(?:-new)?|(?:/[a-z]{2,3}_[A-Z]{2})?/base)?\.(?P<ext>[a-z]+)$',
player_url)
if not id_m:
raise ExtractorError('Cannot identify player %r' % player_url)
@ -1192,8 +1198,8 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
funcname = self._search_regex(
(r'(["\'])signature\1\s*,\s*(?P<sig>[a-zA-Z0-9$]+)\(',
r'\.sig\|\|(?P<sig>[a-zA-Z0-9$]+)\(',
r'yt\.akamaized\.net/\)\s*\|\|\s*.*?\s*c\s*&&\s*d\.set\([^,]+\s*,\s*(?P<sig>[a-zA-Z0-9$]+)\(',
r'\bc\s*&&\s*d\.set\([^,]+\s*,\s*(?P<sig>[a-zA-Z0-9$]+)\(',
r'yt\.akamaized\.net/\)\s*\|\|\s*.*?\s*c\s*&&\s*d\.set\([^,]+\s*,\s*(?:encodeURIComponent\s*\()?(?P<sig>[a-zA-Z0-9$]+)\(',
r'\bc\s*&&\s*d\.set\([^,]+\s*,\s*(?:encodeURIComponent\s*\()?\s*(?P<sig>[a-zA-Z0-9$]+)\(',
r'\bc\s*&&\s*d\.set\([^,]+\s*,\s*\([^)]*\)\s*\(\s*(?P<sig>[a-zA-Z0-9$]+)\('),
jscode, 'Initial JS player signature function name', group='sig')
@ -1540,6 +1546,13 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
if dash_mpd and dash_mpd[0] not in dash_mpds:
dash_mpds.append(dash_mpd[0])
def add_dash_mpd_pr(pl_response):
dash_mpd = url_or_none(try_get(
pl_response, lambda x: x['streamingData']['dashManifestUrl'],
compat_str))
if dash_mpd and dash_mpd not in dash_mpds:
dash_mpds.append(dash_mpd)
is_live = None
view_count = None
@ -1597,6 +1610,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
if isinstance(pl_response, dict):
player_response = pl_response
if not video_info or self._downloader.params.get('youtube_include_dash_manifest', True):
add_dash_mpd_pr(player_response)
# We also try looking in get_video_info since it may contain different dashmpd
# URL that points to a DASH manifest with possibly different itag set (some itags
# are missing from DASH manifest pointed by webpage's dashmpd, some - from DASH
@ -1628,6 +1642,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
pl_response = get_video_info.get('player_response', [None])[0]
if isinstance(pl_response, dict):
player_response = pl_response
add_dash_mpd_pr(player_response)
add_dash_mpd(get_video_info)
if view_count is None:
view_count = extract_view_count(get_video_info)
@ -1673,6 +1688,9 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
'"token" parameter not in video info for unknown reason',
video_id=video_id)
if video_info.get('license_info'):
raise ExtractorError('This video is DRM protected.', expected=True)
video_details = try_get(
player_response, lambda x: x['videoDetails'], dict) or {}
@ -1786,11 +1804,34 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
'height': int_or_none(width_height[1]),
}
q = qualities(['small', 'medium', 'hd720'])
streaming_formats = try_get(player_response, lambda x: x['streamingData']['formats'], list)
if streaming_formats:
for fmt in streaming_formats:
itag = str_or_none(fmt.get('itag'))
if not itag:
continue
quality = fmt.get('quality')
quality_label = fmt.get('qualityLabel') or quality
formats_spec[itag] = {
'asr': int_or_none(fmt.get('audioSampleRate')),
'filesize': int_or_none(fmt.get('contentLength')),
'format_note': quality_label,
'fps': int_or_none(fmt.get('fps')),
'height': int_or_none(fmt.get('height')),
'quality': q(quality),
# bitrate for itag 43 is always 2147483647
'tbr': float_or_none(fmt.get('averageBitrate') or fmt.get('bitrate'), 1000) if itag != '43' else None,
'width': int_or_none(fmt.get('width')),
}
formats = []
for url_data_str in encoded_url_map.split(','):
url_data = compat_parse_qs(url_data_str)
if 'itag' not in url_data or 'url' not in url_data:
continue
stream_type = int_or_none(try_get(url_data, lambda x: x['stream_type'][0]))
# Unsupported FORMAT_STREAM_TYPE_OTF
if stream_type == 3:
continue
format_id = url_data['itag'][0]
url = url_data['url'][0]
@ -1834,7 +1875,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
else:
player_version = self._search_regex(
[r'html5player-([^/]+?)(?:/html5player(?:-new)?)?\.js',
r'(?:www|player)-([^/]+)(?:/[a-z]{2}_[A-Z]{2})?/base\.js'],
r'(?:www|player(?:_ias)?)-([^/]+)(?:/[a-z]{2,3}_[A-Z]{2})?/base\.js'],
player_url,
'html5 player', fatal=False)
player_desc = 'html5 player %s' % player_version
@ -1868,7 +1909,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
filesize = int_or_none(url_data.get(
'clen', [None])[0]) or _extract_filesize(url)
quality = url_data.get('quality_label', [None])[0] or url_data.get('quality', [None])[0]
quality = url_data.get('quality', [None])[0]
more_fields = {
'filesize': filesize,
@ -1876,7 +1917,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
'width': width,
'height': height,
'fps': int_or_none(url_data.get('fps', [None])[0]),
'format_note': quality,
'format_note': url_data.get('quality_label', [None])[0] or quality,
'quality': q(quality),
}
for key, value in more_fields.items():
@ -1904,31 +1945,38 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
'http_chunk_size': 10485760,
}
formats.append(dct)
elif video_info.get('hlsvp'):
manifest_url = video_info['hlsvp'][0]
formats = []
m3u8_formats = self._extract_m3u8_formats(
manifest_url, video_id, 'mp4', fatal=False)
for a_format in m3u8_formats:
itag = self._search_regex(
r'/itag/(\d+)/', a_format['url'], 'itag', default=None)
if itag:
a_format['format_id'] = itag
if itag in self._formats:
dct = self._formats[itag].copy()
dct.update(a_format)
a_format = dct
a_format['player_url'] = player_url
# Accept-Encoding header causes failures in live streams on Youtube and Youtube Gaming
a_format.setdefault('http_headers', {})['Youtubedl-no-compression'] = 'True'
formats.append(a_format)
else:
error_message = clean_html(video_info.get('reason', [None])[0])
if not error_message:
error_message = extract_unavailable_message()
if error_message:
raise ExtractorError(error_message, expected=True)
raise ExtractorError('no conn, hlsvp or url_encoded_fmt_stream_map information found in video info')
manifest_url = (
url_or_none(try_get(
player_response,
lambda x: x['streamingData']['hlsManifestUrl'],
compat_str)) or
url_or_none(try_get(
video_info, lambda x: x['hlsvp'][0], compat_str)))
if manifest_url:
formats = []
m3u8_formats = self._extract_m3u8_formats(
manifest_url, video_id, 'mp4', fatal=False)
for a_format in m3u8_formats:
itag = self._search_regex(
r'/itag/(\d+)/', a_format['url'], 'itag', default=None)
if itag:
a_format['format_id'] = itag
if itag in self._formats:
dct = self._formats[itag].copy()
dct.update(a_format)
a_format = dct
a_format['player_url'] = player_url
# Accept-Encoding header causes failures in live streams on Youtube and Youtube Gaming
a_format.setdefault('http_headers', {})['Youtubedl-no-compression'] = 'True'
formats.append(a_format)
else:
error_message = clean_html(video_info.get('reason', [None])[0])
if not error_message:
error_message = extract_unavailable_message()
if error_message:
raise ExtractorError(error_message, expected=True)
raise ExtractorError('no conn, hlsvp, hlsManifestUrl or url_encoded_fmt_stream_map information found in video info')
# uploader
video_uploader = try_get(
@ -2016,7 +2064,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
r'<div[^>]+id="watch7-headline"[^>]*>\s*<span[^>]*>.*?>(?P<series>[^<]+)</a></b>\s*S(?P<season>\d+)\s*•\s*E(?P<episode>\d+)</span>',
video_webpage)
if m_episode:
series = m_episode.group('series')
series = unescapeHTML(m_episode.group('series'))
season_number = int(m_episode.group('season'))
episode_number = int(m_episode.group('episode'))
else:

View File

@ -79,6 +79,20 @@ class FFmpegPostProcessor(PostProcessor):
programs = ['avprobe', 'avconv', 'ffmpeg', 'ffprobe']
prefer_ffmpeg = True
def get_ffmpeg_version(path):
ver = get_exe_version(path, args=['-version'])
if ver:
regexs = [
r'(?:\d+:)?([0-9.]+)-[0-9]+ubuntu[0-9.]+$', # Ubuntu, see [1]
r'n([0-9.]+)$', # Arch Linux
# 1. http://www.ducea.com/2006/06/17/ubuntu-package-version-naming-explanation/
]
for regex in regexs:
mobj = re.match(regex, ver)
if mobj:
ver = mobj.group(1)
return ver
self.basename = None
self.probe_basename = None
@ -110,11 +124,10 @@ class FFmpegPostProcessor(PostProcessor):
self._paths = dict(
(p, os.path.join(location, p)) for p in programs)
self._versions = dict(
(p, get_exe_version(self._paths[p], args=['-version']))
for p in programs)
(p, get_ffmpeg_version(self._paths[p])) for p in programs)
if self._versions is None:
self._versions = dict(
(p, get_exe_version(p, args=['-version'])) for p in programs)
(p, get_ffmpeg_version(p)) for p in programs)
self._paths = dict((p, p) for p in programs)
if prefer_ffmpeg is False:
@ -384,9 +397,8 @@ class FFmpegEmbedSubtitlePP(FFmpegPostProcessor):
opts += ['-c:s', 'mov_text']
for (i, lang) in enumerate(sub_langs):
opts.extend(['-map', '%d:0' % (i + 1)])
lang_code = ISO639Utils.short2long(lang)
if lang_code is not None:
opts.extend(['-metadata:s:s:%d' % i, 'language=%s' % lang_code])
lang_code = ISO639Utils.short2long(lang) or lang
opts.extend(['-metadata:s:s:%d' % i, 'language=%s' % lang_code])
temp_filename = prepend_extension(filename, 'temp')
self._downloader.to_screen('[ffmpeg] Embedding subtitles in \'%s\'' % filename)

View File

@ -2968,6 +2968,7 @@ class ISO639Utils(object):
'gv': 'glv',
'ha': 'hau',
'he': 'heb',
'iw': 'heb', # Replaced by he in 1989 revision
'hi': 'hin',
'ho': 'hmo',
'hr': 'hrv',
@ -2977,6 +2978,7 @@ class ISO639Utils(object):
'hz': 'her',
'ia': 'ina',
'id': 'ind',
'in': 'ind', # Replaced by id in 1989 revision
'ie': 'ile',
'ig': 'ibo',
'ii': 'iii',
@ -3091,6 +3093,7 @@ class ISO639Utils(object):
'wo': 'wol',
'xh': 'xho',
'yi': 'yid',
'ji': 'yid', # Replaced by yi in 1989 revision
'yo': 'yor',
'za': 'zha',
'zh': 'zho',

View File

@ -1,3 +1,3 @@
from __future__ import unicode_literals
__version__ = '2018.12.17'
__version__ = '2019.01.17'