mirror of
https://github.com/l1ving/youtube-dl
synced 2025-03-11 05:47:15 +08:00
Merge branch 'fix-facebook-date' into fix.25.12.2018
This commit is contained in:
commit
17c9c5b80f
6
.github/ISSUE_TEMPLATE.md
vendored
6
.github/ISSUE_TEMPLATE.md
vendored
@ -6,8 +6,8 @@
|
||||
|
||||
---
|
||||
|
||||
### Make sure you are using the *latest* version: run `youtube-dl --version` and ensure your version is *2018.12.17*. If it's not, read [this FAQ entry](https://github.com/rg3/youtube-dl/blob/master/README.md#how-do-i-update-youtube-dl) and update. Issues with outdated version will be rejected.
|
||||
- [ ] I've **verified** and **I assure** that I'm running youtube-dl **2018.12.17**
|
||||
### Make sure you are using the *latest* version: run `youtube-dl --version` and ensure your version is *2019.01.10*. If it's not, read [this FAQ entry](https://github.com/rg3/youtube-dl/blob/master/README.md#how-do-i-update-youtube-dl) and update. Issues with outdated version will be rejected.
|
||||
- [ ] I've **verified** and **I assure** that I'm running youtube-dl **2019.01.10**
|
||||
|
||||
### Before submitting an *issue* make sure you have:
|
||||
- [ ] At least skimmed through the [README](https://github.com/rg3/youtube-dl/blob/master/README.md), **most notably** the [FAQ](https://github.com/rg3/youtube-dl#faq) and [BUGS](https://github.com/rg3/youtube-dl#bugs) sections
|
||||
@ -36,7 +36,7 @@ Add the `-v` flag to **your command line** you run youtube-dl with (`youtube-dl
|
||||
[debug] User config: []
|
||||
[debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj']
|
||||
[debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251
|
||||
[debug] youtube-dl version 2018.12.17
|
||||
[debug] youtube-dl version 2019.01.10
|
||||
[debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2
|
||||
[debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4
|
||||
[debug] Proxy map: {}
|
||||
|
@ -153,15 +153,19 @@ After you have ensured this site is distributing its content legally, you can fo
|
||||
5. Add an import in [`youtube_dl/extractor/extractors.py`](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/extractors.py).
|
||||
6. Run `python test/test_download.py TestDownload.test_YourExtractor`. This *should fail* at first, but you can continually re-run it until you're done. If you decide to add more than one test, then rename ``_TEST`` to ``_TESTS`` and make it into a list of dictionaries. The tests will then be named `TestDownload.test_YourExtractor`, `TestDownload.test_YourExtractor_1`, `TestDownload.test_YourExtractor_2`, etc. Note that tests with `only_matching` key in test's dict are not counted in.
|
||||
7. Have a look at [`youtube_dl/extractor/common.py`](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/common.py) for possible helper methods and a [detailed description of what your extractor should and may return](https://github.com/rg3/youtube-dl/blob/7f41a598b3fba1bcab2817de64a08941200aa3c8/youtube_dl/extractor/common.py#L94-L303). Add tests and code for as many as you want.
|
||||
8. Make sure your code follows [youtube-dl coding conventions](#youtube-dl-coding-conventions) and check the code with [flake8](https://pypi.python.org/pypi/flake8). Also make sure your code works under all [Python](https://www.python.org/) versions claimed supported by youtube-dl, namely 2.6, 2.7, and 3.2+.
|
||||
9. When the tests pass, [add](https://git-scm.com/docs/git-add) the new files and [commit](https://git-scm.com/docs/git-commit) them and [push](https://git-scm.com/docs/git-push) the result, like this:
|
||||
8. Make sure your code follows [youtube-dl coding conventions](#youtube-dl-coding-conventions) and check the code with [flake8](http://flake8.pycqa.org/en/latest/index.html#quickstart):
|
||||
|
||||
$ flake8 youtube_dl/extractor/yourextractor.py
|
||||
|
||||
9. Make sure your code works under all [Python](https://www.python.org/) versions claimed supported by youtube-dl, namely 2.6, 2.7, and 3.2+.
|
||||
10. When the tests pass, [add](https://git-scm.com/docs/git-add) the new files and [commit](https://git-scm.com/docs/git-commit) them and [push](https://git-scm.com/docs/git-push) the result, like this:
|
||||
|
||||
$ git add youtube_dl/extractor/extractors.py
|
||||
$ git add youtube_dl/extractor/yourextractor.py
|
||||
$ git commit -m '[yourextractor] Add new extractor'
|
||||
$ git push origin yourextractor
|
||||
|
||||
10. Finally, [create a pull request](https://help.github.com/articles/creating-a-pull-request). We'll then review and merge it.
|
||||
11. Finally, [create a pull request](https://help.github.com/articles/creating-a-pull-request). We'll then review and merge it.
|
||||
|
||||
In any case, thank you very much for your contributions!
|
||||
|
||||
@ -257,11 +261,33 @@ title = meta.get('title') or self._og_search_title(webpage)
|
||||
|
||||
This code will try to extract from `meta` first and if it fails it will try extracting `og:title` from a `webpage`.
|
||||
|
||||
### Make regular expressions flexible
|
||||
### Regular expressions
|
||||
|
||||
When using regular expressions try to write them fuzzy and flexible.
|
||||
#### Don't capture groups you don't use
|
||||
|
||||
Capturing group must be an indication that it's used somewhere in the code. Any group that is not used must be non capturing.
|
||||
|
||||
##### Example
|
||||
|
||||
Don't capture id attribute name here since you can't use it for anything anyway.
|
||||
|
||||
Correct:
|
||||
|
||||
```python
|
||||
r'(?:id|ID)=(?P<id>\d+)'
|
||||
```
|
||||
|
||||
Incorrect:
|
||||
```python
|
||||
r'(id|ID)=(?P<id>\d+)'
|
||||
```
|
||||
|
||||
|
||||
#### Make regular expressions relaxed and flexible
|
||||
|
||||
When using regular expressions try to write them fuzzy, relaxed and flexible, skipping insignificant parts that are more likely to change, allowing both single and double quotes for quoted values and so on.
|
||||
|
||||
#### Example
|
||||
##### Example
|
||||
|
||||
Say you need to extract `title` from the following HTML code:
|
||||
|
||||
@ -294,6 +320,25 @@ title = self._search_regex(
|
||||
webpage, 'title', group='title')
|
||||
```
|
||||
|
||||
### Long lines policy
|
||||
|
||||
There is a soft limit to keep lines of code under 80 characters long. This means it should be respected if possible and if it does not make readability and code maintenance worse.
|
||||
|
||||
For example, you should **never** split long string literals like URLs or some other often copied entities over multiple lines to fit this limit:
|
||||
|
||||
Correct:
|
||||
|
||||
```python
|
||||
'https://www.youtube.com/watch?v=FqZTN594JQw&list=PLMYEtVRpaqY00V9W81Cwmzp6N6vZqfUKD4'
|
||||
```
|
||||
|
||||
Incorrect:
|
||||
|
||||
```python
|
||||
'https://www.youtube.com/watch?v=FqZTN594JQw&list='
|
||||
'PLMYEtVRpaqY00V9W81Cwmzp6N6vZqfUKD4'
|
||||
```
|
||||
|
||||
### Use safe conversion functions
|
||||
|
||||
Wrap all extracted numeric data into safe functions from [`youtube_dl/utils.py`](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/utils.py): `int_or_none`, `float_or_none`. Use them for string to number conversions as well.
|
||||
|
61
ChangeLog
61
ChangeLog
@ -1,3 +1,64 @@
|
||||
version 2019.01.10
|
||||
|
||||
Core
|
||||
* [extractor/common] Use episode name as title in _json_ld
|
||||
+ [extractor/common] Add support for movies in _json_ld
|
||||
* [postprocessor/ffmpeg] Embed subtitles with non-standard language codes
|
||||
(#18765)
|
||||
+ [utils] Add language codes replaced in 1989 revision of ISO 639
|
||||
to ISO639Utils (#18765)
|
||||
|
||||
Extractors
|
||||
* [youtube] Extract live HLS URL from player response (#18799)
|
||||
+ [outsidetv] Add support for outsidetv.com (#18774)
|
||||
* [jwplatform] Use JW Platform Delivery API V2 and add support for more URLs
|
||||
+ [fox] Add support National Geographic (#17985, #15333, #14698)
|
||||
+ [playplustv] Add support for playplus.tv (#18789)
|
||||
* [globo] Set GLBID cookie manually (#17346)
|
||||
+ [gaia] Add support for gaia.com (#14605)
|
||||
* [youporn] Fix title and description extraction (#18748)
|
||||
+ [hungama] Add support for hungama.com (#17402, #18771)
|
||||
* [dtube] Fix extraction (#18741)
|
||||
* [tvnow] Fix and rework extractors and prepare for a switch to the new API
|
||||
(#17245, #18499)
|
||||
* [carambatv:page] Fix extraction (#18739)
|
||||
|
||||
|
||||
version 2019.01.02
|
||||
|
||||
Extractors
|
||||
* [discovery] Use geo verification headers (#17838)
|
||||
+ [packtpub] Add support for subscription.packtpub.com (#18718)
|
||||
* [yourporn] Fix extraction (#18583)
|
||||
+ [acast:channel] Add support for play.acast.com (#18587)
|
||||
+ [extractors] Add missing age limits (#18621)
|
||||
+ [rmcdecouverte] Add support for live stream
|
||||
* [rmcdecouverte] Bypass geo restriction
|
||||
* [rmcdecouverte] Update URL regular expression (#18595, 18697)
|
||||
* [manyvids] Fix extraction (#18604, #18614)
|
||||
* [bitchute] Fix extraction (#18567)
|
||||
|
||||
|
||||
version 2018.12.31
|
||||
|
||||
Extractors
|
||||
+ [bbc] Add support for another embed pattern (#18643)
|
||||
+ [npo:live] Add support for npostart.nl (#18644)
|
||||
* [beeg] Fix extraction (#18610, #18626)
|
||||
* [youtube] Unescape HTML for series (#18641)
|
||||
+ [youtube] Extract more format metadata
|
||||
* [youtube] Detect DRM protected videos (#1774)
|
||||
* [youtube] Relax HTML5 player regular expressions (#18465, #18466)
|
||||
* [youtube] Extend HTML5 player regular expression (#17516)
|
||||
+ [liveleak] Add support for another embed type and restore original
|
||||
format extraction
|
||||
+ [crackle] Extract ISM and HTTP formats
|
||||
+ [twitter] Pass Referer with card request (#18579)
|
||||
* [mediasite] Extend URL regular expression (#18558)
|
||||
+ [lecturio] Add support for lecturio.de (#18562)
|
||||
+ [discovery] Add support for Scripps Networks watch domains (#17947)
|
||||
|
||||
|
||||
version 2018.12.17
|
||||
|
||||
Extractors
|
||||
|
49
README.md
49
README.md
@ -496,7 +496,7 @@ The `-o` option allows users to indicate a template for the output file names.
|
||||
|
||||
**tl;dr:** [navigate me to examples](#output-template-examples).
|
||||
|
||||
The basic usage is not to set any template arguments when downloading a single file, like in `youtube-dl -o funny_video.flv "https://some/video"`. However, it may contain special sequences that will be replaced when downloading each video. The special sequences may be formatted according to [python string formatting operations](https://docs.python.org/2/library/stdtypes.html#string-formatting). For example, `%(NAME)s` or `%(NAME)05d`. To clarify, that is a percent symbol followed by a name in parentheses, followed by a formatting operations. Allowed names along with sequence type are:
|
||||
The basic usage is not to set any template arguments when downloading a single file, like in `youtube-dl -o funny_video.flv "https://some/video"`. However, it may contain special sequences that will be replaced when downloading each video. The special sequences may be formatted according to [python string formatting operations](https://docs.python.org/2/library/stdtypes.html#string-formatting). For example, `%(NAME)s` or `%(NAME)05d`. To clarify, that is a percent symbol followed by a name in parentheses, followed by formatting operations. Allowed names along with sequence type are:
|
||||
|
||||
- `id` (string): Video identifier
|
||||
- `title` (string): Video title
|
||||
@ -1133,11 +1133,33 @@ title = meta.get('title') or self._og_search_title(webpage)
|
||||
|
||||
This code will try to extract from `meta` first and if it fails it will try extracting `og:title` from a `webpage`.
|
||||
|
||||
### Make regular expressions flexible
|
||||
### Regular expressions
|
||||
|
||||
When using regular expressions try to write them fuzzy and flexible.
|
||||
#### Don't capture groups you don't use
|
||||
|
||||
Capturing group must be an indication that it's used somewhere in the code. Any group that is not used must be non capturing.
|
||||
|
||||
##### Example
|
||||
|
||||
Don't capture id attribute name here since you can't use it for anything anyway.
|
||||
|
||||
Correct:
|
||||
|
||||
```python
|
||||
r'(?:id|ID)=(?P<id>\d+)'
|
||||
```
|
||||
|
||||
Incorrect:
|
||||
```python
|
||||
r'(id|ID)=(?P<id>\d+)'
|
||||
```
|
||||
|
||||
|
||||
#### Make regular expressions relaxed and flexible
|
||||
|
||||
When using regular expressions try to write them fuzzy, relaxed and flexible, skipping insignificant parts that are more likely to change, allowing both single and double quotes for quoted values and so on.
|
||||
|
||||
#### Example
|
||||
##### Example
|
||||
|
||||
Say you need to extract `title` from the following HTML code:
|
||||
|
||||
@ -1170,6 +1192,25 @@ title = self._search_regex(
|
||||
webpage, 'title', group='title')
|
||||
```
|
||||
|
||||
### Long lines policy
|
||||
|
||||
There is a soft limit to keep lines of code under 80 characters long. This means it should be respected if possible and if it does not make readability and code maintenance worse.
|
||||
|
||||
For example, you should **never** split long string literals like URLs or some other often copied entities over multiple lines to fit this limit:
|
||||
|
||||
Correct:
|
||||
|
||||
```python
|
||||
'https://www.youtube.com/watch?v=FqZTN594JQw&list=PLMYEtVRpaqY00V9W81Cwmzp6N6vZqfUKD4'
|
||||
```
|
||||
|
||||
Incorrect:
|
||||
|
||||
```python
|
||||
'https://www.youtube.com/watch?v=FqZTN594JQw&list='
|
||||
'PLMYEtVRpaqY00V9W81Cwmzp6N6vZqfUKD4'
|
||||
```
|
||||
|
||||
### Use safe conversion functions
|
||||
|
||||
Wrap all extracted numeric data into safe functions from [`youtube_dl/utils.py`](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/utils.py): `int_or_none`, `float_or_none`. Use them for string to number conversions as well.
|
||||
|
@ -320,6 +320,7 @@
|
||||
- **Fusion**
|
||||
- **Fux**
|
||||
- **FXNetworks**
|
||||
- **Gaia**
|
||||
- **GameInformer**
|
||||
- **GameOne**
|
||||
- **gameone:playlist**
|
||||
@ -370,6 +371,8 @@
|
||||
- **HRTiPlaylist**
|
||||
- **Huajiao**: 花椒直播
|
||||
- **HuffPost**: Huffington Post
|
||||
- **Hungama**
|
||||
- **HungamaSong**
|
||||
- **Hypem**
|
||||
- **Iconosquare**
|
||||
- **ign.com**
|
||||
@ -438,6 +441,7 @@
|
||||
- **Lecture2Go**
|
||||
- **Lecturio**
|
||||
- **LecturioCourse**
|
||||
- **LecturioDeCourse**
|
||||
- **LEGO**
|
||||
- **Lemonde**
|
||||
- **Lenta**
|
||||
@ -539,8 +543,6 @@
|
||||
- **MyviEmbed**
|
||||
- **MyVisionTV**
|
||||
- **n-tv.de**
|
||||
- **natgeo**
|
||||
- **natgeo:episodeguide**
|
||||
- **natgeo:video**
|
||||
- **Naver**
|
||||
- **NBA**
|
||||
@ -641,6 +643,7 @@
|
||||
- **orf:oe1**: Radio Österreich 1
|
||||
- **orf:tvthek**: ORF TVthek
|
||||
- **OsnatelTV**
|
||||
- **OutsideTV**
|
||||
- **PacktPub**
|
||||
- **PacktPubCourse**
|
||||
- **PandaTV**: 熊猫TV
|
||||
@ -665,6 +668,7 @@
|
||||
- **Pinkbike**
|
||||
- **Pladform**
|
||||
- **play.fm**
|
||||
- **PlayPlusTV**
|
||||
- **PlaysTV**
|
||||
- **Playtvak**: Playtvak.cz, iDNES.cz and Lidovky.cz
|
||||
- **Playvid**
|
||||
@ -933,7 +937,9 @@
|
||||
- **TVNet**
|
||||
- **TVNoe**
|
||||
- **TVNow**
|
||||
- **TVNowList**
|
||||
- **TVNowAnnual**
|
||||
- **TVNowNew**
|
||||
- **TVNowSeason**
|
||||
- **TVNowShow**
|
||||
- **tvp**: Telewizja Polska
|
||||
- **tvp:embed**: Telewizja Polska
|
||||
|
@ -75,10 +75,14 @@ class HlsFD(FragmentFD):
|
||||
fd.add_progress_hook(ph)
|
||||
return fd.real_download(filename, info_dict)
|
||||
|
||||
def is_ad_fragment(s):
|
||||
def is_ad_fragment_start(s):
|
||||
return (s.startswith('#ANVATO-SEGMENT-INFO') and 'type=ad' in s or
|
||||
s.startswith('#UPLYNK-SEGMENT') and s.endswith(',ad'))
|
||||
|
||||
def is_ad_fragment_end(s):
|
||||
return (s.startswith('#ANVATO-SEGMENT-INFO') and 'type=master' in s or
|
||||
s.startswith('#UPLYNK-SEGMENT') and s.endswith(',segment'))
|
||||
|
||||
media_frags = 0
|
||||
ad_frags = 0
|
||||
ad_frag_next = False
|
||||
@ -87,12 +91,13 @@ class HlsFD(FragmentFD):
|
||||
if not line:
|
||||
continue
|
||||
if line.startswith('#'):
|
||||
if is_ad_fragment(line):
|
||||
ad_frags += 1
|
||||
if is_ad_fragment_start(line):
|
||||
ad_frag_next = True
|
||||
elif is_ad_fragment_end(line):
|
||||
ad_frag_next = False
|
||||
continue
|
||||
if ad_frag_next:
|
||||
ad_frag_next = False
|
||||
ad_frags += 1
|
||||
continue
|
||||
media_frags += 1
|
||||
|
||||
@ -123,7 +128,6 @@ class HlsFD(FragmentFD):
|
||||
if line:
|
||||
if not line.startswith('#'):
|
||||
if ad_frag_next:
|
||||
ad_frag_next = False
|
||||
continue
|
||||
frag_index += 1
|
||||
if frag_index <= ctx['fragment_index']:
|
||||
@ -196,8 +200,10 @@ class HlsFD(FragmentFD):
|
||||
'start': sub_range_start,
|
||||
'end': sub_range_start + int(splitted_byte_range[0]),
|
||||
}
|
||||
elif is_ad_fragment(line):
|
||||
elif is_ad_fragment_start(line):
|
||||
ad_frag_next = True
|
||||
elif is_ad_fragment_end(line):
|
||||
ad_frag_next = False
|
||||
|
||||
self._finish_frag_download(ctx)
|
||||
|
||||
|
@ -79,17 +79,27 @@ class ACastIE(InfoExtractor):
|
||||
|
||||
class ACastChannelIE(InfoExtractor):
|
||||
IE_NAME = 'acast:channel'
|
||||
_VALID_URL = r'https?://(?:www\.)?acast\.com/(?P<id>[^/#?]+)'
|
||||
_TEST = {
|
||||
'url': 'https://www.acast.com/condenasttraveler',
|
||||
_VALID_URL = r'''(?x)
|
||||
https?://
|
||||
(?:
|
||||
(?:www\.)?acast\.com/|
|
||||
play\.acast\.com/s/
|
||||
)
|
||||
(?P<id>[^/#?]+)
|
||||
'''
|
||||
_TESTS = [{
|
||||
'url': 'https://www.acast.com/todayinfocus',
|
||||
'info_dict': {
|
||||
'id': '50544219-29bb-499e-a083-6087f4cb7797',
|
||||
'title': 'Condé Nast Traveler Podcast',
|
||||
'description': 'md5:98646dee22a5b386626ae31866638fbd',
|
||||
'id': '4efc5294-5385-4847-98bd-519799ce5786',
|
||||
'title': 'Today in Focus',
|
||||
'description': 'md5:9ba5564de5ce897faeb12963f4537a64',
|
||||
},
|
||||
'playlist_mincount': 20,
|
||||
}
|
||||
_API_BASE_URL = 'https://www.acast.com/api/'
|
||||
'playlist_mincount': 35,
|
||||
}, {
|
||||
'url': 'http://play.acast.com/s/ft-banking-weekly',
|
||||
'only_matching': True,
|
||||
}]
|
||||
_API_BASE_URL = 'https://play.acast.com/api/'
|
||||
_PAGE_SIZE = 10
|
||||
|
||||
@classmethod
|
||||
@ -102,7 +112,7 @@ class ACastChannelIE(InfoExtractor):
|
||||
channel_slug, note='Download page %d of channel data' % page)
|
||||
for cast in casts:
|
||||
yield self.url_result(
|
||||
'https://www.acast.com/%s/%s' % (channel_slug, cast['url']),
|
||||
'https://play.acast.com/s/%s/%s' % (channel_slug, cast['url']),
|
||||
'ACast', cast['id'])
|
||||
|
||||
def _real_extract(self, url):
|
||||
|
@ -62,7 +62,7 @@ class AudiomackIE(InfoExtractor):
|
||||
# Audiomack wraps a lot of soundcloud tracks in their branded wrapper
|
||||
# if so, pass the work off to the soundcloud extractor
|
||||
if SoundcloudIE.suitable(api_response['url']):
|
||||
return {'_type': 'url', 'url': api_response['url'], 'ie_key': 'Soundcloud'}
|
||||
return self.url_result(api_response['url'], SoundcloudIE.ie_key())
|
||||
|
||||
return {
|
||||
'id': compat_str(api_response.get('id', album_url_tag)),
|
||||
|
@ -795,6 +795,15 @@ class BBCIE(BBCCoUkIE):
|
||||
'uploader': 'Radio 3',
|
||||
'uploader_id': 'bbc_radio_three',
|
||||
},
|
||||
}, {
|
||||
'url': 'http://www.bbc.co.uk/learningenglish/chinese/features/lingohack/ep-181227',
|
||||
'info_dict': {
|
||||
'id': 'p06w9tws',
|
||||
'ext': 'mp4',
|
||||
'title': 'md5:2fabf12a726603193a2879a055f72514',
|
||||
'description': 'Learn English words and phrases from this story',
|
||||
},
|
||||
'add_ie': [BBCCoUkIE.ie_key()],
|
||||
}]
|
||||
|
||||
@classmethod
|
||||
@ -945,6 +954,15 @@ class BBCIE(BBCCoUkIE):
|
||||
if entries:
|
||||
return self.playlist_result(entries, playlist_id, playlist_title, playlist_description)
|
||||
|
||||
# http://www.bbc.co.uk/learningenglish/chinese/features/lingohack/ep-181227
|
||||
group_id = self._search_regex(
|
||||
r'<div[^>]+\bclass=["\']video["\'][^>]+\bdata-pid=["\'](%s)' % self._ID_REGEX,
|
||||
webpage, 'group id', default=None)
|
||||
if playlist_id:
|
||||
return self.url_result(
|
||||
'https://www.bbc.co.uk/programmes/%s' % group_id,
|
||||
ie=BBCCoUkIE.ie_key())
|
||||
|
||||
# single video story (e.g. http://www.bbc.com/travel/story/20150625-sri-lankas-spicy-secret)
|
||||
programme_id = self._search_regex(
|
||||
[r'data-(?:video-player|media)-vpid="(%s)"' % self._ID_REGEX,
|
||||
|
@ -1,15 +1,10 @@
|
||||
from __future__ import unicode_literals
|
||||
|
||||
from .common import InfoExtractor
|
||||
from ..compat import (
|
||||
compat_chr,
|
||||
compat_ord,
|
||||
compat_urllib_parse_unquote,
|
||||
)
|
||||
from ..compat import compat_str
|
||||
from ..utils import (
|
||||
int_or_none,
|
||||
parse_iso8601,
|
||||
urljoin,
|
||||
unified_timestamp,
|
||||
)
|
||||
|
||||
|
||||
@ -36,29 +31,9 @@ class BeegIE(InfoExtractor):
|
||||
|
||||
webpage = self._download_webpage(url, video_id)
|
||||
|
||||
cpl_url = self._search_regex(
|
||||
r'<script[^>]+src=(["\'])(?P<url>(?:/static|(?:https?:)?//static\.beeg\.com)/cpl/\d+\.js.*?)\1',
|
||||
webpage, 'cpl', default=None, group='url')
|
||||
|
||||
cpl_url = urljoin(url, cpl_url)
|
||||
|
||||
beeg_version, beeg_salt = [None] * 2
|
||||
|
||||
if cpl_url:
|
||||
cpl = self._download_webpage(
|
||||
self._proto_relative_url(cpl_url), video_id,
|
||||
'Downloading cpl JS', fatal=False)
|
||||
if cpl:
|
||||
beeg_version = int_or_none(self._search_regex(
|
||||
r'beeg_version\s*=\s*([^\b]+)', cpl,
|
||||
'beeg version', default=None)) or self._search_regex(
|
||||
r'/(\d+)\.js', cpl_url, 'beeg version', default=None)
|
||||
beeg_salt = self._search_regex(
|
||||
r'beeg_salt\s*=\s*(["\'])(?P<beeg_salt>.+?)\1', cpl, 'beeg salt',
|
||||
default=None, group='beeg_salt')
|
||||
|
||||
beeg_version = beeg_version or '2185'
|
||||
beeg_salt = beeg_salt or 'pmweAkq8lAYKdfWcFCUj0yoVgoPlinamH5UE1CB3H'
|
||||
beeg_version = self._search_regex(
|
||||
r'beeg_version\s*=\s*([\da-zA-Z_-]+)', webpage, 'beeg version',
|
||||
default='1546225636701')
|
||||
|
||||
for api_path in ('', 'api.'):
|
||||
video = self._download_json(
|
||||
@ -68,37 +43,6 @@ class BeegIE(InfoExtractor):
|
||||
if video:
|
||||
break
|
||||
|
||||
def split(o, e):
|
||||
def cut(s, x):
|
||||
n.append(s[:x])
|
||||
return s[x:]
|
||||
n = []
|
||||
r = len(o) % e
|
||||
if r > 0:
|
||||
o = cut(o, r)
|
||||
while len(o) > e:
|
||||
o = cut(o, e)
|
||||
n.append(o)
|
||||
return n
|
||||
|
||||
def decrypt_key(key):
|
||||
# Reverse engineered from http://static.beeg.com/cpl/1738.js
|
||||
a = beeg_salt
|
||||
e = compat_urllib_parse_unquote(key)
|
||||
o = ''.join([
|
||||
compat_chr(compat_ord(e[n]) - compat_ord(a[n % len(a)]) % 21)
|
||||
for n in range(len(e))])
|
||||
return ''.join(split(o, 3)[::-1])
|
||||
|
||||
def decrypt_url(encrypted_url):
|
||||
encrypted_url = self._proto_relative_url(
|
||||
encrypted_url.replace('{DATA_MARKERS}', ''), 'https:')
|
||||
key = self._search_regex(
|
||||
r'/key=(.*?)%2Cend=', encrypted_url, 'key', default=None)
|
||||
if not key:
|
||||
return encrypted_url
|
||||
return encrypted_url.replace(key, decrypt_key(key))
|
||||
|
||||
formats = []
|
||||
for format_id, video_url in video.items():
|
||||
if not video_url:
|
||||
@ -108,18 +52,20 @@ class BeegIE(InfoExtractor):
|
||||
if not height:
|
||||
continue
|
||||
formats.append({
|
||||
'url': decrypt_url(video_url),
|
||||
'url': self._proto_relative_url(
|
||||
video_url.replace('{DATA_MARKERS}', 'data=pc_XX__%s_0' % beeg_version), 'https:'),
|
||||
'format_id': format_id,
|
||||
'height': int(height),
|
||||
})
|
||||
self._sort_formats(formats)
|
||||
|
||||
title = video['title']
|
||||
video_id = video.get('id') or video_id
|
||||
video_id = compat_str(video.get('id') or video_id)
|
||||
display_id = video.get('code')
|
||||
description = video.get('desc')
|
||||
series = video.get('ps_name')
|
||||
|
||||
timestamp = parse_iso8601(video.get('date'), ' ')
|
||||
timestamp = unified_timestamp(video.get('date'))
|
||||
duration = int_or_none(video.get('duration'))
|
||||
|
||||
tags = [tag.strip() for tag in video['tags'].split(',')] if video.get('tags') else None
|
||||
@ -129,6 +75,7 @@ class BeegIE(InfoExtractor):
|
||||
'display_id': display_id,
|
||||
'title': title,
|
||||
'description': description,
|
||||
'series': series,
|
||||
'timestamp': timestamp,
|
||||
'duration': duration,
|
||||
'tags': tags,
|
||||
|
@ -5,7 +5,10 @@ import itertools
|
||||
import re
|
||||
|
||||
from .common import InfoExtractor
|
||||
from ..utils import urlencode_postdata
|
||||
from ..utils import (
|
||||
orderedSet,
|
||||
urlencode_postdata,
|
||||
)
|
||||
|
||||
|
||||
class BitChuteIE(InfoExtractor):
|
||||
@ -43,10 +46,16 @@ class BitChuteIE(InfoExtractor):
|
||||
'description', webpage, 'title',
|
||||
default=None) or self._og_search_description(webpage)
|
||||
|
||||
format_urls = []
|
||||
for mobj in re.finditer(
|
||||
r'addWebSeed\s*\(\s*(["\'])(?P<url>(?:(?!\1).)+)\1', webpage):
|
||||
format_urls.append(mobj.group('url'))
|
||||
format_urls.extend(re.findall(r'as=(https?://[^&"\']+)', webpage))
|
||||
|
||||
formats = [
|
||||
{'url': mobj.group('url')}
|
||||
for mobj in re.finditer(
|
||||
r'addWebSeed\s*\(\s*(["\'])(?P<url>(?:(?!\1).)+)\1', webpage)]
|
||||
{'url': format_url}
|
||||
for format_url in orderedSet(format_urls)]
|
||||
self._check_formats(formats, video_id)
|
||||
self._sort_formats(formats)
|
||||
|
||||
description = self._html_search_regex(
|
||||
|
@ -14,6 +14,7 @@ class CamModelsIE(InfoExtractor):
|
||||
_TESTS = [{
|
||||
'url': 'https://www.cammodels.com/cam/AutumnKnight/',
|
||||
'only_matching': True,
|
||||
'age_limit': 18
|
||||
}]
|
||||
|
||||
def _real_extract(self, url):
|
||||
@ -93,4 +94,5 @@ class CamModelsIE(InfoExtractor):
|
||||
'title': self._live_title(user_id),
|
||||
'is_live': True,
|
||||
'formats': formats,
|
||||
'age_limit': 18
|
||||
}
|
||||
|
@ -20,6 +20,7 @@ class CamTubeIE(InfoExtractor):
|
||||
'duration': 1274,
|
||||
'timestamp': 1528018608,
|
||||
'upload_date': '20180603',
|
||||
'age_limit': 18
|
||||
},
|
||||
'params': {
|
||||
'skip_download': True,
|
||||
@ -66,4 +67,5 @@ class CamTubeIE(InfoExtractor):
|
||||
'like_count': like_count,
|
||||
'creator': creator,
|
||||
'formats': formats,
|
||||
'age_limit': 18
|
||||
}
|
||||
|
@ -25,6 +25,7 @@ class CamWithHerIE(InfoExtractor):
|
||||
'comment_count': int,
|
||||
'uploader': 'MileenaK',
|
||||
'upload_date': '20160322',
|
||||
'age_limit': 18,
|
||||
},
|
||||
'params': {
|
||||
'skip_download': True,
|
||||
@ -84,4 +85,5 @@ class CamWithHerIE(InfoExtractor):
|
||||
'comment_count': comment_count,
|
||||
'uploader': uploader,
|
||||
'upload_date': upload_date,
|
||||
'age_limit': 18
|
||||
}
|
||||
|
@ -82,6 +82,12 @@ class CarambaTVPageIE(InfoExtractor):
|
||||
webpage = self._download_webpage(url, video_id)
|
||||
|
||||
videomore_url = VideomoreIE._extract_url(webpage)
|
||||
if not videomore_url:
|
||||
videomore_id = self._search_regex(
|
||||
r'getVMCode\s*\(\s*["\']?(\d+)', webpage, 'videomore id',
|
||||
default=None)
|
||||
if videomore_id:
|
||||
videomore_url = 'videomore:%s' % videomore_id
|
||||
if videomore_url:
|
||||
title = self._og_search_title(webpage)
|
||||
return {
|
||||
|
@ -119,11 +119,7 @@ class CNNBlogsIE(InfoExtractor):
|
||||
def _real_extract(self, url):
|
||||
webpage = self._download_webpage(url, url_basename(url))
|
||||
cnn_url = self._html_search_regex(r'data-url="(.+?)"', webpage, 'cnn url')
|
||||
return {
|
||||
'_type': 'url',
|
||||
'url': cnn_url,
|
||||
'ie_key': CNNIE.ie_key(),
|
||||
}
|
||||
return self.url_result(cnn_url, CNNIE.ie_key())
|
||||
|
||||
|
||||
class CNNArticleIE(InfoExtractor):
|
||||
@ -145,8 +141,4 @@ class CNNArticleIE(InfoExtractor):
|
||||
def _real_extract(self, url):
|
||||
webpage = self._download_webpage(url, url_basename(url))
|
||||
cnn_url = self._html_search_regex(r"video:\s*'([^']+)'", webpage, 'cnn url')
|
||||
return {
|
||||
'_type': 'url',
|
||||
'url': 'http://cnn.com/video/?/video/' + cnn_url,
|
||||
'ie_key': CNNIE.ie_key(),
|
||||
}
|
||||
return self.url_result('http://cnn.com/video/?/video/' + cnn_url, CNNIE.ie_key())
|
||||
|
@ -1239,17 +1239,27 @@ class InfoExtractor(object):
|
||||
if expected_type is not None and expected_type != item_type:
|
||||
return info
|
||||
if item_type in ('TVEpisode', 'Episode'):
|
||||
episode_name = unescapeHTML(e.get('name'))
|
||||
info.update({
|
||||
'episode': unescapeHTML(e.get('name')),
|
||||
'episode': episode_name,
|
||||
'episode_number': int_or_none(e.get('episodeNumber')),
|
||||
'description': unescapeHTML(e.get('description')),
|
||||
})
|
||||
if not info.get('title') and episode_name:
|
||||
info['title'] = episode_name
|
||||
part_of_season = e.get('partOfSeason')
|
||||
if isinstance(part_of_season, dict) and part_of_season.get('@type') in ('TVSeason', 'Season', 'CreativeWorkSeason'):
|
||||
info['season_number'] = int_or_none(part_of_season.get('seasonNumber'))
|
||||
part_of_series = e.get('partOfSeries') or e.get('partOfTVSeries')
|
||||
if isinstance(part_of_series, dict) and part_of_series.get('@type') in ('TVSeries', 'Series', 'CreativeWorkSeries'):
|
||||
info['series'] = unescapeHTML(part_of_series.get('name'))
|
||||
elif item_type == 'Movie':
|
||||
info.update({
|
||||
'title': unescapeHTML(e.get('name')),
|
||||
'description': unescapeHTML(e.get('description')),
|
||||
'duration': parse_duration(e.get('duration')),
|
||||
'timestamp': unified_timestamp(e.get('dateCreated')),
|
||||
})
|
||||
elif item_type in ('Article', 'NewsArticle'):
|
||||
info.update({
|
||||
'timestamp': parse_iso8601(e.get('datePublished')),
|
||||
|
@ -46,8 +46,24 @@ class CuriosityStreamBaseIE(InfoExtractor):
|
||||
self._handle_errors(result)
|
||||
self._auth_token = result['message']['auth_token']
|
||||
|
||||
def _extract_media_info(self, media):
|
||||
video_id = compat_str(media['id'])
|
||||
|
||||
class CuriosityStreamIE(CuriosityStreamBaseIE):
|
||||
IE_NAME = 'curiositystream'
|
||||
_VALID_URL = r'https?://(?:app\.)?curiositystream\.com/video/(?P<id>\d+)'
|
||||
_TEST = {
|
||||
'url': 'https://app.curiositystream.com/video/2',
|
||||
'md5': '262bb2f257ff301115f1973540de8983',
|
||||
'info_dict': {
|
||||
'id': '2',
|
||||
'ext': 'mp4',
|
||||
'title': 'How Did You Develop The Internet?',
|
||||
'description': 'Vint Cerf, Google\'s Chief Internet Evangelist, describes how he and Bob Kahn created the internet.',
|
||||
}
|
||||
}
|
||||
|
||||
def _real_extract(self, url):
|
||||
video_id = self._match_id(url)
|
||||
media = self._call_api('media/' + video_id, video_id)
|
||||
title = media['title']
|
||||
|
||||
formats = []
|
||||
@ -114,38 +130,21 @@ class CuriosityStreamBaseIE(InfoExtractor):
|
||||
}
|
||||
|
||||
|
||||
class CuriosityStreamIE(CuriosityStreamBaseIE):
|
||||
IE_NAME = 'curiositystream'
|
||||
_VALID_URL = r'https?://app\.curiositystream\.com/video/(?P<id>\d+)'
|
||||
_TEST = {
|
||||
'url': 'https://app.curiositystream.com/video/2',
|
||||
'md5': '262bb2f257ff301115f1973540de8983',
|
||||
'info_dict': {
|
||||
'id': '2',
|
||||
'ext': 'mp4',
|
||||
'title': 'How Did You Develop The Internet?',
|
||||
'description': 'Vint Cerf, Google\'s Chief Internet Evangelist, describes how he and Bob Kahn created the internet.',
|
||||
}
|
||||
}
|
||||
|
||||
def _real_extract(self, url):
|
||||
video_id = self._match_id(url)
|
||||
media = self._call_api('media/' + video_id, video_id)
|
||||
return self._extract_media_info(media)
|
||||
|
||||
|
||||
class CuriosityStreamCollectionIE(CuriosityStreamBaseIE):
|
||||
IE_NAME = 'curiositystream:collection'
|
||||
_VALID_URL = r'https?://app\.curiositystream\.com/collection/(?P<id>\d+)'
|
||||
_TEST = {
|
||||
_VALID_URL = r'https?://(?:app\.)?curiositystream\.com/(?:collection|series)/(?P<id>\d+)'
|
||||
_TESTS = [{
|
||||
'url': 'https://app.curiositystream.com/collection/2',
|
||||
'info_dict': {
|
||||
'id': '2',
|
||||
'title': 'Curious Minds: The Internet',
|
||||
'description': 'How is the internet shaping our lives in the 21st Century?',
|
||||
},
|
||||
'playlist_mincount': 12,
|
||||
}
|
||||
'playlist_mincount': 17,
|
||||
}, {
|
||||
'url': 'https://curiositystream.com/series/2',
|
||||
'only_matching': True,
|
||||
}]
|
||||
|
||||
def _real_extract(self, url):
|
||||
collection_id = self._match_id(url)
|
||||
@ -153,7 +152,10 @@ class CuriosityStreamCollectionIE(CuriosityStreamBaseIE):
|
||||
'collections/' + collection_id, collection_id)
|
||||
entries = []
|
||||
for media in collection.get('media', []):
|
||||
entries.append(self._extract_media_info(media))
|
||||
media_id = compat_str(media.get('id'))
|
||||
entries.append(self.url_result(
|
||||
'https://curiositystream.com/video/' + media_id,
|
||||
CuriosityStreamIE.ie_key(), media_id))
|
||||
return self.playlist_result(
|
||||
entries, collection_id,
|
||||
collection.get('title'), collection.get('description'))
|
||||
|
@ -94,11 +94,12 @@ class DiscoveryIE(DiscoveryGoBaseIE):
|
||||
})['access_token']
|
||||
|
||||
try:
|
||||
headers = self.geo_verification_headers()
|
||||
headers['Authorization'] = 'Bearer ' + access_token
|
||||
|
||||
stream = self._download_json(
|
||||
'https://api.discovery.com/v1/streaming/video/' + video_id,
|
||||
display_id, headers={
|
||||
'Authorization': 'Bearer ' + access_token,
|
||||
})
|
||||
display_id, headers=headers)
|
||||
except ExtractorError as e:
|
||||
if isinstance(e.cause, compat_HTTPError) and e.cause.code in (401, 403):
|
||||
e_description = self._parse_json(
|
||||
|
@ -15,16 +15,16 @@ from ..utils import (
|
||||
class DTubeIE(InfoExtractor):
|
||||
_VALID_URL = r'https?://(?:www\.)?d\.tube/(?:#!/)?v/(?P<uploader_id>[0-9a-z.-]+)/(?P<id>[0-9a-z]{8})'
|
||||
_TEST = {
|
||||
'url': 'https://d.tube/#!/v/benswann/zqd630em',
|
||||
'md5': 'a03eaa186618ffa7a3145945543a251e',
|
||||
'url': 'https://d.tube/#!/v/broncnutz/x380jtr1',
|
||||
'md5': '9f29088fa08d699a7565ee983f56a06e',
|
||||
'info_dict': {
|
||||
'id': 'zqd630em',
|
||||
'id': 'x380jtr1',
|
||||
'ext': 'mp4',
|
||||
'title': 'Reality Check: FDA\'s Disinformation Campaign on Kratom',
|
||||
'description': 'md5:700d164e066b87f9eac057949e4227c2',
|
||||
'uploader_id': 'benswann',
|
||||
'upload_date': '20180222',
|
||||
'timestamp': 1519328958,
|
||||
'title': 'Lefty 3-Rings is Back Baby!! NCAA Picks',
|
||||
'description': 'md5:60be222088183be3a42f196f34235776',
|
||||
'uploader_id': 'broncnutz',
|
||||
'upload_date': '20190107',
|
||||
'timestamp': 1546854054,
|
||||
},
|
||||
'params': {
|
||||
'format': '480p',
|
||||
@ -48,7 +48,7 @@ class DTubeIE(InfoExtractor):
|
||||
def canonical_url(h):
|
||||
if not h:
|
||||
return None
|
||||
return 'https://ipfs.io/ipfs/' + h
|
||||
return 'https://video.dtube.top/ipfs/' + h
|
||||
|
||||
formats = []
|
||||
for q in ('240', '480', '720', '1080', ''):
|
||||
|
@ -411,6 +411,7 @@ from .funk import (
|
||||
from .funnyordie import FunnyOrDieIE
|
||||
from .fusion import FusionIE
|
||||
from .fxnetworks import FXNetworksIE
|
||||
from .gaia import GaiaIE
|
||||
from .gameinformer import GameInformerIE
|
||||
from .gameone import (
|
||||
GameOneIE,
|
||||
@ -469,6 +470,10 @@ from .hrti import (
|
||||
)
|
||||
from .huajiao import HuajiaoIE
|
||||
from .huffpost import HuffPostIE
|
||||
from .hungama import (
|
||||
HungamaIE,
|
||||
HungamaSongIE,
|
||||
)
|
||||
from .hypem import HypemIE
|
||||
from .iconosquare import IconosquareIE
|
||||
from .ign import (
|
||||
@ -682,11 +687,7 @@ from .myvi import (
|
||||
MyviEmbedIE,
|
||||
)
|
||||
from .myvidster import MyVidsterIE
|
||||
from .nationalgeographic import (
|
||||
NationalGeographicVideoIE,
|
||||
NationalGeographicIE,
|
||||
NationalGeographicEpisodeGuideIE,
|
||||
)
|
||||
from .nationalgeographic import NationalGeographicVideoIE
|
||||
from .naver import NaverIE
|
||||
from .nba import NBAIE
|
||||
from .nbc import (
|
||||
@ -828,6 +829,7 @@ from .orf import (
|
||||
ORFOE1IE,
|
||||
ORFIPTVIE,
|
||||
)
|
||||
from .outsidetv import OutsideTVIE
|
||||
from .packtpub import (
|
||||
PacktPubIE,
|
||||
PacktPubCourseIE,
|
||||
@ -856,6 +858,7 @@ from .piksel import PikselIE
|
||||
from .pinkbike import PinkbikeIE
|
||||
from .pladform import PladformIE
|
||||
from .playfm import PlayFMIE
|
||||
from .playplustv import PlayPlusTVIE
|
||||
from .plays import PlaysTVIE
|
||||
from .playtvak import PlaytvakIE
|
||||
from .playvid import PlayvidIE
|
||||
@ -1193,7 +1196,9 @@ from .tvnet import TVNetIE
|
||||
from .tvnoe import TVNoeIE
|
||||
from .tvnow import (
|
||||
TVNowIE,
|
||||
TVNowListIE,
|
||||
TVNowNewIE,
|
||||
TVNowSeasonIE,
|
||||
TVNowAnnualIE,
|
||||
TVNowShowIE,
|
||||
)
|
||||
from .tvp import (
|
||||
|
@ -57,7 +57,8 @@ class FacebookIE(InfoExtractor):
|
||||
_CHROME_USER_AGENT = 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/48.0.2564.97 Safari/537.36'
|
||||
|
||||
_VIDEO_PAGE_TEMPLATE = 'https://www.facebook.com/video/video.php?v=%s'
|
||||
_VIDEO_PAGE_TAHOE_TEMPLATE = 'https://www.facebook.com/video/tahoe/async/%s/?chain=true&isvideo=true&payloadtype=primary'
|
||||
_VIDEO_PAGE_TAHOE_TEMPLATE = 'https://www.facebook.com/video/tahoe/async/%s/?chain=true&isvideo=true&payloadtype=%s'
|
||||
|
||||
|
||||
_TESTS = [{
|
||||
'url': 'https://www.facebook.com/video.php?v=637842556329505&fref=nf',
|
||||
@ -222,6 +223,10 @@ class FacebookIE(InfoExtractor):
|
||||
'params': {
|
||||
'skip_download': True,
|
||||
},
|
||||
}, {
|
||||
# no timestamp
|
||||
'url': 'https://www.facebook.com/ChickenShow1996/videos/2289288568020072/',
|
||||
'only_matching': True,
|
||||
}]
|
||||
|
||||
@staticmethod
|
||||
@ -339,6 +344,7 @@ class FacebookIE(InfoExtractor):
|
||||
video_id, transform_source=js_to_json, fatal=False)
|
||||
video_data = extract_from_jsmods_instances(server_js_data)
|
||||
|
||||
tahoe_secondary_data = ''
|
||||
if not video_data:
|
||||
if not fatal_if_no_video:
|
||||
return webpage, False
|
||||
@ -352,9 +358,7 @@ class FacebookIE(InfoExtractor):
|
||||
|
||||
# Video info not in first request, do a secondary request using
|
||||
# tahoe player specific URL
|
||||
tahoe_data = self._download_webpage(
|
||||
self._VIDEO_PAGE_TAHOE_TEMPLATE % video_id, video_id,
|
||||
data=urlencode_postdata({
|
||||
tahoe_request_data = urlencode_postdata({
|
||||
'__a': 1,
|
||||
'__pc': self._search_regex(
|
||||
r'pkg_cohort["\']\s*:\s*["\'](.+?)["\']', webpage,
|
||||
@ -365,15 +369,29 @@ class FacebookIE(InfoExtractor):
|
||||
'fb_dtsg': self._search_regex(
|
||||
r'"DTSGInitialData"\s*,\s*\[\]\s*,\s*{\s*"token"\s*:\s*"([^"]+)"',
|
||||
webpage, 'dtsg token', default=''),
|
||||
}),
|
||||
headers={
|
||||
'Content-Type': 'application/x-www-form-urlencoded',
|
||||
})
|
||||
tahoe_request_headers = {
|
||||
'Content-Type': 'application/x-www-form-urlencoded',
|
||||
}
|
||||
|
||||
tahoe_primary_data = self._download_webpage(
|
||||
self._VIDEO_PAGE_TAHOE_TEMPLATE % (video_id, 'primary'), video_id,
|
||||
data=tahoe_request_data,
|
||||
headers=tahoe_request_headers
|
||||
)
|
||||
|
||||
tahoe_secondary_data = self._download_webpage(
|
||||
self._VIDEO_PAGE_TAHOE_TEMPLATE % (video_id, 'secondary'), video_id,
|
||||
data=tahoe_request_data,
|
||||
headers=tahoe_request_headers
|
||||
)
|
||||
|
||||
tahoe_js_data = self._parse_json(
|
||||
self._search_regex(
|
||||
r'for\s+\(\s*;\s*;\s*\)\s*;(.+)', tahoe_data,
|
||||
r'for\s+\(\s*;\s*;\s*\)\s*;(.+)', tahoe_primary_data,
|
||||
'tahoe js data', default='{}'),
|
||||
video_id, fatal=False)
|
||||
|
||||
video_data = extract_from_jsmods_instances(tahoe_js_data)
|
||||
|
||||
if not video_data:
|
||||
@ -427,7 +445,10 @@ class FacebookIE(InfoExtractor):
|
||||
fatal=False) or self._og_search_title(webpage, fatal=False)
|
||||
timestamp = int_or_none(self._search_regex(
|
||||
r'<abbr[^>]+data-utime=["\'](\d+)', webpage,
|
||||
'timestamp', default=None) or self._search_regex(
|
||||
r'data-utime=\\\"(\d+)\\\"', tahoe_secondary_data,
|
||||
'timestamp', default=None))
|
||||
|
||||
thumbnail = self._og_search_thumbnail(webpage)
|
||||
|
||||
view_count = parse_count(self._search_regex(
|
||||
|
@ -1,11 +1,11 @@
|
||||
# coding: utf-8
|
||||
from __future__ import unicode_literals
|
||||
|
||||
# import json
|
||||
# import uuid
|
||||
|
||||
from .adobepass import AdobePassIE
|
||||
from .uplynk import UplynkPreplayIE
|
||||
from ..compat import compat_str
|
||||
from ..utils import (
|
||||
HEADRequest,
|
||||
int_or_none,
|
||||
parse_age_limit,
|
||||
parse_duration,
|
||||
@ -16,7 +16,7 @@ from ..utils import (
|
||||
|
||||
|
||||
class FOXIE(AdobePassIE):
|
||||
_VALID_URL = r'https?://(?:www\.)?fox\.com/watch/(?P<id>[\da-fA-F]+)'
|
||||
_VALID_URL = r'https?://(?:www\.)?(?:fox\.com|nationalgeographic\.com/tv)/watch/(?P<id>[\da-fA-F]+)'
|
||||
_TESTS = [{
|
||||
# clip
|
||||
'url': 'https://www.fox.com/watch/4b765a60490325103ea69888fb2bd4e8/',
|
||||
@ -43,41 +43,47 @@ class FOXIE(AdobePassIE):
|
||||
# episode, geo-restricted, tv provided required
|
||||
'url': 'https://www.fox.com/watch/30056b295fb57f7452aeeb4920bc3024/',
|
||||
'only_matching': True,
|
||||
}, {
|
||||
'url': 'https://www.nationalgeographic.com/tv/watch/f690e05ebbe23ab79747becd0cc223d1/',
|
||||
'only_matching': True,
|
||||
}]
|
||||
# _access_token = None
|
||||
|
||||
# def _call_api(self, path, video_id, data=None):
|
||||
# headers = {
|
||||
# 'X-Api-Key': '238bb0a0c2aba67922c48709ce0c06fd',
|
||||
# }
|
||||
# if self._access_token:
|
||||
# headers['Authorization'] = 'Bearer ' + self._access_token
|
||||
# return self._download_json(
|
||||
# 'https://api2.fox.com/v2.0/' + path, video_id, data=data, headers=headers)
|
||||
|
||||
# def _real_initialize(self):
|
||||
# self._access_token = self._call_api(
|
||||
# 'login', None, json.dumps({
|
||||
# 'deviceId': compat_str(uuid.uuid4()),
|
||||
# }).encode())['accessToken']
|
||||
|
||||
def _real_extract(self, url):
|
||||
video_id = self._match_id(url)
|
||||
|
||||
video = self._download_json(
|
||||
'https://api.fox.com/fbc-content/v1_4/video/%s' % video_id,
|
||||
'https://api.fox.com/fbc-content/v1_5/video/%s' % video_id,
|
||||
video_id, headers={
|
||||
'apikey': 'abdcbed02c124d393b39e818a4312055',
|
||||
'Content-Type': 'application/json',
|
||||
'Referer': url,
|
||||
})
|
||||
# video = self._call_api('vodplayer/' + video_id, video_id)
|
||||
|
||||
title = video['name']
|
||||
release_url = video['videoRelease']['url']
|
||||
|
||||
description = video.get('description')
|
||||
duration = int_or_none(video.get('durationInSeconds')) or int_or_none(
|
||||
video.get('duration')) or parse_duration(video.get('duration'))
|
||||
timestamp = unified_timestamp(video.get('datePublished'))
|
||||
rating = video.get('contentRating')
|
||||
age_limit = parse_age_limit(rating)
|
||||
# release_url = video['url']
|
||||
|
||||
data = try_get(
|
||||
video, lambda x: x['trackingData']['properties'], dict) or {}
|
||||
|
||||
creator = data.get('brand') or data.get('network') or video.get('network')
|
||||
|
||||
series = video.get('seriesName') or data.get(
|
||||
'seriesName') or data.get('show')
|
||||
season_number = int_or_none(video.get('seasonNumber'))
|
||||
episode = video.get('name')
|
||||
episode_number = int_or_none(video.get('episodeNumber'))
|
||||
release_year = int_or_none(video.get('releaseYear'))
|
||||
|
||||
rating = video.get('contentRating')
|
||||
if data.get('authRequired'):
|
||||
resource = self._get_mvpd_resource(
|
||||
'fbc-fox', title, video.get('guid'), rating)
|
||||
@ -86,6 +92,18 @@ class FOXIE(AdobePassIE):
|
||||
'auth': self._extract_mvpd_auth(
|
||||
url, video_id, 'fbc-fox', resource)
|
||||
})
|
||||
m3u8_url = self._download_json(release_url, video_id)['playURL']
|
||||
formats = self._extract_m3u8_formats(
|
||||
m3u8_url, video_id, 'mp4',
|
||||
entry_protocol='m3u8_native', m3u8_id='hls')
|
||||
self._sort_formats(formats)
|
||||
|
||||
duration = int_or_none(video.get('durationInSeconds')) or int_or_none(
|
||||
video.get('duration')) or parse_duration(video.get('duration'))
|
||||
timestamp = unified_timestamp(video.get('datePublished'))
|
||||
creator = data.get('brand') or data.get('network') or video.get('network')
|
||||
series = video.get('seriesName') or data.get(
|
||||
'seriesName') or data.get('show')
|
||||
|
||||
subtitles = {}
|
||||
for doc_rel in video.get('documentReleases', []):
|
||||
@ -98,36 +116,19 @@ class FOXIE(AdobePassIE):
|
||||
}]
|
||||
break
|
||||
|
||||
info = {
|
||||
return {
|
||||
'id': video_id,
|
||||
'title': title,
|
||||
'description': description,
|
||||
'formats': formats,
|
||||
'description': video.get('description'),
|
||||
'duration': duration,
|
||||
'timestamp': timestamp,
|
||||
'age_limit': age_limit,
|
||||
'age_limit': parse_age_limit(rating),
|
||||
'creator': creator,
|
||||
'series': series,
|
||||
'season_number': season_number,
|
||||
'episode': episode,
|
||||
'episode_number': episode_number,
|
||||
'release_year': release_year,
|
||||
'season_number': int_or_none(video.get('seasonNumber')),
|
||||
'episode': video.get('name'),
|
||||
'episode_number': int_or_none(video.get('episodeNumber')),
|
||||
'release_year': int_or_none(video.get('releaseYear')),
|
||||
'subtitles': subtitles,
|
||||
}
|
||||
|
||||
urlh = self._request_webpage(HEADRequest(release_url), video_id)
|
||||
video_url = compat_str(urlh.geturl())
|
||||
|
||||
if UplynkPreplayIE.suitable(video_url):
|
||||
info.update({
|
||||
'_type': 'url_transparent',
|
||||
'url': video_url,
|
||||
'ie_key': UplynkPreplayIE.ie_key(),
|
||||
})
|
||||
else:
|
||||
m3u8_url = self._download_json(release_url, video_id)['playURL']
|
||||
formats = self._extract_m3u8_formats(
|
||||
m3u8_url, video_id, 'mp4',
|
||||
entry_protocol='m3u8_native', m3u8_id='hls')
|
||||
self._sort_formats(formats)
|
||||
info['formats'] = formats
|
||||
return info
|
||||
|
@ -1,6 +1,7 @@
|
||||
from __future__ import unicode_literals
|
||||
|
||||
from .common import InfoExtractor
|
||||
from .youtube import YoutubeIE
|
||||
|
||||
|
||||
class FreespeechIE(InfoExtractor):
|
||||
@ -27,8 +28,4 @@ class FreespeechIE(InfoExtractor):
|
||||
r'data-video-url="([^"]+)"',
|
||||
webpage, 'youtube url')
|
||||
|
||||
return {
|
||||
'_type': 'url',
|
||||
'url': youtube_url,
|
||||
'ie_key': 'Youtube',
|
||||
}
|
||||
return self.url_result(youtube_url, YoutubeIE.ie_key())
|
||||
|
98
youtube_dl/extractor/gaia.py
Normal file
98
youtube_dl/extractor/gaia.py
Normal file
@ -0,0 +1,98 @@
|
||||
# coding: utf-8
|
||||
from __future__ import unicode_literals
|
||||
|
||||
import re
|
||||
|
||||
from .common import InfoExtractor
|
||||
from ..compat import compat_str
|
||||
from ..utils import (
|
||||
int_or_none,
|
||||
str_or_none,
|
||||
strip_or_none,
|
||||
try_get,
|
||||
)
|
||||
|
||||
|
||||
class GaiaIE(InfoExtractor):
|
||||
_VALID_URL = r'https?://(?:www\.)?gaia\.com/video/(?P<id>[^/?]+).*?\bfullplayer=(?P<type>feature|preview)'
|
||||
_TESTS = [{
|
||||
'url': 'https://www.gaia.com/video/connecting-universal-consciousness?fullplayer=feature',
|
||||
'info_dict': {
|
||||
'id': '89356',
|
||||
'ext': 'mp4',
|
||||
'title': 'Connecting with Universal Consciousness',
|
||||
'description': 'md5:844e209ad31b7d31345f5ed689e3df6f',
|
||||
'upload_date': '20151116',
|
||||
'timestamp': 1447707266,
|
||||
'duration': 936,
|
||||
},
|
||||
'params': {
|
||||
# m3u8 download
|
||||
'skip_download': True,
|
||||
},
|
||||
}, {
|
||||
'url': 'https://www.gaia.com/video/connecting-universal-consciousness?fullplayer=preview',
|
||||
'info_dict': {
|
||||
'id': '89351',
|
||||
'ext': 'mp4',
|
||||
'title': 'Connecting with Universal Consciousness',
|
||||
'description': 'md5:844e209ad31b7d31345f5ed689e3df6f',
|
||||
'upload_date': '20151116',
|
||||
'timestamp': 1447707266,
|
||||
'duration': 53,
|
||||
},
|
||||
'params': {
|
||||
# m3u8 download
|
||||
'skip_download': True,
|
||||
},
|
||||
}]
|
||||
|
||||
def _real_extract(self, url):
|
||||
display_id, vtype = re.search(self._VALID_URL, url).groups()
|
||||
node_id = self._download_json(
|
||||
'https://brooklyn.gaia.com/pathinfo', display_id, query={
|
||||
'path': 'video/' + display_id,
|
||||
})['id']
|
||||
node = self._download_json(
|
||||
'https://brooklyn.gaia.com/node/%d' % node_id, node_id)
|
||||
vdata = node[vtype]
|
||||
media_id = compat_str(vdata['nid'])
|
||||
title = node['title']
|
||||
|
||||
media = self._download_json(
|
||||
'https://brooklyn.gaia.com/media/' + media_id, media_id)
|
||||
formats = self._extract_m3u8_formats(
|
||||
media['mediaUrls']['bcHLS'], media_id, 'mp4')
|
||||
self._sort_formats(formats)
|
||||
|
||||
subtitles = {}
|
||||
text_tracks = media.get('textTracks', {})
|
||||
for key in ('captions', 'subtitles'):
|
||||
for lang, sub_url in text_tracks.get(key, {}).items():
|
||||
subtitles.setdefault(lang, []).append({
|
||||
'url': sub_url,
|
||||
})
|
||||
|
||||
fivestar = node.get('fivestar', {})
|
||||
fields = node.get('fields', {})
|
||||
|
||||
def get_field_value(key, value_key='value'):
|
||||
return try_get(fields, lambda x: x[key][0][value_key])
|
||||
|
||||
return {
|
||||
'id': media_id,
|
||||
'display_id': display_id,
|
||||
'title': title,
|
||||
'formats': formats,
|
||||
'description': strip_or_none(get_field_value('body') or get_field_value('teaser')),
|
||||
'timestamp': int_or_none(node.get('created')),
|
||||
'subtitles': subtitles,
|
||||
'duration': int_or_none(vdata.get('duration')),
|
||||
'like_count': int_or_none(try_get(fivestar, lambda x: x['up_count']['value'])),
|
||||
'dislike_count': int_or_none(try_get(fivestar, lambda x: x['down_count']['value'])),
|
||||
'comment_count': int_or_none(node.get('comment_count')),
|
||||
'series': try_get(node, lambda x: x['series']['title'], compat_str),
|
||||
'season_number': int_or_none(get_field_value('season')),
|
||||
'season_id': str_or_none(get_field_value('series_nid', 'nid')),
|
||||
'episode_number': int_or_none(get_field_value('episode')),
|
||||
}
|
@ -2197,10 +2197,7 @@ class GenericIE(InfoExtractor):
|
||||
|
||||
def _real_extract(self, url):
|
||||
if url.startswith('//'):
|
||||
return {
|
||||
'_type': 'url',
|
||||
'url': self.http_scheme() + url,
|
||||
}
|
||||
return self.url_result(self.http_scheme() + url)
|
||||
|
||||
parsed_url = compat_urlparse.urlparse(url)
|
||||
if not parsed_url.scheme:
|
||||
|
@ -72,7 +72,7 @@ class GloboIE(InfoExtractor):
|
||||
return
|
||||
|
||||
try:
|
||||
self._download_json(
|
||||
glb_id = (self._download_json(
|
||||
'https://login.globo.com/api/authentication', None, data=json.dumps({
|
||||
'payload': {
|
||||
'email': email,
|
||||
@ -81,7 +81,9 @@ class GloboIE(InfoExtractor):
|
||||
},
|
||||
}).encode(), headers={
|
||||
'Content-Type': 'application/json; charset=utf-8',
|
||||
})
|
||||
}) or {}).get('glbId')
|
||||
if glb_id:
|
||||
self._set_cookie('.globo.com', 'GLBID', glb_id)
|
||||
except ExtractorError as e:
|
||||
if isinstance(e.cause, compat_HTTPError) and e.cause.code == 401:
|
||||
resp = self._parse_json(e.cause.read(), None)
|
||||
|
117
youtube_dl/extractor/hungama.py
Normal file
117
youtube_dl/extractor/hungama.py
Normal file
@ -0,0 +1,117 @@
|
||||
# coding: utf-8
|
||||
from __future__ import unicode_literals
|
||||
|
||||
from .common import InfoExtractor
|
||||
from ..utils import (
|
||||
int_or_none,
|
||||
urlencode_postdata,
|
||||
)
|
||||
|
||||
|
||||
class HungamaIE(InfoExtractor):
|
||||
_VALID_URL = r'''(?x)
|
||||
https?://
|
||||
(?:www\.)?hungama\.com/
|
||||
(?:
|
||||
(?:video|movie)/[^/]+/|
|
||||
tv-show/(?:[^/]+/){2}\d+/episode/[^/]+/
|
||||
)
|
||||
(?P<id>\d+)
|
||||
'''
|
||||
_TESTS = [{
|
||||
'url': 'http://www.hungama.com/video/krishna-chants/39349649/',
|
||||
'md5': 'a845a6d1ebd08d80c1035126d49bd6a0',
|
||||
'info_dict': {
|
||||
'id': '2931166',
|
||||
'ext': 'mp4',
|
||||
'title': 'Lucky Ali - Kitni Haseen Zindagi',
|
||||
'track': 'Kitni Haseen Zindagi',
|
||||
'artist': 'Lucky Ali',
|
||||
'album': 'Aks',
|
||||
'release_year': 2000,
|
||||
}
|
||||
}, {
|
||||
'url': 'https://www.hungama.com/movie/kahaani-2/44129919/',
|
||||
'only_matching': True,
|
||||
}, {
|
||||
'url': 'https://www.hungama.com/tv-show/padded-ki-pushup/season-1/44139461/episode/ep-02-training-sasu-pathlaag-karing/44139503/',
|
||||
'only_matching': True,
|
||||
}]
|
||||
|
||||
def _real_extract(self, url):
|
||||
video_id = self._match_id(url)
|
||||
|
||||
webpage = self._download_webpage(url, video_id)
|
||||
|
||||
info = self._search_json_ld(webpage, video_id)
|
||||
|
||||
m3u8_url = self._download_json(
|
||||
'https://www.hungama.com/index.php', video_id,
|
||||
data=urlencode_postdata({'content_id': video_id}), headers={
|
||||
'Content-Type': 'application/x-www-form-urlencoded; charset=UTF-8',
|
||||
'X-Requested-With': 'XMLHttpRequest',
|
||||
}, query={
|
||||
'c': 'common',
|
||||
'm': 'get_video_mdn_url',
|
||||
})['stream_url']
|
||||
|
||||
formats = self._extract_m3u8_formats(
|
||||
m3u8_url, video_id, ext='mp4', entry_protocol='m3u8_native',
|
||||
m3u8_id='hls')
|
||||
self._sort_formats(formats)
|
||||
|
||||
info.update({
|
||||
'id': video_id,
|
||||
'formats': formats,
|
||||
})
|
||||
return info
|
||||
|
||||
|
||||
class HungamaSongIE(InfoExtractor):
|
||||
_VALID_URL = r'https?://(?:www\.)?hungama\.com/song/[^/]+/(?P<id>\d+)'
|
||||
_TEST = {
|
||||
'url': 'https://www.hungama.com/song/kitni-haseen-zindagi/2931166/',
|
||||
'md5': 'a845a6d1ebd08d80c1035126d49bd6a0',
|
||||
'info_dict': {
|
||||
'id': '2931166',
|
||||
'ext': 'mp4',
|
||||
'title': 'Lucky Ali - Kitni Haseen Zindagi',
|
||||
'track': 'Kitni Haseen Zindagi',
|
||||
'artist': 'Lucky Ali',
|
||||
'album': 'Aks',
|
||||
'release_year': 2000,
|
||||
}
|
||||
}
|
||||
|
||||
def _real_extract(self, url):
|
||||
audio_id = self._match_id(url)
|
||||
|
||||
data = self._download_json(
|
||||
'https://www.hungama.com/audio-player-data/track/%s' % audio_id,
|
||||
audio_id, query={'_country': 'IN'})[0]
|
||||
|
||||
track = data['song_name']
|
||||
artist = data.get('singer_name')
|
||||
|
||||
m3u8_url = self._download_json(
|
||||
data.get('file') or data['preview_link'],
|
||||
audio_id)['response']['media_url']
|
||||
|
||||
formats = self._extract_m3u8_formats(
|
||||
m3u8_url, audio_id, ext='mp4', entry_protocol='m3u8_native',
|
||||
m3u8_id='hls')
|
||||
self._sort_formats(formats)
|
||||
|
||||
title = '%s - %s' % (artist, track) if artist else track
|
||||
thumbnail = data.get('img_src') or data.get('album_image')
|
||||
|
||||
return {
|
||||
'id': audio_id,
|
||||
'title': title,
|
||||
'thumbnail': thumbnail,
|
||||
'track': track,
|
||||
'artist': artist,
|
||||
'album': data.get('album_name'),
|
||||
'release_year': int_or_none(data.get('date')),
|
||||
'formats': formats,
|
||||
}
|
@ -7,8 +7,8 @@ from .common import InfoExtractor
|
||||
|
||||
|
||||
class JWPlatformIE(InfoExtractor):
|
||||
_VALID_URL = r'(?:https?://content\.jwplatform\.com/(?:feeds|players|jw6)/|jwplatform:)(?P<id>[a-zA-Z0-9]{8})'
|
||||
_TEST = {
|
||||
_VALID_URL = r'(?:https?://(?:content\.jwplatform|cdn\.jwplayer)\.com/(?:(?:feed|player|thumb|preview|video|manifest)s|jw6|v2/media)/|jwplatform:)(?P<id>[a-zA-Z0-9]{8})'
|
||||
_TESTS = [{
|
||||
'url': 'http://content.jwplatform.com/players/nPripu9l-ALJ3XQCI.js',
|
||||
'md5': 'fa8899fa601eb7c83a64e9d568bdf325',
|
||||
'info_dict': {
|
||||
@ -19,7 +19,10 @@ class JWPlatformIE(InfoExtractor):
|
||||
'upload_date': '20081127',
|
||||
'timestamp': 1227796140,
|
||||
}
|
||||
}
|
||||
}, {
|
||||
'url': 'https://cdn.jwplayer.com/players/nPripu9l-ALJ3XQCI.js',
|
||||
'only_matching': True,
|
||||
}]
|
||||
|
||||
@staticmethod
|
||||
def _extract_url(webpage):
|
||||
@ -34,5 +37,5 @@ class JWPlatformIE(InfoExtractor):
|
||||
|
||||
def _real_extract(self, url):
|
||||
video_id = self._match_id(url)
|
||||
json_data = self._download_json('http://content.jwplatform.com/feeds/%s.json' % video_id, video_id)
|
||||
json_data = self._download_json('https://cdn.jwplayer.com/v2/media/' + video_id, video_id)
|
||||
return self._parse_jwplayer_data(json_data, video_id)
|
||||
|
@ -363,7 +363,4 @@ class LivestreamShortenerIE(InfoExtractor):
|
||||
id = mobj.group('id')
|
||||
webpage = self._download_webpage(url, id)
|
||||
|
||||
return {
|
||||
'_type': 'url',
|
||||
'url': self._og_search_url(webpage),
|
||||
}
|
||||
return self.url_result(self._og_search_url(webpage))
|
||||
|
@ -2,12 +2,18 @@
|
||||
from __future__ import unicode_literals
|
||||
|
||||
from .common import InfoExtractor
|
||||
from ..utils import int_or_none
|
||||
from ..utils import (
|
||||
determine_ext,
|
||||
int_or_none,
|
||||
str_to_int,
|
||||
urlencode_postdata,
|
||||
)
|
||||
|
||||
|
||||
class ManyVidsIE(InfoExtractor):
|
||||
_VALID_URL = r'(?i)https?://(?:www\.)?manyvids\.com/video/(?P<id>\d+)'
|
||||
_TEST = {
|
||||
_TESTS = [{
|
||||
# preview video
|
||||
'url': 'https://www.manyvids.com/Video/133957/everthing-about-me/',
|
||||
'md5': '03f11bb21c52dd12a05be21a5c7dcc97',
|
||||
'info_dict': {
|
||||
@ -17,7 +23,18 @@ class ManyVidsIE(InfoExtractor):
|
||||
'view_count': int,
|
||||
'like_count': int,
|
||||
},
|
||||
}
|
||||
}, {
|
||||
# full video
|
||||
'url': 'https://www.manyvids.com/Video/935718/MY-FACE-REVEAL/',
|
||||
'md5': 'f3e8f7086409e9b470e2643edb96bdcc',
|
||||
'info_dict': {
|
||||
'id': '935718',
|
||||
'ext': 'mp4',
|
||||
'title': 'MY FACE REVEAL',
|
||||
'view_count': int,
|
||||
'like_count': int,
|
||||
},
|
||||
}]
|
||||
|
||||
def _real_extract(self, url):
|
||||
video_id = self._match_id(url)
|
||||
@ -28,12 +45,41 @@ class ManyVidsIE(InfoExtractor):
|
||||
r'data-(?:video-filepath|meta-video)\s*=s*(["\'])(?P<url>(?:(?!\1).)+)\1',
|
||||
webpage, 'video URL', group='url')
|
||||
|
||||
title = '%s (Preview)' % self._html_search_regex(
|
||||
r'<h2[^>]+class="m-a-0"[^>]*>([^<]+)', webpage, 'title')
|
||||
title = self._html_search_regex(
|
||||
(r'<span[^>]+class=["\']item-title[^>]+>([^<]+)',
|
||||
r'<h2[^>]+class=["\']h2 m-0["\'][^>]*>([^<]+)'),
|
||||
webpage, 'title', default=None) or self._html_search_meta(
|
||||
'twitter:title', webpage, 'title', fatal=True)
|
||||
|
||||
if any(p in webpage for p in ('preview_videos', '_preview.mp4')):
|
||||
title += ' (Preview)'
|
||||
|
||||
mv_token = self._search_regex(
|
||||
r'data-mvtoken=(["\'])(?P<value>(?:(?!\1).)+)\1', webpage,
|
||||
'mv token', default=None, group='value')
|
||||
|
||||
if mv_token:
|
||||
# Sets some cookies
|
||||
self._download_webpage(
|
||||
'https://www.manyvids.com/includes/ajax_repository/you_had_me_at_hello.php',
|
||||
video_id, fatal=False, data=urlencode_postdata({
|
||||
'mvtoken': mv_token,
|
||||
'vid': video_id,
|
||||
}), headers={
|
||||
'Referer': url,
|
||||
'X-Requested-With': 'XMLHttpRequest'
|
||||
})
|
||||
|
||||
if determine_ext(video_url) == 'm3u8':
|
||||
formats = self._extract_m3u8_formats(
|
||||
video_url, video_id, 'mp4', entry_protocol='m3u8_native',
|
||||
m3u8_id='hls')
|
||||
else:
|
||||
formats = [{'url': video_url}]
|
||||
|
||||
like_count = int_or_none(self._search_regex(
|
||||
r'data-likes=["\'](\d+)', webpage, 'like count', default=None))
|
||||
view_count = int_or_none(self._html_search_regex(
|
||||
view_count = str_to_int(self._html_search_regex(
|
||||
r'(?s)<span[^>]+class="views-wrapper"[^>]*>(.+?)</span', webpage,
|
||||
'view count', default=None))
|
||||
|
||||
@ -42,7 +88,5 @@ class ManyVidsIE(InfoExtractor):
|
||||
'title': title,
|
||||
'view_count': view_count,
|
||||
'like_count': like_count,
|
||||
'formats': [{
|
||||
'url': video_url,
|
||||
}],
|
||||
'formats': formats,
|
||||
}
|
||||
|
@ -1,15 +1,9 @@
|
||||
from __future__ import unicode_literals
|
||||
|
||||
import re
|
||||
|
||||
from .common import InfoExtractor
|
||||
from .adobepass import AdobePassIE
|
||||
from .theplatform import ThePlatformIE
|
||||
from ..utils import (
|
||||
smuggle_url,
|
||||
url_basename,
|
||||
update_url_query,
|
||||
get_element_by_class,
|
||||
)
|
||||
|
||||
|
||||
@ -64,132 +58,3 @@ class NationalGeographicVideoIE(InfoExtractor):
|
||||
{'force_smil_url': True}),
|
||||
'id': guid,
|
||||
}
|
||||
|
||||
|
||||
class NationalGeographicIE(ThePlatformIE, AdobePassIE):
|
||||
IE_NAME = 'natgeo'
|
||||
_VALID_URL = r'https?://channel\.nationalgeographic\.com/(?:(?:(?:wild/)?[^/]+/)?(?:videos|episodes)|u)/(?P<id>[^/?]+)'
|
||||
|
||||
_TESTS = [
|
||||
{
|
||||
'url': 'http://channel.nationalgeographic.com/u/kdi9Ld0PN2molUUIMSBGxoeDhD729KRjQcnxtetilWPMevo8ZwUBIDuPR0Q3D2LVaTsk0MPRkRWDB8ZhqWVeyoxfsZZm36yRp1j-zPfsHEyI_EgAeFY/',
|
||||
'md5': '518c9aa655686cf81493af5cc21e2a04',
|
||||
'info_dict': {
|
||||
'id': 'vKInpacll2pC',
|
||||
'ext': 'mp4',
|
||||
'title': 'Uncovering a Universal Knowledge',
|
||||
'description': 'md5:1a89148475bf931b3661fcd6ddb2ae3a',
|
||||
'timestamp': 1458680907,
|
||||
'upload_date': '20160322',
|
||||
'uploader': 'NEWA-FNG-NGTV',
|
||||
},
|
||||
'add_ie': ['ThePlatform'],
|
||||
},
|
||||
{
|
||||
'url': 'http://channel.nationalgeographic.com/u/kdvOstqYaBY-vSBPyYgAZRUL4sWUJ5XUUPEhc7ISyBHqoIO4_dzfY3K6EjHIC0hmFXoQ7Cpzm6RkET7S3oMlm6CFnrQwSUwo/',
|
||||
'md5': 'c4912f656b4cbe58f3e000c489360989',
|
||||
'info_dict': {
|
||||
'id': 'Pok5lWCkiEFA',
|
||||
'ext': 'mp4',
|
||||
'title': 'The Stunning Red Bird of Paradise',
|
||||
'description': 'md5:7bc8cd1da29686be4d17ad1230f0140c',
|
||||
'timestamp': 1459362152,
|
||||
'upload_date': '20160330',
|
||||
'uploader': 'NEWA-FNG-NGTV',
|
||||
},
|
||||
'add_ie': ['ThePlatform'],
|
||||
},
|
||||
{
|
||||
'url': 'http://channel.nationalgeographic.com/the-story-of-god-with-morgan-freeman/episodes/the-power-of-miracles/',
|
||||
'only_matching': True,
|
||||
},
|
||||
{
|
||||
'url': 'http://channel.nationalgeographic.com/videos/treasures-rediscovered/',
|
||||
'only_matching': True,
|
||||
},
|
||||
{
|
||||
'url': 'http://channel.nationalgeographic.com/the-story-of-god-with-morgan-freeman/videos/uncovering-a-universal-knowledge/',
|
||||
'only_matching': True,
|
||||
},
|
||||
{
|
||||
'url': 'http://channel.nationalgeographic.com/wild/destination-wild/videos/the-stunning-red-bird-of-paradise/',
|
||||
'only_matching': True,
|
||||
}
|
||||
]
|
||||
|
||||
def _real_extract(self, url):
|
||||
display_id = self._match_id(url)
|
||||
webpage = self._download_webpage(url, display_id)
|
||||
release_url = self._search_regex(
|
||||
r'video_auth_playlist_url\s*=\s*"([^"]+)"',
|
||||
webpage, 'release url')
|
||||
theplatform_path = self._search_regex(r'https?://link\.theplatform\.com/s/([^?]+)', release_url, 'theplatform path')
|
||||
video_id = theplatform_path.split('/')[-1]
|
||||
query = {
|
||||
'mbr': 'true',
|
||||
}
|
||||
is_auth = self._search_regex(r'video_is_auth\s*=\s*"([^"]+)"', webpage, 'is auth', fatal=False)
|
||||
if is_auth == 'auth':
|
||||
auth_resource_id = self._search_regex(
|
||||
r"video_auth_resourceId\s*=\s*'([^']+)'",
|
||||
webpage, 'auth resource id')
|
||||
query['auth'] = self._extract_mvpd_auth(url, video_id, 'natgeo', auth_resource_id)
|
||||
|
||||
formats = []
|
||||
subtitles = {}
|
||||
for key, value in (('switch', 'http'), ('manifest', 'm3u')):
|
||||
tp_query = query.copy()
|
||||
tp_query.update({
|
||||
key: value,
|
||||
})
|
||||
tp_formats, tp_subtitles = self._extract_theplatform_smil(
|
||||
update_url_query(release_url, tp_query), video_id, 'Downloading %s SMIL data' % value)
|
||||
formats.extend(tp_formats)
|
||||
subtitles = self._merge_subtitles(subtitles, tp_subtitles)
|
||||
self._sort_formats(formats)
|
||||
|
||||
info = self._extract_theplatform_metadata(theplatform_path, display_id)
|
||||
info.update({
|
||||
'id': video_id,
|
||||
'formats': formats,
|
||||
'subtitles': subtitles,
|
||||
'display_id': display_id,
|
||||
})
|
||||
return info
|
||||
|
||||
|
||||
class NationalGeographicEpisodeGuideIE(InfoExtractor):
|
||||
IE_NAME = 'natgeo:episodeguide'
|
||||
_VALID_URL = r'https?://channel\.nationalgeographic\.com/(?:wild/)?(?P<id>[^/]+)/episode-guide'
|
||||
_TESTS = [
|
||||
{
|
||||
'url': 'http://channel.nationalgeographic.com/the-story-of-god-with-morgan-freeman/episode-guide/',
|
||||
'info_dict': {
|
||||
'id': 'the-story-of-god-with-morgan-freeman-season-1',
|
||||
'title': 'The Story of God with Morgan Freeman - Season 1',
|
||||
},
|
||||
'playlist_mincount': 6,
|
||||
},
|
||||
{
|
||||
'url': 'http://channel.nationalgeographic.com/underworld-inc/episode-guide/?s=2',
|
||||
'info_dict': {
|
||||
'id': 'underworld-inc-season-2',
|
||||
'title': 'Underworld, Inc. - Season 2',
|
||||
},
|
||||
'playlist_mincount': 7,
|
||||
},
|
||||
]
|
||||
|
||||
def _real_extract(self, url):
|
||||
display_id = self._match_id(url)
|
||||
webpage = self._download_webpage(url, display_id)
|
||||
show = get_element_by_class('show', webpage)
|
||||
selected_season = self._search_regex(
|
||||
r'<div[^>]+class="select-seasons[^"]*".*?<a[^>]*>(.*?)</a>',
|
||||
webpage, 'selected season')
|
||||
entries = [
|
||||
self.url_result(self._proto_relative_url(entry_url), 'NationalGeographic')
|
||||
for entry_url in re.findall('(?s)<div[^>]+class="col-inner"[^>]*?>.*?<a[^>]+href="([^"]+)"', webpage)]
|
||||
return self.playlist_result(
|
||||
entries, '%s-%s' % (display_id, selected_season.lower().replace(' ', '-')),
|
||||
'%s - %s' % (show, selected_season))
|
||||
|
@ -363,7 +363,7 @@ class NPOIE(NPOBaseIE):
|
||||
|
||||
class NPOLiveIE(NPOBaseIE):
|
||||
IE_NAME = 'npo.nl:live'
|
||||
_VALID_URL = r'https?://(?:www\.)?npo\.nl/live(?:/(?P<id>[^/?#&]+))?'
|
||||
_VALID_URL = r'https?://(?:www\.)?npo(?:start)?\.nl/live(?:/(?P<id>[^/?#&]+))?'
|
||||
|
||||
_TESTS = [{
|
||||
'url': 'http://www.npo.nl/live/npo-1',
|
||||
@ -380,6 +380,9 @@ class NPOLiveIE(NPOBaseIE):
|
||||
}, {
|
||||
'url': 'http://www.npo.nl/live',
|
||||
'only_matching': True,
|
||||
}, {
|
||||
'url': 'https://www.npostart.nl/live/npo-1',
|
||||
'only_matching': True,
|
||||
}]
|
||||
|
||||
def _real_extract(self, url):
|
||||
|
28
youtube_dl/extractor/outsidetv.py
Normal file
28
youtube_dl/extractor/outsidetv.py
Normal file
@ -0,0 +1,28 @@
|
||||
# coding: utf-8
|
||||
from __future__ import unicode_literals
|
||||
|
||||
from .common import InfoExtractor
|
||||
|
||||
|
||||
class OutsideTVIE(InfoExtractor):
|
||||
_VALID_URL = r'https?://(?:www\.)?outsidetv\.com/(?:[^/]+/)*?play/[a-zA-Z0-9]{8}/\d+/\d+/(?P<id>[a-zA-Z0-9]{8})'
|
||||
_TESTS = [{
|
||||
'url': 'http://www.outsidetv.com/category/snow/play/ZjQYboH6/1/10/Hdg0jukV/4',
|
||||
'md5': '192d968fedc10b2f70ec31865ffba0da',
|
||||
'info_dict': {
|
||||
'id': 'Hdg0jukV',
|
||||
'ext': 'mp4',
|
||||
'title': 'Home - Jackson Ep 1 | Arbor Snowboards',
|
||||
'description': 'md5:41a12e94f3db3ca253b04bb1e8d8f4cd',
|
||||
'upload_date': '20181225',
|
||||
'timestamp': 1545742800,
|
||||
}
|
||||
}, {
|
||||
'url': 'http://www.outsidetv.com/home/play/ZjQYboH6/1/10/Hdg0jukV/4',
|
||||
'only_matching': True,
|
||||
}]
|
||||
|
||||
def _real_extract(self, url):
|
||||
jw_media_id = self._match_id(url)
|
||||
return self.url_result(
|
||||
'jwplatform:' + jw_media_id, 'JWPlatform', jw_media_id)
|
@ -24,9 +24,9 @@ class PacktPubBaseIE(InfoExtractor):
|
||||
|
||||
|
||||
class PacktPubIE(PacktPubBaseIE):
|
||||
_VALID_URL = r'https?://(?:www\.)?packtpub\.com/mapt/video/[^/]+/(?P<course_id>\d+)/(?P<chapter_id>\d+)/(?P<id>\d+)'
|
||||
_VALID_URL = r'https?://(?:(?:www\.)?packtpub\.com/mapt|subscription\.packtpub\.com)/video/[^/]+/(?P<course_id>\d+)/(?P<chapter_id>\d+)/(?P<id>\d+)'
|
||||
|
||||
_TEST = {
|
||||
_TESTS = [{
|
||||
'url': 'https://www.packtpub.com/mapt/video/web-development/9781787122215/20528/20530/Project+Intro',
|
||||
'md5': '1e74bd6cfd45d7d07666f4684ef58f70',
|
||||
'info_dict': {
|
||||
@ -37,7 +37,10 @@ class PacktPubIE(PacktPubBaseIE):
|
||||
'timestamp': 1490918400,
|
||||
'upload_date': '20170331',
|
||||
},
|
||||
}
|
||||
}, {
|
||||
'url': 'https://subscription.packtpub.com/video/web_development/9781787122215/20528/20530/project-intro',
|
||||
'only_matching': True,
|
||||
}]
|
||||
_NETRC_MACHINE = 'packtpub'
|
||||
_TOKEN = None
|
||||
|
||||
@ -110,15 +113,18 @@ class PacktPubIE(PacktPubBaseIE):
|
||||
|
||||
|
||||
class PacktPubCourseIE(PacktPubBaseIE):
|
||||
_VALID_URL = r'(?P<url>https?://(?:www\.)?packtpub\.com/mapt/video/[^/]+/(?P<id>\d+))'
|
||||
_TEST = {
|
||||
_VALID_URL = r'(?P<url>https?://(?:(?:www\.)?packtpub\.com/mapt|subscription\.packtpub\.com)/video/[^/]+/(?P<id>\d+))'
|
||||
_TESTS = [{
|
||||
'url': 'https://www.packtpub.com/mapt/video/web-development/9781787122215',
|
||||
'info_dict': {
|
||||
'id': '9781787122215',
|
||||
'title': 'Learn Nodejs by building 12 projects [Video]',
|
||||
},
|
||||
'playlist_count': 90,
|
||||
}
|
||||
}, {
|
||||
'url': 'https://subscription.packtpub.com/video/web_development/9781787122215',
|
||||
'only_matching': True,
|
||||
}]
|
||||
|
||||
@classmethod
|
||||
def suitable(cls, url):
|
||||
|
109
youtube_dl/extractor/playplustv.py
Normal file
109
youtube_dl/extractor/playplustv.py
Normal file
@ -0,0 +1,109 @@
|
||||
# coding: utf-8
|
||||
from __future__ import unicode_literals
|
||||
|
||||
import json
|
||||
import re
|
||||
|
||||
from .common import InfoExtractor
|
||||
from ..compat import compat_HTTPError
|
||||
from ..utils import (
|
||||
clean_html,
|
||||
ExtractorError,
|
||||
int_or_none,
|
||||
PUTRequest,
|
||||
)
|
||||
|
||||
|
||||
class PlayPlusTVIE(InfoExtractor):
|
||||
_VALID_URL = r'https?://(?:www\.)?playplus\.(?:com|tv)/VOD/(?P<project_id>[0-9]+)/(?P<id>[0-9a-f]{32})'
|
||||
_TEST = {
|
||||
'url': 'https://www.playplus.tv/VOD/7572/db8d274a5163424e967f35a30ddafb8e',
|
||||
'md5': 'd078cb89d7ab6b9df37ce23c647aef72',
|
||||
'info_dict': {
|
||||
'id': 'db8d274a5163424e967f35a30ddafb8e',
|
||||
'ext': 'mp4',
|
||||
'title': 'Capítulo 179 - Final',
|
||||
'description': 'md5:01085d62d8033a1e34121d3c3cabc838',
|
||||
'timestamp': 1529992740,
|
||||
'upload_date': '20180626',
|
||||
},
|
||||
'skip': 'Requires account credential',
|
||||
}
|
||||
_NETRC_MACHINE = 'playplustv'
|
||||
_GEO_COUNTRIES = ['BR']
|
||||
_token = None
|
||||
_profile_id = None
|
||||
|
||||
def _call_api(self, resource, video_id=None, query=None):
|
||||
return self._download_json('https://api.playplus.tv/api/media/v2/get' + resource, video_id, headers={
|
||||
'Authorization': 'Bearer ' + self._token,
|
||||
}, query=query)
|
||||
|
||||
def _real_initialize(self):
|
||||
email, password = self._get_login_info()
|
||||
if email is None:
|
||||
self.raise_login_required()
|
||||
|
||||
req = PUTRequest(
|
||||
'https://api.playplus.tv/api/web/login', json.dumps({
|
||||
'email': email,
|
||||
'password': password,
|
||||
}).encode(), {
|
||||
'Content-Type': 'application/json; charset=utf-8',
|
||||
})
|
||||
|
||||
try:
|
||||
self._token = self._download_json(req, None)['token']
|
||||
except ExtractorError as e:
|
||||
if isinstance(e.cause, compat_HTTPError) and e.cause.code == 401:
|
||||
raise ExtractorError(self._parse_json(
|
||||
e.cause.read(), None)['errorMessage'], expected=True)
|
||||
raise
|
||||
|
||||
self._profile = self._call_api('Profiles')['list'][0]['_id']
|
||||
|
||||
def _real_extract(self, url):
|
||||
project_id, media_id = re.match(self._VALID_URL, url).groups()
|
||||
media = self._call_api(
|
||||
'Media', media_id, {
|
||||
'profileId': self._profile,
|
||||
'projectId': project_id,
|
||||
'mediaId': media_id,
|
||||
})['obj']
|
||||
title = media['title']
|
||||
|
||||
formats = []
|
||||
for f in media.get('files', []):
|
||||
f_url = f.get('url')
|
||||
if not f_url:
|
||||
continue
|
||||
file_info = f.get('fileInfo') or {}
|
||||
formats.append({
|
||||
'url': f_url,
|
||||
'width': int_or_none(file_info.get('width')),
|
||||
'height': int_or_none(file_info.get('height')),
|
||||
})
|
||||
self._sort_formats(formats)
|
||||
|
||||
thumbnails = []
|
||||
for thumb in media.get('thumbs', []):
|
||||
thumb_url = thumb.get('url')
|
||||
if not thumb_url:
|
||||
continue
|
||||
thumbnails.append({
|
||||
'url': thumb_url,
|
||||
'width': int_or_none(thumb.get('width')),
|
||||
'height': int_or_none(thumb.get('height')),
|
||||
})
|
||||
|
||||
return {
|
||||
'id': media_id,
|
||||
'title': title,
|
||||
'formats': formats,
|
||||
'thumbnails': thumbnails,
|
||||
'description': clean_html(media.get('description')) or media.get('shortDescription'),
|
||||
'timestamp': int_or_none(media.get('publishDate'), 1000),
|
||||
'view_count': int_or_none(media.get('numberOfViews')),
|
||||
'comment_count': int_or_none(media.get('numberOfComments')),
|
||||
'tags': media.get('tags'),
|
||||
}
|
@ -1,38 +1,46 @@
|
||||
# coding: utf-8
|
||||
from __future__ import unicode_literals
|
||||
|
||||
import re
|
||||
|
||||
from .common import InfoExtractor
|
||||
from .brightcove import BrightcoveLegacyIE
|
||||
from ..compat import (
|
||||
compat_parse_qs,
|
||||
compat_urlparse,
|
||||
)
|
||||
from ..utils import smuggle_url
|
||||
|
||||
|
||||
class RMCDecouverteIE(InfoExtractor):
|
||||
_VALID_URL = r'https?://rmcdecouverte\.bfmtv\.com/mediaplayer-replay.*?\bid=(?P<id>\d+)'
|
||||
_VALID_URL = r'https?://rmcdecouverte\.bfmtv\.com/(?:(?:[^/]+/)*program_(?P<id>\d+)|(?P<live_id>mediaplayer-direct))'
|
||||
|
||||
_TEST = {
|
||||
'url': 'http://rmcdecouverte.bfmtv.com/mediaplayer-replay/?id=13502&title=AQUAMEN:LES%20ROIS%20DES%20AQUARIUMS%20:UN%20DELICIEUX%20PROJET',
|
||||
_TESTS = [{
|
||||
'url': 'https://rmcdecouverte.bfmtv.com/wheeler-dealers-occasions-a-saisir/program_2566/',
|
||||
'info_dict': {
|
||||
'id': '5419055995001',
|
||||
'id': '5983675500001',
|
||||
'ext': 'mp4',
|
||||
'title': 'UN DELICIEUX PROJET',
|
||||
'description': 'md5:63610df7c8b1fc1698acd4d0d90ba8b5',
|
||||
'title': 'CORVETTE',
|
||||
'description': 'md5:c1e8295521e45ffebf635d6a7658f506',
|
||||
'uploader_id': '1969646226001',
|
||||
'upload_date': '20170502',
|
||||
'timestamp': 1493745308,
|
||||
'upload_date': '20181226',
|
||||
'timestamp': 1545861635,
|
||||
},
|
||||
'params': {
|
||||
'skip_download': True,
|
||||
},
|
||||
'skip': 'only available for a week',
|
||||
}
|
||||
}, {
|
||||
# live, geo restricted, bypassable
|
||||
'url': 'https://rmcdecouverte.bfmtv.com/mediaplayer-direct/',
|
||||
'only_matching': True,
|
||||
}]
|
||||
BRIGHTCOVE_URL_TEMPLATE = 'http://players.brightcove.net/1969646226001/default_default/index.html?videoId=%s'
|
||||
|
||||
def _real_extract(self, url):
|
||||
video_id = self._match_id(url)
|
||||
webpage = self._download_webpage(url, video_id)
|
||||
mobj = re.match(self._VALID_URL, url)
|
||||
display_id = mobj.group('id') or mobj.group('live_id')
|
||||
webpage = self._download_webpage(url, display_id)
|
||||
brightcove_legacy_url = BrightcoveLegacyIE._extract_brightcove_url(webpage)
|
||||
if brightcove_legacy_url:
|
||||
brightcove_id = compat_parse_qs(compat_urlparse.urlparse(
|
||||
@ -41,5 +49,7 @@ class RMCDecouverteIE(InfoExtractor):
|
||||
brightcove_id = self._search_regex(
|
||||
r'data-video-id=["\'](\d+)', webpage, 'brightcove id')
|
||||
return self.url_result(
|
||||
self.BRIGHTCOVE_URL_TEMPLATE % brightcove_id, 'BrightcoveNew',
|
||||
brightcove_id)
|
||||
smuggle_url(
|
||||
self.BRIGHTCOVE_URL_TEMPLATE % brightcove_id,
|
||||
{'geo_countries': ['FR']}),
|
||||
'BrightcoveNew', brightcove_id)
|
||||
|
@ -30,8 +30,5 @@ class SaveFromIE(InfoExtractor):
|
||||
def _real_extract(self, url):
|
||||
mobj = re.match(self._VALID_URL, url)
|
||||
video_id = os.path.splitext(url.split('/')[-1])[0]
|
||||
return {
|
||||
'_type': 'url',
|
||||
'id': video_id,
|
||||
'url': mobj.group('url'),
|
||||
}
|
||||
|
||||
return self.url_result(mobj.group('url'), video_id=video_id)
|
||||
|
@ -203,10 +203,8 @@ class TEDIE(InfoExtractor):
|
||||
ext_url = None
|
||||
if service.lower() == 'youtube':
|
||||
ext_url = external.get('code')
|
||||
return {
|
||||
'_type': 'url',
|
||||
'url': ext_url or external['uri'],
|
||||
}
|
||||
|
||||
return self.url_result(ext_url or external['uri'])
|
||||
|
||||
resources_ = player_talk.get('resources') or talk_info.get('resources')
|
||||
|
||||
|
@ -61,8 +61,4 @@ class TestURLIE(InfoExtractor):
|
||||
|
||||
self.to_screen('Test URL: %s' % tc['url'])
|
||||
|
||||
return {
|
||||
'_type': 'url',
|
||||
'url': tc['url'],
|
||||
'id': video_id,
|
||||
}
|
||||
return self.url_result(tc['url'], video_id=video_id)
|
||||
|
@ -10,8 +10,9 @@ from ..utils import (
|
||||
int_or_none,
|
||||
parse_iso8601,
|
||||
parse_duration,
|
||||
try_get,
|
||||
str_or_none,
|
||||
update_url_query,
|
||||
urljoin,
|
||||
)
|
||||
|
||||
|
||||
@ -24,8 +25,7 @@ class TVNowBaseIE(InfoExtractor):
|
||||
|
||||
def _call_api(self, path, video_id, query):
|
||||
return self._download_json(
|
||||
'https://api.tvnow.de/v3/' + path,
|
||||
video_id, query=query)
|
||||
'https://api.tvnow.de/v3/' + path, video_id, query=query)
|
||||
|
||||
def _extract_video(self, info, display_id):
|
||||
video_id = compat_str(info['id'])
|
||||
@ -108,6 +108,11 @@ class TVNowIE(TVNowBaseIE):
|
||||
(?!(?:list|jahr)(?:/|$))(?P<id>[^/?\#&]+)
|
||||
'''
|
||||
|
||||
@classmethod
|
||||
def suitable(cls, url):
|
||||
return (False if TVNowNewIE.suitable(url) or TVNowSeasonIE.suitable(url) or TVNowAnnualIE.suitable(url) or TVNowShowIE.suitable(url)
|
||||
else super(TVNowIE, cls).suitable(url))
|
||||
|
||||
_TESTS = [{
|
||||
'url': 'https://www.tvnow.de/rtl2/grip-das-motormagazin/der-neue-porsche-911-gt-3/player',
|
||||
'info_dict': {
|
||||
@ -116,7 +121,6 @@ class TVNowIE(TVNowBaseIE):
|
||||
'ext': 'mp4',
|
||||
'title': 'Der neue Porsche 911 GT 3',
|
||||
'description': 'md5:6143220c661f9b0aae73b245e5d898bb',
|
||||
'thumbnail': r're:^https?://.*\.jpg$',
|
||||
'timestamp': 1495994400,
|
||||
'upload_date': '20170528',
|
||||
'duration': 5283,
|
||||
@ -161,136 +165,314 @@ class TVNowIE(TVNowBaseIE):
|
||||
info = self._call_api(
|
||||
'movies/' + display_id, display_id, query={
|
||||
'fields': ','.join(self._VIDEO_FIELDS),
|
||||
'station': mobj.group(1),
|
||||
})
|
||||
|
||||
return self._extract_video(info, display_id)
|
||||
|
||||
|
||||
class TVNowListBaseIE(TVNowBaseIE):
|
||||
_SHOW_VALID_URL = r'''(?x)
|
||||
(?P<base_url>
|
||||
https?://
|
||||
(?:www\.)?tvnow\.(?:de|at|ch)/[^/]+/
|
||||
(?P<show_id>[^/]+)
|
||||
)
|
||||
class TVNowNewIE(InfoExtractor):
|
||||
_VALID_URL = r'''(?x)
|
||||
(?P<base_url>https?://
|
||||
(?:www\.)?tvnow\.(?:de|at|ch)/
|
||||
(?:shows|serien))/
|
||||
(?P<show>[^/]+)-\d+/
|
||||
[^/]+/
|
||||
episode-\d+-(?P<episode>[^/?$&]+)-(?P<id>\d+)
|
||||
'''
|
||||
|
||||
def _extract_list_info(self, display_id, show_id):
|
||||
fields = list(self._SHOW_FIELDS)
|
||||
fields.extend('formatTabs.%s' % field for field in self._SEASON_FIELDS)
|
||||
fields.extend(
|
||||
'formatTabs.formatTabPages.container.movies.%s' % field
|
||||
for field in self._VIDEO_FIELDS)
|
||||
return self._call_api(
|
||||
'formats/seo', display_id, query={
|
||||
'fields': ','.join(fields),
|
||||
'name': show_id + '.php'
|
||||
})
|
||||
|
||||
|
||||
class TVNowListIE(TVNowListBaseIE):
|
||||
_VALID_URL = r'%s/(?:list|jahr)/(?P<id>[^?\#&]+)' % TVNowListBaseIE._SHOW_VALID_URL
|
||||
|
||||
_SHOW_FIELDS = ('title', )
|
||||
_SEASON_FIELDS = ('id', 'headline', 'seoheadline', )
|
||||
_VIDEO_FIELDS = ('id', 'headline', 'seoUrl', )
|
||||
|
||||
_TESTS = [{
|
||||
'url': 'https://www.tvnow.de/rtl/30-minuten-deutschland/list/aktuell',
|
||||
'info_dict': {
|
||||
'id': '28296',
|
||||
'title': '30 Minuten Deutschland - Aktuell',
|
||||
},
|
||||
'playlist_mincount': 1,
|
||||
}, {
|
||||
'url': 'https://www.tvnow.de/vox/ab-ins-beet/list/staffel-14',
|
||||
'only_matching': True,
|
||||
}, {
|
||||
'url': 'https://www.tvnow.de/rtl2/grip-das-motormagazin/jahr/2018/3',
|
||||
'url': 'https://www.tvnow.de/shows/grip-das-motormagazin-1669/2017-05/episode-405-der-neue-porsche-911-gt-3-331082',
|
||||
'only_matching': True,
|
||||
}]
|
||||
|
||||
@classmethod
|
||||
def suitable(cls, url):
|
||||
return (False if TVNowIE.suitable(url)
|
||||
else super(TVNowListIE, cls).suitable(url))
|
||||
def _real_extract(self, url):
|
||||
mobj = re.match(self._VALID_URL, url)
|
||||
base_url = re.sub(r'(?:shows|serien)', '_', mobj.group('base_url'))
|
||||
show, episode = mobj.group('show', 'episode')
|
||||
return self.url_result(
|
||||
# Rewrite new URLs to the old format and use extraction via old API
|
||||
# at api.tvnow.de as a loophole for bypassing premium content checks
|
||||
'%s/%s/%s' % (base_url, show, episode),
|
||||
ie=TVNowIE.ie_key(), video_id=mobj.group('id'))
|
||||
|
||||
|
||||
class TVNowNewBaseIE(InfoExtractor):
|
||||
def _call_api(self, path, video_id, query={}):
|
||||
result = self._download_json(
|
||||
'https://apigw.tvnow.de/module/' + path, video_id, query=query)
|
||||
error = result.get('error')
|
||||
if error:
|
||||
raise ExtractorError(
|
||||
'%s said: %s' % (self.IE_NAME, error), expected=True)
|
||||
return result
|
||||
|
||||
|
||||
"""
|
||||
TODO: new apigw.tvnow.de based version of TVNowIE. Replace old TVNowIE with it
|
||||
when api.tvnow.de is shut down. This version can't bypass premium checks though.
|
||||
class TVNowIE(TVNowNewBaseIE):
|
||||
_VALID_URL = r'''(?x)
|
||||
https?://
|
||||
(?:www\.)?tvnow\.(?:de|at|ch)/
|
||||
(?:shows|serien)/[^/]+/
|
||||
(?:[^/]+/)+
|
||||
(?P<display_id>[^/?$&]+)-(?P<id>\d+)
|
||||
'''
|
||||
|
||||
_TESTS = [{
|
||||
# episode with annual navigation
|
||||
'url': 'https://www.tvnow.de/shows/grip-das-motormagazin-1669/2017-05/episode-405-der-neue-porsche-911-gt-3-331082',
|
||||
'info_dict': {
|
||||
'id': '331082',
|
||||
'display_id': 'grip-das-motormagazin/der-neue-porsche-911-gt-3',
|
||||
'ext': 'mp4',
|
||||
'title': 'Der neue Porsche 911 GT 3',
|
||||
'description': 'md5:6143220c661f9b0aae73b245e5d898bb',
|
||||
'thumbnail': r're:^https?://.*\.jpg$',
|
||||
'timestamp': 1495994400,
|
||||
'upload_date': '20170528',
|
||||
'duration': 5283,
|
||||
'series': 'GRIP - Das Motormagazin',
|
||||
'season_number': 14,
|
||||
'episode_number': 405,
|
||||
'episode': 'Der neue Porsche 911 GT 3',
|
||||
},
|
||||
}, {
|
||||
# rtl2, episode with season navigation
|
||||
'url': 'https://www.tvnow.de/shows/armes-deutschland-11471/staffel-3/episode-14-bernd-steht-seit-der-trennung-von-seiner-frau-allein-da-526124',
|
||||
'only_matching': True,
|
||||
}, {
|
||||
# rtlnitro
|
||||
'url': 'https://www.tvnow.de/serien/alarm-fuer-cobra-11-die-autobahnpolizei-1815/staffel-13/episode-5-auf-eigene-faust-pilot-366822',
|
||||
'only_matching': True,
|
||||
}, {
|
||||
# superrtl
|
||||
'url': 'https://www.tvnow.de/shows/die-lustigsten-schlamassel-der-welt-1221/staffel-2/episode-14-u-a-ketchup-effekt-364120',
|
||||
'only_matching': True,
|
||||
}, {
|
||||
# ntv
|
||||
'url': 'https://www.tvnow.de/shows/startup-news-10674/staffel-2/episode-39-goetter-in-weiss-387630',
|
||||
'only_matching': True,
|
||||
}, {
|
||||
# vox
|
||||
'url': 'https://www.tvnow.de/shows/auto-mobil-174/2017-11/episode-46-neues-vom-automobilmarkt-2017-11-19-17-00-00-380072',
|
||||
'only_matching': True,
|
||||
}, {
|
||||
'url': 'https://www.tvnow.de/shows/grip-das-motormagazin-1669/2017-05/episode-405-der-neue-porsche-911-gt-3-331082',
|
||||
'only_matching': True,
|
||||
}]
|
||||
|
||||
def _extract_video(self, info, url, display_id):
|
||||
config = info['config']
|
||||
source = config['source']
|
||||
|
||||
video_id = compat_str(info.get('id') or source['videoId'])
|
||||
title = source['title'].strip()
|
||||
|
||||
paths = []
|
||||
for manifest_url in (info.get('manifest') or {}).values():
|
||||
if not manifest_url:
|
||||
continue
|
||||
manifest_url = update_url_query(manifest_url, {'filter': ''})
|
||||
path = self._search_regex(r'https?://[^/]+/(.+?)\.ism/', manifest_url, 'path')
|
||||
if path in paths:
|
||||
continue
|
||||
paths.append(path)
|
||||
|
||||
def url_repl(proto, suffix):
|
||||
return re.sub(
|
||||
r'(?:hls|dash|hss)([.-])', proto + r'\1', re.sub(
|
||||
r'\.ism/(?:[^.]*\.(?:m3u8|mpd)|[Mm]anifest)',
|
||||
'.ism/' + suffix, manifest_url))
|
||||
|
||||
formats = self._extract_mpd_formats(
|
||||
url_repl('dash', '.mpd'), video_id,
|
||||
mpd_id='dash', fatal=False)
|
||||
formats.extend(self._extract_ism_formats(
|
||||
url_repl('hss', 'Manifest'),
|
||||
video_id, ism_id='mss', fatal=False))
|
||||
formats.extend(self._extract_m3u8_formats(
|
||||
url_repl('hls', '.m3u8'), video_id, 'mp4',
|
||||
'm3u8_native', m3u8_id='hls', fatal=False))
|
||||
if formats:
|
||||
break
|
||||
else:
|
||||
if try_get(info, lambda x: x['rights']['isDrm']):
|
||||
raise ExtractorError(
|
||||
'Video %s is DRM protected' % video_id, expected=True)
|
||||
if try_get(config, lambda x: x['boards']['geoBlocking']['block']):
|
||||
raise self.raise_geo_restricted()
|
||||
if not info.get('free', True):
|
||||
raise ExtractorError(
|
||||
'Video %s is not available for free' % video_id, expected=True)
|
||||
self._sort_formats(formats)
|
||||
|
||||
description = source.get('description')
|
||||
thumbnail = url_or_none(source.get('poster'))
|
||||
timestamp = unified_timestamp(source.get('previewStart'))
|
||||
duration = parse_duration(source.get('length'))
|
||||
|
||||
series = source.get('format')
|
||||
season_number = int_or_none(self._search_regex(
|
||||
r'staffel-(\d+)', url, 'season number', default=None))
|
||||
episode_number = int_or_none(self._search_regex(
|
||||
r'episode-(\d+)', url, 'episode number', default=None))
|
||||
|
||||
return {
|
||||
'id': video_id,
|
||||
'display_id': display_id,
|
||||
'title': title,
|
||||
'description': description,
|
||||
'thumbnail': thumbnail,
|
||||
'timestamp': timestamp,
|
||||
'duration': duration,
|
||||
'series': series,
|
||||
'season_number': season_number,
|
||||
'episode_number': episode_number,
|
||||
'episode': title,
|
||||
'formats': formats,
|
||||
}
|
||||
|
||||
def _real_extract(self, url):
|
||||
base_url, show_id, season_id = re.match(self._VALID_URL, url).groups()
|
||||
display_id, video_id = re.match(self._VALID_URL, url).groups()
|
||||
info = self._call_api('player/' + video_id, video_id)
|
||||
return self._extract_video(info, video_id, display_id)
|
||||
"""
|
||||
|
||||
list_info = self._extract_list_info(season_id, show_id)
|
||||
|
||||
season = next(
|
||||
season for season in list_info['formatTabs']['items']
|
||||
if season.get('seoheadline') == season_id)
|
||||
class TVNowListBaseIE(TVNowNewBaseIE):
|
||||
_SHOW_VALID_URL = r'''(?x)
|
||||
(?P<base_url>
|
||||
https?://
|
||||
(?:www\.)?tvnow\.(?:de|at|ch)/(?:shows|serien)/
|
||||
[^/?#&]+-(?P<show_id>\d+)
|
||||
)
|
||||
'''
|
||||
|
||||
title = list_info.get('title')
|
||||
headline = season.get('headline')
|
||||
if title and headline:
|
||||
title = '%s - %s' % (title, headline)
|
||||
else:
|
||||
title = headline or title
|
||||
@classmethod
|
||||
def suitable(cls, url):
|
||||
return (False if TVNowNewIE.suitable(url)
|
||||
else super(TVNowListBaseIE, cls).suitable(url))
|
||||
|
||||
def _extract_items(self, url, show_id, list_id, query):
|
||||
items = self._call_api(
|
||||
'teaserrow/format/episode/' + show_id, list_id,
|
||||
query=query)['items']
|
||||
|
||||
entries = []
|
||||
for container in season['formatTabPages']['items']:
|
||||
items = try_get(
|
||||
container, lambda x: x['container']['movies']['items'],
|
||||
list) or []
|
||||
for info in items:
|
||||
seo_url = info.get('seoUrl')
|
||||
if not seo_url:
|
||||
continue
|
||||
video_id = info.get('id')
|
||||
entries.append(self.url_result(
|
||||
'%s/%s/player' % (base_url, seo_url), TVNowIE.ie_key(),
|
||||
compat_str(video_id) if video_id else None))
|
||||
for item in items:
|
||||
if not isinstance(item, dict):
|
||||
continue
|
||||
item_url = urljoin(url, item.get('url'))
|
||||
if not item_url:
|
||||
continue
|
||||
video_id = str_or_none(item.get('id') or item.get('videoId'))
|
||||
item_title = item.get('subheadline') or item.get('text')
|
||||
entries.append(self.url_result(
|
||||
item_url, ie=TVNowNewIE.ie_key(), video_id=video_id,
|
||||
video_title=item_title))
|
||||
|
||||
return self.playlist_result(
|
||||
entries, compat_str(season.get('id') or season_id), title)
|
||||
return self.playlist_result(entries, '%s/%s' % (show_id, list_id))
|
||||
|
||||
|
||||
class TVNowSeasonIE(TVNowListBaseIE):
|
||||
_VALID_URL = r'%s/staffel-(?P<id>\d+)' % TVNowListBaseIE._SHOW_VALID_URL
|
||||
_TESTS = [{
|
||||
'url': 'https://www.tvnow.de/serien/alarm-fuer-cobra-11-die-autobahnpolizei-1815/staffel-13',
|
||||
'info_dict': {
|
||||
'id': '1815/13',
|
||||
},
|
||||
'playlist_mincount': 22,
|
||||
}]
|
||||
|
||||
def _real_extract(self, url):
|
||||
_, show_id, season_id = re.match(self._VALID_URL, url).groups()
|
||||
return self._extract_items(
|
||||
url, show_id, season_id, {'season': season_id})
|
||||
|
||||
|
||||
class TVNowAnnualIE(TVNowListBaseIE):
|
||||
_VALID_URL = r'%s/(?P<year>\d{4})-(?P<month>\d{2})' % TVNowListBaseIE._SHOW_VALID_URL
|
||||
_TESTS = [{
|
||||
'url': 'https://www.tvnow.de/shows/grip-das-motormagazin-1669/2017-05',
|
||||
'info_dict': {
|
||||
'id': '1669/2017-05',
|
||||
},
|
||||
'playlist_mincount': 2,
|
||||
}]
|
||||
|
||||
def _real_extract(self, url):
|
||||
_, show_id, year, month = re.match(self._VALID_URL, url).groups()
|
||||
return self._extract_items(
|
||||
url, show_id, '%s-%s' % (year, month), {
|
||||
'year': int(year),
|
||||
'month': int(month),
|
||||
})
|
||||
|
||||
|
||||
class TVNowShowIE(TVNowListBaseIE):
|
||||
_VALID_URL = TVNowListBaseIE._SHOW_VALID_URL
|
||||
|
||||
_SHOW_FIELDS = ('id', 'title', )
|
||||
_SEASON_FIELDS = ('id', 'headline', 'seoheadline', )
|
||||
_VIDEO_FIELDS = ()
|
||||
|
||||
_TESTS = [{
|
||||
'url': 'https://www.tvnow.at/vox/ab-ins-beet',
|
||||
# annual navigationType
|
||||
'url': 'https://www.tvnow.de/shows/grip-das-motormagazin-1669',
|
||||
'info_dict': {
|
||||
'id': 'ab-ins-beet',
|
||||
'title': 'Ab ins Beet!',
|
||||
'id': '1669',
|
||||
},
|
||||
'playlist_mincount': 7,
|
||||
'playlist_mincount': 73,
|
||||
}, {
|
||||
'url': 'https://www.tvnow.at/vox/ab-ins-beet/list',
|
||||
'only_matching': True,
|
||||
}, {
|
||||
'url': 'https://www.tvnow.de/rtl2/grip-das-motormagazin/jahr/',
|
||||
'only_matching': True,
|
||||
# season navigationType
|
||||
'url': 'https://www.tvnow.de/shows/armes-deutschland-11471',
|
||||
'info_dict': {
|
||||
'id': '11471',
|
||||
},
|
||||
'playlist_mincount': 3,
|
||||
}]
|
||||
|
||||
@classmethod
|
||||
def suitable(cls, url):
|
||||
return (False if TVNowIE.suitable(url) or TVNowListIE.suitable(url)
|
||||
return (False if TVNowNewIE.suitable(url) or TVNowSeasonIE.suitable(url) or TVNowAnnualIE.suitable(url)
|
||||
else super(TVNowShowIE, cls).suitable(url))
|
||||
|
||||
def _real_extract(self, url):
|
||||
base_url, show_id = re.match(self._VALID_URL, url).groups()
|
||||
|
||||
list_info = self._extract_list_info(show_id, show_id)
|
||||
result = self._call_api(
|
||||
'teaserrow/format/navigation/' + show_id, show_id)
|
||||
|
||||
items = result['items']
|
||||
|
||||
entries = []
|
||||
for season_info in list_info['formatTabs']['items']:
|
||||
season_url = season_info.get('seoheadline')
|
||||
if not season_url:
|
||||
continue
|
||||
season_id = season_info.get('id')
|
||||
entries.append(self.url_result(
|
||||
'%s/list/%s' % (base_url, season_url), TVNowListIE.ie_key(),
|
||||
compat_str(season_id) if season_id else None,
|
||||
season_info.get('headline')))
|
||||
navigation = result.get('navigationType')
|
||||
if navigation == 'annual':
|
||||
for item in items:
|
||||
if not isinstance(item, dict):
|
||||
continue
|
||||
year = int_or_none(item.get('year'))
|
||||
if year is None:
|
||||
continue
|
||||
months = item.get('months')
|
||||
if not isinstance(months, list):
|
||||
continue
|
||||
for month_dict in months:
|
||||
if not isinstance(month_dict, dict) or not month_dict:
|
||||
continue
|
||||
month_number = int_or_none(list(month_dict.keys())[0])
|
||||
if month_number is None:
|
||||
continue
|
||||
entries.append(self.url_result(
|
||||
'%s/%04d-%02d' % (base_url, year, month_number),
|
||||
ie=TVNowAnnualIE.ie_key()))
|
||||
elif navigation == 'season':
|
||||
for item in items:
|
||||
if not isinstance(item, dict):
|
||||
continue
|
||||
season_number = int_or_none(item.get('season'))
|
||||
if season_number is None:
|
||||
continue
|
||||
entries.append(self.url_result(
|
||||
'%s/staffel-%d' % (base_url, season_number),
|
||||
ie=TVNowSeasonIE.ie_key()))
|
||||
else:
|
||||
raise ExtractorError('Unknown navigationType')
|
||||
|
||||
return self.playlist_result(entries, show_id, list_info.get('title'))
|
||||
return self.playlist_result(entries, show_id)
|
||||
|
@ -40,11 +40,7 @@ class WimpIE(InfoExtractor):
|
||||
r'data-id=["\']([0-9A-Za-z_-]{11})'),
|
||||
webpage, 'video URL', default=None)
|
||||
if youtube_id:
|
||||
return {
|
||||
'_type': 'url',
|
||||
'url': youtube_id,
|
||||
'ie_key': YoutubeIE.ie_key(),
|
||||
}
|
||||
return self.url_result(youtube_id, YoutubeIE.ie_key())
|
||||
|
||||
info_dict = self._extract_jwplayer_data(
|
||||
webpage, video_id, require_title=False)
|
||||
|
@ -12,7 +12,7 @@ from ..utils import (
|
||||
|
||||
|
||||
class WistiaIE(InfoExtractor):
|
||||
_VALID_URL = r'(?:wistia:|https?://(?:fast\.)?wistia\.(?:net|com)/embed/iframe/)(?P<id>[a-z0-9]+)'
|
||||
_VALID_URL = r'(?:wistia:|https?://(?:fast\.)?wistia\.(?:net|com)/embed/(?:iframe|medias)/)(?P<id>[a-z0-9]+)'
|
||||
_API_URL = 'http://fast.wistia.com/embed/medias/%s.json'
|
||||
_IFRAME_URL = 'http://fast.wistia.net/embed/iframe/%s'
|
||||
|
||||
@ -38,6 +38,9 @@ class WistiaIE(InfoExtractor):
|
||||
}, {
|
||||
'url': 'http://fast.wistia.com/embed/iframe/sh7fpupwlt',
|
||||
'only_matching': True,
|
||||
}, {
|
||||
'url': 'http://fast.wistia.net/embed/medias/sh7fpupwlt.json',
|
||||
'only_matching': True,
|
||||
}]
|
||||
|
||||
@staticmethod
|
||||
|
@ -68,11 +68,9 @@ class YouPornIE(InfoExtractor):
|
||||
request.add_header('Cookie', 'age_verified=1')
|
||||
webpage = self._download_webpage(request, display_id)
|
||||
|
||||
title = self._search_regex(
|
||||
[r'(?:video_titles|videoTitle)\s*[:=]\s*(["\'])(?P<title>(?:(?!\1).)+)\1',
|
||||
r'<h1[^>]+class=["\']heading\d?["\'][^>]*>(?P<title>[^<]+)<'],
|
||||
webpage, 'title', group='title',
|
||||
default=None) or self._og_search_title(
|
||||
title = self._html_search_regex(
|
||||
r'(?s)<div[^>]+class=["\']watchVideoTitle[^>]+>(.+?)</div>',
|
||||
webpage, 'title', default=None) or self._og_search_title(
|
||||
webpage, default=None) or self._html_search_meta(
|
||||
'title', webpage, fatal=True)
|
||||
|
||||
@ -134,7 +132,11 @@ class YouPornIE(InfoExtractor):
|
||||
formats.append(f)
|
||||
self._sort_formats(formats)
|
||||
|
||||
description = self._og_search_description(webpage, default=None)
|
||||
description = self._html_search_regex(
|
||||
r'(?s)<div[^>]+\bid=["\']description["\'][^>]*>(.+?)</div>',
|
||||
webpage, 'description',
|
||||
default=None) or self._og_search_description(
|
||||
webpage, default=None)
|
||||
thumbnail = self._search_regex(
|
||||
r'(?:imageurl\s*=|poster\s*:)\s*(["\'])(?P<thumbnail>.+?)\1',
|
||||
webpage, 'thumbnail', fatal=False, group='thumbnail')
|
||||
|
@ -14,6 +14,7 @@ class YourPornIE(InfoExtractor):
|
||||
'ext': 'mp4',
|
||||
'title': 'md5:c9f43630bd968267672651ba905a7d35',
|
||||
'thumbnail': r're:^https?://.*\.jpg$',
|
||||
'age_limit': 18
|
||||
},
|
||||
}
|
||||
|
||||
@ -26,7 +27,7 @@ class YourPornIE(InfoExtractor):
|
||||
self._search_regex(
|
||||
r'data-vnfo=(["\'])(?P<data>{.+?})\1', webpage, 'data info',
|
||||
group='data'),
|
||||
video_id)[video_id]).replace('/cdn/', '/cdn2/')
|
||||
video_id)[video_id]).replace('/cdn/', '/cdn3/')
|
||||
|
||||
title = (self._search_regex(
|
||||
r'<[^>]+\bclass=["\']PostEditTA[^>]+>([^<]+)', webpage, 'title',
|
||||
@ -38,4 +39,5 @@ class YourPornIE(InfoExtractor):
|
||||
'url': video_url,
|
||||
'title': title,
|
||||
'thumbnail': thumbnail,
|
||||
'age_limit': 18
|
||||
}
|
||||
|
@ -1077,6 +1077,11 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
|
||||
'url': 'https://invidio.us/watch?v=BaW_jenozKc',
|
||||
'only_matching': True,
|
||||
},
|
||||
{
|
||||
# DRM protected
|
||||
'url': 'https://www.youtube.com/watch?v=s7_qI6_mIXc',
|
||||
'only_matching': True,
|
||||
}
|
||||
]
|
||||
|
||||
def __init__(self, *args, **kwargs):
|
||||
@ -1673,6 +1678,9 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
|
||||
'"token" parameter not in video info for unknown reason',
|
||||
video_id=video_id)
|
||||
|
||||
if video_info.get('license_info'):
|
||||
raise ExtractorError('This video is DRM protected.', expected=True)
|
||||
|
||||
video_details = try_get(
|
||||
player_response, lambda x: x['videoDetails'], dict) or {}
|
||||
|
||||
@ -1786,6 +1794,25 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
|
||||
'height': int_or_none(width_height[1]),
|
||||
}
|
||||
q = qualities(['small', 'medium', 'hd720'])
|
||||
streaming_formats = try_get(player_response, lambda x: x['streamingData']['formats'], list)
|
||||
if streaming_formats:
|
||||
for fmt in streaming_formats:
|
||||
itag = str_or_none(fmt.get('itag'))
|
||||
if not itag:
|
||||
continue
|
||||
quality = fmt.get('quality')
|
||||
quality_label = fmt.get('qualityLabel') or quality
|
||||
formats_spec[itag] = {
|
||||
'asr': int_or_none(fmt.get('audioSampleRate')),
|
||||
'filesize': int_or_none(fmt.get('contentLength')),
|
||||
'format_note': quality_label,
|
||||
'fps': int_or_none(fmt.get('fps')),
|
||||
'height': int_or_none(fmt.get('height')),
|
||||
'quality': q(quality),
|
||||
# bitrate for itag 43 is always 2147483647
|
||||
'tbr': float_or_none(fmt.get('averageBitrate') or fmt.get('bitrate'), 1000) if itag != '43' else None,
|
||||
'width': int_or_none(fmt.get('width')),
|
||||
}
|
||||
formats = []
|
||||
for url_data_str in encoded_url_map.split(','):
|
||||
url_data = compat_parse_qs(url_data_str)
|
||||
@ -1868,7 +1895,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
|
||||
filesize = int_or_none(url_data.get(
|
||||
'clen', [None])[0]) or _extract_filesize(url)
|
||||
|
||||
quality = url_data.get('quality_label', [None])[0] or url_data.get('quality', [None])[0]
|
||||
quality = url_data.get('quality', [None])[0]
|
||||
|
||||
more_fields = {
|
||||
'filesize': filesize,
|
||||
@ -1876,7 +1903,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
|
||||
'width': width,
|
||||
'height': height,
|
||||
'fps': int_or_none(url_data.get('fps', [None])[0]),
|
||||
'format_note': quality,
|
||||
'format_note': url_data.get('quality_label', [None])[0] or quality,
|
||||
'quality': q(quality),
|
||||
}
|
||||
for key, value in more_fields.items():
|
||||
@ -1904,31 +1931,38 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
|
||||
'http_chunk_size': 10485760,
|
||||
}
|
||||
formats.append(dct)
|
||||
elif video_info.get('hlsvp'):
|
||||
manifest_url = video_info['hlsvp'][0]
|
||||
formats = []
|
||||
m3u8_formats = self._extract_m3u8_formats(
|
||||
manifest_url, video_id, 'mp4', fatal=False)
|
||||
for a_format in m3u8_formats:
|
||||
itag = self._search_regex(
|
||||
r'/itag/(\d+)/', a_format['url'], 'itag', default=None)
|
||||
if itag:
|
||||
a_format['format_id'] = itag
|
||||
if itag in self._formats:
|
||||
dct = self._formats[itag].copy()
|
||||
dct.update(a_format)
|
||||
a_format = dct
|
||||
a_format['player_url'] = player_url
|
||||
# Accept-Encoding header causes failures in live streams on Youtube and Youtube Gaming
|
||||
a_format.setdefault('http_headers', {})['Youtubedl-no-compression'] = 'True'
|
||||
formats.append(a_format)
|
||||
else:
|
||||
error_message = clean_html(video_info.get('reason', [None])[0])
|
||||
if not error_message:
|
||||
error_message = extract_unavailable_message()
|
||||
if error_message:
|
||||
raise ExtractorError(error_message, expected=True)
|
||||
raise ExtractorError('no conn, hlsvp or url_encoded_fmt_stream_map information found in video info')
|
||||
manifest_url = (
|
||||
url_or_none(try_get(
|
||||
player_response,
|
||||
lambda x: x['streamingData']['hlsManifestUrl'],
|
||||
compat_str)) or
|
||||
url_or_none(try_get(
|
||||
video_info, lambda x: x['hlsvp'][0], compat_str)))
|
||||
if manifest_url:
|
||||
formats = []
|
||||
m3u8_formats = self._extract_m3u8_formats(
|
||||
manifest_url, video_id, 'mp4', fatal=False)
|
||||
for a_format in m3u8_formats:
|
||||
itag = self._search_regex(
|
||||
r'/itag/(\d+)/', a_format['url'], 'itag', default=None)
|
||||
if itag:
|
||||
a_format['format_id'] = itag
|
||||
if itag in self._formats:
|
||||
dct = self._formats[itag].copy()
|
||||
dct.update(a_format)
|
||||
a_format = dct
|
||||
a_format['player_url'] = player_url
|
||||
# Accept-Encoding header causes failures in live streams on Youtube and Youtube Gaming
|
||||
a_format.setdefault('http_headers', {})['Youtubedl-no-compression'] = 'True'
|
||||
formats.append(a_format)
|
||||
else:
|
||||
error_message = clean_html(video_info.get('reason', [None])[0])
|
||||
if not error_message:
|
||||
error_message = extract_unavailable_message()
|
||||
if error_message:
|
||||
raise ExtractorError(error_message, expected=True)
|
||||
raise ExtractorError('no conn, hlsvp, hlsManifestUrl or url_encoded_fmt_stream_map information found in video info')
|
||||
|
||||
# uploader
|
||||
video_uploader = try_get(
|
||||
@ -2016,7 +2050,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
|
||||
r'<div[^>]+id="watch7-headline"[^>]*>\s*<span[^>]*>.*?>(?P<series>[^<]+)</a></b>\s*S(?P<season>\d+)\s*•\s*E(?P<episode>\d+)</span>',
|
||||
video_webpage)
|
||||
if m_episode:
|
||||
series = m_episode.group('series')
|
||||
series = unescapeHTML(m_episode.group('series'))
|
||||
season_number = int(m_episode.group('season'))
|
||||
episode_number = int(m_episode.group('episode'))
|
||||
else:
|
||||
|
@ -79,6 +79,20 @@ class FFmpegPostProcessor(PostProcessor):
|
||||
programs = ['avprobe', 'avconv', 'ffmpeg', 'ffprobe']
|
||||
prefer_ffmpeg = True
|
||||
|
||||
def get_ffmpeg_version(path):
|
||||
ver = get_exe_version(path, args=['-version'])
|
||||
if ver:
|
||||
regexs = [
|
||||
r'(?:\d+:)?([0-9.]+)-[0-9]+ubuntu[0-9.]+$', # Ubuntu, see [1]
|
||||
r'n([0-9.]+)$', # Arch Linux
|
||||
# 1. http://www.ducea.com/2006/06/17/ubuntu-package-version-naming-explanation/
|
||||
]
|
||||
for regex in regexs:
|
||||
mobj = re.match(regex, ver)
|
||||
if mobj:
|
||||
ver = mobj.group(1)
|
||||
return ver
|
||||
|
||||
self.basename = None
|
||||
self.probe_basename = None
|
||||
|
||||
@ -110,11 +124,10 @@ class FFmpegPostProcessor(PostProcessor):
|
||||
self._paths = dict(
|
||||
(p, os.path.join(location, p)) for p in programs)
|
||||
self._versions = dict(
|
||||
(p, get_exe_version(self._paths[p], args=['-version']))
|
||||
for p in programs)
|
||||
(p, get_ffmpeg_version(self._paths[p])) for p in programs)
|
||||
if self._versions is None:
|
||||
self._versions = dict(
|
||||
(p, get_exe_version(p, args=['-version'])) for p in programs)
|
||||
(p, get_ffmpeg_version(p)) for p in programs)
|
||||
self._paths = dict((p, p) for p in programs)
|
||||
|
||||
if prefer_ffmpeg is False:
|
||||
@ -384,9 +397,8 @@ class FFmpegEmbedSubtitlePP(FFmpegPostProcessor):
|
||||
opts += ['-c:s', 'mov_text']
|
||||
for (i, lang) in enumerate(sub_langs):
|
||||
opts.extend(['-map', '%d:0' % (i + 1)])
|
||||
lang_code = ISO639Utils.short2long(lang)
|
||||
if lang_code is not None:
|
||||
opts.extend(['-metadata:s:s:%d' % i, 'language=%s' % lang_code])
|
||||
lang_code = ISO639Utils.short2long(lang) or lang
|
||||
opts.extend(['-metadata:s:s:%d' % i, 'language=%s' % lang_code])
|
||||
|
||||
temp_filename = prepend_extension(filename, 'temp')
|
||||
self._downloader.to_screen('[ffmpeg] Embedding subtitles in \'%s\'' % filename)
|
||||
|
@ -2968,6 +2968,7 @@ class ISO639Utils(object):
|
||||
'gv': 'glv',
|
||||
'ha': 'hau',
|
||||
'he': 'heb',
|
||||
'iw': 'heb', # Replaced by he in 1989 revision
|
||||
'hi': 'hin',
|
||||
'ho': 'hmo',
|
||||
'hr': 'hrv',
|
||||
@ -2977,6 +2978,7 @@ class ISO639Utils(object):
|
||||
'hz': 'her',
|
||||
'ia': 'ina',
|
||||
'id': 'ind',
|
||||
'in': 'ind', # Replaced by id in 1989 revision
|
||||
'ie': 'ile',
|
||||
'ig': 'ibo',
|
||||
'ii': 'iii',
|
||||
@ -3091,6 +3093,7 @@ class ISO639Utils(object):
|
||||
'wo': 'wol',
|
||||
'xh': 'xho',
|
||||
'yi': 'yid',
|
||||
'ji': 'yid', # Replaced by yi in 1989 revision
|
||||
'yo': 'yor',
|
||||
'za': 'zha',
|
||||
'zh': 'zho',
|
||||
|
@ -1,3 +1,3 @@
|
||||
from __future__ import unicode_literals
|
||||
|
||||
__version__ = '2018.12.17'
|
||||
__version__ = '2019.01.10'
|
||||
|
Loading…
x
Reference in New Issue
Block a user