diff --git a/.github/ISSUE_TEMPLATE.md b/.github/ISSUE_TEMPLATE.md
index ed3e0a157..b84559932 100644
--- a/.github/ISSUE_TEMPLATE.md
+++ b/.github/ISSUE_TEMPLATE.md
@@ -6,8 +6,8 @@
---
-### Make sure you are using the *latest* version: run `youtube-dl --version` and ensure your version is *2018.09.26*. If it's not, read [this FAQ entry](https://github.com/rg3/youtube-dl/blob/master/README.md#how-do-i-update-youtube-dl) and update. Issues with outdated version will be rejected.
-- [ ] I've **verified** and **I assure** that I'm running youtube-dl **2018.09.26**
+### Make sure you are using the *latest* version: run `youtube-dl --version` and ensure your version is *2018.12.17*. If it's not, read [this FAQ entry](https://github.com/rg3/youtube-dl/blob/master/README.md#how-do-i-update-youtube-dl) and update. Issues with outdated version will be rejected.
+- [ ] I've **verified** and **I assure** that I'm running youtube-dl **2018.12.17**
### Before submitting an *issue* make sure you have:
- [ ] At least skimmed through the [README](https://github.com/rg3/youtube-dl/blob/master/README.md), **most notably** the [FAQ](https://github.com/rg3/youtube-dl#faq) and [BUGS](https://github.com/rg3/youtube-dl#bugs) sections
@@ -36,7 +36,7 @@ Add the `-v` flag to **your command line** you run youtube-dl with (`youtube-dl
[debug] User config: []
[debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj']
[debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251
-[debug] youtube-dl version 2018.09.26
+[debug] youtube-dl version 2018.12.17
[debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2
[debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4
[debug] Proxy map: {}
diff --git a/.travis.yml b/.travis.yml
index 92f326860..79287ccf6 100644
--- a/.travis.yml
+++ b/.travis.yml
@@ -15,6 +15,18 @@ env:
- YTDL_TEST_SET=download
matrix:
include:
+ - python: 3.7
+ dist: xenial
+ env: YTDL_TEST_SET=core
+ - python: 3.7
+ dist: xenial
+ env: YTDL_TEST_SET=download
+ - python: 3.8-dev
+ dist: xenial
+ env: YTDL_TEST_SET=core
+ - python: 3.8-dev
+ dist: xenial
+ env: YTDL_TEST_SET=download
- env: JYTHON=true; YTDL_TEST_SET=core
- env: JYTHON=true; YTDL_TEST_SET=download
fast_finish: true
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
index 333acee80..cba87190d 100644
--- a/CONTRIBUTING.md
+++ b/CONTRIBUTING.md
@@ -152,7 +152,7 @@ After you have ensured this site is distributing its content legally, you can fo
```
5. Add an import in [`youtube_dl/extractor/extractors.py`](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/extractors.py).
6. Run `python test/test_download.py TestDownload.test_YourExtractor`. This *should fail* at first, but you can continually re-run it until you're done. If you decide to add more than one test, then rename ``_TEST`` to ``_TESTS`` and make it into a list of dictionaries. The tests will then be named `TestDownload.test_YourExtractor`, `TestDownload.test_YourExtractor_1`, `TestDownload.test_YourExtractor_2`, etc. Note that tests with `only_matching` key in test's dict are not counted in.
-7. Have a look at [`youtube_dl/extractor/common.py`](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/common.py) for possible helper methods and a [detailed description of what your extractor should and may return](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/common.py#L74-L252). Add tests and code for as many as you want.
+7. Have a look at [`youtube_dl/extractor/common.py`](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/common.py) for possible helper methods and a [detailed description of what your extractor should and may return](https://github.com/rg3/youtube-dl/blob/7f41a598b3fba1bcab2817de64a08941200aa3c8/youtube_dl/extractor/common.py#L94-L303). Add tests and code for as many as you want.
8. Make sure your code follows [youtube-dl coding conventions](#youtube-dl-coding-conventions) and check the code with [flake8](https://pypi.python.org/pypi/flake8). Also make sure your code works under all [Python](https://www.python.org/) versions claimed supported by youtube-dl, namely 2.6, 2.7, and 3.2+.
9. When the tests pass, [add](https://git-scm.com/docs/git-add) the new files and [commit](https://git-scm.com/docs/git-commit) them and [push](https://git-scm.com/docs/git-push) the result, like this:
@@ -173,7 +173,7 @@ Extractors are very fragile by nature since they depend on the layout of the sou
### Mandatory and optional metafields
-For extraction to work youtube-dl relies on metadata your extractor extracts and provides to youtube-dl expressed by an [information dictionary](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/common.py#L75-L257) or simply *info dict*. Only the following meta fields in the *info dict* are considered mandatory for a successful extraction process by youtube-dl:
+For extraction to work youtube-dl relies on metadata your extractor extracts and provides to youtube-dl expressed by an [information dictionary](https://github.com/rg3/youtube-dl/blob/7f41a598b3fba1bcab2817de64a08941200aa3c8/youtube_dl/extractor/common.py#L94-L303) or simply *info dict*. Only the following meta fields in the *info dict* are considered mandatory for a successful extraction process by youtube-dl:
- `id` (media identifier)
- `title` (media title)
@@ -181,7 +181,7 @@ For extraction to work youtube-dl relies on metadata your extractor extracts and
In fact only the last option is technically mandatory (i.e. if you can't figure out the download location of the media the extraction does not make any sense). But by convention youtube-dl also treats `id` and `title` as mandatory. Thus the aforementioned metafields are the critical data that the extraction does not make any sense without and if any of them fail to be extracted then the extractor is considered completely broken.
-[Any field](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/common.py#L149-L257) apart from the aforementioned ones are considered **optional**. That means that extraction should be **tolerant** to situations when sources for these fields can potentially be unavailable (even if they are always available at the moment) and **future-proof** in order not to break the extraction of general purpose mandatory fields.
+[Any field](https://github.com/rg3/youtube-dl/blob/7f41a598b3fba1bcab2817de64a08941200aa3c8/youtube_dl/extractor/common.py#L188-L303) apart from the aforementioned ones are considered **optional**. That means that extraction should be **tolerant** to situations when sources for these fields can potentially be unavailable (even if they are always available at the moment) and **future-proof** in order not to break the extraction of general purpose mandatory fields.
#### Example
@@ -296,5 +296,26 @@ title = self._search_regex(
### Use safe conversion functions
-Wrap all extracted numeric data into safe functions from `utils`: `int_or_none`, `float_or_none`. Use them for string to number conversions as well.
+Wrap all extracted numeric data into safe functions from [`youtube_dl/utils.py`](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/utils.py): `int_or_none`, `float_or_none`. Use them for string to number conversions as well.
+
+Use `url_or_none` for safe URL processing.
+
+Use `try_get` for safe metadata extraction from parsed JSON.
+
+Explore [`youtube_dl/utils.py`](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/utils.py) for more useful convenience functions.
+
+#### More examples
+
+##### Safely extract optional description from parsed JSON
+```python
+description = try_get(response, lambda x: x['result']['video'][0]['summary'], compat_str)
+```
+
+##### Safely extract more optional metadata
+```python
+video = try_get(response, lambda x: x['result']['video'][0], dict) or {}
+description = video.get('summary')
+duration = float_or_none(video.get('durationMs'), scale=1000)
+view_count = int_or_none(video.get('views'))
+```
diff --git a/ChangeLog b/ChangeLog
index 241712037..510013e89 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,173 @@
+version 2018.12.17
+
+Extractors
+* [ard:beta] Improve geo restricted videos extraction
+* [ard:beta] Fix subtitles extraction
+* [ard:beta] Improve extraction robustness
+* [ard:beta] Relax URL regular expression (#18441)
+* [acast] Add support for embed.acast.com and play.acast.com (#18483)
+* [iprima] Relax URL regular expression (#18515, #18540)
+* [vrv] Fix initial state extraction (#18553)
+* [youtube] Fix mark watched (#18546)
++ [safari] Add support for learning.oreilly.com (#18510)
+* [youtube] Fix multifeed extraction (#18531)
+* [lecturio] Improve subtitles extraction (#18488)
+* [uol] Fix format URL extraction (#18480)
++ [ard:mediathek] Add support for classic.ardmediathek.de (#18473)
+
+
+version 2018.12.09
+
+Core
+* [YoutubeDL] Keep session cookies in cookie file between runs
+* [YoutubeDL] Recognize session cookies with expired set to 0 (#12929)
+
+Extractors
++ [teachable] Add support for teachable platform sites (#5451, #18150, #18272)
++ [aenetworks] Add support for historyvault.com (#18460)
+* [imgur] Improve gallery and album detection and extraction (#9133, #16577,
+ #17223, #18404)
+* [iprima] Relax URL regular expression (#18453)
+* [hotstar] Fix video data extraction (#18386)
+* [ard:mediathek] Fix title and description extraction (#18349, #18371)
+* [xvideos] Switch to HTTPS (#18422, #18427)
++ [lecturio] Add support for lecturio.com (#18405)
++ [nrktv:series] Add support for extra materials
+* [nrktv:season,series] Fix extraction (#17159, #17258)
+* [nrktv] Relax URL regular expression (#18304, #18387)
+* [yourporn] Fix extraction (#18424, #18425)
+* [tbs] Fix info extraction (#18403)
++ [gamespot] Add support for review URLs
+
+
+version 2018.12.03
+
+Core
+* [utils] Fix random_birthday to generate existing dates only (#18284)
+
+Extractors
++ [tiktok] Add support for tiktok.com (#18108, #18135)
+* [pornhub] Use actual URL host for requests (#18359)
+* [lynda] Fix authentication (#18158, #18217)
+* [gfycat] Update API endpoint (#18333, #18343)
++ [hotstar] Add support for alternative app state layout (#18320)
+* [azmedien] Fix extraction (#18334, #18336)
++ [vimeo] Add support for VHX (Vimeo OTT) (#14835)
+* [joj] Fix extraction (#18280, #18281)
++ [wistia] Add support for fast.wistia.com (#18287)
+
+
+version 2018.11.23
+
+Core
++ [setup.py] Add more relevant classifiers
+
+Extractors
+* [mixcloud] Fallback to hardcoded decryption key (#18016)
+* [nbc:news] Fix article extraction (#16194)
+* [foxsports] Fix extraction (#17543)
+* [loc] Relax regular expression and improve formats extraction
++ [ciscolive] Add support for ciscolive.cisco.com (#17984)
+* [nzz] Relax kaltura regex (#18228)
+* [sixplay] Fix formats extraction
+* [bitchute] Improve title extraction
+* [kaltura] Limit requested MediaEntry fields
++ [americastestkitchen] Add support for zype embeds (#18225)
++ [pornhub] Add pornhub.net alias
+* [nova:embed] Fix extraction (#18222)
+
+
+version 2018.11.18
+
+Extractors
++ [wwe] Extract subtitles
++ [wwe] Add support for playlistst (#14781)
++ [wwe] Add support for wwe.com (#14781, #17450)
+* [vk] Detect geo restriction (#17767)
+* [openload] Use original host during extraction (#18211)
+* [atvat] Fix extraction (#18041)
++ [rte] Add support for new API endpoint (#18206)
+* [tnaflixnetwork:embed] Fix extraction (#18205)
+* [picarto] Use API and add token support (#16518)
++ [zype] Add support for player.zype.com (#18143)
+* [vivo] Fix extraction (#18139)
+* [ruutu] Update API endpoint (#18138)
+
+
+version 2018.11.07
+
+Extractors
++ [youtube] Add another JS signature function name regex (#18091, #18093,
+ #18094)
+* [facebook] Fix tahoe request (#17171)
+* [cliphunter] Fix extraction (#18083)
++ [youtube:playlist] Add support for invidio.us (#18077)
+* [zattoo] Arrange API hosts for derived extractors (#18035)
++ [youtube] Add fallback metadata extraction from videoDetails (#18052)
+
+
+version 2018.11.03
+
+Core
+* [extractor/common] Ensure response handle is not prematurely closed before
+ it can be read if it matches expected_status (#17195, #17846, #17447)
+
+Extractors
+* [laola1tv:embed] Set correct stream access URL scheme (#16341)
++ [ehftv] Add support for ehftv.com (#15408)
+* [azmedien] Adopt to major site redesign (#17745, #17746)
++ [twitcasting] Add support for twitcasting.tv (#17981)
+* [orf:tvthek] Fix extraction (#17737, #17956, #18024)
++ [openload] Add support for oload.fun (#18045)
+* [njpwworld] Fix authentication (#17427)
++ [linkedin:learning] Add support for linkedin.com/learning (#13545)
+* [theplatform] Improve error detection (#13222)
+* [cnbc] Simplify extraction (#14280, #17110)
++ [cbnc] Add support for new URL schema (#14193)
+* [aparat] Improve extraction and extract more metadata (#17445, #18008)
+* [aparat] Fix extraction
+
+
+version 2018.10.29
+
+Core
++ [extractor/common] Add validation for JSON-LD URLs
+
+Extractors
++ [sportbox] Add support for matchtv.ru
+* [sportbox] Fix extraction (#17978)
+* [screencast] Fix extraction (#14590, #14617, #17990)
++ [openload] Add support for oload.icu
++ [ivi] Add support for ivi.tv
+* [crunchyroll] Improve extraction failsafeness (#17991)
+* [dailymail] Fix formats extraction (#17976)
+* [viewster] Reduce format requests
+* [cwtv] Handle API errors (#17905)
++ [rutube] Use geo verification headers (#17897)
++ [brightcove:legacy] Add fallbacks to brightcove:new (#13912)
+- [tv3] Remove extractor (#10461, #15339)
+* [ted] Fix extraction for HTTP and RTMP formats (#5941, #17572, #17894)
++ [openload] Add support for oload.cc (#17823)
++ [patreon] Extract post_file URL (#17792)
+* [patreon] Fix extraction (#14502, #10471)
+
+
+version 2018.10.05
+
+Extractors
+* [pluralsight] Improve authentication (#17762)
+* [dailymotion] Fix extraction (#17699)
+* [crunchyroll] Switch to HTTPS for RpcApi (#17749)
++ [philharmoniedeparis] Add support for pad.philharmoniedeparis.fr (#17705)
+* [philharmoniedeparis] Fix extraction (#17705)
++ [jamendo] Add support for licensing.jamendo.com (#17724)
++ [openload] Add support for oload.cloud (#17710)
+* [pluralsight] Fix subtitles extraction (#17726, #17728)
++ [vimeo] Add another config regular expression (#17690)
+* [spike] Fix Paramount Network extraction (#17677)
+* [hotstar] Fix extraction (#14694, #14931, #17637)
+
+
version 2018.09.26
Extractors
diff --git a/README.md b/README.md
index fdd115c9b..b3c39bf66 100644
--- a/README.md
+++ b/README.md
@@ -1024,16 +1024,20 @@ After you have ensured this site is distributing its content legally, you can fo
```
5. Add an import in [`youtube_dl/extractor/extractors.py`](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/extractors.py).
6. Run `python test/test_download.py TestDownload.test_YourExtractor`. This *should fail* at first, but you can continually re-run it until you're done. If you decide to add more than one test, then rename ``_TEST`` to ``_TESTS`` and make it into a list of dictionaries. The tests will then be named `TestDownload.test_YourExtractor`, `TestDownload.test_YourExtractor_1`, `TestDownload.test_YourExtractor_2`, etc. Note that tests with `only_matching` key in test's dict are not counted in.
-7. Have a look at [`youtube_dl/extractor/common.py`](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/common.py) for possible helper methods and a [detailed description of what your extractor should and may return](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/common.py#L74-L252). Add tests and code for as many as you want.
-8. Make sure your code follows [youtube-dl coding conventions](#youtube-dl-coding-conventions) and check the code with [flake8](https://pypi.python.org/pypi/flake8). Also make sure your code works under all [Python](https://www.python.org/) versions claimed supported by youtube-dl, namely 2.6, 2.7, and 3.2+.
-9. When the tests pass, [add](https://git-scm.com/docs/git-add) the new files and [commit](https://git-scm.com/docs/git-commit) them and [push](https://git-scm.com/docs/git-push) the result, like this:
+7. Have a look at [`youtube_dl/extractor/common.py`](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/common.py) for possible helper methods and a [detailed description of what your extractor should and may return](https://github.com/rg3/youtube-dl/blob/7f41a598b3fba1bcab2817de64a08941200aa3c8/youtube_dl/extractor/common.py#L94-L303). Add tests and code for as many as you want.
+8. Make sure your code follows [youtube-dl coding conventions](#youtube-dl-coding-conventions) and check the code with [flake8](http://flake8.pycqa.org/en/latest/index.html#quickstart):
+
+ $ flake8 youtube_dl/extractor/yourextractor.py
+
+9. Make sure your code works under all [Python](https://www.python.org/) versions claimed supported by youtube-dl, namely 2.6, 2.7, and 3.2+.
+10. When the tests pass, [add](https://git-scm.com/docs/git-add) the new files and [commit](https://git-scm.com/docs/git-commit) them and [push](https://git-scm.com/docs/git-push) the result, like this:
$ git add youtube_dl/extractor/extractors.py
$ git add youtube_dl/extractor/yourextractor.py
$ git commit -m '[yourextractor] Add new extractor'
$ git push origin yourextractor
-10. Finally, [create a pull request](https://help.github.com/articles/creating-a-pull-request). We'll then review and merge it.
+11. Finally, [create a pull request](https://help.github.com/articles/creating-a-pull-request). We'll then review and merge it.
In any case, thank you very much for your contributions!
@@ -1045,7 +1049,7 @@ Extractors are very fragile by nature since they depend on the layout of the sou
### Mandatory and optional metafields
-For extraction to work youtube-dl relies on metadata your extractor extracts and provides to youtube-dl expressed by an [information dictionary](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/common.py#L75-L257) or simply *info dict*. Only the following meta fields in the *info dict* are considered mandatory for a successful extraction process by youtube-dl:
+For extraction to work youtube-dl relies on metadata your extractor extracts and provides to youtube-dl expressed by an [information dictionary](https://github.com/rg3/youtube-dl/blob/7f41a598b3fba1bcab2817de64a08941200aa3c8/youtube_dl/extractor/common.py#L94-L303) or simply *info dict*. Only the following meta fields in the *info dict* are considered mandatory for a successful extraction process by youtube-dl:
- `id` (media identifier)
- `title` (media title)
@@ -1053,7 +1057,7 @@ For extraction to work youtube-dl relies on metadata your extractor extracts and
In fact only the last option is technically mandatory (i.e. if you can't figure out the download location of the media the extraction does not make any sense). But by convention youtube-dl also treats `id` and `title` as mandatory. Thus the aforementioned metafields are the critical data that the extraction does not make any sense without and if any of them fail to be extracted then the extractor is considered completely broken.
-[Any field](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/common.py#L149-L257) apart from the aforementioned ones are considered **optional**. That means that extraction should be **tolerant** to situations when sources for these fields can potentially be unavailable (even if they are always available at the moment) and **future-proof** in order not to break the extraction of general purpose mandatory fields.
+[Any field](https://github.com/rg3/youtube-dl/blob/7f41a598b3fba1bcab2817de64a08941200aa3c8/youtube_dl/extractor/common.py#L188-L303) apart from the aforementioned ones are considered **optional**. That means that extraction should be **tolerant** to situations when sources for these fields can potentially be unavailable (even if they are always available at the moment) and **future-proof** in order not to break the extraction of general purpose mandatory fields.
#### Example
@@ -1168,7 +1172,28 @@ title = self._search_regex(
### Use safe conversion functions
-Wrap all extracted numeric data into safe functions from `utils`: `int_or_none`, `float_or_none`. Use them for string to number conversions as well.
+Wrap all extracted numeric data into safe functions from [`youtube_dl/utils.py`](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/utils.py): `int_or_none`, `float_or_none`. Use them for string to number conversions as well.
+
+Use `url_or_none` for safe URL processing.
+
+Use `try_get` for safe metadata extraction from parsed JSON.
+
+Explore [`youtube_dl/utils.py`](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/utils.py) for more useful convenience functions.
+
+#### More examples
+
+##### Safely extract optional description from parsed JSON
+```python
+description = try_get(response, lambda x: x['result']['video'][0]['summary'], compat_str)
+```
+
+##### Safely extract more optional metadata
+```python
+video = try_get(response, lambda x: x['result']['video'][0], dict) or {}
+description = video.get('summary')
+duration = float_or_none(video.get('durationMs'), scale=1000)
+view_count = int_or_none(video.get('views'))
+```
# EMBEDDING YOUTUBE-DL
diff --git a/docs/supportedsites.md b/docs/supportedsites.md
index 736ab6da7..31d20f255 100644
--- a/docs/supportedsites.md
+++ b/docs/supportedsites.md
@@ -33,7 +33,7 @@
- **AdobeTVShow**
- **AdobeTVVideo**
- **AdultSwim**
- - **aenetworks**: A+E Networks: A&E, Lifetime, History.com, FYI Network
+ - **aenetworks**: A+E Networks: A&E, Lifetime, History.com, FYI Network and History Vault
- **afreecatv**: afreecatv.com
- **AirMozilla**
- **AliExpressLive**
@@ -84,8 +84,6 @@
- **awaan:season**
- **awaan:video**
- **AZMedien**: AZ Medien videos
- - **AZMedienPlaylist**: AZ Medien playlists
- - **AZMedienShowPlaylist**: AZ Medien show playlists
- **BaiduVideo**: 百度视频
- **bambuser**
- **bambuser:channel**
@@ -165,6 +163,8 @@
- **chirbit**
- **chirbit:profile**
- **Cinchcast**
+ - **CiscoLiveSearch**
+ - **CiscoLiveSession**
- **CJSW**
- **cliphunter**
- **Clippit**
@@ -178,6 +178,7 @@
- **Clyp**
- **cmt.com**
- **CNBC**
+ - **CNBCVideo**
- **CNN**
- **CNNArticle**
- **CNNBlogs**
@@ -251,6 +252,7 @@
- **EchoMsk**
- **egghead:course**: egghead.io course
- **egghead:lesson**: egghead.io lesson
+ - **ehftv**
- **eHow**
- **EinsUndEinsTV**
- **Einthusan**
@@ -360,7 +362,7 @@
- **HitRecord**
- **HornBunny**
- **HotNewHipHop**
- - **HotStar**
+ - **hotstar**
- **hotstar:playlist**
- **Howcast**
- **HowStuffWorks**
@@ -374,7 +376,8 @@
- **imdb**: Internet Movie Database trailers
- **imdb:list**: Internet Movie Database lists
- **Imgur**
- - **ImgurAlbum**
+ - **imgur:album**
+ - **imgur:gallery**
- **Ina**
- **Inc**
- **IndavideoEmbed**
@@ -433,6 +436,8 @@
- **Le**: 乐视网
- **Learnr**
- **Lecture2Go**
+ - **Lecturio**
+ - **LecturioCourse**
- **LEGO**
- **Lemonde**
- **Lenta**
@@ -445,6 +450,8 @@
- **limelight:channel**
- **limelight:channel_list**
- **LineTV**
+ - **linkedin:learning**
+ - **linkedin:learning:course**
- **LiTV**
- **LiveLeak**
- **LiveLeakEmbed**
@@ -818,7 +825,7 @@
- **Spiegeltv**
- **sport.francetvinfo.fr**
- **Sport5**
- - **SportBoxEmbed**
+ - **SportBox**
- **SportDeutschland**
- **SpringboardPlatform**
- **Sprout**
@@ -849,6 +856,8 @@
- **TastyTrade**
- **TBS**
- **TDSLifeway**
+ - **Teachable**
+ - **TeachableCourse**
- **teachertube**: teachertube.com videos
- **teachertube:user:collection**: teachertube.com user and collection videos
- **TeachingChannel**
@@ -881,6 +890,8 @@
- **ThisAmericanLife**
- **ThisAV**
- **ThisOldHouse**
+ - **TikTok**
+ - **TikTokUser**
- **tinypic**: tinypic.com videos
- **TMZ**
- **TMZArticle**
@@ -909,7 +920,6 @@
- **TV2**
- **tv2.hu**
- **TV2Article**
- - **TV3**
- **TV4**: tv4.se and tv4play.se
- **TV5MondePlus**: TV5MONDE+
- **TVA**
@@ -931,6 +941,7 @@
- **TVPlayer**
- **TVPlayHome**
- **Tweakers**
+ - **TwitCasting**
- **twitch:chapter**
- **twitch:clips**
- **twitch:profile**
@@ -955,8 +966,6 @@
- **uol.com.br**
- **uplynk**
- **uplynk:preplay**
- - **Upskill**
- - **UpskillCourse**
- **Urort**: NRK P3 Urørt
- **URPlay**
- **USANetwork**
@@ -975,6 +984,7 @@
- **VevoPlaylist**
- **VGTV**: VGTV, BTTV, FTV, Aftenposten and Aftonbladet
- **vh1.com**
+ - **vhx:embed**
- **Viafree**
- **vice**
- **vice:article**
@@ -1078,6 +1088,7 @@
- **wrzuta.pl:playlist**
- **WSJ**: Wall Street Journal
- **WSJArticle**
+ - **WWE**
- **XBef**
- **XboxClips**
- **XFileShare**: XFileShare based sites: DaClips, FileHoot, GorillaVid, MovPod, PowerWatch, Rapidvideo.ws, TheVideoBee, Vidto, Streamin.To, XVIDSTAGE, Vid ABC, VidBom, vidlo, RapidVideo.TV, FastVideo.me
@@ -1137,3 +1148,4 @@
- **ZDF**
- **ZDFChannel**
- **zingmp3**: mp3.zing.vn
+ - **Zype**
diff --git a/setup.py b/setup.py
index 7dbb5805f..dfb669ad2 100644
--- a/setup.py
+++ b/setup.py
@@ -124,6 +124,8 @@ setup(
'Development Status :: 5 - Production/Stable',
'Environment :: Console',
'License :: Public Domain',
+ 'Programming Language :: Python',
+ 'Programming Language :: Python :: 2',
'Programming Language :: Python :: 2.6',
'Programming Language :: Python :: 2.7',
'Programming Language :: Python :: 3',
@@ -132,6 +134,13 @@ setup(
'Programming Language :: Python :: 3.4',
'Programming Language :: Python :: 3.5',
'Programming Language :: Python :: 3.6',
+ 'Programming Language :: Python :: 3.7',
+ 'Programming Language :: Python :: 3.8',
+ 'Programming Language :: Python :: Implementation',
+ 'Programming Language :: Python :: Implementation :: CPython',
+ 'Programming Language :: Python :: Implementation :: IronPython',
+ 'Programming Language :: Python :: Implementation :: Jython',
+ 'Programming Language :: Python :: Implementation :: PyPy',
],
cmdclass={'build_lazy_extractors': build_lazy_extractors},
diff --git a/test/helper.py b/test/helper.py
index dfee217a9..aa9a1c9b2 100644
--- a/test/helper.py
+++ b/test/helper.py
@@ -7,6 +7,7 @@ import json
import os.path
import re
import types
+import ssl
import sys
import youtube_dl.extractor
@@ -244,3 +245,12 @@ def expect_warnings(ydl, warnings_re):
real_warning(w)
ydl.report_warning = _report_warning
+
+
+def http_server_port(httpd):
+ if os.name == 'java' and isinstance(httpd.socket, ssl.SSLSocket):
+ # In Jython SSLSocket is not a subclass of socket.socket
+ sock = httpd.socket.sock
+ else:
+ sock = httpd.socket
+ return sock.getsockname()[1]
diff --git a/test/test_InfoExtractor.py b/test/test_InfoExtractor.py
index 4833396a5..06be72616 100644
--- a/test/test_InfoExtractor.py
+++ b/test/test_InfoExtractor.py
@@ -9,11 +9,30 @@ import sys
import unittest
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
-from test.helper import FakeYDL, expect_dict, expect_value
-from youtube_dl.compat import compat_etree_fromstring
+from test.helper import FakeYDL, expect_dict, expect_value, http_server_port
+from youtube_dl.compat import compat_etree_fromstring, compat_http_server
from youtube_dl.extractor.common import InfoExtractor
from youtube_dl.extractor import YoutubeIE, get_info_extractor
from youtube_dl.utils import encode_data_uri, strip_jsonp, ExtractorError, RegexNotFoundError
+import threading
+
+
+TEAPOT_RESPONSE_STATUS = 418
+TEAPOT_RESPONSE_BODY = "
418 I'm a teapot
"
+
+
+class InfoExtractorTestRequestHandler(compat_http_server.BaseHTTPRequestHandler):
+ def log_message(self, format, *args):
+ pass
+
+ def do_GET(self):
+ if self.path == '/teapot':
+ self.send_response(TEAPOT_RESPONSE_STATUS)
+ self.send_header('Content-Type', 'text/html; charset=utf-8')
+ self.end_headers()
+ self.wfile.write(TEAPOT_RESPONSE_BODY.encode())
+ else:
+ assert False
class TestIE(InfoExtractor):
@@ -743,6 +762,25 @@ jwplayer("mediaplayer").setup({"abouttext":"Visit Indie DB","aboutlink":"http:\/
for i in range(len(entries)):
expect_dict(self, entries[i], expected_entries[i])
+ def test_response_with_expected_status_returns_content(self):
+ # Checks for mitigations against the effects of
+ # that affect Python 3.4.1+, which
+ # manifest as `_download_webpage`, `_download_xml`, `_download_json`,
+ # or the underlying `_download_webpage_handle` returning no content
+ # when a response matches `expected_status`.
+
+ httpd = compat_http_server.HTTPServer(
+ ('127.0.0.1', 0), InfoExtractorTestRequestHandler)
+ port = http_server_port(httpd)
+ server_thread = threading.Thread(target=httpd.serve_forever)
+ server_thread.daemon = True
+ server_thread.start()
+
+ (content, urlh) = self.ie._download_webpage_handle(
+ 'http://127.0.0.1:%d/teapot' % port, None,
+ expected_status=TEAPOT_RESPONSE_STATUS)
+ self.assertEqual(content, TEAPOT_RESPONSE_BODY)
+
if __name__ == '__main__':
unittest.main()
diff --git a/test/test_YoutubeDLCookieJar.py b/test/test_YoutubeDLCookieJar.py
new file mode 100644
index 000000000..6a8243590
--- /dev/null
+++ b/test/test_YoutubeDLCookieJar.py
@@ -0,0 +1,34 @@
+#!/usr/bin/env python
+# coding: utf-8
+
+from __future__ import unicode_literals
+
+import os
+import re
+import sys
+import tempfile
+import unittest
+sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
+
+from youtube_dl.utils import YoutubeDLCookieJar
+
+
+class TestYoutubeDLCookieJar(unittest.TestCase):
+ def test_keep_session_cookies(self):
+ cookiejar = YoutubeDLCookieJar('./test/testdata/cookies/session_cookies.txt')
+ cookiejar.load(ignore_discard=True, ignore_expires=True)
+ tf = tempfile.NamedTemporaryFile(delete=False)
+ try:
+ cookiejar.save(filename=tf.name, ignore_discard=True, ignore_expires=True)
+ temp = tf.read().decode('utf-8')
+ self.assertTrue(re.search(
+ r'www\.foobar\.foobar\s+FALSE\s+/\s+TRUE\s+0\s+YoutubeDLExpiresEmpty\s+YoutubeDLExpiresEmptyValue', temp))
+ self.assertTrue(re.search(
+ r'www\.foobar\.foobar\s+FALSE\s+/\s+TRUE\s+0\s+YoutubeDLExpires0\s+YoutubeDLExpires0Value', temp))
+ finally:
+ tf.close()
+ os.remove(tf.name)
+
+
+if __name__ == '__main__':
+ unittest.main()
diff --git a/test/test_compat.py b/test/test_compat.py
index d6c54e135..51fe6aa0b 100644
--- a/test/test_compat.py
+++ b/test/test_compat.py
@@ -39,7 +39,7 @@ class TestCompat(unittest.TestCase):
def test_compat_expanduser(self):
old_home = os.environ.get('HOME')
- test_str = 'C:\Documents and Settings\тест\Application Data'
+ test_str = r'C:\Documents and Settings\тест\Application Data'
compat_setenv('HOME', test_str)
self.assertEqual(compat_expanduser('~'), test_str)
compat_setenv('HOME', old_home or '')
diff --git a/test/test_downloader_http.py b/test/test_downloader_http.py
index 5cf2bf1a5..750472281 100644
--- a/test/test_downloader_http.py
+++ b/test/test_downloader_http.py
@@ -9,26 +9,16 @@ import sys
import unittest
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
-from test.helper import try_rm
+from test.helper import http_server_port, try_rm
from youtube_dl import YoutubeDL
from youtube_dl.compat import compat_http_server
from youtube_dl.downloader.http import HttpFD
from youtube_dl.utils import encodeFilename
-import ssl
import threading
TEST_DIR = os.path.dirname(os.path.abspath(__file__))
-def http_server_port(httpd):
- if os.name == 'java' and isinstance(httpd.socket, ssl.SSLSocket):
- # In Jython SSLSocket is not a subclass of socket.socket
- sock = httpd.socket.sock
- else:
- sock = httpd.socket
- return sock.getsockname()[1]
-
-
TEST_SIZE = 10 * 1024
diff --git a/test/test_http.py b/test/test_http.py
index 409fec9c8..3ee0a5dda 100644
--- a/test/test_http.py
+++ b/test/test_http.py
@@ -8,6 +8,7 @@ import sys
import unittest
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
+from test.helper import http_server_port
from youtube_dl import YoutubeDL
from youtube_dl.compat import compat_http_server, compat_urllib_request
import ssl
@@ -16,15 +17,6 @@ import threading
TEST_DIR = os.path.dirname(os.path.abspath(__file__))
-def http_server_port(httpd):
- if os.name == 'java' and isinstance(httpd.socket, ssl.SSLSocket):
- # In Jython SSLSocket is not a subclass of socket.socket
- sock = httpd.socket.sock
- else:
- sock = httpd.socket
- return sock.getsockname()[1]
-
-
class HTTPTestRequestHandler(compat_http_server.BaseHTTPRequestHandler):
def log_message(self, format, *args):
pass
diff --git a/test/test_postprocessors.py b/test/test_postprocessors.py
index addb69d6f..4209d1d9a 100644
--- a/test/test_postprocessors.py
+++ b/test/test_postprocessors.py
@@ -14,4 +14,4 @@ from youtube_dl.postprocessor import MetadataFromTitlePP
class TestMetadataFromTitle(unittest.TestCase):
def test_format_to_regex(self):
pp = MetadataFromTitlePP(None, '%(title)s - %(artist)s')
- self.assertEqual(pp._titleregex, '(?P.+)\ \-\ (?P.+)')
+ self.assertEqual(pp._titleregex, r'(?P.+)\ \-\ (?P.+)')
diff --git a/test/testdata/cookies/session_cookies.txt b/test/testdata/cookies/session_cookies.txt
new file mode 100644
index 000000000..f6996f031
--- /dev/null
+++ b/test/testdata/cookies/session_cookies.txt
@@ -0,0 +1,6 @@
+# Netscape HTTP Cookie File
+# http://curl.haxx.se/rfc/cookie_spec.html
+# This is a generated file! Do not edit.
+
+www.foobar.foobar FALSE / TRUE YoutubeDLExpiresEmpty YoutubeDLExpiresEmptyValue
+www.foobar.foobar FALSE / TRUE 0 YoutubeDLExpires0 YoutubeDLExpires0Value
diff --git a/youtube_dl/YoutubeDL.py b/youtube_dl/YoutubeDL.py
index 38ba43a97..4493fd0e1 100755
--- a/youtube_dl/YoutubeDL.py
+++ b/youtube_dl/YoutubeDL.py
@@ -88,6 +88,7 @@ from .utils import (
version_tuple,
write_json_file,
write_string,
+ YoutubeDLCookieJar,
YoutubeDLCookieProcessor,
YoutubeDLHandler,
)
@@ -558,7 +559,7 @@ class YoutubeDL(object):
self.restore_console_title()
if self.params.get('cookiefile') is not None:
- self.cookiejar.save()
+ self.cookiejar.save(ignore_discard=True, ignore_expires=True)
def trouble(self, message=None, tb=None):
"""Determine action to take when a download problem appears.
@@ -2297,10 +2298,9 @@ class YoutubeDL(object):
self.cookiejar = compat_cookiejar.CookieJar()
else:
opts_cookiefile = expand_path(opts_cookiefile)
- self.cookiejar = compat_cookiejar.MozillaCookieJar(
- opts_cookiefile)
+ self.cookiejar = YoutubeDLCookieJar(opts_cookiefile)
if os.access(opts_cookiefile, os.R_OK):
- self.cookiejar.load()
+ self.cookiejar.load(ignore_discard=True, ignore_expires=True)
cookie_processor = YoutubeDLCookieProcessor(self.cookiejar)
if opts_proxy is not None:
diff --git a/youtube_dl/extractor/acast.py b/youtube_dl/extractor/acast.py
index 6d846ea7a..b32f74a37 100644
--- a/youtube_dl/extractor/acast.py
+++ b/youtube_dl/extractor/acast.py
@@ -17,25 +17,15 @@ from ..utils import (
class ACastIE(InfoExtractor):
IE_NAME = 'acast'
- _VALID_URL = r'https?://(?:www\.)?acast\.com/(?P[^/]+)/(?P[^/#?]+)'
+ _VALID_URL = r'''(?x)
+ https?://
+ (?:
+ (?:(?:embed|www)\.)?acast\.com/|
+ play\.acast\.com/s/
+ )
+ (?P[^/]+)/(?P[^/#?]+)
+ '''
_TESTS = [{
- # test with one bling
- 'url': 'https://www.acast.com/condenasttraveler/-where-are-you-taipei-101-taiwan',
- 'md5': 'ada3de5a1e3a2a381327d749854788bb',
- 'info_dict': {
- 'id': '57de3baa-4bb0-487e-9418-2692c1277a34',
- 'ext': 'mp3',
- 'title': '"Where Are You?": Taipei 101, Taiwan',
- 'description': 'md5:a0b4ef3634e63866b542e5b1199a1a0e',
- 'timestamp': 1196172000,
- 'upload_date': '20071127',
- 'duration': 211,
- 'creator': 'Concierge',
- 'series': 'Condé Nast Traveler Podcast',
- 'episode': '"Where Are You?": Taipei 101, Taiwan',
- }
- }, {
- # test with multiple blings
'url': 'https://www.acast.com/sparpodcast/2.raggarmordet-rosterurdetforflutna',
'md5': 'a02393c74f3bdb1801c3ec2695577ce0',
'info_dict': {
@@ -50,6 +40,12 @@ class ACastIE(InfoExtractor):
'series': 'Spår',
'episode': '2. Raggarmordet - Röster ur det förflutna',
}
+ }, {
+ 'url': 'http://embed.acast.com/adambuxton/ep.12-adam-joeschristmaspodcast2015',
+ 'only_matching': True,
+ }, {
+ 'url': 'https://play.acast.com/s/rattegangspodden/s04e09-styckmordet-i-helenelund-del-22',
+ 'only_matching': True,
}]
def _real_extract(self, url):
diff --git a/youtube_dl/extractor/aenetworks.py b/youtube_dl/extractor/aenetworks.py
index 398e56ea3..85ec6392d 100644
--- a/youtube_dl/extractor/aenetworks.py
+++ b/youtube_dl/extractor/aenetworks.py
@@ -22,18 +22,19 @@ class AENetworksBaseIE(ThePlatformIE):
class AENetworksIE(AENetworksBaseIE):
IE_NAME = 'aenetworks'
- IE_DESC = 'A+E Networks: A&E, Lifetime, History.com, FYI Network'
+ IE_DESC = 'A+E Networks: A&E, Lifetime, History.com, FYI Network and History Vault'
_VALID_URL = r'''(?x)
https?://
(?:www\.)?
(?P
- (?:history|aetv|mylifetime|lifetimemovieclub)\.com|
+ (?:history(?:vault)?|aetv|mylifetime|lifetimemovieclub)\.com|
fyi\.tv
)/
(?:
shows/(?P[^/]+(?:/[^/]+){0,2})|
movies/(?P[^/]+)(?:/full-movie)?|
- specials/(?P[^/]+)/full-special
+ specials/(?P[^/]+)/full-special|
+ collections/[^/]+/(?P[^/]+)
)
'''
_TESTS = [{
@@ -80,6 +81,9 @@ class AENetworksIE(AENetworksBaseIE):
}, {
'url': 'http://www.history.com/specials/sniper-into-the-kill-zone/full-special',
'only_matching': True
+ }, {
+ 'url': 'https://www.historyvault.com/collections/america-the-story-of-us/westward',
+ 'only_matching': True
}]
_DOMAIN_TO_REQUESTOR_ID = {
'history.com': 'HISTORY',
@@ -90,9 +94,9 @@ class AENetworksIE(AENetworksBaseIE):
}
def _real_extract(self, url):
- domain, show_path, movie_display_id, special_display_id = re.match(self._VALID_URL, url).groups()
- display_id = show_path or movie_display_id or special_display_id
- webpage = self._download_webpage(url, display_id)
+ domain, show_path, movie_display_id, special_display_id, collection_display_id = re.match(self._VALID_URL, url).groups()
+ display_id = show_path or movie_display_id or special_display_id or collection_display_id
+ webpage = self._download_webpage(url, display_id, headers=self.geo_verification_headers())
if show_path:
url_parts = show_path.split('/')
url_parts_len = len(url_parts)
diff --git a/youtube_dl/extractor/americastestkitchen.py b/youtube_dl/extractor/americastestkitchen.py
index 01736872d..8b32aa886 100644
--- a/youtube_dl/extractor/americastestkitchen.py
+++ b/youtube_dl/extractor/americastestkitchen.py
@@ -43,10 +43,6 @@ class AmericasTestKitchenIE(InfoExtractor):
webpage = self._download_webpage(url, video_id)
- partner_id = self._search_regex(
- r'src=["\'](?:https?:)?//(?:[^/]+\.)kaltura\.com/(?:[^/]+/)*(?:p|partner_id)/(\d+)',
- webpage, 'kaltura partner id')
-
video_data = self._parse_json(
self._search_regex(
r'window\.__INITIAL_STATE__\s*=\s*({.+?})\s*;\s*',
@@ -58,7 +54,18 @@ class AmericasTestKitchenIE(InfoExtractor):
(lambda x: x['episodeDetail']['content']['data'],
lambda x: x['videoDetail']['content']['data']), dict)
ep_meta = ep_data.get('full_video', {})
- external_id = ep_data.get('external_id') or ep_meta['external_id']
+
+ zype_id = ep_meta.get('zype_id')
+ if zype_id:
+ embed_url = 'https://player.zype.com/embed/%s.js?api_key=jZ9GUhRmxcPvX7M3SlfejB6Hle9jyHTdk2jVxG7wOHPLODgncEKVdPYBhuz9iWXQ' % zype_id
+ ie_key = 'Zype'
+ else:
+ partner_id = self._search_regex(
+ r'src=["\'](?:https?:)?//(?:[^/]+\.)kaltura\.com/(?:[^/]+/)*(?:p|partner_id)/(\d+)',
+ webpage, 'kaltura partner id')
+ external_id = ep_data.get('external_id') or ep_meta['external_id']
+ embed_url = 'kaltura:%s:%s' % (partner_id, external_id)
+ ie_key = 'Kaltura'
title = ep_data.get('title') or ep_meta.get('title')
description = clean_html(ep_meta.get('episode_description') or ep_data.get(
@@ -72,8 +79,8 @@ class AmericasTestKitchenIE(InfoExtractor):
return {
'_type': 'url_transparent',
- 'url': 'kaltura:%s:%s' % (partner_id, external_id),
- 'ie_key': 'Kaltura',
+ 'url': embed_url,
+ 'ie_key': ie_key,
'title': title,
'description': description,
'thumbnail': thumbnail,
diff --git a/youtube_dl/extractor/aparat.py b/youtube_dl/extractor/aparat.py
index 6eb8bbb6e..883dcee7a 100644
--- a/youtube_dl/extractor/aparat.py
+++ b/youtube_dl/extractor/aparat.py
@@ -4,6 +4,7 @@ from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import (
int_or_none,
+ merge_dicts,
mimetype2ext,
url_or_none,
)
@@ -12,59 +13,83 @@ from ..utils import (
class AparatIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?aparat\.com/(?:v/|video/video/embed/videohash/)(?P[a-zA-Z0-9]+)'
- _TEST = {
+ _TESTS = [{
'url': 'http://www.aparat.com/v/wP8On',
'md5': '131aca2e14fe7c4dcb3c4877ba300c89',
'info_dict': {
'id': 'wP8On',
'ext': 'mp4',
'title': 'تیم گلکسی 11 - زومیت',
- 'age_limit': 0,
+ 'description': 'md5:096bdabcdcc4569f2b8a5e903a3b3028',
+ 'duration': 231,
+ 'timestamp': 1387394859,
+ 'upload_date': '20131218',
+ 'view_count': int,
},
- # 'skip': 'Extremely unreliable',
- }
+ }, {
+ # multiple formats
+ 'url': 'https://www.aparat.com/v/8dflw/',
+ 'only_matching': True,
+ }]
def _real_extract(self, url):
video_id = self._match_id(url)
- # Note: There is an easier-to-parse configuration at
- # http://www.aparat.com/video/video/config/videohash/%video_id
- # but the URL in there does not work
- webpage = self._download_webpage(
- 'http://www.aparat.com/video/video/embed/vt/frame/showvideo/yes/videohash/' + video_id,
- video_id)
+ # Provides more metadata
+ webpage = self._download_webpage(url, video_id, fatal=False)
- title = self._search_regex(r'\s+title:\s*"([^"]+)"', webpage, 'title')
+ if not webpage:
+ # Note: There is an easier-to-parse configuration at
+ # http://www.aparat.com/video/video/config/videohash/%video_id
+ # but the URL in there does not work
+ webpage = self._download_webpage(
+ 'http://www.aparat.com/video/video/embed/vt/frame/showvideo/yes/videohash/' + video_id,
+ video_id)
- file_list = self._parse_json(
+ options = self._parse_json(
self._search_regex(
- r'fileList\s*=\s*JSON\.parse\(\'([^\']+)\'\)', webpage,
- 'file list'),
+ r'options\s*=\s*JSON\.parse\(\s*(["\'])(?P(?:(?!\1).)+)\1\s*\)',
+ webpage, 'options', group='value'),
video_id)
+ player = options['plugins']['sabaPlayerPlugin']
+
formats = []
- for item in file_list[0]:
- file_url = url_or_none(item.get('file'))
- if not file_url:
- continue
- ext = mimetype2ext(item.get('type'))
- label = item.get('label')
- formats.append({
- 'url': file_url,
- 'ext': ext,
- 'format_id': label or ext,
- 'height': int_or_none(self._search_regex(
- r'(\d+)[pP]', label or '', 'height', default=None)),
- })
- self._sort_formats(formats)
+ for sources in player['multiSRC']:
+ for item in sources:
+ if not isinstance(item, dict):
+ continue
+ file_url = url_or_none(item.get('src'))
+ if not file_url:
+ continue
+ item_type = item.get('type')
+ if item_type == 'application/vnd.apple.mpegurl':
+ formats.extend(self._extract_m3u8_formats(
+ file_url, video_id, 'mp4',
+ entry_protocol='m3u8_native', m3u8_id='hls',
+ fatal=False))
+ else:
+ ext = mimetype2ext(item.get('type'))
+ label = item.get('label')
+ formats.append({
+ 'url': file_url,
+ 'ext': ext,
+ 'format_id': 'http-%s' % (label or ext),
+ 'height': int_or_none(self._search_regex(
+ r'(\d+)[pP]', label or '', 'height',
+ default=None)),
+ })
+ self._sort_formats(
+ formats, field_preference=('height', 'width', 'tbr', 'format_id'))
- thumbnail = self._search_regex(
- r'image:\s*"([^"]+)"', webpage, 'thumbnail', fatal=False)
+ info = self._search_json_ld(webpage, video_id, default={})
- return {
+ if not info.get('title'):
+ info['title'] = player['title']
+
+ return merge_dicts(info, {
'id': video_id,
- 'title': title,
- 'thumbnail': thumbnail,
- 'age_limit': self._family_friendly_search(webpage),
+ 'thumbnail': url_or_none(options.get('poster')),
+ 'duration': int_or_none(player.get('duration')),
'formats': formats,
- }
+ })
diff --git a/youtube_dl/extractor/ard.py b/youtube_dl/extractor/ard.py
index b3e604587..4ccae6430 100644
--- a/youtube_dl/extractor/ard.py
+++ b/youtube_dl/extractor/ard.py
@@ -8,21 +8,24 @@ from .generic import GenericIE
from ..utils import (
determine_ext,
ExtractorError,
- qualities,
int_or_none,
parse_duration,
+ qualities,
+ str_or_none,
+ try_get,
unified_strdate,
- xpath_attr,
- xpath_text,
+ unified_timestamp,
update_url_query,
url_or_none,
+ xpath_attr,
+ xpath_text,
)
from ..compat import compat_etree_fromstring
class ARDMediathekIE(InfoExtractor):
IE_NAME = 'ARD:mediathek'
- _VALID_URL = r'^https?://(?:(?:www\.)?ardmediathek\.de|mediathek\.(?:daserste|rbb-online)\.de|one\.ard\.de)/(?:.*/)(?P[0-9]+|[^0-9][^/\?]+)[^/\?]*(?:\?.*)?'
+ _VALID_URL = r'^https?://(?:(?:(?:www|classic)\.)?ardmediathek\.de|mediathek\.(?:daserste|rbb-online)\.de|one\.ard\.de)/(?:.*/)(?P[0-9]+|[^0-9][^/\?]+)[^/\?]*(?:\?.*)?'
_TESTS = [{
# available till 26.07.2022
@@ -52,8 +55,15 @@ class ARDMediathekIE(InfoExtractor):
# audio
'url': 'http://mediathek.rbb-online.de/radio/Hörspiel/Vor-dem-Fest/kulturradio/Audio?documentId=30796318&topRessort=radio&bcastId=9839158',
'only_matching': True,
+ }, {
+ 'url': 'https://classic.ardmediathek.de/tv/Panda-Gorilla-Co/Panda-Gorilla-Co-Folge-274/Das-Erste/Video?bcastId=16355486&documentId=58234698',
+ 'only_matching': True,
}]
+ @classmethod
+ def suitable(cls, url):
+ return False if ARDBetaMediathekIE.suitable(url) else super(ARDMediathekIE, cls).suitable(url)
+
def _extract_media_info(self, media_info_url, webpage, video_id):
media_info = self._download_json(
media_info_url, video_id, 'Downloading media JSON')
@@ -174,13 +184,18 @@ class ARDMediathekIE(InfoExtractor):
title = self._html_search_regex(
[r'(.*?)
',
r'',
- r'(.*?)
'],
+ r'(.*?)
',
+ r']*>(.*?)'],
webpage, 'title')
description = self._html_search_meta(
'dcterms.abstract', webpage, 'description', default=None)
if description is None:
description = self._html_search_meta(
- 'description', webpage, 'meta description')
+ 'description', webpage, 'meta description', default=None)
+ if description is None:
+ description = self._html_search_regex(
+ r'(.+?)
',
+ webpage, 'teaser text', default=None)
# Thumbnail is sometimes not present.
# It is in the mobile version, but that seems to use a different URL
@@ -296,7 +311,7 @@ class ARDIE(InfoExtractor):
class ARDBetaMediathekIE(InfoExtractor):
- _VALID_URL = r'https://beta\.ardmediathek\.de/[a-z]+/player/(?P[a-zA-Z0-9]+)/(?P[^/?#]+)'
+ _VALID_URL = r'https://(?:beta|www)\.ardmediathek\.de/[^/]+/(?:player|live)/(?P[a-zA-Z0-9]+)(?:/(?P[^/?#]+))?'
_TESTS = [{
'url': 'https://beta.ardmediathek.de/ard/player/Y3JpZDovL2Rhc2Vyc3RlLmRlL3RhdG9ydC9mYmM4NGM1NC0xNzU4LTRmZGYtYWFhZS0wYzcyZTIxNGEyMDE/die-robuste-roswita',
'md5': '2d02d996156ea3c397cfc5036b5d7f8f',
@@ -310,12 +325,18 @@ class ARDBetaMediathekIE(InfoExtractor):
'upload_date': '20180826',
'ext': 'mp4',
},
+ }, {
+ 'url': 'https://www.ardmediathek.de/ard/player/Y3JpZDovL3N3ci5kZS9hZXgvbzEwNzE5MTU/',
+ 'only_matching': True,
+ }, {
+ 'url': 'https://www.ardmediathek.de/swr/live/Y3JpZDovL3N3ci5kZS8xMzQ4MTA0Mg',
+ 'only_matching': True,
}]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('video_id')
- display_id = mobj.group('display_id')
+ display_id = mobj.group('display_id') or video_id
webpage = self._download_webpage(url, display_id)
data_json = self._search_regex(r'window\.__APOLLO_STATE__\s*=\s*(\{.*);\n', webpage, 'json')
@@ -326,43 +347,62 @@ class ARDBetaMediathekIE(InfoExtractor):
'display_id': display_id,
}
formats = []
+ subtitles = {}
+ geoblocked = False
for widget in data.values():
- if widget.get('_geoblocked'):
- raise ExtractorError('This video is not available due to geoblocking', expected=True)
-
+ if widget.get('_geoblocked') is True:
+ geoblocked = True
if '_duration' in widget:
- res['duration'] = widget['_duration']
+ res['duration'] = int_or_none(widget['_duration'])
if 'clipTitle' in widget:
res['title'] = widget['clipTitle']
if '_previewImage' in widget:
res['thumbnail'] = widget['_previewImage']
if 'broadcastedOn' in widget:
- res['upload_date'] = unified_strdate(widget['broadcastedOn'])
+ res['timestamp'] = unified_timestamp(widget['broadcastedOn'])
if 'synopsis' in widget:
res['description'] = widget['synopsis']
- if '_subtitleUrl' in widget:
- res['subtitles'] = {'de': [{
+ subtitle_url = url_or_none(widget.get('_subtitleUrl'))
+ if subtitle_url:
+ subtitles.setdefault('de', []).append({
'ext': 'ttml',
- 'url': widget['_subtitleUrl'],
- }]}
+ 'url': subtitle_url,
+ })
if '_quality' in widget:
- format_url = widget['_stream']['json'][0]
-
- if format_url.endswith('.f4m'):
+ format_url = url_or_none(try_get(
+ widget, lambda x: x['_stream']['json'][0]))
+ if not format_url:
+ continue
+ ext = determine_ext(format_url)
+ if ext == 'f4m':
formats.extend(self._extract_f4m_formats(
format_url + '?hdcore=3.11.0',
video_id, f4m_id='hds', fatal=False))
- elif format_url.endswith('m3u8'):
+ elif ext == 'm3u8':
formats.extend(self._extract_m3u8_formats(
- format_url, video_id, 'mp4', m3u8_id='hls', fatal=False))
+ format_url, video_id, 'mp4', m3u8_id='hls',
+ fatal=False))
else:
+ # HTTP formats are not available when geoblocked is True,
+ # other formats are fine though
+ if geoblocked:
+ continue
+ quality = str_or_none(widget.get('_quality'))
formats.append({
- 'format_id': 'http-' + widget['_quality'],
+ 'format_id': ('http-' + quality) if quality else 'http',
'url': format_url,
'preference': 10, # Plain HTTP, that's nice
})
+ if not formats and geoblocked:
+ self.raise_geo_restricted(
+ msg='This video is not available due to geoblocking',
+ countries=['DE'])
+
self._sort_formats(formats)
- res['formats'] = formats
+ res.update({
+ 'subtitles': subtitles,
+ 'formats': formats,
+ })
return res
diff --git a/youtube_dl/extractor/atvat.py b/youtube_dl/extractor/atvat.py
index 1584d53fc..95e572d70 100644
--- a/youtube_dl/extractor/atvat.py
+++ b/youtube_dl/extractor/atvat.py
@@ -28,8 +28,10 @@ class ATVAtIE(InfoExtractor):
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
video_data = self._parse_json(unescapeHTML(self._search_regex(
- r'class="[^"]*jsb_video/FlashPlayer[^"]*"[^>]+data-jsb="([^"]+)"',
- webpage, 'player data')), display_id)['config']['initial_video']
+ [r'flashPlayerOptions\s*=\s*(["\'])(?P(?:(?!\1).)+)\1',
+ r'class="[^"]*jsb_video/FlashPlayer[^"]*"[^>]+data-jsb="(?P[^"]+)"'],
+ webpage, 'player data', group='json')),
+ display_id)['config']['initial_video']
video_id = video_data['id']
video_title = video_data['title']
diff --git a/youtube_dl/extractor/azmedien.py b/youtube_dl/extractor/azmedien.py
index 68f26e2ca..fcbdc71b9 100644
--- a/youtube_dl/extractor/azmedien.py
+++ b/youtube_dl/extractor/azmedien.py
@@ -1,213 +1,86 @@
# coding: utf-8
from __future__ import unicode_literals
+import json
import re
from .common import InfoExtractor
from .kaltura import KalturaIE
-from ..utils import (
- get_element_by_class,
- get_element_by_id,
- strip_or_none,
- urljoin,
-)
-class AZMedienBaseIE(InfoExtractor):
- def _kaltura_video(self, partner_id, entry_id):
- return self.url_result(
- 'kaltura:%s:%s' % (partner_id, entry_id), ie=KalturaIE.ie_key(),
- video_id=entry_id)
-
-
-class AZMedienIE(AZMedienBaseIE):
+class AZMedienIE(InfoExtractor):
IE_DESC = 'AZ Medien videos'
_VALID_URL = r'''(?x)
https?://
(?:www\.)?
- (?:
+ (?P
telezueri\.ch|
telebaern\.tv|
telem1\.ch
)/
- [0-9]+-show-[^/\#]+
- (?:
- /[0-9]+-episode-[^/\#]+
- (?:
- /[0-9]+-segment-(?:[^/\#]+\#)?|
- \#
- )|
- \#
+ [^/]+/
+ (?P
+ [^/]+-(?P\d+)
)
- (?P[^\#]+)
+ (?:
+ \#video=
+ (?P
+ [_0-9a-z]+
+ )
+ )?
'''
_TESTS = [{
- # URL with 'segment'
- 'url': 'http://www.telezueri.ch/62-show-zuerinews/13772-episode-sonntag-18-dezember-2016/32419-segment-massenabweisungen-beim-hiltl-club-wegen-pelzboom',
+ 'url': 'https://www.telezueri.ch/sonntalk/bundesrats-vakanzen-eu-rahmenabkommen-133214569',
'info_dict': {
- 'id': '1_2444peh4',
+ 'id': '1_anruz3wy',
'ext': 'mp4',
- 'title': 'Massenabweisungen beim Hiltl Club wegen Pelzboom',
- 'description': 'md5:9ea9dd1b159ad65b36ddcf7f0d7c76a8',
- 'uploader_id': 'TeleZ?ri',
- 'upload_date': '20161218',
- 'timestamp': 1482084490,
+ 'title': 'Bundesrats-Vakanzen / EU-Rahmenabkommen',
+ 'uploader_id': 'TVOnline',
+ 'upload_date': '20180930',
+ 'timestamp': 1538328802,
},
'params': {
'skip_download': True,
},
}, {
- # URL with 'segment' and fragment:
- 'url': 'http://www.telebaern.tv/118-show-news/14240-episode-dienstag-17-januar-2017/33666-segment-achtung-gefahr#zu-wenig-pflegerinnen-und-pfleger',
- 'only_matching': True
- }, {
- # URL with 'episode' and fragment:
- 'url': 'http://www.telem1.ch/47-show-sonntalk/13986-episode-soldaten-fuer-grenzschutz-energiestrategie-obama-bilanz#soldaten-fuer-grenzschutz-energiestrategie-obama-bilanz',
- 'only_matching': True
- }, {
- # URL with 'show' and fragment:
- 'url': 'http://www.telezueri.ch/66-show-sonntalk#burka-plakate-trump-putin-china-besuch',
+ 'url': 'https://www.telebaern.tv/telebaern-news/montag-1-oktober-2018-ganze-sendung-133531189#video=0_7xjo9lf1',
'only_matching': True
}]
- def _real_extract(self, url):
- video_id = self._match_id(url)
-
- webpage = self._download_webpage(url, video_id)
-
- partner_id = self._search_regex(
- r'',
webpage, 'app state'), video_id)
- video_data = list(app_state.values())[0]['initialState']['contentData']['content']
+ video_data = {}
+ getters = list(
+ lambda x, k=k: x['initialState']['content%s' % k]['content']
+ for k in ('Data', 'Detail')
+ )
+ for v in app_state.values():
+ content = try_get(v, getters, dict)
+ if content and content.get('contentId') == video_id:
+ video_data = content
+ break
title = video_data['title']
diff --git a/youtube_dl/extractor/imgur.py b/youtube_dl/extractor/imgur.py
index ecc958a17..0eb54db3f 100644
--- a/youtube_dl/extractor/imgur.py
+++ b/youtube_dl/extractor/imgur.py
@@ -12,7 +12,7 @@ from ..utils import (
class ImgurIE(InfoExtractor):
- _VALID_URL = r'https?://(?:i\.)?imgur\.com/(?:(?:gallery|(?:topic|r)/[^/]+)/)?(?P[a-zA-Z0-9]{6,})(?:[/?#&]+|\.[a-z0-9]+)?$'
+ _VALID_URL = r'https?://(?:i\.)?imgur\.com/(?!(?:a|gallery|(?:t(?:opic)?|r)/[^/]+)/)(?P[a-zA-Z0-9]+)'
_TESTS = [{
'url': 'https://i.imgur.com/A61SaA1.gifv',
@@ -20,28 +20,9 @@ class ImgurIE(InfoExtractor):
'id': 'A61SaA1',
'ext': 'mp4',
'title': 're:Imgur GIF$|MRW gifv is up and running without any bugs$',
- 'description': 'Imgur: The magic of the Internet',
},
}, {
'url': 'https://imgur.com/A61SaA1',
- 'info_dict': {
- 'id': 'A61SaA1',
- 'ext': 'mp4',
- 'title': 're:Imgur GIF$|MRW gifv is up and running without any bugs$',
- 'description': 'Imgur: The magic of the Internet',
- },
- }, {
- 'url': 'https://imgur.com/gallery/YcAQlkx',
- 'info_dict': {
- 'id': 'YcAQlkx',
- 'ext': 'mp4',
- 'title': 'Classic Steve Carell gif...cracks me up everytime....damn the repost downvotes....',
- }
- }, {
- 'url': 'http://imgur.com/topic/Funny/N8rOudd',
- 'only_matching': True,
- }, {
- 'url': 'http://imgur.com/r/aww/VQcQPhM',
'only_matching': True,
}, {
'url': 'https://i.imgur.com/crGpqCV.mp4',
@@ -50,8 +31,8 @@ class ImgurIE(InfoExtractor):
def _real_extract(self, url):
video_id = self._match_id(url)
- gifv_url = 'https://i.imgur.com/{id}.gifv'.format(id=video_id)
- webpage = self._download_webpage(gifv_url, video_id)
+ webpage = self._download_webpage(
+ 'https://i.imgur.com/{id}.gifv'.format(id=video_id), video_id)
width = int_or_none(self._og_search_property(
'video:width', webpage, default=None))
@@ -72,7 +53,6 @@ class ImgurIE(InfoExtractor):
'format_id': m.group('type').partition('/')[2],
'url': self._proto_relative_url(m.group('src')),
'ext': mimetype2ext(m.group('type')),
- 'acodec': 'none',
'width': width,
'height': height,
'http_headers': {
@@ -107,44 +87,64 @@ class ImgurIE(InfoExtractor):
return {
'id': video_id,
'formats': formats,
- 'description': self._og_search_description(webpage, default=None),
'title': self._og_search_title(webpage),
}
-class ImgurAlbumIE(InfoExtractor):
- _VALID_URL = r'https?://(?:i\.)?imgur\.com/(?:(?:a|gallery|topic/[^/]+)/)?(?P[a-zA-Z0-9]{5})(?:[/?#&]+)?$'
+class ImgurGalleryIE(InfoExtractor):
+ IE_NAME = 'imgur:gallery'
+ _VALID_URL = r'https?://(?:i\.)?imgur\.com/(?:gallery|(?:t(?:opic)?|r)/[^/]+)/(?P[a-zA-Z0-9]+)'
_TESTS = [{
'url': 'http://imgur.com/gallery/Q95ko',
'info_dict': {
'id': 'Q95ko',
+ 'title': 'Adding faces make every GIF better',
},
'playlist_count': 25,
}, {
- 'url': 'http://imgur.com/a/j6Orj',
+ 'url': 'http://imgur.com/topic/Aww/ll5Vk',
'only_matching': True,
}, {
- 'url': 'http://imgur.com/topic/Aww/ll5Vk',
+ 'url': 'https://imgur.com/gallery/YcAQlkx',
+ 'info_dict': {
+ 'id': 'YcAQlkx',
+ 'ext': 'mp4',
+ 'title': 'Classic Steve Carell gif...cracks me up everytime....damn the repost downvotes....',
+ }
+ }, {
+ 'url': 'http://imgur.com/topic/Funny/N8rOudd',
+ 'only_matching': True,
+ }, {
+ 'url': 'http://imgur.com/r/aww/VQcQPhM',
'only_matching': True,
}]
def _real_extract(self, url):
- album_id = self._match_id(url)
+ gallery_id = self._match_id(url)
- album_images = self._download_json(
- 'http://imgur.com/gallery/%s/album_images/hit.json?all=true' % album_id,
- album_id, fatal=False)
+ data = self._download_json(
+ 'https://imgur.com/gallery/%s.json' % gallery_id,
+ gallery_id)['data']['image']
- if album_images:
- data = album_images.get('data')
- if data and isinstance(data, dict):
- images = data.get('images')
- if images and isinstance(images, list):
- entries = [
- self.url_result('http://imgur.com/%s' % image['hash'])
- for image in images if image.get('hash')]
- return self.playlist_result(entries, album_id)
+ if data.get('is_album'):
+ entries = [
+ self.url_result('http://imgur.com/%s' % image['hash'], ImgurIE.ie_key(), image['hash'])
+ for image in data['album_images']['images'] if image.get('hash')]
+ return self.playlist_result(entries, gallery_id, data.get('title'), data.get('description'))
- # Fallback to single video
- return self.url_result('http://imgur.com/%s' % album_id, ImgurIE.ie_key())
+ return self.url_result('http://imgur.com/%s' % gallery_id, ImgurIE.ie_key(), gallery_id)
+
+
+class ImgurAlbumIE(ImgurGalleryIE):
+ IE_NAME = 'imgur:album'
+ _VALID_URL = r'https?://(?:i\.)?imgur\.com/a/(?P[a-zA-Z0-9]+)'
+
+ _TESTS = [{
+ 'url': 'http://imgur.com/a/j6Orj',
+ 'info_dict': {
+ 'id': 'j6Orj',
+ 'title': 'A Literary Analysis of "Star Wars: The Force Awakens"',
+ },
+ 'playlist_count': 12,
+ }]
diff --git a/youtube_dl/extractor/iprima.py b/youtube_dl/extractor/iprima.py
index 1d58d6e85..11bbeb592 100644
--- a/youtube_dl/extractor/iprima.py
+++ b/youtube_dl/extractor/iprima.py
@@ -12,7 +12,7 @@ from ..utils import (
class IPrimaIE(InfoExtractor):
- _VALID_URL = r'https?://(?:play|prima)\.iprima\.cz/(?:.+/)?(?P[^?#]+)'
+ _VALID_URL = r'https?://(?:[^/]+)\.iprima\.cz/(?:[^/]+/)*(?P[^/?#&]+)'
_GEO_BYPASS = False
_TESTS = [{
@@ -41,6 +41,24 @@ class IPrimaIE(InfoExtractor):
# iframe prima.iprima.cz
'url': 'https://prima.iprima.cz/porady/jak-se-stavi-sen/rodina-rathousova-praha',
'only_matching': True,
+ }, {
+ 'url': 'http://www.iprima.cz/filmy/desne-rande',
+ 'only_matching': True,
+ }, {
+ 'url': 'https://zoom.iprima.cz/10-nejvetsich-tajemstvi-zahad/posvatna-mista-a-stavby',
+ 'only_matching': True,
+ }, {
+ 'url': 'https://krimi.iprima.cz/mraz-0/sebevrazdy',
+ 'only_matching': True,
+ }, {
+ 'url': 'https://cool.iprima.cz/derava-silnice-nevadi',
+ 'only_matching': True,
+ }, {
+ 'url': 'https://love.iprima.cz/laska-az-za-hrob/slib-dany-bratrovi',
+ 'only_matching': True,
+ }, {
+ 'url': 'https://autosalon.iprima.cz/motorsport/7-epizoda-1',
+ 'only_matching': True,
}]
def _real_extract(self, url):
diff --git a/youtube_dl/extractor/ivi.py b/youtube_dl/extractor/ivi.py
index cb51cef2d..86c014b07 100644
--- a/youtube_dl/extractor/ivi.py
+++ b/youtube_dl/extractor/ivi.py
@@ -15,7 +15,7 @@ from ..utils import (
class IviIE(InfoExtractor):
IE_DESC = 'ivi.ru'
IE_NAME = 'ivi'
- _VALID_URL = r'https?://(?:www\.)?ivi\.ru/(?:watch/(?:[^/]+/)?|video/player\?.*?videoId=)(?P\d+)'
+ _VALID_URL = r'https?://(?:www\.)?ivi\.(?:ru|tv)/(?:watch/(?:[^/]+/)?|video/player\?.*?videoId=)(?P\d+)'
_GEO_BYPASS = False
_GEO_COUNTRIES = ['RU']
@@ -65,7 +65,11 @@ class IviIE(InfoExtractor):
'thumbnail': r're:^https?://.*\.jpg$',
},
'skip': 'Only works from Russia',
- }
+ },
+ {
+ 'url': 'https://www.ivi.tv/watch/33560/',
+ 'only_matching': True,
+ },
]
# Sorted by quality
diff --git a/youtube_dl/extractor/joj.py b/youtube_dl/extractor/joj.py
index d9f8dbfd2..62b28e980 100644
--- a/youtube_dl/extractor/joj.py
+++ b/youtube_dl/extractor/joj.py
@@ -61,7 +61,7 @@ class JojIE(InfoExtractor):
bitrates = self._parse_json(
self._search_regex(
- r'(?s)bitrates\s*=\s*({.+?});', webpage, 'bitrates',
+ r'(?s)(?:src|bitrates)\s*=\s*({.+?});', webpage, 'bitrates',
default='{}'),
video_id, transform_source=js_to_json, fatal=False)
diff --git a/youtube_dl/extractor/kaltura.py b/youtube_dl/extractor/kaltura.py
index 04f68fce4..fdf7f5bbc 100644
--- a/youtube_dl/extractor/kaltura.py
+++ b/youtube_dl/extractor/kaltura.py
@@ -192,6 +192,8 @@ class KalturaIE(InfoExtractor):
'entryId': video_id,
'service': 'baseentry',
'ks': '{1:result:ks}',
+ 'responseProfile:fields': 'createdAt,dataUrl,duration,name,plays,thumbnailUrl,userId',
+ 'responseProfile:type': 1,
},
{
'action': 'getbyentryid',
diff --git a/youtube_dl/extractor/laola1tv.py b/youtube_dl/extractor/laola1tv.py
index c7f813370..fa217365a 100644
--- a/youtube_dl/extractor/laola1tv.py
+++ b/youtube_dl/extractor/laola1tv.py
@@ -2,6 +2,7 @@
from __future__ import unicode_literals
import json
+import re
from .common import InfoExtractor
from ..utils import (
@@ -32,7 +33,8 @@ class Laola1TvEmbedIE(InfoExtractor):
def _extract_token_url(self, stream_access_url, video_id, data):
return self._download_json(
- stream_access_url, video_id, headers={
+ self._proto_relative_url(stream_access_url, 'https:'), video_id,
+ headers={
'Content-Type': 'application/json',
}, data=json.dumps(data).encode())['data']['stream-access'][0]
@@ -119,9 +121,59 @@ class Laola1TvEmbedIE(InfoExtractor):
}
-class Laola1TvIE(Laola1TvEmbedIE):
+class Laola1TvBaseIE(Laola1TvEmbedIE):
+ def _extract_video(self, url):
+ display_id = self._match_id(url)
+ webpage = self._download_webpage(url, display_id)
+
+ if 'Dieser Livestream ist bereits beendet.' in webpage:
+ raise ExtractorError('This live stream has already finished.', expected=True)
+
+ conf = self._parse_json(self._search_regex(
+ r'(?s)conf\s*=\s*({.+?});', webpage, 'conf'),
+ display_id,
+ transform_source=lambda s: js_to_json(re.sub(r'shareurl:.+,', '', s)))
+ video_id = conf['videoid']
+
+ config = self._download_json(conf['configUrl'], video_id, query={
+ 'videoid': video_id,
+ 'partnerid': conf['partnerid'],
+ 'language': conf.get('language', ''),
+ 'portal': conf.get('portalid', ''),
+ })
+ error = config.get('error')
+ if error:
+ raise ExtractorError('%s said: %s' % (self.IE_NAME, error), expected=True)
+
+ video_data = config['video']
+ title = video_data['title']
+ is_live = video_data.get('isLivestream') and video_data.get('isLive')
+ meta = video_data.get('metaInformation')
+ sports = meta.get('sports')
+ categories = sports.split(',') if sports else []
+
+ token_url = self._extract_token_url(
+ video_data['streamAccess'], video_id,
+ video_data['abo']['required'])
+
+ formats = self._extract_formats(token_url, video_id)
+
+ return {
+ 'id': video_id,
+ 'display_id': display_id,
+ 'title': self._live_title(title) if is_live else title,
+ 'description': video_data.get('description'),
+ 'thumbnail': video_data.get('image'),
+ 'categories': categories,
+ 'formats': formats,
+ 'is_live': is_live,
+ }
+
+
+class Laola1TvIE(Laola1TvBaseIE):
IE_NAME = 'laola1tv'
_VALID_URL = r'https?://(?:www\.)?laola1\.tv/[a-z]+-[a-z]+/[^/]+/(?P[^/?#&]+)'
+
_TESTS = [{
'url': 'http://www.laola1.tv/de-de/video/straubing-tigers-koelner-haie/227883.html',
'info_dict': {
@@ -169,52 +221,30 @@ class Laola1TvIE(Laola1TvEmbedIE):
}]
def _real_extract(self, url):
- display_id = self._match_id(url)
+ return self._extract_video(url)
- webpage = self._download_webpage(url, display_id)
- if 'Dieser Livestream ist bereits beendet.' in webpage:
- raise ExtractorError('This live stream has already finished.', expected=True)
+class EHFTVIE(Laola1TvBaseIE):
+ IE_NAME = 'ehftv'
+ _VALID_URL = r'https?://(?:www\.)?ehftv\.com/[a-z]+(?:-[a-z]+)?/[^/]+/(?P[^/?#&]+)'
- conf = self._parse_json(self._search_regex(
- r'(?s)conf\s*=\s*({.+?});', webpage, 'conf'),
- display_id, js_to_json)
+ _TESTS = [{
+ 'url': 'https://www.ehftv.com/int/video/paris-saint-germain-handball-pge-vive-kielce/1166761',
+ 'info_dict': {
+ 'id': '1166761',
+ 'display_id': 'paris-saint-germain-handball-pge-vive-kielce',
+ 'ext': 'mp4',
+ 'title': 'Paris Saint-Germain Handball - PGE Vive Kielce',
+ 'is_live': False,
+ 'categories': ['Handball'],
+ },
+ 'params': {
+ 'skip_download': True,
+ },
+ }]
- video_id = conf['videoid']
-
- config = self._download_json(conf['configUrl'], video_id, query={
- 'videoid': video_id,
- 'partnerid': conf['partnerid'],
- 'language': conf.get('language', ''),
- 'portal': conf.get('portalid', ''),
- })
- error = config.get('error')
- if error:
- raise ExtractorError('%s said: %s' % (self.IE_NAME, error), expected=True)
-
- video_data = config['video']
- title = video_data['title']
- is_live = video_data.get('isLivestream') and video_data.get('isLive')
- meta = video_data.get('metaInformation')
- sports = meta.get('sports')
- categories = sports.split(',') if sports else []
-
- token_url = self._extract_token_url(
- video_data['streamAccess'], video_id,
- video_data['abo']['required'])
-
- formats = self._extract_formats(token_url, video_id)
-
- return {
- 'id': video_id,
- 'display_id': display_id,
- 'title': self._live_title(title) if is_live else title,
- 'description': video_data.get('description'),
- 'thumbnail': video_data.get('image'),
- 'categories': categories,
- 'formats': formats,
- 'is_live': is_live,
- }
+ def _real_extract(self, url):
+ return self._extract_video(url)
class ITTFIE(InfoExtractor):
diff --git a/youtube_dl/extractor/lecturio.py b/youtube_dl/extractor/lecturio.py
new file mode 100644
index 000000000..24f78d928
--- /dev/null
+++ b/youtube_dl/extractor/lecturio.py
@@ -0,0 +1,229 @@
+# coding: utf-8
+from __future__ import unicode_literals
+
+import re
+
+from .common import InfoExtractor
+from ..compat import compat_str
+from ..utils import (
+ determine_ext,
+ extract_attributes,
+ ExtractorError,
+ float_or_none,
+ int_or_none,
+ str_or_none,
+ url_or_none,
+ urlencode_postdata,
+ urljoin,
+)
+
+
+class LecturioBaseIE(InfoExtractor):
+ _LOGIN_URL = 'https://app.lecturio.com/en/login'
+ _NETRC_MACHINE = 'lecturio'
+
+ def _real_initialize(self):
+ self._login()
+
+ def _login(self):
+ username, password = self._get_login_info()
+ if username is None:
+ return
+
+ # Sets some cookies
+ _, urlh = self._download_webpage_handle(
+ self._LOGIN_URL, None, 'Downloading login popup')
+
+ def is_logged(url_handle):
+ return self._LOGIN_URL not in compat_str(url_handle.geturl())
+
+ # Already logged in
+ if is_logged(urlh):
+ return
+
+ login_form = {
+ 'signin[email]': username,
+ 'signin[password]': password,
+ 'signin[remember]': 'on',
+ }
+
+ response, urlh = self._download_webpage_handle(
+ self._LOGIN_URL, None, 'Logging in',
+ data=urlencode_postdata(login_form))
+
+ # Logged in successfully
+ if is_logged(urlh):
+ return
+
+ errors = self._html_search_regex(
+ r'(?s)]+class=["\']error_list[^>]+>(.+?)
', response,
+ 'errors', default=None)
+ if errors:
+ raise ExtractorError('Unable to login: %s' % errors, expected=True)
+ raise ExtractorError('Unable to log in')
+
+
+class LecturioIE(LecturioBaseIE):
+ _VALID_URL = r'''(?x)
+ https://
+ (?:
+ app\.lecturio\.com/[^/]+/(?P[^/?#&]+)\.lecture|
+ (?:www\.)?lecturio\.de/[^/]+/(?P[^/?#&]+)\.vortrag
+ )
+ '''
+ _TESTS = [{
+ 'url': 'https://app.lecturio.com/medical-courses/important-concepts-and-terms-introduction-to-microbiology.lecture#tab/videos',
+ 'md5': 'f576a797a5b7a5e4e4bbdfc25a6a6870',
+ 'info_dict': {
+ 'id': '39634',
+ 'ext': 'mp4',
+ 'title': 'Important Concepts and Terms – Introduction to Microbiology',
+ },
+ 'skip': 'Requires lecturio account credentials',
+ }, {
+ 'url': 'https://www.lecturio.de/jura/oeffentliches-recht-staatsexamen.vortrag',
+ 'only_matching': True,
+ }]
+
+ _CC_LANGS = {
+ 'German': 'de',
+ 'English': 'en',
+ 'Spanish': 'es',
+ 'French': 'fr',
+ 'Polish': 'pl',
+ 'Russian': 'ru',
+ }
+
+ def _real_extract(self, url):
+ mobj = re.match(self._VALID_URL, url)
+ display_id = mobj.group('id') or mobj.group('id_de')
+
+ webpage = self._download_webpage(
+ 'https://app.lecturio.com/en/lecture/%s/player.html' % display_id,
+ display_id)
+
+ lecture_id = self._search_regex(
+ r'lecture_id\s*=\s*(?:L_)?(\d+)', webpage, 'lecture id')
+
+ api_url = self._search_regex(
+ r'lectureDataLink\s*:\s*(["\'])(?P(?:(?!\1).)+)\1', webpage,
+ 'api url', group='url')
+
+ video = self._download_json(api_url, display_id)
+
+ title = video['title'].strip()
+
+ formats = []
+ for format_ in video['content']['media']:
+ if not isinstance(format_, dict):
+ continue
+ file_ = format_.get('file')
+ if not file_:
+ continue
+ ext = determine_ext(file_)
+ if ext == 'smil':
+ # smil contains only broken RTMP formats anyway
+ continue
+ file_url = url_or_none(file_)
+ if not file_url:
+ continue
+ label = str_or_none(format_.get('label'))
+ filesize = int_or_none(format_.get('fileSize'))
+ formats.append({
+ 'url': file_url,
+ 'format_id': label,
+ 'filesize': float_or_none(filesize, invscale=1000)
+ })
+ self._sort_formats(formats)
+
+ subtitles = {}
+ automatic_captions = {}
+ cc = self._parse_json(
+ self._search_regex(
+ r'subtitleUrls\s*:\s*({.+?})\s*,', webpage, 'subtitles',
+ default='{}'), display_id, fatal=False)
+ for cc_label, cc_url in cc.items():
+ cc_url = url_or_none(cc_url)
+ if not cc_url:
+ continue
+ lang = self._search_regex(
+ r'/([a-z]{2})_', cc_url, 'lang',
+ default=cc_label.split()[0] if cc_label else 'en')
+ original_lang = self._search_regex(
+ r'/[a-z]{2}_([a-z]{2})_', cc_url, 'original lang',
+ default=None)
+ sub_dict = (automatic_captions
+ if 'auto-translated' in cc_label or original_lang
+ else subtitles)
+ sub_dict.setdefault(self._CC_LANGS.get(lang, lang), []).append({
+ 'url': cc_url,
+ })
+
+ return {
+ 'id': lecture_id,
+ 'title': title,
+ 'formats': formats,
+ 'subtitles': subtitles,
+ 'automatic_captions': automatic_captions,
+ }
+
+
+class LecturioCourseIE(LecturioBaseIE):
+ _VALID_URL = r'https://app\.lecturio\.com/[^/]+/(?P[^/?#&]+)\.course'
+ _TEST = {
+ 'url': 'https://app.lecturio.com/medical-courses/microbiology-introduction.course#/',
+ 'info_dict': {
+ 'id': 'microbiology-introduction',
+ 'title': 'Microbiology: Introduction',
+ },
+ 'playlist_count': 45,
+ 'skip': 'Requires lecturio account credentials',
+ }
+
+ def _real_extract(self, url):
+ display_id = self._match_id(url)
+
+ webpage = self._download_webpage(url, display_id)
+
+ entries = []
+ for mobj in re.finditer(
+ r'(?s)<[^>]+\bdata-url=(["\'])(?:(?!\1).)+\.lecture\b[^>]+>',
+ webpage):
+ params = extract_attributes(mobj.group(0))
+ lecture_url = urljoin(url, params.get('data-url'))
+ lecture_id = params.get('data-id')
+ entries.append(self.url_result(
+ lecture_url, ie=LecturioIE.ie_key(), video_id=lecture_id))
+
+ title = self._search_regex(
+ r']+class=["\']content-title[^>]+>([^<]+)', webpage,
+ 'title', default=None)
+
+ return self.playlist_result(entries, display_id, title)
+
+
+class LecturioDeCourseIE(LecturioBaseIE):
+ _VALID_URL = r'https://(?:www\.)?lecturio\.de/[^/]+/(?P[^/?#&]+)\.kurs'
+ _TEST = {
+ 'url': 'https://www.lecturio.de/jura/grundrechte.kurs',
+ 'only_matching': True,
+ }
+
+ def _real_extract(self, url):
+ display_id = self._match_id(url)
+
+ webpage = self._download_webpage(url, display_id)
+
+ entries = []
+ for mobj in re.finditer(
+ r'(?s)]+\bdata-lecture-id=["\'](?P\d+).+?\bhref=(["\'])(?P(?:(?!\2).)+\.vortrag)\b[^>]+>',
+ webpage):
+ lecture_url = urljoin(url, mobj.group('url'))
+ lecture_id = mobj.group('id')
+ entries.append(self.url_result(
+ lecture_url, ie=LecturioIE.ie_key(), video_id=lecture_id))
+
+ title = self._search_regex(
+ r' |