1
0
mirror of https://github.com/l1ving/youtube-dl synced 2025-02-03 06:42:52 +08:00

Merge pull request #10 from rg3/master

op
This commit is contained in:
世外桃源 KPT-D 2017-12-11 23:48:00 +04:00 committed by GitHub
commit 748f4bd7d5
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
203 changed files with 6084 additions and 2503 deletions

View File

@ -6,8 +6,8 @@
--- ---
### Make sure you are using the *latest* version: run `youtube-dl --version` and ensure your version is *2017.08.27.1*. If it's not, read [this FAQ entry](https://github.com/rg3/youtube-dl/blob/master/README.md#how-do-i-update-youtube-dl) and update. Issues with outdated version will be rejected. ### Make sure you are using the *latest* version: run `youtube-dl --version` and ensure your version is *2017.12.10*. If it's not, read [this FAQ entry](https://github.com/rg3/youtube-dl/blob/master/README.md#how-do-i-update-youtube-dl) and update. Issues with outdated version will be rejected.
- [ ] I've **verified** and **I assure** that I'm running youtube-dl **2017.08.27.1** - [ ] I've **verified** and **I assure** that I'm running youtube-dl **2017.12.10**
### Before submitting an *issue* make sure you have: ### Before submitting an *issue* make sure you have:
- [ ] At least skimmed through the [README](https://github.com/rg3/youtube-dl/blob/master/README.md), **most notably** the [FAQ](https://github.com/rg3/youtube-dl#faq) and [BUGS](https://github.com/rg3/youtube-dl#bugs) sections - [ ] At least skimmed through the [README](https://github.com/rg3/youtube-dl/blob/master/README.md), **most notably** the [FAQ](https://github.com/rg3/youtube-dl#faq) and [BUGS](https://github.com/rg3/youtube-dl#bugs) sections
@ -35,7 +35,7 @@ Add the `-v` flag to **your command line** you run youtube-dl with (`youtube-dl
[debug] User config: [] [debug] User config: []
[debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj'] [debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj']
[debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251 [debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251
[debug] youtube-dl version 2017.08.27.1 [debug] youtube-dl version 2017.12.10
[debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2 [debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2
[debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4 [debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4
[debug] Proxy map: {} [debug] Proxy map: {}

View File

@ -9,6 +9,7 @@
### Before submitting a *pull request* make sure you have: ### Before submitting a *pull request* make sure you have:
- [ ] At least skimmed through [adding new extractor tutorial](https://github.com/rg3/youtube-dl#adding-support-for-a-new-site) and [youtube-dl coding conventions](https://github.com/rg3/youtube-dl#youtube-dl-coding-conventions) sections - [ ] At least skimmed through [adding new extractor tutorial](https://github.com/rg3/youtube-dl#adding-support-for-a-new-site) and [youtube-dl coding conventions](https://github.com/rg3/youtube-dl#youtube-dl-coding-conventions) sections
- [ ] [Searched](https://github.com/rg3/youtube-dl/search?q=is%3Apr&type=Issues) the bugtracker for similar pull requests - [ ] [Searched](https://github.com/rg3/youtube-dl/search?q=is%3Apr&type=Issues) the bugtracker for similar pull requests
- [ ] Checked the code with [flake8](https://pypi.python.org/pypi/flake8)
### In order to be accepted and merged into youtube-dl each piece of code must be in public domain or released under [Unlicense](http://unlicense.org/). Check one of the following options: ### In order to be accepted and merged into youtube-dl each piece of code must be in public domain or released under [Unlicense](http://unlicense.org/). Check one of the following options:
- [ ] I am the original author of this code and I am willing to release it under [Unlicense](http://unlicense.org/) - [ ] I am the original author of this code and I am willing to release it under [Unlicense](http://unlicense.org/)

1
.gitignore vendored
View File

@ -22,6 +22,7 @@ cover/
updates_key.pem updates_key.pem
*.egg-info *.egg-info
*.srt *.srt
*.ttml
*.sbv *.sbv
*.vtt *.vtt
*.flv *.flv

View File

@ -11,12 +11,12 @@ sudo: false
env: env:
- YTDL_TEST_SET=core - YTDL_TEST_SET=core
- YTDL_TEST_SET=download - YTDL_TEST_SET=download
matrix:
fast_finish: true
allow_failures:
- env: YTDL_TEST_SET=download
script: ./devscripts/run_tests.sh script: ./devscripts/run_tests.sh
notifications: notifications:
email: email:
- filippo.valsorda@gmail.com - filippo.valsorda@gmail.com
- yasoob.khld@gmail.com - yasoob.khld@gmail.com
# irc:
# channels:
# - "irc.freenode.org#youtube-dl"
# skip_join: true

View File

@ -224,3 +224,10 @@ Giuseppe Fabiano
Örn Guðjónsson Örn Guðjónsson
Parmjit Virk Parmjit Virk
Genki Sky Genki Sky
Ľuboš Katrinec
Corey Nicholson
Ashutosh Chaudhary
John Dong
Tatsuyuki Ishi
Daniel Weber
Kay Bouché

View File

@ -82,6 +82,8 @@ To run the test, simply invoke your favorite test runner, or execute a test file
python test/test_download.py python test/test_download.py
nosetests nosetests
See item 6 of [new extractor tutorial](#adding-support-for-a-new-site) for how to run extractor specific test cases.
If you want to create a build of youtube-dl yourself, you'll need If you want to create a build of youtube-dl yourself, you'll need
* python * python
@ -149,7 +151,7 @@ After you have ensured this site is distributing its content legally, you can fo
} }
``` ```
5. Add an import in [`youtube_dl/extractor/extractors.py`](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/extractors.py). 5. Add an import in [`youtube_dl/extractor/extractors.py`](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/extractors.py).
6. Run `python test/test_download.py TestDownload.test_YourExtractor`. This *should fail* at first, but you can continually re-run it until you're done. If you decide to add more than one test, then rename ``_TEST`` to ``_TESTS`` and make it into a list of dictionaries. The tests will then be named `TestDownload.test_YourExtractor`, `TestDownload.test_YourExtractor_1`, `TestDownload.test_YourExtractor_2`, etc. 6. Run `python test/test_download.py TestDownload.test_YourExtractor`. This *should fail* at first, but you can continually re-run it until you're done. If you decide to add more than one test, then rename ``_TEST`` to ``_TESTS`` and make it into a list of dictionaries. The tests will then be named `TestDownload.test_YourExtractor`, `TestDownload.test_YourExtractor_1`, `TestDownload.test_YourExtractor_2`, etc. Note that tests with `only_matching` key in test's dict are not counted in.
7. Have a look at [`youtube_dl/extractor/common.py`](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/common.py) for possible helper methods and a [detailed description of what your extractor should and may return](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/common.py#L74-L252). Add tests and code for as many as you want. 7. Have a look at [`youtube_dl/extractor/common.py`](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/common.py) for possible helper methods and a [detailed description of what your extractor should and may return](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/common.py#L74-L252). Add tests and code for as many as you want.
8. Make sure your code follows [youtube-dl coding conventions](#youtube-dl-coding-conventions) and check the code with [flake8](https://pypi.python.org/pypi/flake8). Also make sure your code works under all [Python](https://www.python.org/) versions claimed supported by youtube-dl, namely 2.6, 2.7, and 3.2+. 8. Make sure your code follows [youtube-dl coding conventions](#youtube-dl-coding-conventions) and check the code with [flake8](https://pypi.python.org/pypi/flake8). Also make sure your code works under all [Python](https://www.python.org/) versions claimed supported by youtube-dl, namely 2.6, 2.7, and 3.2+.
9. When the tests pass, [add](https://git-scm.com/docs/git-add) the new files and [commit](https://git-scm.com/docs/git-commit) them and [push](https://git-scm.com/docs/git-push) the result, like this: 9. When the tests pass, [add](https://git-scm.com/docs/git-add) the new files and [commit](https://git-scm.com/docs/git-commit) them and [push](https://git-scm.com/docs/git-push) the result, like this:

360
ChangeLog
View File

@ -1,3 +1,361 @@
version 2017.12.10
Core
+ [utils] Add sami mimetype to mimetype2ext
Extractors
* [culturebox] Improve video id extraction (#14947)
* [twitter] Improve extraction (#14197)
+ [udemy] Extract more HLS formats
* [udemy] Improve course id extraction (#14938)
+ [stretchinternet] Add support for portal.stretchinternet.com (#14576)
* [ellentube] Fix extraction (#14407, #14570)
+ [raiplay:playlist] Add support for playlists (#14563)
* [sonyliv] Bypass geo restriction
* [sonyliv] Extract higher quality formats (#14922)
* [fox] Extract subtitles
+ [fox] Add support for Adobe Pass authentication (#14205, #14489)
- [dailymotion:cloud] Remove extractor (#6794)
* [xhamster] Fix thumbnail extraction (#14780)
+ [xhamster] Add support for mobile URLs (#14780)
* [generic] Don't pass video id as mpd id while extracting DASH (#14902)
* [ard] Skip invalid stream URLs (#14906)
* [porncom] Fix metadata extraction (#14911)
* [pluralsight] Detect agreement request (#14913)
* [toutv] Fix login (#14614)
version 2017.12.02
Core
+ [downloader/fragment] Commit part file after each fragment
+ [extractor/common] Add durations for DASH fragments with bare SegmentURLs
+ [extractor/common] Add support for DASH manifests with SegmentLists with
bare SegmentURLs (#14844)
+ [utils] Add hvc1 codec code to parse_codecs
Extractors
* [xhamster] Fix extraction (#14884)
* [youku] Update ccode (#14872)
* [mnet] Fix format extraction (#14883)
+ [xiami] Add Referer header to API request
* [mtv] Correct scc extention in extracted subtitles (#13730)
* [vvvvid] Fix extraction for kenc videos (#13406)
+ [br] Add support for BR Mediathek videos (#14560, #14788)
+ [daisuki] Add support for motto.daisuki.com (#14681)
* [odnoklassniki] Fix API metadata request (#14862)
* [itv] Fix HLS formats extraction
+ [pbs] Add another media id regular expression
version 2017.11.26
Core
* [extractor/common] Use final URL when dumping request (#14769)
Extractors
* [fczenit] Fix extraction
- [firstpost] Remove extractor
* [freespeech] Fix extraction
* [nexx] Extract more formats
+ [openload] Add support for openload.link (#14763)
* [empflix] Relax URL regular expression
* [empflix] Fix extractrion
* [tnaflix] Don't modify download URLs (#14811)
- [gamersyde] Remove extractor
* [francetv:generationwhat] Fix extraction
+ [massengeschmacktv] Add support for Massengeschmack TV
* [fox9] Fix extraction
* [faz] Fix extraction and add support for Perform Group embeds (#14714)
+ [performgroup] Add support for performgroup.com
+ [jwplatform] Add support for iframes (#14828)
* [culturebox] Fix extraction (#14827)
* [youku] Fix extraction; update ccode (#14815)
* [livestream] Make SMIL extraction non fatal (#14792)
+ [drtuber] Add support for mobile URLs (#14772)
+ [spankbang] Add support for mobile URLs (#14771)
* [instagram] Fix description, timestamp and counters extraction (#14755)
version 2017.11.15
Core
* [common] Skip Apple FairPlay m3u8 manifests (#14741)
* [YoutubeDL] Fix playlist range optimization for --playlist-items (#14740)
Extractors
* [vshare] Capture and output error message
* [vshare] Fix extraction (#14473)
* [crunchyroll] Extract old RTMP formats
* [tva] Fix extraction (#14736)
* [gamespot] Lower preference of HTTP formats (#14652)
* [instagram:user] Fix extraction (#14699)
* [ccma] Fix typo (#14730)
- Remove sensitive data from logging in messages
* [instagram:user] Fix extraction (#14699)
+ [gamespot] Add support for article URLs (#14652)
* [gamespot] Skip Brightcove Once HTTP formats (#14652)
* [cartoonnetwork] Update tokenizer_src (#14666)
+ [wsj] Recognize another URL pattern (#14704)
* [pandatv] Update API URL and sign format URLs (#14693)
* [crunchyroll] Use old login method (#11572)
version 2017.11.06
Core
+ [extractor/common] Add protocol for f4m formats
* [f4m] Prefer baseURL for relative URLs (#14660)
* [extractor/common] Respect URL query in _extract_wowza_formats (14645)
Extractors
+ [hotstar:playlist] Add support for playlists (#12465)
* [hotstar] Bypass geo restriction (#14672)
- [22tracks] Remove extractor (#11024, #14628)
+ [skysport] Sdd support ooyala videos protected with embed_token (#14641)
* [gamespot] Extract formats referenced with new data fields (#14652)
* [spankbang] Detect unavailable videos (#14644)
version 2017.10.29
Core
* [extractor/common] Prefix format id for audio only HLS formats
+ [utils] Add support for zero years and months in parse_duration
Extractors
* [egghead] Fix extraction (#14388)
+ [fxnetworks] Extract series metadata (#14603)
+ [younow] Add support for younow.com (#9255, #9432, #12436)
* [dctptv] Fix extraction (#14599)
* [youtube] Restrict embed regex (#14600)
* [vimeo] Restrict iframe embed regex (#14600)
* [soundgasm] Improve extraction (#14588)
- [myvideo] Remove extractor (#8557)
+ [nbc] Add support for classic-tv videos (#14575)
+ [vrtnu] Add support for cookies authentication and simplify (#11873)
+ [canvas] Add support for vrt.be/vrtnu (#11873)
* [twitch:clips] Fix title extraction (#14566)
+ [ndtv] Add support for sub-sites (#14534)
* [dramafever] Fix login error message extraction
+ [nick] Add support for more nickelodeon sites (no, dk, se, ch, fr, es, pt,
ro, hu) (#14553)
version 2017.10.20
Core
* [downloader/fragment] Report warning instead of error on inconsistent
download state
* [downloader/hls] Fix total fragments count when ad fragments exist
Extractors
* [parliamentliveuk] Fix extraction (#14524)
* [soundcloud] Update client id (#14546)
+ [servus] Add support for servus.com (#14362)
+ [unity] Add support for unity3d.com (#14528)
* [youtube] Replace youtube redirect URLs in description (#14517)
* [pbs] Restrict direct video URL regular expression (#14519)
* [drtv] Respect preference for direct HTTP formats (#14509)
+ [eporner] Add support for embed URLs (#14507)
* [arte] Capture and output error message
* [niconico] Improve uploader metadata extraction robustness (#14135)
version 2017.10.15.1
Core
* [downloader/hls] Ignore anvato ad fragments (#14496)
* [downloader/fragment] Output ad fragment count
Extractors
* [scrippsnetworks:watch] Bypass geo restriction
+ [anvato] Add ability to bypass geo restriction
* [redditr] Fix extraction for URLs with query (#14495)
version 2017.10.15
Core
+ [common] Add support for jwplayer youtube embeds
Extractors
* [scrippsnetworks:watch] Fix extraction (#14389)
* [anvato] Process master m3u8 manifests
* [youtube] Fix relative URLs in description
* [spike] Bypass geo restriction
+ [howstuffworks] Add support for more domains
* [infoq] Fix http format downloading
+ [rtlnl] Add support for another type of embeds
+ [onionstudios] Add support for bulbs-video embeds
* [udn] Fix extraction
* [shahid] Fix extraction (#14448)
* [kaltura] Ignore Widevine encrypted video (.wvm) (#14471)
* [vh1] Fix extraction (#9613)
version 2017.10.12
Core
* [YoutubeDL] Improve _default_format_spec (#14461)
Extractors
* [steam] Fix extraction (#14067)
+ [funk] Add support for funk.net (#14464)
+ [nexx] Add support for shortcuts and relax domain id extraction
+ [voxmedia] Add support for recode.net (#14173)
+ [once] Add support for vmap URLs
+ [generic] Add support for channel9 embeds (#14469)
* [tva] Fix extraction (#14328)
+ [tubitv] Add support for new URL format (#14460)
- [afreecatv:global] Remove extractor
- [youtube:shared] Removed extractor (#14420)
+ [slideslive] Add support for slideslive.com (#2680)
+ [facebook] Support thumbnails (#14416)
* [vvvvid] Fix episode number extraction (#14456)
* [hrti:playlist] Relax URL regular expression
* [wdr] Relax media link regular expression (#14447)
* [hrti] Relax URL regular expression (#14443)
* [fox] Delegate extraction to uplynk:preplay (#14147)
+ [youtube] Add support for hooktube.com (#14437)
version 2017.10.07
Core
* [YoutubeDL] Ignore duplicates in --playlist-items
* [YoutubeDL] Fix out of range --playlist-items for iterable playlists and
reduce code duplication (#14425)
+ [utils] Use cache in OnDemandPagedList by default
* [postprocessor/ffmpeg] Convert to opus using libopus (#14381)
Extractors
* [reddit] Sort formats (#14430)
* [lnkgo] Relax URL regular expression (#14423)
* [pornflip] Extend URL regular expression (#14405, #14406)
+ [xtube] Add support for embed URLs (#14417)
+ [xvideos] Add support for embed URLs and improve extraction (#14409)
* [beeg] Fix extraction (#14403)
* [tvn24] Relax URL regular expression (#14395)
* [nbc] Fix extraction (#13651, #13715, #14137, #14198, #14312, #14314, #14378,
#14392, #14414, #14419, #14431)
+ [ketnet] Add support for videos without direct sources (#14377)
* [canvas] Generalize mediazone.vrt.be extractor and rework canvas and een
+ [afreecatv] Add support for adult videos (#14376)
version 2017.10.01
Core
* [YoutubeDL] Document youtube_include_dash_manifest
Extractors
+ [tvp] Add support for new URL schema (#14368)
+ [generic] Add support for single format Video.js embeds (#14371)
* [yahoo] Bypass geo restriction for brightcove (#14210)
* [yahoo] Use extracted brightcove account id (#14210)
* [rtve:alacarta] Fix extraction (#14290)
+ [yahoo] Add support for custom brigthcove embeds (#14210)
+ [generic] Add support for Video.js embeds
+ [gfycat] Add support for /gifs/detail URLs (#14322)
* [generic] Fix infinite recursion for twitter:player URLs (#14339)
* [xhamsterembed] Fix extraction (#14308)
version 2017.09.24
Core
+ [options] Accept lrc as a subtitle conversion target format (#14292)
* [utils] Fix handling raw TTML subtitles (#14191)
Extractors
* [24video] Fix timestamp extraction and make non fatal (#14295)
+ [24video] Add support for 24video.adult (#14295)
+ [kakao] Add support for tv.kakao.com (#12298, #14007)
+ [twitter] Add support for URLs without user id (#14270)
+ [americastestkitchen] Add support for americastestkitchen.com (#10764,
#13996)
* [generic] Fix support for multiple HTML5 videos on one page (#14080)
* [mixcloud] Fix extraction (#14088, #14132)
+ [lynda] Add support for educourse.ga (#14286)
* [beeg] Fix extraction (#14275)
* [nbcsports:vplayer] Correct theplatform URL (#13873)
* [twitter] Fix duration extraction (#14141)
* [tvplay] Bypass geo restriction
+ [heise] Add support for YouTube embeds (#14109)
+ [popcorntv] Add support for popcorntv.it (#5914, #14211)
* [viki] Update app data (#14181)
* [morningstar] Relax URL regular expression (#14222)
* [openload] Fix extraction (#14225, #14257)
* [noovo] Fix extraction (#14214)
* [dailymotion:playlist] Relax URL regular expression (#14219)
+ [twitch] Add support for go.twitch.tv URLs (#14215)
* [vgtv] Relax URL regular expression (#14223)
version 2017.09.15
Core
* [downloader/fragment] Restart inconsistent incomplete fragment downloads
(#13731)
* [YoutubeDL] Download raw subtitles files (#12909, #14191)
Extractors
* [condenast] Fix extraction (#14196, #14207)
+ [orf] Add support for f4m stories
* [tv4] Relax URL regular expression (#14206)
* [animeondemand] Bypass geo restriction
+ [animeondemand] Add support for flash videos (#9944)
version 2017.09.11
Extractors
* [rutube:playlist] Fix suitable (#14166)
version 2017.09.10
Core
+ [utils] Introduce bool_or_none
* [YoutubeDL] Ensure dir existence for each requested format (#14116)
Extractors
* [fox] Fix extraction (#14147)
* [rutube] Use bool_or_none
* [rutube] Rework and generalize playlist extractors (#13565)
+ [rutube:playlist] Add support for playlists (#13534, #13565)
+ [radiocanada] Add fallback for title extraction (#14145)
* [vk] Use dedicated YouTube embeds extraction routine
* [vice] Use dedicated YouTube embeds extraction routine
* [cracked] Use dedicated YouTube embeds extraction routine
* [chilloutzone] Use dedicated YouTube embeds extraction routine
* [abcnews] Use dedicated YouTube embeds extraction routine
* [youtube] Separate methods for embeds extraction
* [redtube] Fix formats extraction (#14122)
* [arte] Relax unavailability check (#14112)
+ [manyvids] Add support for preview videos from manyvids.com (#14053, #14059)
* [vidme:user] Relax URL regular expression (#14054)
* [bpb] Fix extraction (#14043, #14086)
* [soundcloud] Fix download URL with private tracks (#14093)
* [aliexpress:live] Add support for live.aliexpress.com (#13698, #13707)
* [viidea] Capture and output lecture error message (#14099)
* [radiocanada] Skip unsupported platforms (#14100)
version 2017.09.02
Extractors
* [youtube] Force old layout for each webpage (#14068, #14072, #14074, #14076,
#14077, #14079, #14082, #14083, #14094, #14095, #14096)
* [youtube] Fix upload date extraction (#14065)
+ [charlierose] Add support for episodes (#14062)
+ [bbccouk] Add support for w-prefixed ids (#14056)
* [googledrive] Extend URL regular expression (#9785)
+ [googledrive] Add support for source format (#14046)
* [pornhd] Fix extraction (#14005)
version 2017.08.27.1 version 2017.08.27.1
Extractors Extractors
@ -640,7 +998,7 @@ version 2017.04.14
Core Core
+ [downloader/hls] Add basic support for EXT-X-BYTERANGE tag (#10955) + [downloader/hls] Add basic support for EXT-X-BYTERANGE tag (#10955)
+ [adobepass] Improve Comcast and Verison login code (#10803) + [adobepass] Improve Comcast and Verizon login code (#10803)
+ [adobepass] Add support for Verizon (#10803) + [adobepass] Add support for Verizon (#10803)
Extractors Extractors

View File

@ -36,8 +36,17 @@ test:
ot: offlinetest ot: offlinetest
# Keep this list in sync with devscripts/run_tests.sh
offlinetest: codetest offlinetest: codetest
$(PYTHON) -m nose --verbose test --exclude test_download.py --exclude test_age_restriction.py --exclude test_subtitles.py --exclude test_write_annotations.py --exclude test_youtube_lists.py --exclude test_iqiyi_sdk_interpreter.py --exclude test_socks.py $(PYTHON) -m nose --verbose test \
--exclude test_age_restriction.py \
--exclude test_download.py \
--exclude test_iqiyi_sdk_interpreter.py \
--exclude test_socks.py \
--exclude test_subtitles.py \
--exclude test_write_annotations.py \
--exclude test_youtube_lists.py \
--exclude test_youtube_signature.py
tar: youtube-dl.tar.gz tar: youtube-dl.tar.gz
@ -110,11 +119,10 @@ youtube-dl.tar.gz: youtube-dl README.md README.txt youtube-dl.1 youtube-dl.bash-
--exclude '*~' \ --exclude '*~' \
--exclude '__pycache__' \ --exclude '__pycache__' \
--exclude '.git' \ --exclude '.git' \
--exclude 'testdata' \
--exclude 'docs/_build' \ --exclude 'docs/_build' \
-- \ -- \
bin devscripts test youtube_dl docs \ bin devscripts test youtube_dl docs \
ChangeLog LICENSE README.md README.txt \ ChangeLog LICENSE README.md README.txt \
Makefile MANIFEST.in youtube-dl.1 youtube-dl.bash-completion \ Makefile MANIFEST.in youtube-dl.1 youtube-dl.bash-completion \
youtube-dl.zsh youtube-dl.fish setup.py \ youtube-dl.zsh youtube-dl.fish setup.py setup.cfg \
youtube-dl youtube-dl

View File

@ -1,3 +1,5 @@
[![Build Status](https://travis-ci.org/rg3/youtube-dl.svg?branch=master)](https://travis-ci.org/rg3/youtube-dl)
youtube-dl - download videos from youtube.com or other video platforms youtube-dl - download videos from youtube.com or other video platforms
- [INSTALLATION](#installation) - [INSTALLATION](#installation)
@ -427,7 +429,7 @@ Alternatively, refer to the [developer instructions](#developer-instructions) fo
syntax. Example: --exec 'adb push {} syntax. Example: --exec 'adb push {}
/sdcard/Music/ && rm {}' /sdcard/Music/ && rm {}'
--convert-subs FORMAT Convert the subtitles to other format --convert-subs FORMAT Convert the subtitles to other format
(currently supported: srt|ass|vtt) (currently supported: srt|ass|vtt|lrc)
# CONFIGURATION # CONFIGURATION
@ -509,6 +511,9 @@ The basic usage is not to set any template arguments when downloading a single f
- `average_rating` (numeric): Average rating give by users, the scale used depends on the webpage - `average_rating` (numeric): Average rating give by users, the scale used depends on the webpage
- `comment_count` (numeric): Number of comments on the video - `comment_count` (numeric): Number of comments on the video
- `age_limit` (numeric): Age restriction for the video (years) - `age_limit` (numeric): Age restriction for the video (years)
- `is_live` (boolean): Whether this video is a live stream or a fixed-length video
- `start_time` (numeric): Time in seconds where the reproduction should start, as specified in the URL
- `end_time` (numeric): Time in seconds where the reproduction should end, as specified in the URL
- `format` (string): A human-readable description of the format - `format` (string): A human-readable description of the format
- `format_id` (string): Format code specified by `--format` - `format_id` (string): Format code specified by `--format`
- `format_note` (string): Additional info about the format - `format_note` (string): Additional info about the format
@ -936,6 +941,8 @@ To run the test, simply invoke your favorite test runner, or execute a test file
python test/test_download.py python test/test_download.py
nosetests nosetests
See item 6 of [new extractor tutorial](#adding-support-for-a-new-site) for how to run extractor specific test cases.
If you want to create a build of youtube-dl yourself, you'll need If you want to create a build of youtube-dl yourself, you'll need
* python * python
@ -1003,7 +1010,7 @@ After you have ensured this site is distributing its content legally, you can fo
} }
``` ```
5. Add an import in [`youtube_dl/extractor/extractors.py`](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/extractors.py). 5. Add an import in [`youtube_dl/extractor/extractors.py`](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/extractors.py).
6. Run `python test/test_download.py TestDownload.test_YourExtractor`. This *should fail* at first, but you can continually re-run it until you're done. If you decide to add more than one test, then rename ``_TEST`` to ``_TESTS`` and make it into a list of dictionaries. The tests will then be named `TestDownload.test_YourExtractor`, `TestDownload.test_YourExtractor_1`, `TestDownload.test_YourExtractor_2`, etc. 6. Run `python test/test_download.py TestDownload.test_YourExtractor`. This *should fail* at first, but you can continually re-run it until you're done. If you decide to add more than one test, then rename ``_TEST`` to ``_TESTS`` and make it into a list of dictionaries. The tests will then be named `TestDownload.test_YourExtractor`, `TestDownload.test_YourExtractor_1`, `TestDownload.test_YourExtractor_2`, etc. Note that tests with `only_matching` key in test's dict are not counted in.
7. Have a look at [`youtube_dl/extractor/common.py`](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/common.py) for possible helper methods and a [detailed description of what your extractor should and may return](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/common.py#L74-L252). Add tests and code for as many as you want. 7. Have a look at [`youtube_dl/extractor/common.py`](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/common.py) for possible helper methods and a [detailed description of what your extractor should and may return](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/common.py#L74-L252). Add tests and code for as many as you want.
8. Make sure your code follows [youtube-dl coding conventions](#youtube-dl-coding-conventions) and check the code with [flake8](https://pypi.python.org/pypi/flake8). Also make sure your code works under all [Python](https://www.python.org/) versions claimed supported by youtube-dl, namely 2.6, 2.7, and 3.2+. 8. Make sure your code follows [youtube-dl coding conventions](#youtube-dl-coding-conventions) and check the code with [flake8](https://pypi.python.org/pypi/flake8). Also make sure your code works under all [Python](https://www.python.org/) versions claimed supported by youtube-dl, namely 2.6, 2.7, and 3.2+.
9. When the tests pass, [add](https://git-scm.com/docs/git-add) the new files and [commit](https://git-scm.com/docs/git-commit) them and [push](https://git-scm.com/docs/git-push) the result, like this: 9. When the tests pass, [add](https://git-scm.com/docs/git-add) the new files and [commit](https://git-scm.com/docs/git-commit) them and [push](https://git-scm.com/docs/git-push) the result, like this:
@ -1165,7 +1172,7 @@ with youtube_dl.YoutubeDL(ydl_opts) as ydl:
ydl.download(['https://www.youtube.com/watch?v=BaW_jenozKc']) ydl.download(['https://www.youtube.com/watch?v=BaW_jenozKc'])
``` ```
Most likely, you'll want to use various options. For a list of options available, have a look at [`youtube_dl/YoutubeDL.py`](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/YoutubeDL.py#L129-L279). For a start, if you want to intercept youtube-dl's output, set a `logger` object. Most likely, you'll want to use various options. For a list of options available, have a look at [`youtube_dl/YoutubeDL.py`](https://github.com/rg3/youtube-dl/blob/3e4cedf9e8cd3157df2457df7274d0c842421945/youtube_dl/YoutubeDL.py#L137-L312). For a start, if you want to intercept youtube-dl's output, set a `logger` object.
Here's a more complete example of a program that outputs only errors (and a short message after the download is finished), and downloads/converts the video to an mp3 file: Here's a more complete example of a program that outputs only errors (and a short message after the download is finished), and downloads/converts the video to an mp3 file:

View File

@ -14,7 +14,7 @@ import os
import sys import sys
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__)))) sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from test.helper import get_testcases from test.helper import gettestcases
from youtube_dl.utils import compat_urllib_parse_urlparse from youtube_dl.utils import compat_urllib_parse_urlparse
from youtube_dl.utils import compat_urllib_request from youtube_dl.utils import compat_urllib_request
@ -24,7 +24,7 @@ if len(sys.argv) > 1:
else: else:
METHOD = 'EURISTIC' METHOD = 'EURISTIC'
for test in get_testcases(): for test in gettestcases():
if METHOD == 'EURISTIC': if METHOD == 'EURISTIC':
try: try:
webpage = compat_urllib_request.urlopen(test['url'], timeout=10).read() webpage = compat_urllib_request.urlopen(test['url'], timeout=10).read()

View File

@ -1,6 +1,7 @@
#!/bin/bash #!/bin/bash
DOWNLOAD_TESTS="age_restriction|download|subtitles|write_annotations|iqiyi_sdk_interpreter|youtube_lists" # Keep this list in sync with the `offlinetest` target in Makefile
DOWNLOAD_TESTS="age_restriction|download|iqiyi_sdk_interpreter|socks|subtitles|write_annotations|youtube_lists|youtube_signature"
test_set="" test_set=""
multiprocess_args="" multiprocess_args=""

View File

@ -3,8 +3,6 @@
- **1up.com** - **1up.com**
- **20min** - **20min**
- **220.ro** - **220.ro**
- **22tracks:genre**
- **22tracks:track**
- **24video** - **24video**
- **3qsdn**: 3Q SDN - **3qsdn**: 3Q SDN
- **3sat** - **3sat**
@ -36,12 +34,13 @@
- **AdultSwim** - **AdultSwim**
- **aenetworks**: A+E Networks: A&E, Lifetime, History.com, FYI Network - **aenetworks**: A+E Networks: A&E, Lifetime, History.com, FYI Network
- **afreecatv**: afreecatv.com - **afreecatv**: afreecatv.com
- **afreecatv:global**: afreecatv.com
- **AirMozilla** - **AirMozilla**
- **AliExpressLive**
- **AlJazeera** - **AlJazeera**
- **Allocine** - **Allocine**
- **AlphaPorno** - **AlphaPorno**
- **AMCNetworks** - **AMCNetworks**
- **AmericasTestKitchen**
- **anderetijden**: npo.nl, ntr.nl, omroepwnl.nl, zapp.nl and npo3.nl - **anderetijden**: npo.nl, ntr.nl, omroepwnl.nl, zapp.nl and npo3.nl
- **AnimeOnDemand** - **AnimeOnDemand**
- **anitube.se** - **anitube.se**
@ -113,11 +112,12 @@
- **BokeCC** - **BokeCC**
- **BostonGlobe** - **BostonGlobe**
- **Bpb**: Bundeszentrale für politische Bildung - **Bpb**: Bundeszentrale für politische Bildung
- **BR**: Bayerischer Rundfunk Mediathek - **BR**: Bayerischer Rundfunk
- **BravoTV** - **BravoTV**
- **Break** - **Break**
- **brightcove:legacy** - **brightcove:legacy**
- **brightcove:new** - **brightcove:new**
- **BRMediathek**: Bayerischer Rundfunk Mediathek
- **bt:article**: Bergens Tidende Articles - **bt:article**: Bergens Tidende Articles
- **bt:vestlendingen**: Bergens Tidende - Vestlendingen - **bt:vestlendingen**: Bergens Tidende - Vestlendingen
- **BuzzFeed** - **BuzzFeed**
@ -128,7 +128,8 @@
- **CamWithHer** - **CamWithHer**
- **canalc2.tv** - **canalc2.tv**
- **Canalplus**: canalplus.fr, piwiplus.fr and d8.tv - **Canalplus**: canalplus.fr, piwiplus.fr and d8.tv
- **Canvas**: canvas.be and een.be - **Canvas**
- **CanvasEen**: canvas.be and een.be
- **CarambaTV** - **CarambaTV**
- **CarambaTVPage** - **CarambaTVPage**
- **CartoonNetwork** - **CartoonNetwork**
@ -197,9 +198,8 @@
- **dailymotion** - **dailymotion**
- **dailymotion:playlist** - **dailymotion:playlist**
- **dailymotion:user** - **dailymotion:user**
- **DailymotionCloud** - **DaisukiMotto**
- **Daisuki** - **DaisukiMottoPlaylist**
- **DaisukiPlaylist**
- **daum.net** - **daum.net**
- **daum.net:clip** - **daum.net:clip**
- **daum.net:playlist** - **daum.net:playlist**
@ -242,8 +242,9 @@
- **eHow** - **eHow**
- **Einthusan** - **Einthusan**
- **eitb.tv** - **eitb.tv**
- **EllenTV** - **EllenTube**
- **EllenTV:clips** - **EllenTubePlaylist**
- **EllenTubeVideo**
- **ElPais**: El País - **ElPais**: El País
- **Embedly** - **Embedly**
- **EMPFlix** - **EMPFlix**
@ -266,10 +267,8 @@
- **fc2** - **fc2**
- **fc2:embed** - **fc2:embed**
- **Fczenit** - **Fczenit**
- **fernsehkritik.tv**
- **filmon** - **filmon**
- **filmon:channel** - **filmon:channel**
- **Firstpost**
- **FiveTV** - **FiveTV**
- **Flickr** - **Flickr**
- **Flipagram** - **Flipagram**
@ -283,7 +282,7 @@
- **foxnews:article** - **foxnews:article**
- **foxnews:insider** - **foxnews:insider**
- **FoxSports** - **FoxSports**
- **france2.fr:generation-quoi** - **france2.fr:generation-what**
- **FranceCulture** - **FranceCulture**
- **FranceInter** - **FranceInter**
- **FranceTV** - **FranceTV**
@ -293,6 +292,7 @@
- **freespeech.org** - **freespeech.org**
- **FreshLive** - **FreshLive**
- **Funimation** - **Funimation**
- **Funk**
- **FunnyOrDie** - **FunnyOrDie**
- **Fusion** - **Fusion**
- **Fux** - **Fux**
@ -300,7 +300,6 @@
- **GameInformer** - **GameInformer**
- **GameOne** - **GameOne**
- **gameone:playlist** - **gameone:playlist**
- **Gamersyde**
- **GameSpot** - **GameSpot**
- **GameStar** - **GameStar**
- **Gaskrank** - **Gaskrank**
@ -339,6 +338,7 @@
- **HornBunny** - **HornBunny**
- **HotNewHipHop** - **HotNewHipHop**
- **HotStar** - **HotStar**
- **hotstar:playlist**
- **Howcast** - **Howcast**
- **HowStuffWorks** - **HowStuffWorks**
- **HRTi** - **HRTi**
@ -377,6 +377,7 @@
- **Jove** - **Jove**
- **jpopsuki.tv** - **jpopsuki.tv**
- **JWPlatform** - **JWPlatform**
- **Kakao**
- **Kaltura** - **Kaltura**
- **Kamcord** - **Kamcord**
- **KanalPlay**: Kanal 5/9/11 Play - **KanalPlay**: Kanal 5/9/11 Play
@ -437,6 +438,8 @@
- **MakerTV** - **MakerTV**
- **mangomolo:live** - **mangomolo:live**
- **mangomolo:video** - **mangomolo:video**
- **ManyVids**
- **massengeschmack.tv**
- **MatchTV** - **MatchTV**
- **MDR**: MDR.DE and KiKA - **MDR**: MDR.DE and KiKA
- **media.ccc.de** - **media.ccc.de**
@ -493,7 +496,6 @@
- **MySpace:album** - **MySpace:album**
- **MySpass** - **MySpass**
- **Myvi** - **Myvi**
- **myvideo** (Currently broken)
- **MyVidster** - **MyVidster**
- **n-tv.de** - **n-tv.de**
- **natgeo** - **natgeo**
@ -591,6 +593,7 @@
- **Openload** - **Openload**
- **OraTV** - **OraTV**
- **orf:fm4**: radio FM4 - **orf:fm4**: radio FM4
- **orf:fm4:story**: fm4.orf.at stories
- **orf:iptv**: iptv.ORF.at - **orf:iptv**: iptv.ORF.at
- **orf:oe1**: Radio Österreich 1 - **orf:oe1**: Radio Österreich 1
- **orf:tvthek**: ORF TVthek - **orf:tvthek**: ORF TVthek
@ -604,6 +607,7 @@
- **pcmag** - **pcmag**
- **PearVideo** - **PearVideo**
- **People** - **People**
- **PerformGroup**
- **periscope**: Periscope - **periscope**: Periscope
- **periscope:user**: Periscope user videos - **periscope:user**: Periscope user videos
- **PhilharmonieDeParis**: Philharmonie de Paris - **PhilharmonieDeParis**: Philharmonie de Paris
@ -624,6 +628,7 @@
- **Pokemon** - **Pokemon**
- **PolskieRadio** - **PolskieRadio**
- **PolskieRadioCategory** - **PolskieRadioCategory**
- **PopcornTV**
- **PornCom** - **PornCom**
- **PornerBros** - **PornerBros**
- **PornFlip** - **PornFlip**
@ -657,6 +662,7 @@
- **Rai** - **Rai**
- **RaiPlay** - **RaiPlay**
- **RaiPlayLive** - **RaiPlayLive**
- **RaiPlayPlaylist**
- **RBMARadio** - **RBMARadio**
- **RDS**: RDS.ca - **RDS**: RDS.ca
- **RedBullTV** - **RedBullTV**
@ -701,6 +707,7 @@
- **rutube:embed**: Rutube embedded videos - **rutube:embed**: Rutube embedded videos
- **rutube:movie**: Rutube movies - **rutube:movie**: Rutube movies
- **rutube:person**: Rutube person videos - **rutube:person**: Rutube person videos
- **rutube:playlist**: Rutube playlists
- **RUTV**: RUTV.RU - **RUTV**: RUTV.RU
- **Ruutu** - **Ruutu**
- **Ruv** - **Ruv**
@ -720,6 +727,7 @@
- **SenateISVP** - **SenateISVP**
- **SendtoNews** - **SendtoNews**
- **ServingSys** - **ServingSys**
- **Servus**
- **Sexu** - **Sexu**
- **Shahid** - **Shahid**
- **Shared**: shared.sx - **Shared**: shared.sx
@ -730,6 +738,7 @@
- **skynewsarabia:video** - **skynewsarabia:video**
- **SkySports** - **SkySports**
- **Slideshare** - **Slideshare**
- **SlidesLive**
- **Slutload** - **Slutload**
- **smotri**: Smotri.com - **smotri**: Smotri.com
- **smotri:broadcast**: Smotri.com broadcasts - **smotri:broadcast**: Smotri.com broadcasts
@ -773,6 +782,7 @@
- **streamcloud.eu** - **streamcloud.eu**
- **StreamCZ** - **StreamCZ**
- **StreetVoice** - **StreetVoice**
- **StretchInternet**
- **SunPorno** - **SunPorno**
- **SVT** - **SVT**
- **SVTPlay**: SVT Play and Öppet arkiv - **SVTPlay**: SVT Play and Öppet arkiv
@ -878,6 +888,7 @@
- **UDNEmbed**: 聯合影音 - **UDNEmbed**: 聯合影音
- **UKTVPlay** - **UKTVPlay**
- **Unistra** - **Unistra**
- **Unity**
- **uol.com.br** - **uol.com.br**
- **uplynk** - **uplynk**
- **uplynk:preplay** - **uplynk:preplay**
@ -961,10 +972,12 @@
- **VoiceRepublic** - **VoiceRepublic**
- **Voot** - **Voot**
- **VoxMedia** - **VoxMedia**
- **VoxMediaVolume**
- **Vporn** - **Vporn**
- **vpro**: npo.nl, ntr.nl, omroepwnl.nl, zapp.nl and npo3.nl - **vpro**: npo.nl, ntr.nl, omroepwnl.nl, zapp.nl and npo3.nl
- **Vrak** - **Vrak**
- **VRT**: deredactie.be, sporza.be, cobra.be and cobra.canvas.be - **VRT**: deredactie.be, sporza.be, cobra.be and cobra.canvas.be
- **VrtNU**: VrtNU.be
- **vrv** - **vrv**
- **vrv:series** - **vrv:series**
- **VShare** - **VShare**
@ -1023,6 +1036,9 @@
- **YouJizz** - **YouJizz**
- **youku**: 优酷 - **youku**: 优酷
- **youku:show** - **youku:show**
- **YouNowChannel**
- **YouNowLive**
- **YouNowMoment**
- **YouPorn** - **YouPorn**
- **YourUpload** - **YourUpload**
- **youtube**: YouTube.com - **youtube**: YouTube.com
@ -1036,7 +1052,6 @@
- **youtube:search**: YouTube.com searches - **youtube:search**: YouTube.com searches
- **youtube:search:date**: YouTube.com searches, newest videos first - **youtube:search:date**: YouTube.com searches, newest videos first
- **youtube:search_url**: YouTube.com search URLs - **youtube:search_url**: YouTube.com search URLs
- **youtube:shared**
- **youtube:show**: YouTube.com (multi-season) shows - **youtube:show**: YouTube.com (multi-season) shows
- **youtube:subscriptions**: YouTube.com subscriptions feed, "ytsubs" keyword (requires authentication) - **youtube:subscriptions**: YouTube.com subscriptions feed, "ytsubs" keyword (requires authentication)
- **youtube:user**: YouTube.com user videos (URL or "ytuser" keyword) - **youtube:user**: YouTube.com user videos (URL or "ytuser" keyword)

View File

@ -562,7 +562,89 @@ jwplayer("mediaplayer").setup({"abouttext":"Visit Indie DB","aboutlink":"http:\/
'width': 1920, 'width': 1920,
'height': 1080, 'height': 1080,
}] }]
), ), (
# https://github.com/rg3/youtube-dl/pull/14844
'urls_only',
'http://unknown/manifest.mpd',
[{
'manifest_url': 'http://unknown/manifest.mpd',
'ext': 'mp4',
'format_id': 'h264_aac_144p_m4s',
'format_note': 'DASH video',
'protocol': 'http_dash_segments',
'acodec': 'mp4a.40.2',
'vcodec': 'avc3.42c01e',
'tbr': 200,
'width': 256,
'height': 144,
}, {
'manifest_url': 'http://unknown/manifest.mpd',
'ext': 'mp4',
'format_id': 'h264_aac_240p_m4s',
'format_note': 'DASH video',
'protocol': 'http_dash_segments',
'acodec': 'mp4a.40.2',
'vcodec': 'avc3.42c01e',
'tbr': 400,
'width': 424,
'height': 240,
}, {
'manifest_url': 'http://unknown/manifest.mpd',
'ext': 'mp4',
'format_id': 'h264_aac_360p_m4s',
'format_note': 'DASH video',
'protocol': 'http_dash_segments',
'acodec': 'mp4a.40.2',
'vcodec': 'avc3.42c01e',
'tbr': 800,
'width': 640,
'height': 360,
}, {
'manifest_url': 'http://unknown/manifest.mpd',
'ext': 'mp4',
'format_id': 'h264_aac_480p_m4s',
'format_note': 'DASH video',
'protocol': 'http_dash_segments',
'acodec': 'mp4a.40.2',
'vcodec': 'avc3.42c01e',
'tbr': 1200,
'width': 856,
'height': 480,
}, {
'manifest_url': 'http://unknown/manifest.mpd',
'ext': 'mp4',
'format_id': 'h264_aac_576p_m4s',
'format_note': 'DASH video',
'protocol': 'http_dash_segments',
'acodec': 'mp4a.40.2',
'vcodec': 'avc3.42c01e',
'tbr': 1600,
'width': 1024,
'height': 576,
}, {
'manifest_url': 'http://unknown/manifest.mpd',
'ext': 'mp4',
'format_id': 'h264_aac_720p_m4s',
'format_note': 'DASH video',
'protocol': 'http_dash_segments',
'acodec': 'mp4a.40.2',
'vcodec': 'avc3.42c01e',
'tbr': 2400,
'width': 1280,
'height': 720,
}, {
'manifest_url': 'http://unknown/manifest.mpd',
'ext': 'mp4',
'format_id': 'h264_aac_1080p_m4s',
'format_note': 'DASH video',
'protocol': 'http_dash_segments',
'acodec': 'mp4a.40.2',
'vcodec': 'avc3.42c01e',
'tbr': 4400,
'width': 1920,
'height': 1080,
}]
)
] ]
for mpd_file, mpd_url, expected_formats in _TEST_CASES: for mpd_file, mpd_url, expected_formats in _TEST_CASES:
@ -574,6 +656,33 @@ jwplayer("mediaplayer").setup({"abouttext":"Visit Indie DB","aboutlink":"http:\/
self.ie._sort_formats(formats) self.ie._sort_formats(formats)
expect_value(self, formats, expected_formats, None) expect_value(self, formats, expected_formats, None)
def test_parse_f4m_formats(self):
_TEST_CASES = [
(
# https://github.com/rg3/youtube-dl/issues/14660
'custom_base_url',
'http://api.new.livestream.com/accounts/6115179/events/6764928/videos/144884262.f4m',
[{
'manifest_url': 'http://api.new.livestream.com/accounts/6115179/events/6764928/videos/144884262.f4m',
'ext': 'flv',
'format_id': '2148',
'protocol': 'f4m',
'tbr': 2148,
'width': 1280,
'height': 720,
}]
),
]
for f4m_file, f4m_url, expected_formats in _TEST_CASES:
with io.open('./test/testdata/f4m/%s.f4m' % f4m_file,
mode='r', encoding='utf-8') as f:
formats = self.ie._parse_f4m_formats(
compat_etree_fromstring(f.read().encode('utf-8')),
f4m_url, None)
self.ie._sort_formats(formats)
expect_value(self, formats, expected_formats, None)
if __name__ == '__main__': if __name__ == '__main__':
unittest.main() unittest.main()

View File

@ -466,12 +466,18 @@ class TestFormatSelection(unittest.TestCase):
ydl = YDL({'simulate': True}) ydl = YDL({'simulate': True})
self.assertEqual(ydl._default_format_spec({}), 'bestvideo+bestaudio/best') self.assertEqual(ydl._default_format_spec({}), 'bestvideo+bestaudio/best')
ydl = YDL({})
self.assertEqual(ydl._default_format_spec({'is_live': True}), 'best/bestvideo+bestaudio')
ydl = YDL({'simulate': True})
self.assertEqual(ydl._default_format_spec({'is_live': True}), 'bestvideo+bestaudio/best')
ydl = YDL({'outtmpl': '-'}) ydl = YDL({'outtmpl': '-'})
self.assertEqual(ydl._default_format_spec({}), 'best') self.assertEqual(ydl._default_format_spec({}), 'best/bestvideo+bestaudio')
ydl = YDL({}) ydl = YDL({})
self.assertEqual(ydl._default_format_spec({}, download=False), 'bestvideo+bestaudio/best') self.assertEqual(ydl._default_format_spec({}, download=False), 'bestvideo+bestaudio/best')
self.assertEqual(ydl._default_format_spec({'is_live': True}), 'best') self.assertEqual(ydl._default_format_spec({'is_live': True}), 'best/bestvideo+bestaudio')
class TestYoutubeDL(unittest.TestCase): class TestYoutubeDL(unittest.TestCase):
@ -770,6 +776,12 @@ class TestYoutubeDL(unittest.TestCase):
result = get_ids({'playlist_items': '10'}) result = get_ids({'playlist_items': '10'})
self.assertEqual(result, []) self.assertEqual(result, [])
result = get_ids({'playlist_items': '3-10'})
self.assertEqual(result, [3, 4])
result = get_ids({'playlist_items': '2-4,3-4,3'})
self.assertEqual(result, [2, 3, 4])
def test_urlopen_no_file_protocol(self): def test_urlopen_no_file_protocol(self):
# see https://github.com/rg3/youtube-dl/issues/8227 # see https://github.com/rg3/youtube-dl/issues/8227
ydl = YDL() ydl = YDL()

View File

@ -540,6 +540,7 @@ class TestUtil(unittest.TestCase):
self.assertEqual(parse_duration('87 Min.'), 5220) self.assertEqual(parse_duration('87 Min.'), 5220)
self.assertEqual(parse_duration('PT1H0.040S'), 3600.04) self.assertEqual(parse_duration('PT1H0.040S'), 3600.04)
self.assertEqual(parse_duration('PT00H03M30SZ'), 210) self.assertEqual(parse_duration('PT00H03M30SZ'), 210)
self.assertEqual(parse_duration('P0Y0M0DT0H4M20.880S'), 260.88)
def test_fix_xml_ampersands(self): def test_fix_xml_ampersands(self):
self.assertEqual( self.assertEqual(
@ -1064,7 +1065,7 @@ ffmpeg version 2.4.4 Copyright (c) 2000-2014 the FFmpeg ...'''), '2.4.4')
<p begin="3" dur="-1">Ignored, three</p> <p begin="3" dur="-1">Ignored, three</p>
</div> </div>
</body> </body>
</tt>''' </tt>'''.encode('utf-8')
srt_data = '''1 srt_data = '''1
00:00:00,000 --> 00:00:01,000 00:00:00,000 --> 00:00:01,000
The following line contains Chinese characters and special symbols The following line contains Chinese characters and special symbols
@ -1089,7 +1090,7 @@ Line
<p begin="0" end="1">The first line</p> <p begin="0" end="1">The first line</p>
</div> </div>
</body> </body>
</tt>''' </tt>'''.encode('utf-8')
srt_data = '''1 srt_data = '''1
00:00:00,000 --> 00:00:01,000 00:00:00,000 --> 00:00:01,000
The first line The first line
@ -1115,7 +1116,7 @@ The first line
<p style="s1" tts:textDecoration="underline" begin="00:00:09.56" id="p2" end="00:00:12.36"><span style="s2" tts:color="lime">inner<br /> </span>style</p> <p style="s1" tts:textDecoration="underline" begin="00:00:09.56" id="p2" end="00:00:12.36"><span style="s2" tts:color="lime">inner<br /> </span>style</p>
</div> </div>
</body> </body>
</tt>''' </tt>'''.encode('utf-8')
srt_data = '''1 srt_data = '''1
00:00:02,080 --> 00:00:05,839 00:00:02,080 --> 00:00:05,839
<font color="white" face="sansSerif" size="16">default style<font color="red">custom style</font></font> <font color="white" face="sansSerif" size="16">default style<font color="red">custom style</font></font>
@ -1138,6 +1139,26 @@ part 3</font></u>
''' '''
self.assertEqual(dfxp2srt(dfxp_data_with_style), srt_data) self.assertEqual(dfxp2srt(dfxp_data_with_style), srt_data)
dfxp_data_non_utf8 = '''<?xml version="1.0" encoding="UTF-16"?>
<tt xmlns="http://www.w3.org/ns/ttml" xml:lang="en" xmlns:tts="http://www.w3.org/ns/ttml#parameter">
<body>
<div xml:lang="en">
<p begin="0" end="1">Line 1</p>
<p begin="1" end="2">第二行</p>
</div>
</body>
</tt>'''.encode('utf-16')
srt_data = '''1
00:00:00,000 --> 00:00:01,000
Line 1
2
00:00:01,000 --> 00:00:02,000
第二行
'''
self.assertEqual(dfxp2srt(dfxp_data_non_utf8), srt_data)
def test_cli_option(self): def test_cli_option(self):
self.assertEqual(cli_option({'proxy': '127.0.0.1:3128'}, '--proxy', 'proxy'), ['--proxy', '127.0.0.1:3128']) self.assertEqual(cli_option({'proxy': '127.0.0.1:3128'}, '--proxy', 'proxy'), ['--proxy', '127.0.0.1:3128'])
self.assertEqual(cli_option({'proxy': None}, '--proxy', 'proxy'), []) self.assertEqual(cli_option({'proxy': None}, '--proxy', 'proxy'), [])

10
test/testdata/f4m/custom_base_url.f4m vendored Normal file
View File

@ -0,0 +1,10 @@
<?xml version="1.0" encoding="UTF-8"?>
<manifest xmlns="http://ns.adobe.com/f4m/1.0">
<streamType>recorded</streamType>
<baseURL>http://vod.livestream.com/events/0000000000673980/</baseURL>
<duration>269.293</duration>
<bootstrapInfo profile="named" id="bootstrap_1">AAAAm2Fic3QAAAAAAAAAAQAAAAPoAAAAAAAEG+0AAAAAAAAAAAAAAAAAAQAAABlhc3J0AAAAAAAAAAABAAAAAQAAAC4BAAAAVmFmcnQAAAAAAAAD6AAAAAAEAAAAAQAAAAAAAAAAAAAXcAAAAC0AAAAAAAQHQAAAE5UAAAAuAAAAAAAEGtUAAAEYAAAAAAAAAAAAAAAAAAAAAAA=</bootstrapInfo>
<media url="b90f532f-b0f6-4f4e-8289-706d490b2fd8_2292" bootstrapInfoId="bootstrap_1" bitrate="2148" width="1280" height="720" videoCodec="avc1.4d401f" audioCodec="mp4a.40.2">
<metadata>AgAKb25NZXRhRGF0YQgAAAAIAAhkdXJhdGlvbgBAcNSwIMSbpgAFd2lkdGgAQJQAAAAAAAAABmhlaWdodABAhoAAAAAAAAAJZnJhbWVyYXRlAEA4/7DoLwW3AA12aWRlb2RhdGFyYXRlAECe1DLgjcobAAx2aWRlb2NvZGVjaWQAQBwAAAAAAAAADWF1ZGlvZGF0YXJhdGUAQGSimlvaPKQADGF1ZGlvY29kZWNpZABAJAAAAAAAAAAACQ==</metadata>
</media>
</manifest>

218
test/testdata/mpd/urls_only.mpd vendored Normal file
View File

@ -0,0 +1,218 @@
<?xml version="1.0" ?>
<MPD maxSegmentDuration="PT0H0M10.000S" mediaPresentationDuration="PT0H4M1.728S" minBufferTime="PT1.500S" profiles="urn:mpeg:dash:profile:isoff-main:2011" type="static" xmlns="urn:mpeg:dash:schema:mpd:2011">
<Period duration="PT0H4M1.728S">
<AdaptationSet bitstreamSwitching="true" lang="und" maxHeight="1080" maxWidth="1920" par="16:9" segmentAlignment="true">
<ContentComponent contentType="video" id="1"/>
<Representation audioSamplingRate="44100" bandwidth="200000" codecs="avc3.42c01e,mp4a.40.2" frameRate="25" height="144" id="h264_aac_144p_m4s" mimeType="video/mp4" sar="1:1" startWithSAP="1" width="256">
<SegmentList duration="10000" timescale="1000">
<Initialization sourceURL="../vd_5999c902ea707c67d8e267a9_1503250723/h264_aac_144p_m4s/init/432f65a0.mp4"/>
<SegmentURL media="../vd_5999c902ea707c67d8e267a9_1503250723/h264_aac_144p_m4s/0/432f65a0.m4s"/>
<SegmentURL media="../vd_5999c902ea707c67d8e267a9_1503250723/h264_aac_144p_m4s/1/432f65a0.m4s"/>
<SegmentURL media="../vd_5999c902ea707c67d8e267a9_1503250723/h264_aac_144p_m4s/2/432f65a0.m4s"/>
<SegmentURL media="../vd_5999c902ea707c67d8e267a9_1503250723/h264_aac_144p_m4s/3/432f65a0.m4s"/>
<SegmentURL media="../vd_5999c902ea707c67d8e267a9_1503250723/h264_aac_144p_m4s/4/432f65a0.m4s"/>
<SegmentURL media="../vd_5999c902ea707c67d8e267a9_1503250723/h264_aac_144p_m4s/5/432f65a0.m4s"/>
<SegmentURL media="../vd_5999c902ea707c67d8e267a9_1503250723/h264_aac_144p_m4s/6/432f65a0.m4s"/>
<SegmentURL media="../vd_5999c902ea707c67d8e267a9_1503250723/h264_aac_144p_m4s/7/432f65a0.m4s"/>
<SegmentURL media="../vd_5999c902ea707c67d8e267a9_1503250723/h264_aac_144p_m4s/8/432f65a0.m4s"/>
<SegmentURL media="../vd_5999c902ea707c67d8e267a9_1503250723/h264_aac_144p_m4s/9/432f65a0.m4s"/>
<SegmentURL media="../vd_5999c902ea707c67d8e267a9_1503250723/h264_aac_144p_m4s/10/432f65a0.m4s"/>
<SegmentURL media="../vd_5999c902ea707c67d8e267a9_1503250723/h264_aac_144p_m4s/11/432f65a0.m4s"/>
<SegmentURL media="../vd_5999c902ea707c67d8e267a9_1503250723/h264_aac_144p_m4s/12/432f65a0.m4s"/>
<SegmentURL media="../vd_5999c902ea707c67d8e267a9_1503250723/h264_aac_144p_m4s/13/432f65a0.m4s"/>
<SegmentURL media="../vd_5999c902ea707c67d8e267a9_1503250723/h264_aac_144p_m4s/14/432f65a0.m4s"/>
<SegmentURL media="../vd_5999c902ea707c67d8e267a9_1503250723/h264_aac_144p_m4s/15/432f65a0.m4s"/>
<SegmentURL media="../vd_5999c902ea707c67d8e267a9_1503250723/h264_aac_144p_m4s/16/432f65a0.m4s"/>
<SegmentURL media="../vd_5999c902ea707c67d8e267a9_1503250723/h264_aac_144p_m4s/17/432f65a0.m4s"/>
<SegmentURL media="../vd_5999c902ea707c67d8e267a9_1503250723/h264_aac_144p_m4s/18/432f65a0.m4s"/>
<SegmentURL media="../vd_5999c902ea707c67d8e267a9_1503250723/h264_aac_144p_m4s/19/432f65a0.m4s"/>
<SegmentURL media="../vd_5999c902ea707c67d8e267a9_1503250723/h264_aac_144p_m4s/20/432f65a0.m4s"/>
<SegmentURL media="../vd_5999c902ea707c67d8e267a9_1503250723/h264_aac_144p_m4s/21/432f65a0.m4s"/>
<SegmentURL media="../vd_5999c902ea707c67d8e267a9_1503250723/h264_aac_144p_m4s/22/432f65a0.m4s"/>
<SegmentURL media="../vd_5999c902ea707c67d8e267a9_1503250723/h264_aac_144p_m4s/23/432f65a0.m4s"/>
<SegmentURL media="../vd_5999c902ea707c67d8e267a9_1503250723/h264_aac_144p_m4s/24/432f65a0.m4s"/>
</SegmentList>
</Representation>
<Representation audioSamplingRate="44100" bandwidth="400000" codecs="avc3.42c01e,mp4a.40.2" frameRate="25" height="240" id="h264_aac_240p_m4s" mimeType="video/mp4" sar="160:159" startWithSAP="1" width="424">
<SegmentList duration="10000" timescale="1000">
<Initialization sourceURL="../vd_5999c902ea707c67d8e267a9_1503250723/h264_aac_240p_m4s/init/432f65a0.mp4"/>
<SegmentURL media="../vd_5999c902ea707c67d8e267a9_1503250723/h264_aac_240p_m4s/0/432f65a0.m4s"/>
<SegmentURL media="../vd_5999c902ea707c67d8e267a9_1503250723/h264_aac_240p_m4s/1/432f65a0.m4s"/>
<SegmentURL media="../vd_5999c902ea707c67d8e267a9_1503250723/h264_aac_240p_m4s/2/432f65a0.m4s"/>
<SegmentURL media="../vd_5999c902ea707c67d8e267a9_1503250723/h264_aac_240p_m4s/3/432f65a0.m4s"/>
<SegmentURL media="../vd_5999c902ea707c67d8e267a9_1503250723/h264_aac_240p_m4s/4/432f65a0.m4s"/>
<SegmentURL media="../vd_5999c902ea707c67d8e267a9_1503250723/h264_aac_240p_m4s/5/432f65a0.m4s"/>
<SegmentURL media="../vd_5999c902ea707c67d8e267a9_1503250723/h264_aac_240p_m4s/6/432f65a0.m4s"/>
<SegmentURL media="../vd_5999c902ea707c67d8e267a9_1503250723/h264_aac_240p_m4s/7/432f65a0.m4s"/>
<SegmentURL media="../vd_5999c902ea707c67d8e267a9_1503250723/h264_aac_240p_m4s/8/432f65a0.m4s"/>
<SegmentURL media="../vd_5999c902ea707c67d8e267a9_1503250723/h264_aac_240p_m4s/9/432f65a0.m4s"/>
<SegmentURL media="../vd_5999c902ea707c67d8e267a9_1503250723/h264_aac_240p_m4s/10/432f65a0.m4s"/>
<SegmentURL media="../vd_5999c902ea707c67d8e267a9_1503250723/h264_aac_240p_m4s/11/432f65a0.m4s"/>
<SegmentURL media="../vd_5999c902ea707c67d8e267a9_1503250723/h264_aac_240p_m4s/12/432f65a0.m4s"/>
<SegmentURL media="../vd_5999c902ea707c67d8e267a9_1503250723/h264_aac_240p_m4s/13/432f65a0.m4s"/>
<SegmentURL media="../vd_5999c902ea707c67d8e267a9_1503250723/h264_aac_240p_m4s/14/432f65a0.m4s"/>
<SegmentURL media="../vd_5999c902ea707c67d8e267a9_1503250723/h264_aac_240p_m4s/15/432f65a0.m4s"/>
<SegmentURL media="../vd_5999c902ea707c67d8e267a9_1503250723/h264_aac_240p_m4s/16/432f65a0.m4s"/>
<SegmentURL media="../vd_5999c902ea707c67d8e267a9_1503250723/h264_aac_240p_m4s/17/432f65a0.m4s"/>
<SegmentURL media="../vd_5999c902ea707c67d8e267a9_1503250723/h264_aac_240p_m4s/18/432f65a0.m4s"/>
<SegmentURL media="../vd_5999c902ea707c67d8e267a9_1503250723/h264_aac_240p_m4s/19/432f65a0.m4s"/>
<SegmentURL media="../vd_5999c902ea707c67d8e267a9_1503250723/h264_aac_240p_m4s/20/432f65a0.m4s"/>
<SegmentURL media="../vd_5999c902ea707c67d8e267a9_1503250723/h264_aac_240p_m4s/21/432f65a0.m4s"/>
<SegmentURL media="../vd_5999c902ea707c67d8e267a9_1503250723/h264_aac_240p_m4s/22/432f65a0.m4s"/>
<SegmentURL media="../vd_5999c902ea707c67d8e267a9_1503250723/h264_aac_240p_m4s/23/432f65a0.m4s"/>
<SegmentURL media="../vd_5999c902ea707c67d8e267a9_1503250723/h264_aac_240p_m4s/24/432f65a0.m4s"/>
</SegmentList>
</Representation>
<Representation audioSamplingRate="44100" bandwidth="800000" codecs="avc3.42c01e,mp4a.40.2" frameRate="25" height="360" id="h264_aac_360p_m4s" mimeType="video/mp4" sar="1:1" startWithSAP="1" width="640">
<SegmentList duration="10000" timescale="1000">
<Initialization sourceURL="../vd_5999c902ea707c67d8e267a9_1503250723/h264_aac_360p_m4s/init/432f65a0.mp4"/>
<SegmentURL media="../vd_5999c902ea707c67d8e267a9_1503250723/h264_aac_360p_m4s/0/432f65a0.m4s"/>
<SegmentURL media="../vd_5999c902ea707c67d8e267a9_1503250723/h264_aac_360p_m4s/1/432f65a0.m4s"/>
<SegmentURL media="../vd_5999c902ea707c67d8e267a9_1503250723/h264_aac_360p_m4s/2/432f65a0.m4s"/>
<SegmentURL media="../vd_5999c902ea707c67d8e267a9_1503250723/h264_aac_360p_m4s/3/432f65a0.m4s"/>
<SegmentURL media="../vd_5999c902ea707c67d8e267a9_1503250723/h264_aac_360p_m4s/4/432f65a0.m4s"/>
<SegmentURL media="../vd_5999c902ea707c67d8e267a9_1503250723/h264_aac_360p_m4s/5/432f65a0.m4s"/>
<SegmentURL media="../vd_5999c902ea707c67d8e267a9_1503250723/h264_aac_360p_m4s/6/432f65a0.m4s"/>
<SegmentURL media="../vd_5999c902ea707c67d8e267a9_1503250723/h264_aac_360p_m4s/7/432f65a0.m4s"/>
<SegmentURL media="../vd_5999c902ea707c67d8e267a9_1503250723/h264_aac_360p_m4s/8/432f65a0.m4s"/>
<SegmentURL media="../vd_5999c902ea707c67d8e267a9_1503250723/h264_aac_360p_m4s/9/432f65a0.m4s"/>
<SegmentURL media="../vd_5999c902ea707c67d8e267a9_1503250723/h264_aac_360p_m4s/10/432f65a0.m4s"/>
<SegmentURL media="../vd_5999c902ea707c67d8e267a9_1503250723/h264_aac_360p_m4s/11/432f65a0.m4s"/>
<SegmentURL media="../vd_5999c902ea707c67d8e267a9_1503250723/h264_aac_360p_m4s/12/432f65a0.m4s"/>
<SegmentURL media="../vd_5999c902ea707c67d8e267a9_1503250723/h264_aac_360p_m4s/13/432f65a0.m4s"/>
<SegmentURL media="../vd_5999c902ea707c67d8e267a9_1503250723/h264_aac_360p_m4s/14/432f65a0.m4s"/>
<SegmentURL media="../vd_5999c902ea707c67d8e267a9_1503250723/h264_aac_360p_m4s/15/432f65a0.m4s"/>
<SegmentURL media="../vd_5999c902ea707c67d8e267a9_1503250723/h264_aac_360p_m4s/16/432f65a0.m4s"/>
<SegmentURL media="../vd_5999c902ea707c67d8e267a9_1503250723/h264_aac_360p_m4s/17/432f65a0.m4s"/>
<SegmentURL media="../vd_5999c902ea707c67d8e267a9_1503250723/h264_aac_360p_m4s/18/432f65a0.m4s"/>
<SegmentURL media="../vd_5999c902ea707c67d8e267a9_1503250723/h264_aac_360p_m4s/19/432f65a0.m4s"/>
<SegmentURL media="../vd_5999c902ea707c67d8e267a9_1503250723/h264_aac_360p_m4s/20/432f65a0.m4s"/>
<SegmentURL media="../vd_5999c902ea707c67d8e267a9_1503250723/h264_aac_360p_m4s/21/432f65a0.m4s"/>
<SegmentURL media="../vd_5999c902ea707c67d8e267a9_1503250723/h264_aac_360p_m4s/22/432f65a0.m4s"/>
<SegmentURL media="../vd_5999c902ea707c67d8e267a9_1503250723/h264_aac_360p_m4s/23/432f65a0.m4s"/>
<SegmentURL media="../vd_5999c902ea707c67d8e267a9_1503250723/h264_aac_360p_m4s/24/432f65a0.m4s"/>
</SegmentList>
</Representation>
<Representation audioSamplingRate="44100" bandwidth="1200000" codecs="avc3.42c01e,mp4a.40.2" frameRate="25" height="480" id="h264_aac_480p_m4s" mimeType="video/mp4" sar="320:321" startWithSAP="1" width="856">
<SegmentList duration="10000" timescale="1000">
<Initialization sourceURL="../vd_5999c902ea707c67d8e267a9_1503250723/h264_aac_480p_m4s/init/432f65a0.mp4"/>
<SegmentURL media="../vd_5999c902ea707c67d8e267a9_1503250723/h264_aac_480p_m4s/0/432f65a0.m4s"/>
<SegmentURL media="../vd_5999c902ea707c67d8e267a9_1503250723/h264_aac_480p_m4s/1/432f65a0.m4s"/>
<SegmentURL media="../vd_5999c902ea707c67d8e267a9_1503250723/h264_aac_480p_m4s/2/432f65a0.m4s"/>
<SegmentURL media="../vd_5999c902ea707c67d8e267a9_1503250723/h264_aac_480p_m4s/3/432f65a0.m4s"/>
<SegmentURL media="../vd_5999c902ea707c67d8e267a9_1503250723/h264_aac_480p_m4s/4/432f65a0.m4s"/>
<SegmentURL media="../vd_5999c902ea707c67d8e267a9_1503250723/h264_aac_480p_m4s/5/432f65a0.m4s"/>
<SegmentURL media="../vd_5999c902ea707c67d8e267a9_1503250723/h264_aac_480p_m4s/6/432f65a0.m4s"/>
<SegmentURL media="../vd_5999c902ea707c67d8e267a9_1503250723/h264_aac_480p_m4s/7/432f65a0.m4s"/>
<SegmentURL media="../vd_5999c902ea707c67d8e267a9_1503250723/h264_aac_480p_m4s/8/432f65a0.m4s"/>
<SegmentURL media="../vd_5999c902ea707c67d8e267a9_1503250723/h264_aac_480p_m4s/9/432f65a0.m4s"/>
<SegmentURL media="../vd_5999c902ea707c67d8e267a9_1503250723/h264_aac_480p_m4s/10/432f65a0.m4s"/>
<SegmentURL media="../vd_5999c902ea707c67d8e267a9_1503250723/h264_aac_480p_m4s/11/432f65a0.m4s"/>
<SegmentURL media="../vd_5999c902ea707c67d8e267a9_1503250723/h264_aac_480p_m4s/12/432f65a0.m4s"/>
<SegmentURL media="../vd_5999c902ea707c67d8e267a9_1503250723/h264_aac_480p_m4s/13/432f65a0.m4s"/>
<SegmentURL media="../vd_5999c902ea707c67d8e267a9_1503250723/h264_aac_480p_m4s/14/432f65a0.m4s"/>
<SegmentURL media="../vd_5999c902ea707c67d8e267a9_1503250723/h264_aac_480p_m4s/15/432f65a0.m4s"/>
<SegmentURL media="../vd_5999c902ea707c67d8e267a9_1503250723/h264_aac_480p_m4s/16/432f65a0.m4s"/>
<SegmentURL media="../vd_5999c902ea707c67d8e267a9_1503250723/h264_aac_480p_m4s/17/432f65a0.m4s"/>
<SegmentURL media="../vd_5999c902ea707c67d8e267a9_1503250723/h264_aac_480p_m4s/18/432f65a0.m4s"/>
<SegmentURL media="../vd_5999c902ea707c67d8e267a9_1503250723/h264_aac_480p_m4s/19/432f65a0.m4s"/>
<SegmentURL media="../vd_5999c902ea707c67d8e267a9_1503250723/h264_aac_480p_m4s/20/432f65a0.m4s"/>
<SegmentURL media="../vd_5999c902ea707c67d8e267a9_1503250723/h264_aac_480p_m4s/21/432f65a0.m4s"/>
<SegmentURL media="../vd_5999c902ea707c67d8e267a9_1503250723/h264_aac_480p_m4s/22/432f65a0.m4s"/>
<SegmentURL media="../vd_5999c902ea707c67d8e267a9_1503250723/h264_aac_480p_m4s/23/432f65a0.m4s"/>
<SegmentURL media="../vd_5999c902ea707c67d8e267a9_1503250723/h264_aac_480p_m4s/24/432f65a0.m4s"/>
</SegmentList>
</Representation>
<Representation audioSamplingRate="44100" bandwidth="1600000" codecs="avc3.42c01e,mp4a.40.2" frameRate="25" height="576" id="h264_aac_576p_m4s" mimeType="video/mp4" sar="1:1" startWithSAP="1" width="1024">
<SegmentList duration="10000" timescale="1000">
<Initialization sourceURL="../vd_5999c902ea707c67d8e267a9_1503250723/h264_aac_576p_m4s/init/432f65a0.mp4"/>
<SegmentURL media="../vd_5999c902ea707c67d8e267a9_1503250723/h264_aac_576p_m4s/0/432f65a0.m4s"/>
<SegmentURL media="../vd_5999c902ea707c67d8e267a9_1503250723/h264_aac_576p_m4s/1/432f65a0.m4s"/>
<SegmentURL media="../vd_5999c902ea707c67d8e267a9_1503250723/h264_aac_576p_m4s/2/432f65a0.m4s"/>
<SegmentURL media="../vd_5999c902ea707c67d8e267a9_1503250723/h264_aac_576p_m4s/3/432f65a0.m4s"/>
<SegmentURL media="../vd_5999c902ea707c67d8e267a9_1503250723/h264_aac_576p_m4s/4/432f65a0.m4s"/>
<SegmentURL media="../vd_5999c902ea707c67d8e267a9_1503250723/h264_aac_576p_m4s/5/432f65a0.m4s"/>
<SegmentURL media="../vd_5999c902ea707c67d8e267a9_1503250723/h264_aac_576p_m4s/6/432f65a0.m4s"/>
<SegmentURL media="../vd_5999c902ea707c67d8e267a9_1503250723/h264_aac_576p_m4s/7/432f65a0.m4s"/>
<SegmentURL media="../vd_5999c902ea707c67d8e267a9_1503250723/h264_aac_576p_m4s/8/432f65a0.m4s"/>
<SegmentURL media="../vd_5999c902ea707c67d8e267a9_1503250723/h264_aac_576p_m4s/9/432f65a0.m4s"/>
<SegmentURL media="../vd_5999c902ea707c67d8e267a9_1503250723/h264_aac_576p_m4s/10/432f65a0.m4s"/>
<SegmentURL media="../vd_5999c902ea707c67d8e267a9_1503250723/h264_aac_576p_m4s/11/432f65a0.m4s"/>
<SegmentURL media="../vd_5999c902ea707c67d8e267a9_1503250723/h264_aac_576p_m4s/12/432f65a0.m4s"/>
<SegmentURL media="../vd_5999c902ea707c67d8e267a9_1503250723/h264_aac_576p_m4s/13/432f65a0.m4s"/>
<SegmentURL media="../vd_5999c902ea707c67d8e267a9_1503250723/h264_aac_576p_m4s/14/432f65a0.m4s"/>
<SegmentURL media="../vd_5999c902ea707c67d8e267a9_1503250723/h264_aac_576p_m4s/15/432f65a0.m4s"/>
<SegmentURL media="../vd_5999c902ea707c67d8e267a9_1503250723/h264_aac_576p_m4s/16/432f65a0.m4s"/>
<SegmentURL media="../vd_5999c902ea707c67d8e267a9_1503250723/h264_aac_576p_m4s/17/432f65a0.m4s"/>
<SegmentURL media="../vd_5999c902ea707c67d8e267a9_1503250723/h264_aac_576p_m4s/18/432f65a0.m4s"/>
<SegmentURL media="../vd_5999c902ea707c67d8e267a9_1503250723/h264_aac_576p_m4s/19/432f65a0.m4s"/>
<SegmentURL media="../vd_5999c902ea707c67d8e267a9_1503250723/h264_aac_576p_m4s/20/432f65a0.m4s"/>
<SegmentURL media="../vd_5999c902ea707c67d8e267a9_1503250723/h264_aac_576p_m4s/21/432f65a0.m4s"/>
<SegmentURL media="../vd_5999c902ea707c67d8e267a9_1503250723/h264_aac_576p_m4s/22/432f65a0.m4s"/>
<SegmentURL media="../vd_5999c902ea707c67d8e267a9_1503250723/h264_aac_576p_m4s/23/432f65a0.m4s"/>
<SegmentURL media="../vd_5999c902ea707c67d8e267a9_1503250723/h264_aac_576p_m4s/24/432f65a0.m4s"/>
</SegmentList>
</Representation>
<Representation audioSamplingRate="44100" bandwidth="2400000" codecs="avc3.42c01e,mp4a.40.2" frameRate="25" height="720" id="h264_aac_720p_m4s" mimeType="video/mp4" sar="1:1" startWithSAP="1" width="1280">
<SegmentList duration="10000" timescale="1000">
<Initialization sourceURL="../vd_5999c902ea707c67d8e267a9_1503250723/h264_aac_720p_m4s/init/432f65a0.mp4"/>
<SegmentURL media="../vd_5999c902ea707c67d8e267a9_1503250723/h264_aac_720p_m4s/0/432f65a0.m4s"/>
<SegmentURL media="../vd_5999c902ea707c67d8e267a9_1503250723/h264_aac_720p_m4s/1/432f65a0.m4s"/>
<SegmentURL media="../vd_5999c902ea707c67d8e267a9_1503250723/h264_aac_720p_m4s/2/432f65a0.m4s"/>
<SegmentURL media="../vd_5999c902ea707c67d8e267a9_1503250723/h264_aac_720p_m4s/3/432f65a0.m4s"/>
<SegmentURL media="../vd_5999c902ea707c67d8e267a9_1503250723/h264_aac_720p_m4s/4/432f65a0.m4s"/>
<SegmentURL media="../vd_5999c902ea707c67d8e267a9_1503250723/h264_aac_720p_m4s/5/432f65a0.m4s"/>
<SegmentURL media="../vd_5999c902ea707c67d8e267a9_1503250723/h264_aac_720p_m4s/6/432f65a0.m4s"/>
<SegmentURL media="../vd_5999c902ea707c67d8e267a9_1503250723/h264_aac_720p_m4s/7/432f65a0.m4s"/>
<SegmentURL media="../vd_5999c902ea707c67d8e267a9_1503250723/h264_aac_720p_m4s/8/432f65a0.m4s"/>
<SegmentURL media="../vd_5999c902ea707c67d8e267a9_1503250723/h264_aac_720p_m4s/9/432f65a0.m4s"/>
<SegmentURL media="../vd_5999c902ea707c67d8e267a9_1503250723/h264_aac_720p_m4s/10/432f65a0.m4s"/>
<SegmentURL media="../vd_5999c902ea707c67d8e267a9_1503250723/h264_aac_720p_m4s/11/432f65a0.m4s"/>
<SegmentURL media="../vd_5999c902ea707c67d8e267a9_1503250723/h264_aac_720p_m4s/12/432f65a0.m4s"/>
<SegmentURL media="../vd_5999c902ea707c67d8e267a9_1503250723/h264_aac_720p_m4s/13/432f65a0.m4s"/>
<SegmentURL media="../vd_5999c902ea707c67d8e267a9_1503250723/h264_aac_720p_m4s/14/432f65a0.m4s"/>
<SegmentURL media="../vd_5999c902ea707c67d8e267a9_1503250723/h264_aac_720p_m4s/15/432f65a0.m4s"/>
<SegmentURL media="../vd_5999c902ea707c67d8e267a9_1503250723/h264_aac_720p_m4s/16/432f65a0.m4s"/>
<SegmentURL media="../vd_5999c902ea707c67d8e267a9_1503250723/h264_aac_720p_m4s/17/432f65a0.m4s"/>
<SegmentURL media="../vd_5999c902ea707c67d8e267a9_1503250723/h264_aac_720p_m4s/18/432f65a0.m4s"/>
<SegmentURL media="../vd_5999c902ea707c67d8e267a9_1503250723/h264_aac_720p_m4s/19/432f65a0.m4s"/>
<SegmentURL media="../vd_5999c902ea707c67d8e267a9_1503250723/h264_aac_720p_m4s/20/432f65a0.m4s"/>
<SegmentURL media="../vd_5999c902ea707c67d8e267a9_1503250723/h264_aac_720p_m4s/21/432f65a0.m4s"/>
<SegmentURL media="../vd_5999c902ea707c67d8e267a9_1503250723/h264_aac_720p_m4s/22/432f65a0.m4s"/>
<SegmentURL media="../vd_5999c902ea707c67d8e267a9_1503250723/h264_aac_720p_m4s/23/432f65a0.m4s"/>
<SegmentURL media="../vd_5999c902ea707c67d8e267a9_1503250723/h264_aac_720p_m4s/24/432f65a0.m4s"/>
</SegmentList>
</Representation>
<Representation audioSamplingRate="44100" bandwidth="4400000" codecs="avc3.42c01e,mp4a.40.2" frameRate="25" height="1080" id="h264_aac_1080p_m4s" mimeType="video/mp4" sar="1:1" startWithSAP="1" width="1920">
<SegmentList duration="10000" timescale="1000">
<Initialization sourceURL="../vd_5999c902ea707c67d8e267a9_1503250723/h264_aac_1080p_m4s/init/432f65a0.mp4"/>
<SegmentURL media="../vd_5999c902ea707c67d8e267a9_1503250723/h264_aac_1080p_m4s/0/432f65a0.m4s"/>
<SegmentURL media="../vd_5999c902ea707c67d8e267a9_1503250723/h264_aac_1080p_m4s/1/432f65a0.m4s"/>
<SegmentURL media="../vd_5999c902ea707c67d8e267a9_1503250723/h264_aac_1080p_m4s/2/432f65a0.m4s"/>
<SegmentURL media="../vd_5999c902ea707c67d8e267a9_1503250723/h264_aac_1080p_m4s/3/432f65a0.m4s"/>
<SegmentURL media="../vd_5999c902ea707c67d8e267a9_1503250723/h264_aac_1080p_m4s/4/432f65a0.m4s"/>
<SegmentURL media="../vd_5999c902ea707c67d8e267a9_1503250723/h264_aac_1080p_m4s/5/432f65a0.m4s"/>
<SegmentURL media="../vd_5999c902ea707c67d8e267a9_1503250723/h264_aac_1080p_m4s/6/432f65a0.m4s"/>
<SegmentURL media="../vd_5999c902ea707c67d8e267a9_1503250723/h264_aac_1080p_m4s/7/432f65a0.m4s"/>
<SegmentURL media="../vd_5999c902ea707c67d8e267a9_1503250723/h264_aac_1080p_m4s/8/432f65a0.m4s"/>
<SegmentURL media="../vd_5999c902ea707c67d8e267a9_1503250723/h264_aac_1080p_m4s/9/432f65a0.m4s"/>
<SegmentURL media="../vd_5999c902ea707c67d8e267a9_1503250723/h264_aac_1080p_m4s/10/432f65a0.m4s"/>
<SegmentURL media="../vd_5999c902ea707c67d8e267a9_1503250723/h264_aac_1080p_m4s/11/432f65a0.m4s"/>
<SegmentURL media="../vd_5999c902ea707c67d8e267a9_1503250723/h264_aac_1080p_m4s/12/432f65a0.m4s"/>
<SegmentURL media="../vd_5999c902ea707c67d8e267a9_1503250723/h264_aac_1080p_m4s/13/432f65a0.m4s"/>
<SegmentURL media="../vd_5999c902ea707c67d8e267a9_1503250723/h264_aac_1080p_m4s/14/432f65a0.m4s"/>
<SegmentURL media="../vd_5999c902ea707c67d8e267a9_1503250723/h264_aac_1080p_m4s/15/432f65a0.m4s"/>
<SegmentURL media="../vd_5999c902ea707c67d8e267a9_1503250723/h264_aac_1080p_m4s/16/432f65a0.m4s"/>
<SegmentURL media="../vd_5999c902ea707c67d8e267a9_1503250723/h264_aac_1080p_m4s/17/432f65a0.m4s"/>
<SegmentURL media="../vd_5999c902ea707c67d8e267a9_1503250723/h264_aac_1080p_m4s/18/432f65a0.m4s"/>
<SegmentURL media="../vd_5999c902ea707c67d8e267a9_1503250723/h264_aac_1080p_m4s/19/432f65a0.m4s"/>
<SegmentURL media="../vd_5999c902ea707c67d8e267a9_1503250723/h264_aac_1080p_m4s/20/432f65a0.m4s"/>
<SegmentURL media="../vd_5999c902ea707c67d8e267a9_1503250723/h264_aac_1080p_m4s/21/432f65a0.m4s"/>
<SegmentURL media="../vd_5999c902ea707c67d8e267a9_1503250723/h264_aac_1080p_m4s/22/432f65a0.m4s"/>
<SegmentURL media="../vd_5999c902ea707c67d8e267a9_1503250723/h264_aac_1080p_m4s/23/432f65a0.m4s"/>
<SegmentURL media="../vd_5999c902ea707c67d8e267a9_1503250723/h264_aac_1080p_m4s/24/432f65a0.m4s"/>
</SegmentList>
</Representation>
</AdaptationSet>
</Period>
</MPD>

View File

@ -65,6 +65,7 @@ from .utils import (
locked_file, locked_file,
make_HTTPS_handler, make_HTTPS_handler,
MaxDownloadsReached, MaxDownloadsReached,
orderedSet,
PagedList, PagedList,
parse_filesize, parse_filesize,
PerRequestProxyHandler, PerRequestProxyHandler,
@ -92,6 +93,7 @@ from .utils import (
) )
from .cache import Cache from .cache import Cache
from .extractor import get_info_extractor, gen_extractor_classes, _LAZY_LOADER from .extractor import get_info_extractor, gen_extractor_classes, _LAZY_LOADER
from .extractor.openload import PhantomJSwrapper
from .downloader import get_suitable_downloader from .downloader import get_suitable_downloader
from .downloader.rtmp import rtmpdump_version from .downloader.rtmp import rtmpdump_version
from .postprocessor import ( from .postprocessor import (
@ -303,6 +305,12 @@ class YoutubeDL(object):
otherwise prefer avconv. otherwise prefer avconv.
postprocessor_args: A list of additional command-line arguments for the postprocessor_args: A list of additional command-line arguments for the
postprocessor. postprocessor.
The following options are used by the Youtube extractor:
youtube_include_dash_manifest: If True (default), DASH manifests and related
data will be downloaded and processed by extractor.
You can reduce network I/O by disabling it if you don't
care about DASH.
""" """
_NUMERIC_FIELDS = set(( _NUMERIC_FIELDS = set((
@ -901,15 +909,25 @@ class YoutubeDL(object):
yield int(item) yield int(item)
else: else:
yield int(string_segment) yield int(string_segment)
playlistitems = iter_playlistitems(playlistitems_str) playlistitems = orderedSet(iter_playlistitems(playlistitems_str))
ie_entries = ie_result['entries'] ie_entries = ie_result['entries']
def make_playlistitems_entries(list_ie_entries):
num_entries = len(list_ie_entries)
return [
list_ie_entries[i - 1] for i in playlistitems
if -num_entries <= i - 1 < num_entries]
def report_download(num_entries):
self.to_screen(
'[%s] playlist %s: Downloading %d videos' %
(ie_result['extractor'], playlist, num_entries))
if isinstance(ie_entries, list): if isinstance(ie_entries, list):
n_all_entries = len(ie_entries) n_all_entries = len(ie_entries)
if playlistitems: if playlistitems:
entries = [ entries = make_playlistitems_entries(ie_entries)
ie_entries[i - 1] for i in playlistitems
if -n_all_entries <= i - 1 < n_all_entries]
else: else:
entries = ie_entries[playliststart:playlistend] entries = ie_entries[playliststart:playlistend]
n_entries = len(entries) n_entries = len(entries)
@ -927,20 +945,16 @@ class YoutubeDL(object):
entries = ie_entries.getslice( entries = ie_entries.getslice(
playliststart, playlistend) playliststart, playlistend)
n_entries = len(entries) n_entries = len(entries)
self.to_screen( report_download(n_entries)
'[%s] playlist %s: Downloading %d videos' %
(ie_result['extractor'], playlist, n_entries))
else: # iterable else: # iterable
if playlistitems: if playlistitems:
entry_list = list(ie_entries) entries = make_playlistitems_entries(list(itertools.islice(
entries = [entry_list[i - 1] for i in playlistitems] ie_entries, 0, max(playlistitems))))
else: else:
entries = list(itertools.islice( entries = list(itertools.islice(
ie_entries, playliststart, playlistend)) ie_entries, playliststart, playlistend))
n_entries = len(entries) n_entries = len(entries)
self.to_screen( report_download(n_entries)
'[%s] playlist %s: Downloading %d videos' %
(ie_result['extractor'], playlist, n_entries))
if self.params.get('playlistreverse', False): if self.params.get('playlistreverse', False):
entries = entries[::-1] entries = entries[::-1]
@ -1065,22 +1079,27 @@ class YoutubeDL(object):
return _filter return _filter
def _default_format_spec(self, info_dict, download=True): def _default_format_spec(self, info_dict, download=True):
req_format_list = []
def can_have_partial_formats(): def can_merge():
if self.params.get('simulate', False):
return True
if not download:
return True
if self.params.get('outtmpl', DEFAULT_OUTTMPL) == '-':
return False
if info_dict.get('is_live'):
return False
merger = FFmpegMergerPP(self) merger = FFmpegMergerPP(self)
return merger.available and merger.can_merge() return merger.available and merger.can_merge()
if can_have_partial_formats():
req_format_list.append('bestvideo+bestaudio') def prefer_best():
req_format_list.append('best') if self.params.get('simulate', False):
return False
if not download:
return False
if self.params.get('outtmpl', DEFAULT_OUTTMPL) == '-':
return True
if info_dict.get('is_live'):
return True
if not can_merge():
return True
return False
req_format_list = ['bestvideo+bestaudio', 'best']
if prefer_best():
req_format_list.reverse()
return '/'.join(req_format_list) return '/'.join(req_format_list)
def build_format_selector(self, format_spec): def build_format_selector(self, format_spec):
@ -1710,12 +1729,17 @@ class YoutubeDL(object):
if filename is None: if filename is None:
return return
try: def ensure_dir_exists(path):
dn = os.path.dirname(sanitize_path(encodeFilename(filename))) try:
if dn and not os.path.exists(dn): dn = os.path.dirname(path)
os.makedirs(dn) if dn and not os.path.exists(dn):
except (OSError, IOError) as err: os.makedirs(dn)
self.report_error('unable to create directory ' + error_to_compat_str(err)) return True
except (OSError, IOError) as err:
self.report_error('unable to create directory ' + error_to_compat_str(err))
return False
if not ensure_dir_exists(sanitize_path(encodeFilename(filename))):
return return
if self.params.get('writedescription', False): if self.params.get('writedescription', False):
@ -1758,29 +1782,30 @@ class YoutubeDL(object):
ie = self.get_info_extractor(info_dict['extractor_key']) ie = self.get_info_extractor(info_dict['extractor_key'])
for sub_lang, sub_info in subtitles.items(): for sub_lang, sub_info in subtitles.items():
sub_format = sub_info['ext'] sub_format = sub_info['ext']
if sub_info.get('data') is not None: sub_filename = subtitles_filename(filename, sub_lang, sub_format)
sub_data = sub_info['data'] if self.params.get('nooverwrites', False) and os.path.exists(encodeFilename(sub_filename)):
self.to_screen('[info] Video subtitle %s.%s is already present' % (sub_lang, sub_format))
else: else:
try: self.to_screen('[info] Writing video subtitles to: ' + sub_filename)
sub_data = ie._download_webpage( if sub_info.get('data') is not None:
sub_info['url'], info_dict['id'], note=False) try:
except ExtractorError as err: # Use newline='' to prevent conversion of newline characters
self.report_warning('Unable to download subtitle for "%s": %s' % # See https://github.com/rg3/youtube-dl/issues/10268
(sub_lang, error_to_compat_str(err.cause))) with io.open(encodeFilename(sub_filename), 'w', encoding='utf-8', newline='') as subfile:
continue subfile.write(sub_info['data'])
try: except (OSError, IOError):
sub_filename = subtitles_filename(filename, sub_lang, sub_format) self.report_error('Cannot write subtitles file ' + sub_filename)
if self.params.get('nooverwrites', False) and os.path.exists(encodeFilename(sub_filename)): return
self.to_screen('[info] Video subtitle %s.%s is already_present' % (sub_lang, sub_format))
else: else:
self.to_screen('[info] Writing video subtitles to: ' + sub_filename) try:
# Use newline='' to prevent conversion of newline characters sub_data = ie._request_webpage(
# See https://github.com/rg3/youtube-dl/issues/10268 sub_info['url'], info_dict['id'], note=False).read()
with io.open(encodeFilename(sub_filename), 'w', encoding='utf-8', newline='') as subfile: with io.open(encodeFilename(sub_filename), 'wb') as subfile:
subfile.write(sub_data) subfile.write(sub_data)
except (OSError, IOError): except (ExtractorError, IOError, OSError, ValueError) as err:
self.report_error('Cannot write subtitles file ' + sub_filename) self.report_warning('Unable to download subtitle for "%s": %s' %
return (sub_lang, error_to_compat_str(err)))
continue
if self.params.get('writeinfojson', False): if self.params.get('writeinfojson', False):
infofn = replace_extension(filename, 'info.json', info_dict.get('ext')) infofn = replace_extension(filename, 'info.json', info_dict.get('ext'))
@ -1853,8 +1878,11 @@ class YoutubeDL(object):
for f in requested_formats: for f in requested_formats:
new_info = dict(info_dict) new_info = dict(info_dict)
new_info.update(f) new_info.update(f)
fname = self.prepare_filename(new_info) fname = prepend_extension(
fname = prepend_extension(fname, 'f%s' % f['format_id'], new_info['ext']) self.prepare_filename(new_info),
'f%s' % f['format_id'], new_info['ext'])
if not ensure_dir_exists(fname):
return
downloaded.append(fname) downloaded.append(fname)
partial_success = dl(fname, new_info) partial_success = dl(fname, new_info)
success = success and partial_success success = success and partial_success
@ -2208,6 +2236,7 @@ class YoutubeDL(object):
exe_versions = FFmpegPostProcessor.get_versions(self) exe_versions = FFmpegPostProcessor.get_versions(self)
exe_versions['rtmpdump'] = rtmpdump_version() exe_versions['rtmpdump'] = rtmpdump_version()
exe_versions['phantomjs'] = PhantomJSwrapper._version()
exe_str = ', '.join( exe_str = ', '.join(
'%s %s' % (exe, v) '%s %s' % (exe, v)
for exe, v in sorted(exe_versions.items()) for exe, v in sorted(exe_versions.items())

View File

@ -206,7 +206,7 @@ def _real_main(argv=None):
if opts.recodevideo not in ['mp4', 'flv', 'webm', 'ogg', 'mkv', 'avi']: if opts.recodevideo not in ['mp4', 'flv', 'webm', 'ogg', 'mkv', 'avi']:
parser.error('invalid video recode format specified') parser.error('invalid video recode format specified')
if opts.convertsubtitles is not None: if opts.convertsubtitles is not None:
if opts.convertsubtitles not in ['srt', 'vtt', 'ass']: if opts.convertsubtitles not in ['srt', 'vtt', 'ass', 'lrc']:
parser.error('invalid subtitle format specified') parser.error('invalid subtitle format specified')
if opts.date is not None: if opts.date is not None:

View File

@ -6,6 +6,7 @@ import collections
import email import email
import getpass import getpass
import io import io
import itertools
import optparse import optparse
import os import os
import re import re
@ -15,7 +16,6 @@ import socket
import struct import struct
import subprocess import subprocess
import sys import sys
import itertools
import xml.etree.ElementTree import xml.etree.ElementTree
@ -2898,6 +2898,13 @@ else:
compat_struct_pack = struct.pack compat_struct_pack = struct.pack
compat_struct_unpack = struct.unpack compat_struct_unpack = struct.unpack
try:
from future_builtins import zip as compat_zip
except ImportError: # not 2.6+ or is 3.x
try:
from itertools import izip as compat_zip # < 2.5 or 3.x
except ImportError:
compat_zip = zip
__all__ = [ __all__ = [
'compat_HTMLParseError', 'compat_HTMLParseError',
@ -2948,5 +2955,6 @@ __all__ = [
'compat_urlretrieve', 'compat_urlretrieve',
'compat_xml_parse_error', 'compat_xml_parse_error',
'compat_xpath', 'compat_xpath',
'compat_zip',
'workaround_optparse_bug9161', 'workaround_optparse_bug9161',
] ]

View File

@ -243,8 +243,17 @@ def remove_encrypted_media(media):
media)) media))
def _add_ns(prop): def _add_ns(prop, ver=1):
return '{http://ns.adobe.com/f4m/1.0}%s' % prop return '{http://ns.adobe.com/f4m/%d.0}%s' % (ver, prop)
def get_base_url(manifest):
base_url = xpath_text(
manifest, [_add_ns('baseURL'), _add_ns('baseURL', 2)],
'base URL', default=None)
if base_url:
base_url = base_url.strip()
return base_url
class F4mFD(FragmentFD): class F4mFD(FragmentFD):
@ -330,13 +339,13 @@ class F4mFD(FragmentFD):
rate, media = list(filter( rate, media = list(filter(
lambda f: int(f[0]) == requested_bitrate, formats))[0] lambda f: int(f[0]) == requested_bitrate, formats))[0]
base_url = compat_urlparse.urljoin(man_url, media.attrib['url']) # Prefer baseURL for relative URLs as per 11.2 of F4M 3.0 spec.
man_base_url = get_base_url(doc) or man_url
base_url = compat_urlparse.urljoin(man_base_url, media.attrib['url'])
bootstrap_node = doc.find(_add_ns('bootstrapInfo')) bootstrap_node = doc.find(_add_ns('bootstrapInfo'))
# From Adobe F4M 3.0 spec: boot_info, bootstrap_url = self._parse_bootstrap_node(
# The <baseURL> element SHALL be the base URL for all relative bootstrap_node, man_base_url)
# (HTTP-based) URLs in the manifest. If <baseURL> is not present, said
# URLs should be relative to the location of the containing document.
boot_info, bootstrap_url = self._parse_bootstrap_node(bootstrap_node, man_url)
live = boot_info['live'] live = boot_info['live']
metadata_node = media.find(_add_ns('metadata')) metadata_node = media.find(_add_ns('metadata'))
if metadata_node is not None: if metadata_node is not None:

View File

@ -107,6 +107,7 @@ class FragmentFD(FileDownloader):
def _append_fragment(self, ctx, frag_content): def _append_fragment(self, ctx, frag_content):
try: try:
ctx['dest_stream'].write(frag_content) ctx['dest_stream'].write(frag_content)
ctx['dest_stream'].flush()
finally: finally:
if self.__do_ytdl_file(ctx): if self.__do_ytdl_file(ctx):
self._write_ytdl_file(ctx) self._write_ytdl_file(ctx)
@ -117,9 +118,15 @@ class FragmentFD(FileDownloader):
def _prepare_frag_download(self, ctx): def _prepare_frag_download(self, ctx):
if 'live' not in ctx: if 'live' not in ctx:
ctx['live'] = False ctx['live'] = False
if not ctx['live']:
total_frags_str = '%d' % ctx['total_frags']
ad_frags = ctx.get('ad_frags', 0)
if ad_frags:
total_frags_str += ' (not including %d ad)' % ad_frags
else:
total_frags_str = 'unknown (live)'
self.to_screen( self.to_screen(
'[%s] Total fragments: %s' '[%s] Total fragments: %s' % (self.FD_NAME, total_frags_str))
% (self.FD_NAME, ctx['total_frags'] if not ctx['live'] else 'unknown (live)'))
self.report_destination(ctx['filename']) self.report_destination(ctx['filename'])
dl = HttpQuietDownloader( dl = HttpQuietDownloader(
self.ydl, self.ydl,
@ -151,10 +158,15 @@ class FragmentFD(FileDownloader):
if self.__do_ytdl_file(ctx): if self.__do_ytdl_file(ctx):
if os.path.isfile(encodeFilename(self.ytdl_filename(ctx['filename']))): if os.path.isfile(encodeFilename(self.ytdl_filename(ctx['filename']))):
self._read_ytdl_file(ctx) self._read_ytdl_file(ctx)
if ctx['fragment_index'] > 0 and resume_len == 0:
self.report_warning(
'Inconsistent state of incomplete fragment download. '
'Restarting from the beginning...')
ctx['fragment_index'] = resume_len = 0
self._write_ytdl_file(ctx)
else: else:
self._write_ytdl_file(ctx) self._write_ytdl_file(ctx)
if ctx['fragment_index'] > 0: assert ctx['fragment_index'] == 0
assert resume_len > 0
dest_stream, tmpfilename = sanitize_open(tmpfilename, open_mode) dest_stream, tmpfilename = sanitize_open(tmpfilename, open_mode)

View File

@ -75,15 +75,30 @@ class HlsFD(FragmentFD):
fd.add_progress_hook(ph) fd.add_progress_hook(ph)
return fd.real_download(filename, info_dict) return fd.real_download(filename, info_dict)
total_frags = 0 def anvato_ad(s):
return s.startswith('#ANVATO-SEGMENT-INFO') and 'type=ad' in s
media_frags = 0
ad_frags = 0
ad_frag_next = False
for line in s.splitlines(): for line in s.splitlines():
line = line.strip() line = line.strip()
if line and not line.startswith('#'): if not line:
total_frags += 1 continue
if line.startswith('#'):
if anvato_ad(line):
ad_frags += 1
ad_frag_next = True
continue
if ad_frag_next:
ad_frag_next = False
continue
media_frags += 1
ctx = { ctx = {
'filename': filename, 'filename': filename,
'total_frags': total_frags, 'total_frags': media_frags,
'ad_frags': ad_frags,
} }
self._prepare_and_start_frag_download(ctx) self._prepare_and_start_frag_download(ctx)
@ -101,10 +116,14 @@ class HlsFD(FragmentFD):
decrypt_info = {'METHOD': 'NONE'} decrypt_info = {'METHOD': 'NONE'}
byte_range = {} byte_range = {}
frag_index = 0 frag_index = 0
ad_frag_next = False
for line in s.splitlines(): for line in s.splitlines():
line = line.strip() line = line.strip()
if line: if line:
if not line.startswith('#'): if not line.startswith('#'):
if ad_frag_next:
ad_frag_next = False
continue
frag_index += 1 frag_index += 1
if frag_index <= ctx['fragment_index']: if frag_index <= ctx['fragment_index']:
continue continue
@ -175,6 +194,8 @@ class HlsFD(FragmentFD):
'start': sub_range_start, 'start': sub_range_start,
'end': sub_range_start + int(splitted_byte_range[0]), 'end': sub_range_start + int(splitted_byte_range[0]),
} }
elif anvato_ad(line):
ad_frag_next = True
self._finish_frag_download(ctx) self._finish_frag_download(ctx)

View File

@ -7,6 +7,7 @@ import time
from .amp import AMPIE from .amp import AMPIE
from .common import InfoExtractor from .common import InfoExtractor
from .youtube import YoutubeIE
from ..compat import compat_urlparse from ..compat import compat_urlparse
@ -108,9 +109,7 @@ class AbcNewsIE(InfoExtractor):
r'window\.abcnvideo\.url\s*=\s*"([^"]+)"', webpage, 'video URL') r'window\.abcnvideo\.url\s*=\s*"([^"]+)"', webpage, 'video URL')
full_video_url = compat_urlparse.urljoin(url, video_url) full_video_url = compat_urlparse.urljoin(url, video_url)
youtube_url = self._html_search_regex( youtube_url = YoutubeIE._extract_url(webpage)
r'<iframe[^>]+src="(https://www\.youtube\.com/embed/[^"]+)"',
webpage, 'YouTube URL', default=None)
timestamp = None timestamp = None
date_str = self._html_search_regex( date_str = self._html_search_regex(
@ -140,7 +139,7 @@ class AbcNewsIE(InfoExtractor):
} }
if youtube_url: if youtube_url:
entries = [entry, self.url_result(youtube_url, 'Youtube')] entries = [entry, self.url_result(youtube_url, ie=YoutubeIE.ie_key())]
return self.playlist_result(entries) return self.playlist_result(entries)
return entry return entry

View File

@ -131,7 +131,7 @@ class AENetworksIE(AENetworksBaseIE):
r'data-media-url=(["\'])(?P<url>(?:(?!\1).)+?)\1'], r'data-media-url=(["\'])(?P<url>(?:(?!\1).)+?)\1'],
webpage, 'video url', group='url') webpage, 'video url', group='url')
theplatform_metadata = self._download_theplatform_metadata(self._search_regex( theplatform_metadata = self._download_theplatform_metadata(self._search_regex(
r'https?://link.theplatform.com/s/([^?]+)', media_url, 'theplatform_path'), video_id) r'https?://link\.theplatform\.com/s/([^?]+)', media_url, 'theplatform_path'), video_id)
info = self._parse_theplatform_metadata(theplatform_metadata) info = self._parse_theplatform_metadata(theplatform_metadata)
if theplatform_metadata.get('AETN$isBehindWall'): if theplatform_metadata.get('AETN$isBehindWall'):
requestor_id = self._DOMAIN_TO_REQUESTOR_ID[domain] requestor_id = self._DOMAIN_TO_REQUESTOR_ID[domain]

View File

@ -138,6 +138,23 @@ class AfreecaTVIE(InfoExtractor):
'params': { 'params': {
'skip_download': True, 'skip_download': True,
}, },
}, {
# adult video
'url': 'http://vod.afreecatv.com/PLAYER/STATION/26542731',
'info_dict': {
'id': '20171001_F1AE1711_196617479_1',
'ext': 'mp4',
'title': '[생]서아 초심 찾기 방송 (part 1)',
'thumbnail': 're:^https?://(?:video|st)img.afreecatv.com/.*$',
'uploader': 'BJ서아',
'uploader_id': 'bjdyrksu',
'upload_date': '20171001',
'duration': 3600,
'age_limit': 18,
},
'params': {
'skip_download': True,
},
}, { }, {
'url': 'http://www.afreecatv.com/player/Player.swf?szType=szBjId=djleegoon&nStationNo=11273158&nBbsNo=13161095&nTitleNo=36327652', 'url': 'http://www.afreecatv.com/player/Player.swf?szType=szBjId=djleegoon&nStationNo=11273158&nBbsNo=13161095&nTitleNo=36327652',
'only_matching': True, 'only_matching': True,
@ -160,7 +177,15 @@ class AfreecaTVIE(InfoExtractor):
video_xml = self._download_xml( video_xml = self._download_xml(
'http://afbbs.afreecatv.com:8080/api/video/get_video_info.php', 'http://afbbs.afreecatv.com:8080/api/video/get_video_info.php',
video_id, query={'nTitleNo': video_id}) video_id, query={
'nTitleNo': video_id,
'partialView': 'SKIP_ADULT',
})
flag = xpath_text(video_xml, './track/flag', 'flag', default=None)
if flag and flag != 'SUCCEED':
raise ExtractorError(
'%s said: %s' % (self.IE_NAME, flag), expected=True)
video_element = video_xml.findall(compat_xpath('./track/video'))[1] video_element = video_xml.findall(compat_xpath('./track/video'))[1]
if video_element is None or video_element.text is None: if video_element is None or video_element.text is None:
@ -246,107 +271,3 @@ class AfreecaTVIE(InfoExtractor):
}) })
return info return info
class AfreecaTVGlobalIE(AfreecaTVIE):
IE_NAME = 'afreecatv:global'
_VALID_URL = r'https?://(?:www\.)?afreeca\.tv/(?P<channel_id>\d+)(?:/v/(?P<video_id>\d+))?'
_TESTS = [{
'url': 'http://afreeca.tv/36853014/v/58301',
'info_dict': {
'id': '58301',
'title': 'tryhard top100',
'uploader_id': '36853014',
'uploader': 'makgi Hearthstone Live!',
},
'playlist_count': 3,
}]
def _real_extract(self, url):
channel_id, video_id = re.match(self._VALID_URL, url).groups()
video_type = 'video' if video_id else 'live'
query = {
'pt': 'view',
'bid': channel_id,
}
if video_id:
query['vno'] = video_id
video_data = self._download_json(
'http://api.afreeca.tv/%s/view_%s.php' % (video_type, video_type),
video_id or channel_id, query=query)['channel']
if video_data.get('result') != 1:
raise ExtractorError('%s said: %s' % (self.IE_NAME, video_data['remsg']))
title = video_data['title']
info = {
'thumbnail': video_data.get('thumb'),
'view_count': int_or_none(video_data.get('vcnt')),
'age_limit': int_or_none(video_data.get('grade')),
'uploader_id': channel_id,
'uploader': video_data.get('cname'),
}
if video_id:
entries = []
for i, f in enumerate(video_data.get('flist', [])):
video_key = self.parse_video_key(f.get('key', ''))
f_url = f.get('file')
if not video_key or not f_url:
continue
entries.append({
'id': '%s_%s' % (video_id, video_key.get('part', i + 1)),
'title': title,
'upload_date': video_key.get('upload_date'),
'duration': int_or_none(f.get('length')),
'url': f_url,
'protocol': 'm3u8_native',
'ext': 'mp4',
})
info.update({
'id': video_id,
'title': title,
'duration': int_or_none(video_data.get('length')),
})
if len(entries) > 1:
info['_type'] = 'multi_video'
info['entries'] = entries
elif len(entries) == 1:
i = entries[0].copy()
i.update(info)
info = i
else:
formats = []
for s in video_data.get('strm', []):
s_url = s.get('purl')
if not s_url:
continue
stype = s.get('stype')
if stype == 'HLS':
formats.extend(self._extract_m3u8_formats(
s_url, channel_id, 'mp4', m3u8_id=stype, fatal=False))
elif stype == 'RTMP':
format_id = [stype]
label = s.get('label')
if label:
format_id.append(label)
formats.append({
'format_id': '-'.join(format_id),
'url': s_url,
'tbr': int_or_none(s.get('bps')),
'height': int_or_none(s.get('brt')),
'ext': 'flv',
'rtmp_live': True,
})
self._sort_formats(formats)
info.update({
'id': channel_id,
'title': self._live_title(title),
'is_live': True,
'formats': formats,
})
return info

View File

@ -0,0 +1,53 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
from ..compat import compat_str
from ..utils import (
float_or_none,
try_get,
)
class AliExpressLiveIE(InfoExtractor):
_VALID_URL = r'https?://live\.aliexpress\.com/live/(?P<id>\d+)'
_TEST = {
'url': 'https://live.aliexpress.com/live/2800002704436634',
'md5': 'e729e25d47c5e557f2630eaf99b740a5',
'info_dict': {
'id': '2800002704436634',
'ext': 'mp4',
'title': 'CASIMA7.22',
'thumbnail': r're:http://.*\.jpg',
'uploader': 'CASIMA Official Store',
'timestamp': 1500717600,
'upload_date': '20170722',
},
}
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
data = self._parse_json(
self._search_regex(
r'(?s)runParams\s*=\s*({.+?})\s*;?\s*var',
webpage, 'runParams'),
video_id)
title = data['title']
formats = self._extract_m3u8_formats(
data['replyStreamUrl'], video_id, 'mp4',
entry_protocol='m3u8_native', m3u8_id='hls')
return {
'id': video_id,
'title': title,
'thumbnail': data.get('coverUrl'),
'uploader': try_get(
data, lambda x: x['followBar']['name'], compat_str),
'timestamp': float_or_none(data.get('startTimeLong'), scale=1000),
'formats': formats,
}

View File

@ -0,0 +1,85 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import (
clean_html,
int_or_none,
try_get,
unified_strdate,
)
class AmericasTestKitchenIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?americastestkitchen\.com/(?:episode|videos)/(?P<id>\d+)'
_TESTS = [{
'url': 'https://www.americastestkitchen.com/episode/548-summer-dinner-party',
'md5': 'b861c3e365ac38ad319cfd509c30577f',
'info_dict': {
'id': '1_5g5zua6e',
'title': 'Summer Dinner Party',
'ext': 'mp4',
'description': 'md5:858d986e73a4826979b6a5d9f8f6a1ec',
'thumbnail': r're:^https?://.*\.jpg',
'timestamp': 1497285541,
'upload_date': '20170612',
'uploader_id': 'roger.metcalf@americastestkitchen.com',
'release_date': '20170617',
'series': "America's Test Kitchen",
'season_number': 17,
'episode': 'Summer Dinner Party',
'episode_number': 24,
},
'params': {
'skip_download': True,
},
}, {
'url': 'https://www.americastestkitchen.com/videos/3420-pan-seared-salmon',
'only_matching': True,
}]
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
partner_id = self._search_regex(
r'src=["\'](?:https?:)?//(?:[^/]+\.)kaltura\.com/(?:[^/]+/)*(?:p|partner_id)/(\d+)',
webpage, 'kaltura partner id')
video_data = self._parse_json(
self._search_regex(
r'window\.__INITIAL_STATE__\s*=\s*({.+?})\s*;\s*</script>',
webpage, 'initial context'),
video_id)
ep_data = try_get(
video_data,
(lambda x: x['episodeDetail']['content']['data'],
lambda x: x['videoDetail']['content']['data']), dict)
ep_meta = ep_data.get('full_video', {})
external_id = ep_data.get('external_id') or ep_meta['external_id']
title = ep_data.get('title') or ep_meta.get('title')
description = clean_html(ep_meta.get('episode_description') or ep_data.get(
'description') or ep_meta.get('description'))
thumbnail = try_get(ep_meta, lambda x: x['photo']['image_url'])
release_date = unified_strdate(ep_data.get('aired_at'))
season_number = int_or_none(ep_meta.get('season_number'))
episode = ep_meta.get('title')
episode_number = int_or_none(ep_meta.get('episode_number'))
return {
'_type': 'url_transparent',
'url': 'kaltura:%s:%s' % (partner_id, external_id),
'ie_key': 'Kaltura',
'title': title,
'description': description,
'thumbnail': thumbnail,
'release_date': release_date,
'series': "America's Test Kitchen",
'season_number': season_number,
'episode': episode,
'episode_number': episode_number,
}

View File

@ -3,16 +3,13 @@ from __future__ import unicode_literals
import re import re
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import ( from ..compat import compat_str
compat_urlparse,
compat_str,
)
from ..utils import ( from ..utils import (
determine_ext, determine_ext,
extract_attributes, extract_attributes,
ExtractorError, ExtractorError,
sanitized_Request,
urlencode_postdata, urlencode_postdata,
urljoin,
) )
@ -21,6 +18,8 @@ class AnimeOnDemandIE(InfoExtractor):
_LOGIN_URL = 'https://www.anime-on-demand.de/users/sign_in' _LOGIN_URL = 'https://www.anime-on-demand.de/users/sign_in'
_APPLY_HTML5_URL = 'https://www.anime-on-demand.de/html5apply' _APPLY_HTML5_URL = 'https://www.anime-on-demand.de/html5apply'
_NETRC_MACHINE = 'animeondemand' _NETRC_MACHINE = 'animeondemand'
# German-speaking countries of Europe
_GEO_COUNTRIES = ['AT', 'CH', 'DE', 'LI', 'LU']
_TESTS = [{ _TESTS = [{
# jap, OmU # jap, OmU
'url': 'https://www.anime-on-demand.de/anime/161', 'url': 'https://www.anime-on-demand.de/anime/161',
@ -46,6 +45,10 @@ class AnimeOnDemandIE(InfoExtractor):
# Full length film, non-series, ger/jap, Dub/OmU, account required # Full length film, non-series, ger/jap, Dub/OmU, account required
'url': 'https://www.anime-on-demand.de/anime/185', 'url': 'https://www.anime-on-demand.de/anime/185',
'only_matching': True, 'only_matching': True,
}, {
# Flash videos
'url': 'https://www.anime-on-demand.de/anime/12',
'only_matching': True,
}] }]
def _login(self): def _login(self):
@ -72,14 +75,13 @@ class AnimeOnDemandIE(InfoExtractor):
'post url', default=self._LOGIN_URL, group='url') 'post url', default=self._LOGIN_URL, group='url')
if not post_url.startswith('http'): if not post_url.startswith('http'):
post_url = compat_urlparse.urljoin(self._LOGIN_URL, post_url) post_url = urljoin(self._LOGIN_URL, post_url)
request = sanitized_Request(
post_url, urlencode_postdata(login_form))
request.add_header('Referer', self._LOGIN_URL)
response = self._download_webpage( response = self._download_webpage(
request, None, 'Logging in as %s' % username) post_url, None, 'Logging in',
data=urlencode_postdata(login_form), headers={
'Referer': self._LOGIN_URL,
})
if all(p not in response for p in ('>Logout<', 'href="/users/sign_out"')): if all(p not in response for p in ('>Logout<', 'href="/users/sign_out"')):
error = self._search_regex( error = self._search_regex(
@ -120,10 +122,11 @@ class AnimeOnDemandIE(InfoExtractor):
formats = [] formats = []
for input_ in re.findall( for input_ in re.findall(
r'<input[^>]+class=["\'].*?streamstarter_html5[^>]+>', html): r'<input[^>]+class=["\'].*?streamstarter[^>]+>', html):
attributes = extract_attributes(input_) attributes = extract_attributes(input_)
title = attributes.get('data-dialog-header')
playlist_urls = [] playlist_urls = []
for playlist_key in ('data-playlist', 'data-otherplaylist'): for playlist_key in ('data-playlist', 'data-otherplaylist', 'data-stream'):
playlist_url = attributes.get(playlist_key) playlist_url = attributes.get(playlist_key)
if isinstance(playlist_url, compat_str) and re.match( if isinstance(playlist_url, compat_str) and re.match(
r'/?[\da-zA-Z]+', playlist_url): r'/?[\da-zA-Z]+', playlist_url):
@ -147,19 +150,38 @@ class AnimeOnDemandIE(InfoExtractor):
format_id_list.append(compat_str(num)) format_id_list.append(compat_str(num))
format_id = '-'.join(format_id_list) format_id = '-'.join(format_id_list)
format_note = ', '.join(filter(None, (kind, lang_note))) format_note = ', '.join(filter(None, (kind, lang_note)))
request = sanitized_Request( item_id_list = []
compat_urlparse.urljoin(url, playlist_url), if format_id:
item_id_list.append(format_id)
item_id_list.append('videomaterial')
playlist = self._download_json(
urljoin(url, playlist_url), video_id,
'Downloading %s JSON' % ' '.join(item_id_list),
headers={ headers={
'X-Requested-With': 'XMLHttpRequest', 'X-Requested-With': 'XMLHttpRequest',
'X-CSRF-Token': csrf_token, 'X-CSRF-Token': csrf_token,
'Referer': url, 'Referer': url,
'Accept': 'application/json, text/javascript, */*; q=0.01', 'Accept': 'application/json, text/javascript, */*; q=0.01',
}) }, fatal=False)
playlist = self._download_json(
request, video_id, 'Downloading %s playlist JSON' % format_id,
fatal=False)
if not playlist: if not playlist:
continue continue
stream_url = playlist.get('streamurl')
if stream_url:
rtmp = re.search(
r'^(?P<url>rtmpe?://(?P<host>[^/]+)/(?P<app>.+/))(?P<playpath>mp[34]:.+)',
stream_url)
if rtmp:
formats.append({
'url': rtmp.group('url'),
'app': rtmp.group('app'),
'play_path': rtmp.group('playpath'),
'page_url': url,
'player_url': 'https://www.anime-on-demand.de/assets/jwplayer.flash-55abfb34080700304d49125ce9ffb4a6.swf',
'rtmp_real_time': True,
'format_id': 'rtmp',
'ext': 'flv',
})
continue
start_video = playlist.get('startvideo', 0) start_video = playlist.get('startvideo', 0)
playlist = playlist.get('playlist') playlist = playlist.get('playlist')
if not playlist or not isinstance(playlist, list): if not playlist or not isinstance(playlist, list):
@ -222,7 +244,7 @@ class AnimeOnDemandIE(InfoExtractor):
f.update({ f.update({
'id': '%s-%s' % (f['id'], m.group('kind').lower()), 'id': '%s-%s' % (f['id'], m.group('kind').lower()),
'title': m.group('title'), 'title': m.group('title'),
'url': compat_urlparse.urljoin(url, m.group('href')), 'url': urljoin(url, m.group('href')),
}) })
entries.append(f) entries.append(f)

View File

@ -18,6 +18,7 @@ from ..utils import (
int_or_none, int_or_none,
strip_jsonp, strip_jsonp,
unescapeHTML, unescapeHTML,
unsmuggle_url,
) )
@ -197,12 +198,16 @@ class AnvatoIE(InfoExtractor):
'tbr': tbr if tbr != 0 else None, 'tbr': tbr if tbr != 0 else None,
} }
if ext == 'm3u8' or media_format in ('m3u8', 'm3u8-variant'): if media_format == 'm3u8' and tbr is not None:
if tbr is not None: a_format.update({
a_format.update({ 'format_id': '-'.join(filter(None, ['hls', compat_str(tbr)])),
'format_id': '-'.join(filter(None, ['hls', compat_str(tbr)])), 'ext': 'mp4',
'ext': 'mp4', })
}) elif media_format == 'm3u8-variant' or ext == 'm3u8':
formats.extend(self._extract_m3u8_formats(
video_url, video_id, 'mp4', entry_protocol='m3u8_native',
m3u8_id='hls', fatal=False))
continue
elif ext == 'mp3' or media_format == 'mp3': elif ext == 'mp3' or media_format == 'mp3':
a_format['vcodec'] = 'none' a_format['vcodec'] = 'none'
else: else:
@ -271,6 +276,9 @@ class AnvatoIE(InfoExtractor):
anvplayer_data['accessKey'], anvplayer_data['video']) anvplayer_data['accessKey'], anvplayer_data['video'])
def _real_extract(self, url): def _real_extract(self, url):
url, smuggled_data = unsmuggle_url(url, {})
self._initialize_geo_bypass(smuggled_data.get('geo_countries'))
mobj = re.match(self._VALID_URL, url) mobj = re.match(self._VALID_URL, url)
access_key, video_id = mobj.group('access_key_or_mcp', 'id') access_key, video_id = mobj.group('access_key_or_mcp', 'id')
if access_key not in self._ANVACK_TABLE: if access_key not in self._ANVACK_TABLE:

View File

@ -117,7 +117,7 @@ class AppleTrailersIE(InfoExtractor):
continue continue
formats.append({ formats.append({
'format_id': '%s-%s' % (version, size), 'format_id': '%s-%s' % (version, size),
'url': re.sub(r'_(\d+p.mov)', r'_h\1', src), 'url': re.sub(r'_(\d+p\.mov)', r'_h\1', src),
'width': int_or_none(size_data.get('width')), 'width': int_or_none(size_data.get('width')),
'height': int_or_none(size_data.get('height')), 'height': int_or_none(size_data.get('height')),
'language': version[:2], 'language': version[:2],
@ -179,7 +179,7 @@ class AppleTrailersIE(InfoExtractor):
formats = [] formats = []
for format in settings['metadata']['sizes']: for format in settings['metadata']['sizes']:
# The src is a file pointing to the real video file # The src is a file pointing to the real video file
format_url = re.sub(r'_(\d*p.mov)', r'_h\1', format['src']) format_url = re.sub(r'_(\d*p\.mov)', r'_h\1', format['src'])
formats.append({ formats.append({
'url': format_url, 'url': format_url,
'format': format['type'], 'format': format['type'],

View File

@ -5,6 +5,7 @@ import re
from .common import InfoExtractor from .common import InfoExtractor
from .generic import GenericIE from .generic import GenericIE
from ..compat import compat_str
from ..utils import ( from ..utils import (
determine_ext, determine_ext,
ExtractorError, ExtractorError,
@ -126,6 +127,8 @@ class ARDMediathekIE(InfoExtractor):
quality = stream.get('_quality') quality = stream.get('_quality')
server = stream.get('_server') server = stream.get('_server')
for stream_url in stream_urls: for stream_url in stream_urls:
if not isinstance(stream_url, compat_str) or '//' not in stream_url:
continue
ext = determine_ext(stream_url) ext = determine_ext(stream_url)
if quality != 'auto' and ext in ('f4m', 'm3u8'): if quality != 'auto' and ext in ('f4m', 'm3u8'):
continue continue
@ -146,13 +149,11 @@ class ARDMediathekIE(InfoExtractor):
'play_path': stream_url, 'play_path': stream_url,
'format_id': 'a%s-rtmp-%s' % (num, quality), 'format_id': 'a%s-rtmp-%s' % (num, quality),
} }
elif stream_url.startswith('http'): else:
f = { f = {
'url': stream_url, 'url': stream_url,
'format_id': 'a%s-%s-%s' % (num, ext, quality) 'format_id': 'a%s-%s-%s' % (num, ext, quality)
} }
else:
continue
m = re.search(r'_(?P<width>\d+)x(?P<height>\d+)\.mp4$', stream_url) m = re.search(r'_(?P<width>\d+)x(?P<height>\d+)\.mp4$', stream_url)
if m: if m:
f.update({ f.update({
@ -195,7 +196,7 @@ class ARDMediathekIE(InfoExtractor):
title = self._html_search_regex( title = self._html_search_regex(
[r'<h1(?:\s+class="boxTopHeadline")?>(.*?)</h1>', [r'<h1(?:\s+class="boxTopHeadline")?>(.*?)</h1>',
r'<meta name="dcterms.title" content="(.*?)"/>', r'<meta name="dcterms\.title" content="(.*?)"/>',
r'<h4 class="headline">(.*?)</h4>'], r'<h4 class="headline">(.*?)</h4>'],
webpage, 'title') webpage, 'title')
description = self._html_search_meta( description = self._html_search_meta(

View File

@ -6,6 +6,7 @@ import re
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import ( from ..compat import (
compat_parse_qs, compat_parse_qs,
compat_str,
compat_urllib_parse_urlparse, compat_urllib_parse_urlparse,
) )
from ..utils import ( from ..utils import (
@ -15,6 +16,7 @@ from ..utils import (
int_or_none, int_or_none,
NO_DEFAULT, NO_DEFAULT,
qualities, qualities,
try_get,
unified_strdate, unified_strdate,
) )
@ -80,12 +82,15 @@ class ArteTVBaseIE(InfoExtractor):
info = self._download_json(json_url, video_id) info = self._download_json(json_url, video_id)
player_info = info['videoJsonPlayer'] player_info = info['videoJsonPlayer']
vsr = player_info['VSR'] vsr = try_get(player_info, lambda x: x['VSR'], dict)
if not vsr:
if not vsr and not player_info.get('VRU'): error = None
raise ExtractorError( if try_get(player_info, lambda x: x['custom_msg']['type']) == 'error':
'Video %s is not available' % player_info.get('VID') or video_id, error = try_get(
expected=True) player_info, lambda x: x['custom_msg']['msg'], compat_str)
if not error:
error = 'Video %s is not available' % player_info.get('VID') or video_id
raise ExtractorError(error, expected=True)
upload_date_str = player_info.get('shootingDate') upload_date_str = player_info.get('shootingDate')
if not upload_date_str: if not upload_date_str:

View File

@ -87,7 +87,7 @@ class AtresPlayerIE(InfoExtractor):
self._LOGIN_URL, urlencode_postdata(login_form)) self._LOGIN_URL, urlencode_postdata(login_form))
request.add_header('Content-Type', 'application/x-www-form-urlencoded') request.add_header('Content-Type', 'application/x-www-form-urlencoded')
response = self._download_webpage( response = self._download_webpage(
request, None, 'Logging in as %s' % username) request, None, 'Logging in')
error = self._html_search_regex( error = self._html_search_regex(
r'(?s)<ul[^>]+class="[^"]*\blist_error\b[^"]*">(.+?)</ul>', r'(?s)<ul[^>]+class="[^"]*\blist_error\b[^"]*">(.+?)</ul>',

View File

@ -47,7 +47,7 @@ class AZMedienIE(AZMedienBaseIE):
'url': 'http://www.telezueri.ch/62-show-zuerinews/13772-episode-sonntag-18-dezember-2016/32419-segment-massenabweisungen-beim-hiltl-club-wegen-pelzboom', 'url': 'http://www.telezueri.ch/62-show-zuerinews/13772-episode-sonntag-18-dezember-2016/32419-segment-massenabweisungen-beim-hiltl-club-wegen-pelzboom',
'info_dict': { 'info_dict': {
'id': '1_2444peh4', 'id': '1_2444peh4',
'ext': 'mov', 'ext': 'mp4',
'title': 'Massenabweisungen beim Hiltl Club wegen Pelzboom', 'title': 'Massenabweisungen beim Hiltl Club wegen Pelzboom',
'description': 'md5:9ea9dd1b159ad65b36ddcf7f0d7c76a8', 'description': 'md5:9ea9dd1b159ad65b36ddcf7f0d7c76a8',
'uploader_id': 'TeleZ?ri', 'uploader_id': 'TeleZ?ri',

View File

@ -59,7 +59,7 @@ class BambuserIE(InfoExtractor):
self._LOGIN_URL, urlencode_postdata(login_form)) self._LOGIN_URL, urlencode_postdata(login_form))
request.add_header('Referer', self._LOGIN_URL) request.add_header('Referer', self._LOGIN_URL)
response = self._download_webpage( response = self._download_webpage(
request, None, 'Logging in as %s' % username) request, None, 'Logging in')
login_error = self._html_search_regex( login_error = self._html_search_regex(
r'(?s)<div class="messages error">(.+?)</div>', r'(?s)<div class="messages error">(.+?)</div>',

View File

@ -386,7 +386,7 @@ class BBCCoUkIE(InfoExtractor):
m3u8_id=format_id, fatal=False)) m3u8_id=format_id, fatal=False))
if re.search(self._USP_RE, href): if re.search(self._USP_RE, href):
usp_formats = self._extract_m3u8_formats( usp_formats = self._extract_m3u8_formats(
re.sub(self._USP_RE, r'/\1.ism/\1.m3u8', href), re.sub(self._USP_RE, r'/\1\.ism/\1\.m3u8', href),
programme_id, ext='mp4', entry_protocol='m3u8_native', programme_id, ext='mp4', entry_protocol='m3u8_native',
m3u8_id=format_id, fatal=False) m3u8_id=format_id, fatal=False)
for f in usp_formats: for f in usp_formats:

View File

@ -9,6 +9,7 @@ from ..compat import (
from ..utils import ( from ..utils import (
int_or_none, int_or_none,
parse_iso8601, parse_iso8601,
urljoin,
) )
@ -36,9 +37,11 @@ class BeegIE(InfoExtractor):
webpage = self._download_webpage(url, video_id) webpage = self._download_webpage(url, video_id)
cpl_url = self._search_regex( cpl_url = self._search_regex(
r'<script[^>]+src=(["\'])(?P<url>(?:https?:)?//static\.beeg\.com/cpl/\d+\.js.*?)\1', r'<script[^>]+src=(["\'])(?P<url>(?:/static|(?:https?:)?//static\.beeg\.com)/cpl/\d+\.js.*?)\1',
webpage, 'cpl', default=None, group='url') webpage, 'cpl', default=None, group='url')
cpl_url = urljoin(url, cpl_url)
beeg_version, beeg_salt = [None] * 2 beeg_version, beeg_salt = [None] * 2
if cpl_url: if cpl_url:
@ -54,12 +57,16 @@ class BeegIE(InfoExtractor):
r'beeg_salt\s*=\s*(["\'])(?P<beeg_salt>.+?)\1', cpl, 'beeg salt', r'beeg_salt\s*=\s*(["\'])(?P<beeg_salt>.+?)\1', cpl, 'beeg salt',
default=None, group='beeg_salt') default=None, group='beeg_salt')
beeg_version = beeg_version or '2000' beeg_version = beeg_version or '2185'
beeg_salt = beeg_salt or 'pmweAkq8lAYKdfWcFCUj0yoVgoPlinamH5UE1CB3H' beeg_salt = beeg_salt or 'pmweAkq8lAYKdfWcFCUj0yoVgoPlinamH5UE1CB3H'
video = self._download_json( for api_path in ('', 'api.'):
'https://api.beeg.com/api/v6/%s/video/%s' % (beeg_version, video_id), video = self._download_json(
video_id) 'https://%sbeeg.com/api/v6/%s/video/%s'
% (api_path, beeg_version, video_id), video_id,
fatal=api_path == 'api.')
if video:
break
def split(o, e): def split(o, e):
def cut(s, x): def cut(s, x):

View File

@ -33,13 +33,18 @@ class BpbIE(InfoExtractor):
title = self._html_search_regex( title = self._html_search_regex(
r'<h2 class="white">(.*?)</h2>', webpage, 'title') r'<h2 class="white">(.*?)</h2>', webpage, 'title')
video_info_dicts = re.findall( video_info_dicts = re.findall(
r"({\s*src:\s*'http://film\.bpb\.de/[^}]+})", webpage) r"({\s*src\s*:\s*'https?://film\.bpb\.de/[^}]+})", webpage)
formats = [] formats = []
for video_info in video_info_dicts: for video_info in video_info_dicts:
video_info = self._parse_json(video_info, video_id, transform_source=js_to_json) video_info = self._parse_json(
quality = video_info['quality'] video_info, video_id, transform_source=js_to_json, fatal=False)
video_url = video_info['src'] if not video_info:
continue
video_url = video_info.get('src')
if not video_url:
continue
quality = 'high' if '_high' in video_url else 'low'
formats.append({ formats.append({
'url': video_url, 'url': video_url,
'preference': 10 if quality == 'high' else 0, 'preference': 10 if quality == 'high' else 0,

View File

@ -1,20 +1,23 @@
# coding: utf-8 # coding: utf-8
from __future__ import unicode_literals from __future__ import unicode_literals
import json
import re import re
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import ( from ..utils import (
determine_ext,
ExtractorError, ExtractorError,
int_or_none, int_or_none,
parse_duration, parse_duration,
parse_iso8601,
xpath_element, xpath_element,
xpath_text, xpath_text,
) )
class BRIE(InfoExtractor): class BRIE(InfoExtractor):
IE_DESC = 'Bayerischer Rundfunk Mediathek' IE_DESC = 'Bayerischer Rundfunk'
_VALID_URL = r'(?P<base_url>https?://(?:www\.)?br(?:-klassik)?\.de)/(?:[a-z0-9\-_]+/)+(?P<id>[a-z0-9\-_]+)\.html' _VALID_URL = r'(?P<base_url>https?://(?:www\.)?br(?:-klassik)?\.de)/(?:[a-z0-9\-_]+/)+(?P<id>[a-z0-9\-_]+)\.html'
_TESTS = [ _TESTS = [
@ -123,10 +126,10 @@ class BRIE(InfoExtractor):
for asset in assets.findall('asset'): for asset in assets.findall('asset'):
format_url = xpath_text(asset, ['downloadUrl', 'url']) format_url = xpath_text(asset, ['downloadUrl', 'url'])
asset_type = asset.get('type') asset_type = asset.get('type')
if asset_type == 'HDS': if asset_type.startswith('HDS'):
formats.extend(self._extract_f4m_formats( formats.extend(self._extract_f4m_formats(
format_url + '?hdcore=3.2.0', media_id, f4m_id='hds', fatal=False)) format_url + '?hdcore=3.2.0', media_id, f4m_id='hds', fatal=False))
elif asset_type == 'HLS': elif asset_type.startswith('HLS'):
formats.extend(self._extract_m3u8_formats( formats.extend(self._extract_m3u8_formats(
format_url, media_id, 'mp4', 'm3u8_native', m3u8_id='hds', fatal=False)) format_url, media_id, 'mp4', 'm3u8_native', m3u8_id='hds', fatal=False))
else: else:
@ -169,3 +172,140 @@ class BRIE(InfoExtractor):
} for variant in variants.findall('variant') if xpath_text(variant, 'url')] } for variant in variants.findall('variant') if xpath_text(variant, 'url')]
thumbnails.sort(key=lambda x: x['width'] * x['height'], reverse=True) thumbnails.sort(key=lambda x: x['width'] * x['height'], reverse=True)
return thumbnails return thumbnails
class BRMediathekIE(InfoExtractor):
IE_DESC = 'Bayerischer Rundfunk Mediathek'
_VALID_URL = r'https?://(?:www\.)?br\.de/mediathek/video/[^/?&#]*?-(?P<id>av:[0-9a-f]{24})'
_TESTS = [{
'url': 'https://www.br.de/mediathek/video/gesundheit-die-sendung-vom-28112017-av:5a1e6a6e8fce6d001871cc8e',
'md5': 'fdc3d485835966d1622587d08ba632ec',
'info_dict': {
'id': 'av:5a1e6a6e8fce6d001871cc8e',
'ext': 'mp4',
'title': 'Die Sendung vom 28.11.2017',
'description': 'md5:6000cdca5912ab2277e5b7339f201ccc',
'timestamp': 1511942766,
'upload_date': '20171129',
}
}]
def _real_extract(self, url):
clip_id = self._match_id(url)
clip = self._download_json(
'https://proxy-base.master.mango.express/graphql',
clip_id, data=json.dumps({
"query": """{
viewer {
clip(id: "%s") {
title
description
duration
createdAt
ageRestriction
videoFiles {
edges {
node {
publicLocation
fileSize
videoProfile {
width
height
bitrate
encoding
}
}
}
}
captionFiles {
edges {
node {
publicLocation
}
}
}
teaserImages {
edges {
node {
imageFiles {
edges {
node {
publicLocation
width
height
}
}
}
}
}
}
}
}
}""" % clip_id}).encode(), headers={
'Content-Type': 'application/json',
})['data']['viewer']['clip']
title = clip['title']
formats = []
for edge in clip.get('videoFiles', {}).get('edges', []):
node = edge.get('node', {})
n_url = node.get('publicLocation')
if not n_url:
continue
ext = determine_ext(n_url)
if ext == 'm3u8':
formats.extend(self._extract_m3u8_formats(
n_url, clip_id, 'mp4', 'm3u8_native',
m3u8_id='hls', fatal=False))
else:
video_profile = node.get('videoProfile', {})
tbr = int_or_none(video_profile.get('bitrate'))
format_id = 'http'
if tbr:
format_id += '-%d' % tbr
formats.append({
'format_id': format_id,
'url': n_url,
'width': int_or_none(video_profile.get('width')),
'height': int_or_none(video_profile.get('height')),
'tbr': tbr,
'filesize': int_or_none(node.get('fileSize')),
})
self._sort_formats(formats)
subtitles = {}
for edge in clip.get('captionFiles', {}).get('edges', []):
node = edge.get('node', {})
n_url = node.get('publicLocation')
if not n_url:
continue
subtitles.setdefault('de', []).append({
'url': n_url,
})
thumbnails = []
for edge in clip.get('teaserImages', {}).get('edges', []):
for image_edge in edge.get('node', {}).get('imageFiles', {}).get('edges', []):
node = image_edge.get('node', {})
n_url = node.get('publicLocation')
if not n_url:
continue
thumbnails.append({
'url': n_url,
'width': int_or_none(node.get('width')),
'height': int_or_none(node.get('height')),
})
return {
'id': clip_id,
'title': title,
'description': clip.get('description'),
'duration': int_or_none(clip.get('duration')),
'timestamp': parse_iso8601(clip.get('createdAt')),
'age_limit': int_or_none(clip.get('ageRestriction')),
'formats': formats,
'subtitles': subtitles,
'thumbnails': thumbnails,
}

View File

@ -1,26 +1,112 @@
from __future__ import unicode_literals from __future__ import unicode_literals
import re import re
import json
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import float_or_none from .gigya import GigyaBaseIE
from ..compat import compat_HTTPError
from ..utils import (
ExtractorError,
strip_or_none,
float_or_none,
int_or_none,
parse_iso8601,
)
class CanvasIE(InfoExtractor): class CanvasIE(InfoExtractor):
_VALID_URL = r'https?://mediazone\.vrt\.be/api/v1/(?P<site_id>canvas|een|ketnet|vrtvideo)/assets/(?P<id>[^/?#&]+)'
_TESTS = [{
'url': 'https://mediazone.vrt.be/api/v1/ketnet/assets/md-ast-4ac54990-ce66-4d00-a8ca-9eac86f4c475',
'md5': '90139b746a0a9bd7bb631283f6e2a64e',
'info_dict': {
'id': 'md-ast-4ac54990-ce66-4d00-a8ca-9eac86f4c475',
'display_id': 'md-ast-4ac54990-ce66-4d00-a8ca-9eac86f4c475',
'ext': 'flv',
'title': 'Nachtwacht: De Greystook',
'description': 'md5:1db3f5dc4c7109c821261e7512975be7',
'thumbnail': r're:^https?://.*\.jpg$',
'duration': 1468.03,
},
'expected_warnings': ['is not a supported codec', 'Unknown MIME type'],
}, {
'url': 'https://mediazone.vrt.be/api/v1/canvas/assets/mz-ast-5e5f90b6-2d72-4c40-82c2-e134f884e93e',
'only_matching': True,
}]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
site_id, video_id = mobj.group('site_id'), mobj.group('id')
data = self._download_json(
'https://mediazone.vrt.be/api/v1/%s/assets/%s'
% (site_id, video_id), video_id)
title = data['title']
description = data.get('description')
formats = []
for target in data['targetUrls']:
format_url, format_type = target.get('url'), target.get('type')
if not format_url or not format_type:
continue
if format_type == 'HLS':
formats.extend(self._extract_m3u8_formats(
format_url, video_id, 'mp4', entry_protocol='m3u8_native',
m3u8_id=format_type, fatal=False))
elif format_type == 'HDS':
formats.extend(self._extract_f4m_formats(
format_url, video_id, f4m_id=format_type, fatal=False))
elif format_type == 'MPEG_DASH':
formats.extend(self._extract_mpd_formats(
format_url, video_id, mpd_id=format_type, fatal=False))
elif format_type == 'HSS':
formats.extend(self._extract_ism_formats(
format_url, video_id, ism_id='mss', fatal=False))
else:
formats.append({
'format_id': format_type,
'url': format_url,
})
self._sort_formats(formats)
subtitles = {}
subtitle_urls = data.get('subtitleUrls')
if isinstance(subtitle_urls, list):
for subtitle in subtitle_urls:
subtitle_url = subtitle.get('url')
if subtitle_url and subtitle.get('type') == 'CLOSED':
subtitles.setdefault('nl', []).append({'url': subtitle_url})
return {
'id': video_id,
'display_id': video_id,
'title': title,
'description': description,
'formats': formats,
'duration': float_or_none(data.get('duration'), 1000),
'thumbnail': data.get('posterImageUrl'),
'subtitles': subtitles,
}
class CanvasEenIE(InfoExtractor):
IE_DESC = 'canvas.be and een.be' IE_DESC = 'canvas.be and een.be'
_VALID_URL = r'https?://(?:www\.)?(?P<site_id>canvas|een)\.be/(?:[^/]+/)*(?P<id>[^/?#&]+)' _VALID_URL = r'https?://(?:www\.)?(?P<site_id>canvas|een)\.be/(?:[^/]+/)*(?P<id>[^/?#&]+)'
_TESTS = [{ _TESTS = [{
'url': 'http://www.canvas.be/video/de-afspraak/najaar-2015/de-afspraak-veilt-voor-de-warmste-week', 'url': 'http://www.canvas.be/video/de-afspraak/najaar-2015/de-afspraak-veilt-voor-de-warmste-week',
'md5': 'ea838375a547ac787d4064d8c7860a6c', 'md5': 'ed66976748d12350b118455979cca293',
'info_dict': { 'info_dict': {
'id': 'mz-ast-5e5f90b6-2d72-4c40-82c2-e134f884e93e', 'id': 'mz-ast-5e5f90b6-2d72-4c40-82c2-e134f884e93e',
'display_id': 'de-afspraak-veilt-voor-de-warmste-week', 'display_id': 'de-afspraak-veilt-voor-de-warmste-week',
'ext': 'mp4', 'ext': 'flv',
'title': 'De afspraak veilt voor de Warmste Week', 'title': 'De afspraak veilt voor de Warmste Week',
'description': 'md5:24cb860c320dc2be7358e0e5aa317ba6', 'description': 'md5:24cb860c320dc2be7358e0e5aa317ba6',
'thumbnail': r're:^https?://.*\.jpg$', 'thumbnail': r're:^https?://.*\.jpg$',
'duration': 49.02, 'duration': 49.02,
} },
'expected_warnings': ['is not a supported codec'],
}, { }, {
# with subtitles # with subtitles
'url': 'http://www.canvas.be/video/panorama/2016/pieter-0167', 'url': 'http://www.canvas.be/video/panorama/2016/pieter-0167',
@ -40,7 +126,8 @@ class CanvasIE(InfoExtractor):
}, },
'params': { 'params': {
'skip_download': True, 'skip_download': True,
} },
'skip': 'Pagina niet gevonden',
}, { }, {
'url': 'https://www.een.be/sorry-voor-alles/herbekijk-sorry-voor-alles', 'url': 'https://www.een.be/sorry-voor-alles/herbekijk-sorry-voor-alles',
'info_dict': { 'info_dict': {
@ -54,7 +141,8 @@ class CanvasIE(InfoExtractor):
}, },
'params': { 'params': {
'skip_download': True, 'skip_download': True,
} },
'skip': 'Episode no longer available',
}, { }, {
'url': 'https://www.canvas.be/check-point/najaar-2016/de-politie-uw-vriend', 'url': 'https://www.canvas.be/check-point/najaar-2016/de-politie-uw-vriend',
'only_matching': True, 'only_matching': True,
@ -66,55 +154,157 @@ class CanvasIE(InfoExtractor):
webpage = self._download_webpage(url, display_id) webpage = self._download_webpage(url, display_id)
title = (self._search_regex( title = strip_or_none(self._search_regex(
r'<h1[^>]+class="video__body__header__title"[^>]*>(.+?)</h1>', r'<h1[^>]+class="video__body__header__title"[^>]*>(.+?)</h1>',
webpage, 'title', default=None) or self._og_search_title( webpage, 'title', default=None) or self._og_search_title(
webpage)).strip() webpage, default=None))
video_id = self._html_search_regex( video_id = self._html_search_regex(
r'data-video=(["\'])(?P<id>(?:(?!\1).)+)\1', webpage, 'video id', group='id') r'data-video=(["\'])(?P<id>(?:(?!\1).)+)\1', webpage, 'video id',
group='id')
data = self._download_json(
'https://mediazone.vrt.be/api/v1/%s/assets/%s'
% (site_id, video_id), display_id)
formats = []
for target in data['targetUrls']:
format_url, format_type = target.get('url'), target.get('type')
if not format_url or not format_type:
continue
if format_type == 'HLS':
formats.extend(self._extract_m3u8_formats(
format_url, display_id, entry_protocol='m3u8_native',
ext='mp4', preference=0, fatal=False, m3u8_id=format_type))
elif format_type == 'HDS':
formats.extend(self._extract_f4m_formats(
format_url, display_id, f4m_id=format_type, fatal=False))
elif format_type == 'MPEG_DASH':
formats.extend(self._extract_mpd_formats(
format_url, display_id, mpd_id=format_type, fatal=False))
else:
formats.append({
'format_id': format_type,
'url': format_url,
})
self._sort_formats(formats)
subtitles = {}
subtitle_urls = data.get('subtitleUrls')
if isinstance(subtitle_urls, list):
for subtitle in subtitle_urls:
subtitle_url = subtitle.get('url')
if subtitle_url and subtitle.get('type') == 'CLOSED':
subtitles.setdefault('nl', []).append({'url': subtitle_url})
return { return {
'_type': 'url_transparent',
'url': 'https://mediazone.vrt.be/api/v1/%s/assets/%s' % (site_id, video_id),
'ie_key': CanvasIE.ie_key(),
'id': video_id, 'id': video_id,
'display_id': display_id, 'display_id': display_id,
'title': title, 'title': title,
'description': self._og_search_description(webpage), 'description': self._og_search_description(webpage),
'formats': formats, }
'duration': float_or_none(data.get('duration'), 1000),
'thumbnail': data.get('posterImageUrl'),
'subtitles': subtitles, class VrtNUIE(GigyaBaseIE):
IE_DESC = 'VrtNU.be'
_VALID_URL = r'https?://(?:www\.)?vrt\.be/(?P<site_id>vrtnu)/(?:[^/]+/)*(?P<id>[^/?#&]+)'
_TESTS = [{
'url': 'https://www.vrt.be/vrtnu/a-z/postbus-x/1/postbus-x-s1a1/',
'info_dict': {
'id': 'pbs-pub-2e2d8c27-df26-45c9-9dc6-90c78153044d$vid-90c932b1-e21d-4fb8-99b1-db7b49cf74de',
'ext': 'flv',
'title': 'De zwarte weduwe',
'description': 'md5:d90c21dced7db869a85db89a623998d4',
'duration': 1457.04,
'thumbnail': r're:^https?://.*\.jpg$',
'season': '1',
'season_number': 1,
'episode_number': 1,
},
'skip': 'This video is only available for registered users'
}]
_NETRC_MACHINE = 'vrtnu'
_APIKEY = '3_0Z2HujMtiWq_pkAjgnS2Md2E11a1AwZjYiBETtwNE-EoEHDINgtnvcAOpNgmrVGy'
_CONTEXT_ID = 'R3595707040'
def _real_initialize(self):
self._login()
def _login(self):
username, password = self._get_login_info()
if username is None:
return
auth_data = {
'APIKey': self._APIKEY,
'targetEnv': 'jssdk',
'loginID': username,
'password': password,
'authMode': 'cookie',
}
auth_info = self._gigya_login(auth_data)
# Sometimes authentication fails for no good reason, retry
login_attempt = 1
while login_attempt <= 3:
try:
# When requesting a token, no actual token is returned, but the
# necessary cookies are set.
self._request_webpage(
'https://token.vrt.be',
None, note='Requesting a token', errnote='Could not get a token',
headers={
'Content-Type': 'application/json',
'Referer': 'https://www.vrt.be/vrtnu/',
},
data=json.dumps({
'uid': auth_info['UID'],
'uidsig': auth_info['UIDSignature'],
'ts': auth_info['signatureTimestamp'],
'email': auth_info['profile']['email'],
}).encode('utf-8'))
except ExtractorError as e:
if isinstance(e.cause, compat_HTTPError) and e.cause.code == 401:
login_attempt += 1
self.report_warning('Authentication failed')
self._sleep(1, None, msg_template='Waiting for %(timeout)s seconds before trying again')
else:
raise e
else:
break
def _real_extract(self, url):
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
title = self._html_search_regex(
r'(?ms)<h1 class="content__heading">(.+?)</h1>',
webpage, 'title').strip()
description = self._html_search_regex(
r'(?ms)<div class="content__description">(.+?)</div>',
webpage, 'description', default=None)
season = self._html_search_regex(
[r'''(?xms)<div\ class="tabs__tab\ tabs__tab--active">\s*
<span>seizoen\ (.+?)</span>\s*
</div>''',
r'<option value="seizoen (\d{1,3})" data-href="[^"]+?" selected>'],
webpage, 'season', default=None)
season_number = int_or_none(season)
episode_number = int_or_none(self._html_search_regex(
r'''(?xms)<div\ class="content__episode">\s*
<abbr\ title="aflevering">afl</abbr>\s*<span>(\d+)</span>
</div>''',
webpage, 'episode_number', default=None))
release_date = parse_iso8601(self._html_search_regex(
r'(?ms)<div class="content__broadcastdate">\s*<time\ datetime="(.+?)"',
webpage, 'release_date', default=None))
# If there's a ? or a # in the URL, remove them and everything after
clean_url = url.split('?')[0].split('#')[0].strip('/')
securevideo_url = clean_url + '.mssecurevideo.json'
try:
video = self._download_json(securevideo_url, display_id)
except ExtractorError as e:
if isinstance(e.cause, compat_HTTPError) and e.cause.code == 401:
self.raise_login_required()
raise
# We are dealing with a '../<show>.relevant' URL
redirect_url = video.get('url')
if redirect_url:
return self.url_result(self._proto_relative_url(redirect_url, 'https:'))
# There is only one entry, but with an unknown key, so just get
# the first one
video_id = list(video.values())[0].get('videoid')
return {
'_type': 'url_transparent',
'url': 'https://mediazone.vrt.be/api/v1/vrtvideo/assets/%s' % video_id,
'ie_key': CanvasIE.ie_key(),
'id': video_id,
'display_id': display_id,
'title': title,
'description': description,
'season': season,
'season_number': season_number,
'episode_number': episode_number,
'release_date': release_date,
} }

View File

@ -31,7 +31,7 @@ class CartoonNetworkIE(TurnerBaseIE):
'http://www.cartoonnetwork.com/video-seo-svc/episodeservices/getCvpPlaylist?networkName=CN2&' + query, video_id, { 'http://www.cartoonnetwork.com/video-seo-svc/episodeservices/getCvpPlaylist?networkName=CN2&' + query, video_id, {
'secure': { 'secure': {
'media_src': 'http://androidhls-secure.cdn.turner.com/toon/big', 'media_src': 'http://androidhls-secure.cdn.turner.com/toon/big',
'tokenizer_src': 'http://www.cartoonnetwork.com/cntv/mvpd/processors/services/token_ipadAdobe.do', 'tokenizer_src': 'https://token.vgtf.net/token/token_mobile',
}, },
}, { }, {
'url': url, 'url': url,

View File

@ -93,7 +93,7 @@ class CCMAIE(InfoExtractor):
'description': clean_html(informacio.get('descripcio')), 'description': clean_html(informacio.get('descripcio')),
'duration': duration, 'duration': duration,
'timestamp': timestamp, 'timestamp': timestamp,
'thumnails': thumbnails, 'thumbnails': thumbnails,
'subtitles': subtitles, 'subtitles': subtitles,
'formats': formats, 'formats': formats,
} }

View File

@ -81,6 +81,12 @@ class Channel9IE(InfoExtractor):
_RSS_URL = 'http://channel9.msdn.com/%s/RSS' _RSS_URL = 'http://channel9.msdn.com/%s/RSS'
@staticmethod
def _extract_urls(webpage):
return re.findall(
r'<iframe[^>]+src=["\'](https?://channel9\.msdn\.com/(?:[^/]+/)+)player\b',
webpage)
def _extract_list(self, video_id, rss_url=None): def _extract_list(self, video_id, rss_url=None):
if not rss_url: if not rss_url:
rss_url = self._RSS_URL % video_id rss_url = self._RSS_URL % video_id

View File

@ -5,6 +5,7 @@ import base64
import json import json
from .common import InfoExtractor from .common import InfoExtractor
from .youtube import YoutubeIE
from ..utils import ( from ..utils import (
clean_html, clean_html,
ExtractorError ExtractorError
@ -70,11 +71,9 @@ class ChilloutzoneIE(InfoExtractor):
# If nativePlatform is None a fallback mechanism is used (i.e. youtube embed) # If nativePlatform is None a fallback mechanism is used (i.e. youtube embed)
if native_platform is None: if native_platform is None:
youtube_url = self._html_search_regex( youtube_url = YoutubeIE._extract_url(webpage)
r'<iframe.* src="((?:https?:)?//(?:[^.]+\.)?youtube\.com/.+?)"', if youtube_url:
webpage, 'fallback video URL', default=None) return self.url_result(youtube_url, ie=YoutubeIE.ie_key())
if youtube_url is not None:
return self.url_result(youtube_url, ie='Youtube')
# Non Fallback: Decide to use native source (e.g. youtube or vimeo) or # Non Fallback: Decide to use native source (e.g. youtube or vimeo) or
# the own CDN # the own CDN

View File

@ -120,13 +120,16 @@ class ComedyCentralTVIE(MTVServicesInfoExtractor):
class ComedyCentralShortnameIE(InfoExtractor): class ComedyCentralShortnameIE(InfoExtractor):
_VALID_URL = r'^:(?P<id>tds|thedailyshow)$' _VALID_URL = r'^:(?P<id>tds|thedailyshow|theopposition)$'
_TESTS = [{ _TESTS = [{
'url': ':tds', 'url': ':tds',
'only_matching': True, 'only_matching': True,
}, { }, {
'url': ':thedailyshow', 'url': ':thedailyshow',
'only_matching': True, 'only_matching': True,
}, {
'url': ':theopposition',
'only_matching': True,
}] }]
def _real_extract(self, url): def _real_extract(self, url):
@ -134,5 +137,6 @@ class ComedyCentralShortnameIE(InfoExtractor):
shortcut_map = { shortcut_map = {
'tds': 'http://www.cc.com/shows/the-daily-show-with-trevor-noah/full-episodes', 'tds': 'http://www.cc.com/shows/the-daily-show-with-trevor-noah/full-episodes',
'thedailyshow': 'http://www.cc.com/shows/the-daily-show-with-trevor-noah/full-episodes', 'thedailyshow': 'http://www.cc.com/shows/the-daily-show-with-trevor-noah/full-episodes',
'theopposition': 'http://www.cc.com/shows/the-opposition-with-jordan-klepper/full-episodes',
} }
return self.url_result(shortcut_map[video_id]) return self.url_result(shortcut_map[video_id])

View File

@ -29,7 +29,10 @@ from ..compat import (
compat_urlparse, compat_urlparse,
compat_xml_parse_error, compat_xml_parse_error,
) )
from ..downloader.f4m import remove_encrypted_media from ..downloader.f4m import (
get_base_url,
remove_encrypted_media,
)
from ..utils import ( from ..utils import (
NO_DEFAULT, NO_DEFAULT,
age_restricted, age_restricted,
@ -589,19 +592,11 @@ class InfoExtractor(object):
if not encoding: if not encoding:
encoding = self._guess_encoding_from_content(content_type, webpage_bytes) encoding = self._guess_encoding_from_content(content_type, webpage_bytes)
if self._downloader.params.get('dump_intermediate_pages', False): if self._downloader.params.get('dump_intermediate_pages', False):
try: self.to_screen('Dumping request to ' + urlh.geturl())
url = url_or_request.get_full_url()
except AttributeError:
url = url_or_request
self.to_screen('Dumping request to ' + url)
dump = base64.b64encode(webpage_bytes).decode('ascii') dump = base64.b64encode(webpage_bytes).decode('ascii')
self._downloader.to_screen(dump) self._downloader.to_screen(dump)
if self._downloader.params.get('write_pages', False): if self._downloader.params.get('write_pages', False):
try: basen = '%s_%s' % (video_id, urlh.geturl())
url = url_or_request.get_full_url()
except AttributeError:
url = url_or_request
basen = '%s_%s' % (video_id, url)
if len(basen) > 240: if len(basen) > 240:
h = '___' + hashlib.md5(basen.encode('utf-8')).hexdigest() h = '___' + hashlib.md5(basen.encode('utf-8')).hexdigest()
basen = basen[:240 - len(h)] + h basen = basen[:240 - len(h)] + h
@ -1239,11 +1234,8 @@ class InfoExtractor(object):
media_nodes = remove_encrypted_media(media_nodes) media_nodes = remove_encrypted_media(media_nodes)
if not media_nodes: if not media_nodes:
return formats return formats
base_url = xpath_text(
manifest, ['{http://ns.adobe.com/f4m/1.0}baseURL', '{http://ns.adobe.com/f4m/2.0}baseURL'], manifest_base_url = get_base_url(manifest)
'base URL', default=None)
if base_url:
base_url = base_url.strip()
bootstrap_info = xpath_element( bootstrap_info = xpath_element(
manifest, ['{http://ns.adobe.com/f4m/1.0}bootstrapInfo', '{http://ns.adobe.com/f4m/2.0}bootstrapInfo'], manifest, ['{http://ns.adobe.com/f4m/1.0}bootstrapInfo', '{http://ns.adobe.com/f4m/2.0}bootstrapInfo'],
@ -1275,7 +1267,7 @@ class InfoExtractor(object):
continue continue
manifest_url = ( manifest_url = (
media_url if media_url.startswith('http://') or media_url.startswith('https://') media_url if media_url.startswith('http://') or media_url.startswith('https://')
else ((base_url or '/'.join(manifest_url.split('/')[:-1])) + '/' + media_url)) else ((manifest_base_url or '/'.join(manifest_url.split('/')[:-1])) + '/' + media_url))
# If media_url is itself a f4m manifest do the recursive extraction # If media_url is itself a f4m manifest do the recursive extraction
# since bitrates in parent manifest (this one) and media_url manifest # since bitrates in parent manifest (this one) and media_url manifest
# may differ leading to inability to resolve the format by requested # may differ leading to inability to resolve the format by requested
@ -1310,6 +1302,7 @@ class InfoExtractor(object):
'url': manifest_url, 'url': manifest_url,
'manifest_url': manifest_url, 'manifest_url': manifest_url,
'ext': 'flv' if bootstrap_info is not None else None, 'ext': 'flv' if bootstrap_info is not None else None,
'protocol': 'f4m',
'tbr': tbr, 'tbr': tbr,
'width': width, 'width': width,
'height': height, 'height': height,
@ -1355,6 +1348,9 @@ class InfoExtractor(object):
if '#EXT-X-FAXS-CM:' in m3u8_doc: # Adobe Flash Access if '#EXT-X-FAXS-CM:' in m3u8_doc: # Adobe Flash Access
return [] return []
if re.search(r'#EXT-X-SESSION-KEY:.*?URI="skd://', m3u8_doc): # Apple FairPlay
return []
formats = [] formats = []
format_url = lambda u: ( format_url = lambda u: (
@ -1401,7 +1397,7 @@ class InfoExtractor(object):
media_url = media.get('URI') media_url = media.get('URI')
if media_url: if media_url:
format_id = [] format_id = []
for v in (group_id, name): for v in (m3u8_id, group_id, name):
if v: if v:
format_id.append(v) format_id.append(v)
f = { f = {
@ -1920,7 +1916,7 @@ class InfoExtractor(object):
# can't be used at the same time # can't be used at the same time
if '%(Number' in media_template and 's' not in representation_ms_info: if '%(Number' in media_template and 's' not in representation_ms_info:
segment_duration = None segment_duration = None
if 'total_number' not in representation_ms_info and 'segment_duration': if 'total_number' not in representation_ms_info and 'segment_duration' in representation_ms_info:
segment_duration = float_or_none(representation_ms_info['segment_duration'], representation_ms_info['timescale']) segment_duration = float_or_none(representation_ms_info['segment_duration'], representation_ms_info['timescale'])
representation_ms_info['total_number'] = int(math.ceil(float(period_duration) / segment_duration)) representation_ms_info['total_number'] = int(math.ceil(float(period_duration) / segment_duration))
representation_ms_info['fragments'] = [{ representation_ms_info['fragments'] = [{
@ -1979,6 +1975,22 @@ class InfoExtractor(object):
}) })
segment_index += 1 segment_index += 1
representation_ms_info['fragments'] = fragments representation_ms_info['fragments'] = fragments
elif 'segment_urls' in representation_ms_info:
# Segment URLs with no SegmentTimeline
# Example: https://www.seznam.cz/zpravy/clanek/cesko-zasahne-vitr-o-sile-vichrice-muze-byt-i-zivotu-nebezpecny-39091
# https://github.com/rg3/youtube-dl/pull/14844
fragments = []
segment_duration = float_or_none(
representation_ms_info['segment_duration'],
representation_ms_info['timescale']) if 'segment_duration' in representation_ms_info else None
for segment_url in representation_ms_info['segment_urls']:
fragment = {
location_key(segment_url): segment_url,
}
if segment_duration:
fragment['duration'] = segment_duration
fragments.append(fragment)
representation_ms_info['fragments'] = fragments
# NB: MPD manifest may contain direct URLs to unfragmented media. # NB: MPD manifest may contain direct URLs to unfragmented media.
# No fragments key is present in this case. # No fragments key is present in this case.
if 'fragments' in representation_ms_info: if 'fragments' in representation_ms_info:
@ -2233,27 +2245,35 @@ class InfoExtractor(object):
return formats return formats
def _extract_wowza_formats(self, url, video_id, m3u8_entry_protocol='m3u8_native', skip_protocols=[]): def _extract_wowza_formats(self, url, video_id, m3u8_entry_protocol='m3u8_native', skip_protocols=[]):
query = compat_urlparse.urlparse(url).query
url = re.sub(r'/(?:manifest|playlist|jwplayer)\.(?:m3u8|f4m|mpd|smil)', '', url) url = re.sub(r'/(?:manifest|playlist|jwplayer)\.(?:m3u8|f4m|mpd|smil)', '', url)
url_base = self._search_regex( url_base = self._search_regex(
r'(?:(?:https?|rtmp|rtsp):)?(//[^?]+)', url, 'format url') r'(?:(?:https?|rtmp|rtsp):)?(//[^?]+)', url, 'format url')
http_base_url = '%s:%s' % ('http', url_base) http_base_url = '%s:%s' % ('http', url_base)
formats = [] formats = []
def manifest_url(manifest):
m_url = '%s/%s' % (http_base_url, manifest)
if query:
m_url += '?%s' % query
return m_url
if 'm3u8' not in skip_protocols: if 'm3u8' not in skip_protocols:
formats.extend(self._extract_m3u8_formats( formats.extend(self._extract_m3u8_formats(
http_base_url + '/playlist.m3u8', video_id, 'mp4', manifest_url('playlist.m3u8'), video_id, 'mp4',
m3u8_entry_protocol, m3u8_id='hls', fatal=False)) m3u8_entry_protocol, m3u8_id='hls', fatal=False))
if 'f4m' not in skip_protocols: if 'f4m' not in skip_protocols:
formats.extend(self._extract_f4m_formats( formats.extend(self._extract_f4m_formats(
http_base_url + '/manifest.f4m', manifest_url('manifest.f4m'),
video_id, f4m_id='hds', fatal=False)) video_id, f4m_id='hds', fatal=False))
if 'dash' not in skip_protocols: if 'dash' not in skip_protocols:
formats.extend(self._extract_mpd_formats( formats.extend(self._extract_mpd_formats(
http_base_url + '/manifest.mpd', manifest_url('manifest.mpd'),
video_id, mpd_id='dash', fatal=False)) video_id, mpd_id='dash', fatal=False))
if re.search(r'(?:/smil:|\.smil)', url_base): if re.search(r'(?:/smil:|\.smil)', url_base):
if 'smil' not in skip_protocols: if 'smil' not in skip_protocols:
rtmp_formats = self._extract_smil_formats( rtmp_formats = self._extract_smil_formats(
http_base_url + '/jwplayer.smil', manifest_url('jwplayer.smil'),
video_id, fatal=False) video_id, fatal=False)
for rtmp_format in rtmp_formats: for rtmp_format in rtmp_formats:
rtsp_format = rtmp_format.copy() rtsp_format = rtmp_format.copy()
@ -2322,7 +2342,6 @@ class InfoExtractor(object):
formats = self._parse_jwplayer_formats( formats = self._parse_jwplayer_formats(
video_data['sources'], video_id=this_video_id, m3u8_id=m3u8_id, video_data['sources'], video_id=this_video_id, m3u8_id=m3u8_id,
mpd_id=mpd_id, rtmp_params=rtmp_params, base_url=base_url) mpd_id=mpd_id, rtmp_params=rtmp_params, base_url=base_url)
self._sort_formats(formats)
subtitles = {} subtitles = {}
tracks = video_data.get('tracks') tracks = video_data.get('tracks')
@ -2339,16 +2358,25 @@ class InfoExtractor(object):
'url': self._proto_relative_url(track_url) 'url': self._proto_relative_url(track_url)
}) })
entries.append({ entry = {
'id': this_video_id, 'id': this_video_id,
'title': video_data['title'] if require_title else video_data.get('title'), 'title': unescapeHTML(video_data['title'] if require_title else video_data.get('title')),
'description': video_data.get('description'), 'description': video_data.get('description'),
'thumbnail': self._proto_relative_url(video_data.get('image')), 'thumbnail': self._proto_relative_url(video_data.get('image')),
'timestamp': int_or_none(video_data.get('pubdate')), 'timestamp': int_or_none(video_data.get('pubdate')),
'duration': float_or_none(jwplayer_data.get('duration') or video_data.get('duration')), 'duration': float_or_none(jwplayer_data.get('duration') or video_data.get('duration')),
'subtitles': subtitles, 'subtitles': subtitles,
'formats': formats, }
}) # https://github.com/jwplayer/jwplayer/blob/master/src/js/utils/validator.js#L32
if len(formats) == 1 and re.search(r'^(?:http|//).*(?:youtube\.com|youtu\.be)/.+', formats[0]['url']):
entry.update({
'_type': 'url_transparent',
'url': formats[0]['url'],
})
else:
self._sort_formats(formats)
entry['formats'] = formats
entries.append(entry)
if len(entries) == 1: if len(entries) == 1:
return entries[0] return entries[0]
else: else:
@ -2449,10 +2477,12 @@ class InfoExtractor(object):
self._downloader.report_warning(msg) self._downloader.report_warning(msg)
return res return res
def _set_cookie(self, domain, name, value, expire_time=None): def _set_cookie(self, domain, name, value, expire_time=None, port=None,
path='/', secure=False, discard=False, rest={}, **kwargs):
cookie = compat_cookiejar.Cookie( cookie = compat_cookiejar.Cookie(
0, name, value, None, None, domain, None, 0, name, value, port, port is not None, domain, True,
None, '/', True, False, expire_time, '', None, None, None) domain.startswith('.'), path, True, secure, expire_time,
discard, None, None, rest)
self._downloader.cookiejar.set_cookie(cookie) self._downloader.cookiejar.set_cookie(cookie)
def _get_cookies(self, url): def _get_cookies(self, url):

View File

@ -116,16 +116,16 @@ class CondeNastIE(InfoExtractor):
entries = [self.url_result(build_url(path), 'CondeNast') for path in paths] entries = [self.url_result(build_url(path), 'CondeNast') for path in paths]
return self.playlist_result(entries, playlist_title=title) return self.playlist_result(entries, playlist_title=title)
def _extract_video_params(self, webpage): def _extract_video_params(self, webpage, display_id):
query = {} query = self._parse_json(
params = self._search_regex( self._search_regex(
r'(?s)var params = {(.+?)}[;,]', webpage, 'player params', default=None) r'(?s)var\s+params\s*=\s*({.+?})[;,]', webpage, 'player params',
if params: default='{}'),
query.update({ display_id, transform_source=js_to_json, fatal=False)
'videoId': self._search_regex(r'videoId: [\'"](.+?)[\'"]', params, 'video id'), if query:
'playerId': self._search_regex(r'playerId: [\'"](.+?)[\'"]', params, 'player id'), query['videoId'] = self._search_regex(
'target': self._search_regex(r'target: [\'"](.+?)[\'"]', params, 'target'), r'(?:data-video-id=|currentVideoId\s*=\s*)["\']([\da-f]+)',
}) webpage, 'video id', default=None)
else: else:
params = extract_attributes(self._search_regex( params = extract_attributes(self._search_regex(
r'(<[^>]+data-js="video-player"[^>]+>)', r'(<[^>]+data-js="video-player"[^>]+>)',
@ -141,17 +141,27 @@ class CondeNastIE(InfoExtractor):
video_id = params['videoId'] video_id = params['videoId']
video_info = None video_info = None
if params.get('playerId'):
info_page = self._download_json( # New API path
'http://player.cnevids.com/player/video.js', query = params.copy()
video_id, 'Downloading video info', fatal=False, query=params) query['embedType'] = 'inline'
if info_page: info_page = self._download_json(
video_info = info_page.get('video') 'http://player.cnevids.com/embed-api.json', video_id,
if not video_info: 'Downloading embed info', fatal=False, query=query)
info_page = self._download_webpage(
'http://player.cnevids.com/player/loader.js', # Old fallbacks
video_id, 'Downloading loader info', query=params) if not info_page:
else: if params.get('playerId'):
info_page = self._download_json(
'http://player.cnevids.com/player/video.js', video_id,
'Downloading video info', fatal=False, query=params)
if info_page:
video_info = info_page.get('video')
if not video_info:
info_page = self._download_webpage(
'http://player.cnevids.com/player/loader.js',
video_id, 'Downloading loader info', query=params)
if not video_info:
info_page = self._download_webpage( info_page = self._download_webpage(
'https://player.cnevids.com/inline/video/%s.js' % video_id, 'https://player.cnevids.com/inline/video/%s.js' % video_id,
video_id, 'Downloading inline info', query={ video_id, 'Downloading inline info', query={
@ -215,7 +225,7 @@ class CondeNastIE(InfoExtractor):
if url_type == 'series': if url_type == 'series':
return self._extract_series(url, webpage) return self._extract_series(url, webpage)
else: else:
params = self._extract_video_params(webpage) params = self._extract_video_params(webpage, display_id)
info = self._search_json_ld( info = self._search_json_ld(
webpage, display_id, fatal=False) webpage, display_id, fatal=False)
info.update(self._extract_video(params)) info.update(self._extract_video(params))

View File

@ -3,6 +3,7 @@ from __future__ import unicode_literals
import re import re
from .common import InfoExtractor from .common import InfoExtractor
from .youtube import YoutubeIE
from ..utils import ( from ..utils import (
parse_iso8601, parse_iso8601,
str_to_int, str_to_int,
@ -41,11 +42,9 @@ class CrackedIE(InfoExtractor):
webpage = self._download_webpage(url, video_id) webpage = self._download_webpage(url, video_id)
youtube_url = self._search_regex( youtube_url = YoutubeIE._extract_url(webpage)
r'<iframe[^>]+src="((?:https?:)?//www\.youtube\.com/embed/[^"]+)"',
webpage, 'youtube url', default=None)
if youtube_url: if youtube_url:
return self.url_result(youtube_url, 'Youtube') return self.url_result(youtube_url, ie=YoutubeIE.ie_key())
video_url = self._html_search_regex( video_url = self._html_search_regex(
[r'var\s+CK_vidSrc\s*=\s*"([^"]+)"', r'<video\s+src="([^"]+)"'], [r'var\s+CK_vidSrc\s*=\s*"([^"]+)"', r'<video\s+src="([^"]+)"'],

View File

@ -38,11 +38,32 @@ class CrunchyrollBaseIE(InfoExtractor):
_LOGIN_FORM = 'login_form' _LOGIN_FORM = 'login_form'
_NETRC_MACHINE = 'crunchyroll' _NETRC_MACHINE = 'crunchyroll'
def _call_rpc_api(self, method, video_id, note=None, data=None):
data = data or {}
data['req'] = 'RpcApi' + method
data = compat_urllib_parse_urlencode(data).encode('utf-8')
return self._download_xml(
'http://www.crunchyroll.com/xml/',
video_id, note, fatal=False, data=data, headers={
'Content-Type': 'application/x-www-form-urlencoded',
})
def _login(self): def _login(self):
(username, password) = self._get_login_info() (username, password) = self._get_login_info()
if username is None: if username is None:
return return
self._download_webpage(
'https://www.crunchyroll.com/?a=formhandler',
None, 'Logging in', 'Wrong login info',
data=urlencode_postdata({
'formname': 'RpcApiUser_Login',
'next_url': 'https://www.crunchyroll.com/acct/membership',
'name': username,
'password': password,
}))
'''
login_page = self._download_webpage( login_page = self._download_webpage(
self._LOGIN_URL, None, 'Downloading login page') self._LOGIN_URL, None, 'Downloading login page')
@ -86,6 +107,7 @@ class CrunchyrollBaseIE(InfoExtractor):
raise ExtractorError('Unable to login: %s' % error, expected=True) raise ExtractorError('Unable to login: %s' % error, expected=True)
raise ExtractorError('Unable to log in') raise ExtractorError('Unable to log in')
'''
def _real_initialize(self): def _real_initialize(self):
self._login() self._login()
@ -365,15 +387,19 @@ Format: Layer, Start, End, Style, Name, MarginL, MarginR, MarginV, Effect, Text
def _get_subtitles(self, video_id, webpage): def _get_subtitles(self, video_id, webpage):
subtitles = {} subtitles = {}
for sub_id, sub_name in re.findall(r'\bssid=([0-9]+)"[^>]+?\btitle="([^"]+)', webpage): for sub_id, sub_name in re.findall(r'\bssid=([0-9]+)"[^>]+?\btitle="([^"]+)', webpage):
sub_page = self._download_webpage( sub_doc = self._call_rpc_api(
'http://www.crunchyroll.com/xml/?req=RpcApiSubtitle_GetXml&subtitle_script_id=' + sub_id, 'Subtitle_GetXml', video_id,
video_id, note='Downloading subtitles for ' + sub_name) 'Downloading subtitles for ' + sub_name, data={
id = self._search_regex(r'id=\'([0-9]+)', sub_page, 'subtitle_id', fatal=False) 'subtitle_script_id': sub_id,
iv = self._search_regex(r'<iv>([^<]+)', sub_page, 'subtitle_iv', fatal=False) })
data = self._search_regex(r'<data>([^<]+)', sub_page, 'subtitle_data', fatal=False) if not sub_doc:
if not id or not iv or not data:
continue continue
subtitle = self._decrypt_subtitles(data, iv, id).decode('utf-8') sid = sub_doc.get('id')
iv = xpath_text(sub_doc, 'iv', 'subtitle iv')
data = xpath_text(sub_doc, 'data', 'subtitle data')
if not sid or not iv or not data:
continue
subtitle = self._decrypt_subtitles(data, iv, sid).decode('utf-8')
lang_code = self._search_regex(r'lang_code=["\']([^"\']+)', subtitle, 'subtitle_lang_code', fatal=False) lang_code = self._search_regex(r'lang_code=["\']([^"\']+)', subtitle, 'subtitle_lang_code', fatal=False)
if not lang_code: if not lang_code:
continue continue
@ -444,65 +470,79 @@ Format: Layer, Start, End, Style, Name, MarginL, MarginR, MarginV, Effect, Text
for fmt in available_fmts: for fmt in available_fmts:
stream_quality, stream_format = self._FORMAT_IDS[fmt] stream_quality, stream_format = self._FORMAT_IDS[fmt]
video_format = fmt + 'p' video_format = fmt + 'p'
streamdata_req = sanitized_Request( stream_infos = []
'http://www.crunchyroll.com/xml/?req=RpcApiVideoPlayer_GetStandardConfig&media_id=%s&video_format=%s&video_quality=%s' streamdata = self._call_rpc_api(
% (video_id, stream_format, stream_quality), 'VideoPlayer_GetStandardConfig', video_id,
compat_urllib_parse_urlencode({'current_page': url}).encode('utf-8')) 'Downloading media info for %s' % video_format, data={
streamdata_req.add_header('Content-Type', 'application/x-www-form-urlencoded') 'media_id': video_id,
streamdata = self._download_xml( 'video_format': stream_format,
streamdata_req, video_id, 'video_quality': stream_quality,
note='Downloading media info for %s' % video_format) 'current_page': url,
stream_info = streamdata.find('./{default}preload/stream_info') })
video_encode_id = xpath_text(stream_info, './video_encode_id') if streamdata:
if video_encode_id in video_encode_ids: stream_info = streamdata.find('./{default}preload/stream_info')
continue if stream_info:
video_encode_ids.append(video_encode_id) stream_infos.append(stream_info)
stream_info = self._call_rpc_api(
'VideoEncode_GetStreamInfo', video_id,
'Downloading stream info for %s' % video_format, data={
'media_id': video_id,
'video_format': stream_format,
'video_encode_quality': stream_quality,
})
if stream_info:
stream_infos.append(stream_info)
for stream_info in stream_infos:
video_encode_id = xpath_text(stream_info, './video_encode_id')
if video_encode_id in video_encode_ids:
continue
video_encode_ids.append(video_encode_id)
video_file = xpath_text(stream_info, './file') video_file = xpath_text(stream_info, './file')
if not video_file: if not video_file:
continue continue
if video_file.startswith('http'): if video_file.startswith('http'):
formats.extend(self._extract_m3u8_formats( formats.extend(self._extract_m3u8_formats(
video_file, video_id, 'mp4', entry_protocol='m3u8_native', video_file, video_id, 'mp4', entry_protocol='m3u8_native',
m3u8_id='hls', fatal=False)) m3u8_id='hls', fatal=False))
continue
video_url = xpath_text(stream_info, './host')
if not video_url:
continue
metadata = stream_info.find('./metadata')
format_info = {
'format': video_format,
'format_id': video_format,
'height': int_or_none(xpath_text(metadata, './height')),
'width': int_or_none(xpath_text(metadata, './width')),
}
if '.fplive.net/' in video_url:
video_url = re.sub(r'^rtmpe?://', 'http://', video_url.strip())
parsed_video_url = compat_urlparse.urlparse(video_url)
direct_video_url = compat_urlparse.urlunparse(parsed_video_url._replace(
netloc='v.lvlt.crcdn.net',
path='%s/%s' % (remove_end(parsed_video_url.path, '/'), video_file.split(':')[-1])))
if self._is_valid_url(direct_video_url, video_id, video_format):
format_info.update({
'url': direct_video_url,
})
formats.append(format_info)
continue continue
format_info.update({ video_url = xpath_text(stream_info, './host')
'url': video_url, if not video_url:
'play_path': video_file, continue
'ext': 'flv', metadata = stream_info.find('./metadata')
}) format_info = {
formats.append(format_info) 'format': video_format,
self._sort_formats(formats) 'height': int_or_none(xpath_text(metadata, './height')),
'width': int_or_none(xpath_text(metadata, './width')),
}
metadata = self._download_xml( if '.fplive.net/' in video_url:
'http://www.crunchyroll.com/xml', video_id, video_url = re.sub(r'^rtmpe?://', 'http://', video_url.strip())
note='Downloading media info', query={ parsed_video_url = compat_urlparse.urlparse(video_url)
'req': 'RpcApiVideoPlayer_GetMediaMetadata', direct_video_url = compat_urlparse.urlunparse(parsed_video_url._replace(
netloc='v.lvlt.crcdn.net',
path='%s/%s' % (remove_end(parsed_video_url.path, '/'), video_file.split(':')[-1])))
if self._is_valid_url(direct_video_url, video_id, video_format):
format_info.update({
'format_id': 'http-' + video_format,
'url': direct_video_url,
})
formats.append(format_info)
continue
format_info.update({
'format_id': 'rtmp-' + video_format,
'url': video_url,
'play_path': video_file,
'ext': 'flv',
})
formats.append(format_info)
self._sort_formats(formats, ('height', 'width', 'tbr', 'fps'))
metadata = self._call_rpc_api(
'VideoPlayer_GetMediaMetadata', video_id,
note='Downloading media info', data={
'media_id': video_id, 'media_id': video_id,
}) })

View File

@ -235,7 +235,7 @@ class DailymotionIE(DailymotionBaseInfoExtractor):
# vevo embed # vevo embed
vevo_id = self._search_regex( vevo_id = self._search_regex(
r'<link rel="video_src" href="[^"]*?vevo.com[^"]*?video=(?P<id>[\w]*)', r'<link rel="video_src" href="[^"]*?vevo\.com[^"]*?video=(?P<id>[\w]*)',
webpage, 'vevo embed', default=None) webpage, 'vevo embed', default=None)
if vevo_id: if vevo_id:
return self.url_result('vevo:%s' % vevo_id, 'Vevo') return self.url_result('vevo:%s' % vevo_id, 'Vevo')
@ -325,7 +325,7 @@ class DailymotionIE(DailymotionBaseInfoExtractor):
class DailymotionPlaylistIE(DailymotionBaseInfoExtractor): class DailymotionPlaylistIE(DailymotionBaseInfoExtractor):
IE_NAME = 'dailymotion:playlist' IE_NAME = 'dailymotion:playlist'
_VALID_URL = r'(?:https?://)?(?:www\.)?dailymotion\.[a-z]{2,3}/playlist/(?P<id>.+?)/' _VALID_URL = r'(?:https?://)?(?:www\.)?dailymotion\.[a-z]{2,3}/playlist/(?P<id>[^/?#&]+)'
_MORE_PAGES_INDICATOR = r'(?s)<div class="pages[^"]*">.*?<a\s+class="[^"]*?icon-arrow_right[^"]*?"' _MORE_PAGES_INDICATOR = r'(?s)<div class="pages[^"]*">.*?<a\s+class="[^"]*?icon-arrow_right[^"]*?"'
_PAGE_TEMPLATE = 'https://www.dailymotion.com/playlist/%s/%s' _PAGE_TEMPLATE = 'https://www.dailymotion.com/playlist/%s/%s'
_TESTS = [{ _TESTS = [{
@ -413,52 +413,3 @@ class DailymotionUserIE(DailymotionPlaylistIE):
'title': full_user, 'title': full_user,
'entries': self._extract_entries(user), 'entries': self._extract_entries(user),
} }
class DailymotionCloudIE(DailymotionBaseInfoExtractor):
_VALID_URL_PREFIX = r'https?://api\.dmcloud\.net/(?:player/)?embed/'
_VALID_URL = r'%s[^/]+/(?P<id>[^/?]+)' % _VALID_URL_PREFIX
_VALID_EMBED_URL = r'%s[^/]+/[^\'"]+' % _VALID_URL_PREFIX
_TESTS = [{
# From http://www.francetvinfo.fr/economie/entreprises/les-entreprises-familiales-le-secret-de-la-reussite_933271.html
# Tested at FranceTvInfo_2
'url': 'http://api.dmcloud.net/embed/4e7343f894a6f677b10006b4/556e03339473995ee145930c?auth=1464865870-0-jyhsm84b-ead4c701fb750cf9367bf4447167a3db&autoplay=1',
'only_matching': True,
}, {
# http://www.francetvinfo.fr/societe/larguez-les-amarres-le-cobaturage-se-developpe_980101.html
'url': 'http://api.dmcloud.net/player/embed/4e7343f894a6f677b10006b4/559545469473996d31429f06?auth=1467430263-0-90tglw2l-a3a4b64ed41efe48d7fccad85b8b8fda&autoplay=1',
'only_matching': True,
}]
@classmethod
def _extract_dmcloud_url(cls, webpage):
mobj = re.search(r'<iframe[^>]+src=[\'"](%s)[\'"]' % cls._VALID_EMBED_URL, webpage)
if mobj:
return mobj.group(1)
mobj = re.search(
r'<input[^>]+id=[\'"]dmcloudUrlEmissionSelect[\'"][^>]+value=[\'"](%s)[\'"]' % cls._VALID_EMBED_URL,
webpage)
if mobj:
return mobj.group(1)
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage_no_ff(url, video_id)
title = self._html_search_regex(r'<title>([^>]+)</title>', webpage, 'title')
video_info = self._parse_json(self._search_regex(
r'var\s+info\s*=\s*([^;]+);', webpage, 'video info'), video_id)
# TODO: parse ios_url, which is in fact a manifest
video_url = video_info['mp4_url']
return {
'id': video_id,
'url': video_url,
'title': title,
'thumbnail': video_info.get('thumbnail_url'),
}

View File

@ -13,33 +13,30 @@ from ..aes import (
from ..utils import ( from ..utils import (
bytes_to_intlist, bytes_to_intlist,
bytes_to_long, bytes_to_long,
clean_html, extract_attributes,
ExtractorError, ExtractorError,
intlist_to_bytes, intlist_to_bytes,
get_element_by_id,
js_to_json, js_to_json,
int_or_none, int_or_none,
long_to_bytes, long_to_bytes,
pkcs1pad, pkcs1pad,
remove_end,
) )
class DaisukiIE(InfoExtractor): class DaisukiMottoIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?daisuki\.net/[^/]+/[^/]+/[^/]+/watch\.[^.]+\.(?P<id>\d+)\.html' _VALID_URL = r'https?://motto\.daisuki\.net/framewatch/embed/[^/]+/(?P<id>[0-9a-zA-Z]{3})'
_TEST = { _TEST = {
'url': 'http://www.daisuki.net/tw/en/anime/watch.TheIdolMasterCG.11213.html', 'url': 'http://motto.daisuki.net/framewatch/embed/embedDRAGONBALLSUPERUniverseSurvivalsaga/V2e/760/428',
'info_dict': { 'info_dict': {
'id': '11213', 'id': 'V2e',
'ext': 'mp4', 'ext': 'mp4',
'title': '#01 Who is in the pumpkin carriage? - THE IDOLM@STER CINDERELLA GIRLS', 'title': '#117 SHOWDOWN OF LOVE! ANDROIDS VS UNIVERSE 2!!',
'subtitles': { 'subtitles': {
'mul': [{ 'mul': [{
'ext': 'ttml', 'ext': 'ttml',
}], }],
}, },
'creator': 'BANDAI NAMCO Entertainment',
}, },
'params': { 'params': {
'skip_download': True, # AES-encrypted HLS stream 'skip_download': True, # AES-encrypted HLS stream
@ -73,15 +70,17 @@ class DaisukiIE(InfoExtractor):
n, e = self._RSA_KEY n, e = self._RSA_KEY
encrypted_aeskey = long_to_bytes(pow(bytes_to_long(padded_aeskey), e, n)) encrypted_aeskey = long_to_bytes(pow(bytes_to_long(padded_aeskey), e, n))
init_data = self._download_json('http://www.daisuki.net/bin/bgn/init', video_id, query={ init_data = self._download_json(
's': flashvars.get('s', ''), 'http://motto.daisuki.net/fastAPI/bgn/init/',
'c': flashvars.get('ss3_prm', ''), video_id, query={
'e': url, 's': flashvars.get('s', ''),
'd': base64.b64encode(intlist_to_bytes(aes_cbc_encrypt( 'c': flashvars.get('ss3_prm', ''),
bytes_to_intlist(json.dumps(data)), 'e': url,
aes_key, iv))).decode('ascii'), 'd': base64.b64encode(intlist_to_bytes(aes_cbc_encrypt(
'a': base64.b64encode(encrypted_aeskey).decode('ascii'), bytes_to_intlist(json.dumps(data)),
}, note='Downloading JSON metadata' + (' (try #%d)' % (idx + 1) if idx > 0 else '')) aes_key, iv))).decode('ascii'),
'a': base64.b64encode(encrypted_aeskey).decode('ascii'),
}, note='Downloading JSON metadata' + (' (try #%d)' % (idx + 1) if idx > 0 else ''))
if 'rtn' in init_data: if 'rtn' in init_data:
encrypted_rtn = init_data['rtn'] encrypted_rtn = init_data['rtn']
@ -98,14 +97,11 @@ class DaisukiIE(InfoExtractor):
aes_key, iv)).decode('utf-8').rstrip('\0'), aes_key, iv)).decode('utf-8').rstrip('\0'),
video_id) video_id)
title = rtn['title_str']
formats = self._extract_m3u8_formats( formats = self._extract_m3u8_formats(
rtn['play_url'], video_id, ext='mp4', entry_protocol='m3u8_native') rtn['play_url'], video_id, ext='mp4', entry_protocol='m3u8_native')
title = remove_end(self._og_search_title(webpage), ' - DAISUKI')
creator = self._html_search_regex(
r'Creator\s*:\s*([^<]+)', webpage, 'creator', fatal=False)
subtitles = {} subtitles = {}
caption_url = rtn.get('caption_url') caption_url = rtn.get('caption_url')
if caption_url: if caption_url:
@ -120,21 +116,18 @@ class DaisukiIE(InfoExtractor):
'title': title, 'title': title,
'formats': formats, 'formats': formats,
'subtitles': subtitles, 'subtitles': subtitles,
'creator': creator,
} }
class DaisukiPlaylistIE(InfoExtractor): class DaisukiMottoPlaylistIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)daisuki\.net/[^/]+/[^/]+/[^/]+/detail\.(?P<id>[a-zA-Z0-9]+)\.html' _VALID_URL = r'https?://motto\.daisuki\.net/(?P<id>information)/'
_TEST = { _TEST = {
'url': 'http://www.daisuki.net/tw/en/anime/detail.TheIdolMasterCG.html', 'url': 'http://motto.daisuki.net/information/',
'info_dict': { 'info_dict': {
'id': 'TheIdolMasterCG', 'title': 'DRAGON BALL SUPER',
'title': 'THE IDOLM@STER CINDERELLA GIRLS',
'description': 'md5:0f2c028a9339f7a2c7fbf839edc5c5d8',
}, },
'playlist_count': 26, 'playlist_mincount': 117,
} }
def _real_extract(self, url): def _real_extract(self, url):
@ -142,18 +135,19 @@ class DaisukiPlaylistIE(InfoExtractor):
webpage = self._download_webpage(url, playlist_id) webpage = self._download_webpage(url, playlist_id)
episode_pattern = r'''(?sx) entries = []
<img[^>]+delay="[^"]+/(\d+)/movie\.jpg".+? for li in re.findall(r'(<li[^>]+?data-product_id="[a-zA-Z0-9]{3}"[^>]+>)', webpage):
<p[^>]+class=".*?\bepisodeNumber\b.*?">(?:<a[^>]+>)?([^<]+)''' attr = extract_attributes(li)
entries = [{ ad_id = attr.get('data-ad_id')
'_type': 'url_transparent', product_id = attr.get('data-product_id')
'url': url.replace('detail', 'watch').replace('.html', '.' + movie_id + '.html'), if ad_id and product_id:
'episode_id': episode_id, episode_id = attr.get('data-chapter')
'episode_number': int_or_none(episode_id), entries.append({
} for movie_id, episode_id in re.findall(episode_pattern, webpage)] '_type': 'url_transparent',
'url': 'http://motto.daisuki.net/framewatch/embed/%s/%s/760/428' % (ad_id, product_id),
'episode_id': episode_id,
'episode_number': int_or_none(episode_id),
'ie_key': 'DaisukiMotto',
})
playlist_title = remove_end( return self.playlist_result(entries, playlist_title='DRAGON BALL SUPER')
self._og_search_title(webpage, fatal=False), ' - Anime - DAISUKI')
playlist_description = clean_html(get_element_by_id('synopsisTxt', webpage))
return self.playlist_result(entries, playlist_id, playlist_title, playlist_description)

View File

@ -2,53 +2,85 @@
from __future__ import unicode_literals from __future__ import unicode_literals
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import unified_strdate from ..compat import compat_str
from ..utils import (
float_or_none,
unified_strdate,
)
class DctpTvIE(InfoExtractor): class DctpTvIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?dctp\.tv/(#/)?filme/(?P<id>.+?)/$' _VALID_URL = r'https?://(?:www\.)?dctp\.tv/(?:#/)?filme/(?P<id>[^/?#&]+)'
_TEST = { _TEST = {
'url': 'http://www.dctp.tv/filme/videoinstallation-fuer-eine-kaufhausfassade/', 'url': 'http://www.dctp.tv/filme/videoinstallation-fuer-eine-kaufhausfassade/',
'md5': '174dd4a8a6225cf5655952f969cfbe24',
'info_dict': { 'info_dict': {
'id': '95eaa4f33dad413aa17b4ee613cccc6c', 'id': '95eaa4f33dad413aa17b4ee613cccc6c',
'display_id': 'videoinstallation-fuer-eine-kaufhausfassade', 'display_id': 'videoinstallation-fuer-eine-kaufhausfassade',
'ext': 'mp4', 'ext': 'flv',
'title': 'Videoinstallation für eine Kaufhausfassade', 'title': 'Videoinstallation für eine Kaufhausfassade',
'description': 'Kurzfilm', 'description': 'Kurzfilm',
'upload_date': '20110407', 'upload_date': '20110407',
'thumbnail': r're:^https?://.*\.jpg$', 'thumbnail': r're:^https?://.*\.jpg$',
'duration': 71.24,
},
'params': {
# rtmp download
'skip_download': True,
}, },
} }
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) display_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
object_id = self._html_search_meta('DC.identifier', webpage) webpage = self._download_webpage(url, display_id)
servers_json = self._download_json( video_id = self._html_search_meta(
'http://www.dctp.tv/elastic_streaming_client/get_streaming_server/', 'DC.identifier', webpage, 'video id',
video_id, note='Downloading server list') default=None) or self._search_regex(
server = servers_json[0]['server'] r'id=["\']uuid[^>]+>([^<]+)<', webpage, 'video id')
m3u8_path = self._search_regex(
r'\'([^\'"]+/playlist\.m3u8)"', webpage, 'm3u8 path')
formats = self._extract_m3u8_formats(
'http://%s%s' % (server, m3u8_path), video_id, ext='mp4',
entry_protocol='m3u8_native')
title = self._og_search_title(webpage) title = self._og_search_title(webpage)
servers = self._download_json(
'http://www.dctp.tv/streaming_servers/', display_id,
note='Downloading server list', fatal=False)
if servers:
endpoint = next(
server['endpoint']
for server in servers
if isinstance(server.get('endpoint'), compat_str) and
'cloudfront' in server['endpoint'])
else:
endpoint = 'rtmpe://s2pqqn4u96e4j8.cloudfront.net/cfx/st/'
app = self._search_regex(
r'^rtmpe?://[^/]+/(?P<app>.*)$', endpoint, 'app')
formats = [{
'url': endpoint,
'app': app,
'play_path': 'mp4:%s_dctp_0500_4x3.m4v' % video_id,
'page_url': url,
'player_url': 'http://svm-prod-dctptv-static.s3.amazonaws.com/dctptv-relaunch2012-109.swf',
'ext': 'flv',
}]
description = self._html_search_meta('DC.description', webpage) description = self._html_search_meta('DC.description', webpage)
upload_date = unified_strdate( upload_date = unified_strdate(
self._html_search_meta('DC.date.created', webpage)) self._html_search_meta('DC.date.created', webpage))
thumbnail = self._og_search_thumbnail(webpage) thumbnail = self._og_search_thumbnail(webpage)
duration = float_or_none(self._search_regex(
r'id=["\']duration_in_ms[^+]>(\d+)', webpage, 'duration',
default=None), scale=1000)
return { return {
'id': object_id, 'id': video_id,
'title': title, 'title': title,
'formats': formats, 'formats': formats,
'display_id': video_id, 'display_id': display_id,
'description': description, 'description': description,
'upload_date': upload_date, 'upload_date': upload_date,
'thumbnail': thumbnail, 'thumbnail': thumbnail,
'duration': duration,
} }

View File

@ -19,7 +19,7 @@ class DeezerPlaylistIE(InfoExtractor):
'id': '176747451', 'id': '176747451',
'title': 'Best!', 'title': 'Best!',
'uploader': 'Anonymous', 'uploader': 'Anonymous',
'thumbnail': r're:^https?://cdn-images.deezer.com/images/cover/.*\.jpg$', 'thumbnail': r're:^https?://cdn-images\.deezer\.com/images/cover/.*\.jpg$',
}, },
'playlist_count': 30, 'playlist_count': 30,
'skip': 'Only available in .de', 'skip': 'Only available in .de',

View File

@ -54,12 +54,12 @@ class DramaFeverBaseIE(AMPIE):
request = sanitized_Request( request = sanitized_Request(
self._LOGIN_URL, urlencode_postdata(login_form)) self._LOGIN_URL, urlencode_postdata(login_form))
response = self._download_webpage( response = self._download_webpage(
request, None, 'Logging in as %s' % username) request, None, 'Logging in')
if all(logout_pattern not in response if all(logout_pattern not in response
for logout_pattern in ['href="/accounts/logout/"', '>Log out<']): for logout_pattern in ['href="/accounts/logout/"', '>Log out<']):
error = self._html_search_regex( error = self._html_search_regex(
r'(?s)class="hidden-xs prompt"[^>]*>(.+?)<', r'(?s)<h\d[^>]+\bclass="hidden-xs prompt"[^>]*>(.+?)</h\d',
response, 'error message', default=None) response, 'error message', default=None)
if error: if error:
raise ExtractorError('Unable to login: %s' % error, expected=True) raise ExtractorError('Unable to login: %s' % error, expected=True)

View File

@ -10,7 +10,7 @@ from ..utils import (
class DrTuberIE(InfoExtractor): class DrTuberIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?drtuber\.com/(?:video|embed)/(?P<id>\d+)(?:/(?P<display_id>[\w-]+))?' _VALID_URL = r'https?://(?:(?:www|m)\.)?drtuber\.com/(?:video|embed)/(?P<id>\d+)(?:/(?P<display_id>[\w-]+))?'
_TESTS = [{ _TESTS = [{
'url': 'http://www.drtuber.com/video/1740434/hot-perky-blonde-naked-golf', 'url': 'http://www.drtuber.com/video/1740434/hot-perky-blonde-naked-golf',
'md5': '93e680cf2536ad0dfb7e74d94a89facd', 'md5': '93e680cf2536ad0dfb7e74d94a89facd',
@ -28,6 +28,9 @@ class DrTuberIE(InfoExtractor):
}, { }, {
'url': 'http://www.drtuber.com/embed/489939', 'url': 'http://www.drtuber.com/embed/489939',
'only_matching': True, 'only_matching': True,
}, {
'url': 'http://m.drtuber.com/video/3893529/lingerie-blowjob-from-beautiful-teen',
'only_matching': True,
}] }]
@staticmethod @staticmethod

View File

@ -138,6 +138,7 @@ class DRTVIE(InfoExtractor):
'tbr': int_or_none(bitrate), 'tbr': int_or_none(bitrate),
'ext': link.get('FileFormat'), 'ext': link.get('FileFormat'),
'vcodec': 'none' if kind == 'AudioResource' else None, 'vcodec': 'none' if kind == 'AudioResource' else None,
'preference': preference,
}) })
subtitles_list = asset.get('SubtitlesList') subtitles_list = asset.get('SubtitlesList')
if isinstance(subtitles_list, list): if isinstance(subtitles_list, list):

View File

@ -2,7 +2,9 @@
from __future__ import unicode_literals from __future__ import unicode_literals
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import compat_str
from ..utils import ( from ..utils import (
determine_ext,
int_or_none, int_or_none,
try_get, try_get,
unified_timestamp, unified_timestamp,
@ -17,7 +19,7 @@ class EggheadCourseIE(InfoExtractor):
'url': 'https://egghead.io/courses/professor-frisby-introduces-composable-functional-javascript', 'url': 'https://egghead.io/courses/professor-frisby-introduces-composable-functional-javascript',
'playlist_count': 29, 'playlist_count': 29,
'info_dict': { 'info_dict': {
'id': 'professor-frisby-introduces-composable-functional-javascript', 'id': '72',
'title': 'Professor Frisby Introduces Composable Functional JavaScript', 'title': 'Professor Frisby Introduces Composable Functional JavaScript',
'description': 're:(?s)^This course teaches the ubiquitous.*You\'ll start composing functionality before you know it.$', 'description': 're:(?s)^This course teaches the ubiquitous.*You\'ll start composing functionality before you know it.$',
}, },
@ -26,14 +28,28 @@ class EggheadCourseIE(InfoExtractor):
def _real_extract(self, url): def _real_extract(self, url):
playlist_id = self._match_id(url) playlist_id = self._match_id(url)
course = self._download_json( lessons = self._download_json(
'https://egghead.io/api/v1/series/%s' % playlist_id, playlist_id) 'https://egghead.io/api/v1/series/%s/lessons' % playlist_id,
playlist_id, 'Downloading course lessons JSON')
entries = [ entries = []
self.url_result( for lesson in lessons:
'wistia:%s' % lesson['wistia_id'], ie='Wistia', lesson_url = lesson.get('http_url')
video_id=lesson['wistia_id'], video_title=lesson.get('title')) if not lesson_url or not isinstance(lesson_url, compat_str):
for lesson in course['lessons'] if lesson.get('wistia_id')] continue
lesson_id = lesson.get('id')
if lesson_id:
lesson_id = compat_str(lesson_id)
entries.append(self.url_result(
lesson_url, ie=EggheadLessonIE.ie_key(), video_id=lesson_id))
course = self._download_json(
'https://egghead.io/api/v1/series/%s' % playlist_id,
playlist_id, 'Downloading course JSON', fatal=False) or {}
playlist_id = course.get('id')
if playlist_id:
playlist_id = compat_str(playlist_id)
return self.playlist_result( return self.playlist_result(
entries, playlist_id, course.get('title'), entries, playlist_id, course.get('title'),
@ -43,11 +59,12 @@ class EggheadCourseIE(InfoExtractor):
class EggheadLessonIE(InfoExtractor): class EggheadLessonIE(InfoExtractor):
IE_DESC = 'egghead.io lesson' IE_DESC = 'egghead.io lesson'
IE_NAME = 'egghead:lesson' IE_NAME = 'egghead:lesson'
_VALID_URL = r'https://egghead\.io/lessons/(?P<id>[^/?#&]+)' _VALID_URL = r'https://egghead\.io/(?:api/v1/)?lessons/(?P<id>[^/?#&]+)'
_TEST = { _TESTS = [{
'url': 'https://egghead.io/lessons/javascript-linear-data-flow-with-container-style-types-box', 'url': 'https://egghead.io/lessons/javascript-linear-data-flow-with-container-style-types-box',
'info_dict': { 'info_dict': {
'id': 'fv5yotjxcg', 'id': '1196',
'display_id': 'javascript-linear-data-flow-with-container-style-types-box',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Create linear data flow with container style types (Box)', 'title': 'Create linear data flow with container style types (Box)',
'description': 'md5:9aa2cdb6f9878ed4c39ec09e85a8150e', 'description': 'md5:9aa2cdb6f9878ed4c39ec09e85a8150e',
@ -60,25 +77,51 @@ class EggheadLessonIE(InfoExtractor):
}, },
'params': { 'params': {
'skip_download': True, 'skip_download': True,
'format': 'bestvideo',
}, },
} }, {
'url': 'https://egghead.io/api/v1/lessons/react-add-redux-to-a-react-application',
'only_matching': True,
}]
def _real_extract(self, url): def _real_extract(self, url):
lesson_id = self._match_id(url) display_id = self._match_id(url)
lesson = self._download_json( lesson = self._download_json(
'https://egghead.io/api/v1/lessons/%s' % lesson_id, lesson_id) 'https://egghead.io/api/v1/lessons/%s' % display_id, display_id)
lesson_id = compat_str(lesson['id'])
title = lesson['title']
formats = []
for _, format_url in lesson['media_urls'].items():
if not format_url or not isinstance(format_url, compat_str):
continue
ext = determine_ext(format_url)
if ext == 'm3u8':
formats.extend(self._extract_m3u8_formats(
format_url, lesson_id, 'mp4', entry_protocol='m3u8',
m3u8_id='hls', fatal=False))
elif ext == 'mpd':
formats.extend(self._extract_mpd_formats(
format_url, lesson_id, mpd_id='dash', fatal=False))
else:
formats.append({
'url': format_url,
})
self._sort_formats(formats)
return { return {
'_type': 'url_transparent', 'id': lesson_id,
'ie_key': 'Wistia', 'display_id': display_id,
'url': 'wistia:%s' % lesson['wistia_id'], 'title': title,
'id': lesson['wistia_id'],
'title': lesson.get('title'),
'description': lesson.get('summary'), 'description': lesson.get('summary'),
'thumbnail': lesson.get('thumb_nail'), 'thumbnail': lesson.get('thumb_nail'),
'timestamp': unified_timestamp(lesson.get('published_at')), 'timestamp': unified_timestamp(lesson.get('published_at')),
'duration': int_or_none(lesson.get('duration')), 'duration': int_or_none(lesson.get('duration')),
'view_count': int_or_none(lesson.get('plays_count')), 'view_count': int_or_none(lesson.get('plays_count')),
'tags': try_get(lesson, lambda x: x['tag_list'], list), 'tags': try_get(lesson, lambda x: x['tag_list'], list),
'series': try_get(
lesson, lambda x: x['series']['title'], compat_str),
'formats': formats,
} }

View File

@ -0,0 +1,133 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import (
clean_html,
extract_attributes,
float_or_none,
int_or_none,
try_get,
)
class EllenTubeBaseIE(InfoExtractor):
def _extract_data_config(self, webpage, video_id):
details = self._search_regex(
r'(<[^>]+\bdata-component=(["\'])[Dd]etails.+?></div>)', webpage,
'details')
return self._parse_json(
extract_attributes(details)['data-config'], video_id)
def _extract_video(self, data, video_id):
title = data['title']
formats = []
duration = None
for entry in data.get('media'):
if entry.get('id') == 'm3u8':
formats = self._extract_m3u8_formats(
entry['url'], video_id, 'mp4',
entry_protocol='m3u8_native', m3u8_id='hls')
duration = int_or_none(entry.get('duration'))
break
self._sort_formats(formats)
def get_insight(kind):
return int_or_none(try_get(
data, lambda x: x['insight']['%ss' % kind]))
return {
'extractor_key': EllenTubeIE.ie_key(),
'id': video_id,
'title': title,
'description': data.get('description'),
'duration': duration,
'thumbnail': data.get('thumbnail'),
'timestamp': float_or_none(data.get('publishTime'), scale=1000),
'view_count': get_insight('view'),
'like_count': get_insight('like'),
'formats': formats,
}
class EllenTubeIE(EllenTubeBaseIE):
_VALID_URL = r'''(?x)
(?:
ellentube:|
https://api-prod\.ellentube\.com/ellenapi/api/item/
)
(?P<id>[\da-f]{8}-[\da-f]{4}-[\da-f]{4}-[\da-f]{4}-[\da-f]{12})
'''
_TESTS = [{
'url': 'https://api-prod.ellentube.com/ellenapi/api/item/0822171c-3829-43bf-b99f-d77358ae75e3',
'md5': '2fabc277131bddafdd120e0fc0f974c9',
'info_dict': {
'id': '0822171c-3829-43bf-b99f-d77358ae75e3',
'ext': 'mp4',
'title': 'Ellen Meets Las Vegas Survivors Jesus Campos and Stephen Schuck',
'description': 'md5:76e3355e2242a78ad9e3858e5616923f',
'thumbnail': r're:^https?://.+?',
'duration': 514,
'timestamp': 1508505120,
'upload_date': '20171020',
'view_count': int,
'like_count': int,
}
}, {
'url': 'ellentube:734a3353-f697-4e79-9ca9-bfc3002dc1e0',
'only_matching': True,
}]
def _real_extract(self, url):
video_id = self._match_id(url)
data = self._download_json(
'https://api-prod.ellentube.com/ellenapi/api/item/%s' % video_id,
video_id)
return self._extract_video(data, video_id)
class EllenTubeVideoIE(EllenTubeBaseIE):
_VALID_URL = r'https?://(?:www\.)?ellentube\.com/video/(?P<id>.+?)\.html'
_TEST = {
'url': 'https://www.ellentube.com/video/ellen-meets-las-vegas-survivors-jesus-campos-and-stephen-schuck.html',
'only_matching': True,
}
def _real_extract(self, url):
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
video_id = self._extract_data_config(webpage, display_id)['id']
return self.url_result(
'ellentube:%s' % video_id, ie=EllenTubeIE.ie_key(),
video_id=video_id)
class EllenTubePlaylistIE(EllenTubeBaseIE):
_VALID_URL = r'https?://(?:www\.)?ellentube\.com/(?:episode|studios)/(?P<id>.+?)\.html'
_TESTS = [{
'url': 'https://www.ellentube.com/episode/dax-shepard-jordan-fisher-haim.html',
'info_dict': {
'id': 'dax-shepard-jordan-fisher-haim',
'title': "Dax Shepard, 'DWTS' Team Jordan Fisher & Lindsay Arnold, HAIM",
'description': 'md5:bfc982194dabb3f4e325e43aa6b2e21c',
},
'playlist_count': 6,
}, {
'url': 'https://www.ellentube.com/studios/macey-goes-rving0.html',
'only_matching': True,
}]
def _real_extract(self, url):
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
data = self._extract_data_config(webpage, display_id)['data']
feed = self._download_json(
'https://api-prod.ellentube.com/ellenapi/api/feed/?%s'
% data['filter'], display_id)
entries = [
self._extract_video(elem, elem['id'])
for elem in feed if elem.get('type') == 'VIDEO' and elem.get('id')]
return self.playlist_result(
entries, display_id, data.get('title'),
clean_html(data.get('description')))

View File

@ -1,101 +0,0 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
from .kaltura import KalturaIE
from ..utils import NO_DEFAULT
class EllenTVIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?(?:ellentv|ellentube)\.com/videos/(?P<id>[a-z0-9_-]+)'
_TESTS = [{
'url': 'http://www.ellentv.com/videos/0-ipq1gsai/',
'md5': '4294cf98bc165f218aaa0b89e0fd8042',
'info_dict': {
'id': '0_ipq1gsai',
'ext': 'mov',
'title': 'Fast Fingers of Fate',
'description': 'md5:3539013ddcbfa64b2a6d1b38d910868a',
'timestamp': 1428035648,
'upload_date': '20150403',
'uploader_id': 'batchUser',
},
}, {
# not available via http://widgets.ellentube.com/
'url': 'http://www.ellentv.com/videos/1-szkgu2m2/',
'info_dict': {
'id': '1_szkgu2m2',
'ext': 'flv',
'title': "Ellen's Amazingly Talented Audience",
'description': 'md5:86ff1e376ff0d717d7171590e273f0a5',
'timestamp': 1255140900,
'upload_date': '20091010',
'uploader_id': 'ellenkaltura@gmail.com',
},
'params': {
'skip_download': True,
},
}]
def _real_extract(self, url):
video_id = self._match_id(url)
URLS = ('http://widgets.ellentube.com/videos/%s' % video_id, url)
for num, url_ in enumerate(URLS, 1):
webpage = self._download_webpage(
url_, video_id, fatal=num == len(URLS))
default = NO_DEFAULT if num == len(URLS) else None
partner_id = self._search_regex(
r"var\s+partnerId\s*=\s*'([^']+)", webpage, 'partner id',
default=default)
kaltura_id = self._search_regex(
[r'id="kaltura_player_([^"]+)"',
r"_wb_entry_id\s*:\s*'([^']+)",
r'data-kaltura-entry-id="([^"]+)'],
webpage, 'kaltura id', default=default)
if partner_id and kaltura_id:
break
return self.url_result('kaltura:%s:%s' % (partner_id, kaltura_id), KalturaIE.ie_key())
class EllenTVClipsIE(InfoExtractor):
IE_NAME = 'EllenTV:clips'
_VALID_URL = r'https?://(?:www\.)?ellentv\.com/episodes/(?P<id>[a-z0-9_-]+)'
_TEST = {
'url': 'http://www.ellentv.com/episodes/meryl-streep-vanessa-hudgens/',
'info_dict': {
'id': 'meryl-streep-vanessa-hudgens',
'title': 'Meryl Streep, Vanessa Hudgens',
},
'playlist_mincount': 5,
}
def _real_extract(self, url):
playlist_id = self._match_id(url)
webpage = self._download_webpage(url, playlist_id)
playlist = self._extract_playlist(webpage, playlist_id)
return {
'_type': 'playlist',
'id': playlist_id,
'title': self._og_search_title(webpage),
'entries': self._extract_entries(playlist)
}
def _extract_playlist(self, webpage, playlist_id):
json_string = self._search_regex(r'playerView.addClips\(\[\{(.*?)\}\]\);', webpage, 'json')
return self._parse_json('[{' + json_string + '}]', playlist_id)
def _extract_entries(self, playlist):
return [
self.url_result(
'kaltura:%s:%s' % (item['kaltura_partner_id'], item['kaltura_entry_id']),
KalturaIE.ie_key(), video_id=item['kaltura_entry_id'])
for item in playlist]

View File

@ -15,7 +15,7 @@ from ..utils import (
class EpornerIE(InfoExtractor): class EpornerIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?eporner\.com/hd-porn/(?P<id>\w+)(?:/(?P<display_id>[\w-]+))?' _VALID_URL = r'https?://(?:www\.)?eporner\.com/(?:hd-porn|embed)/(?P<id>\w+)(?:/(?P<display_id>[\w-]+))?'
_TESTS = [{ _TESTS = [{
'url': 'http://www.eporner.com/hd-porn/95008/Infamous-Tiffany-Teen-Strip-Tease-Video/', 'url': 'http://www.eporner.com/hd-porn/95008/Infamous-Tiffany-Teen-Strip-Tease-Video/',
'md5': '39d486f046212d8e1b911c52ab4691f8', 'md5': '39d486f046212d8e1b911c52ab4691f8',
@ -35,6 +35,9 @@ class EpornerIE(InfoExtractor):
}, { }, {
'url': 'http://www.eporner.com/hd-porn/3YRUtzMcWn0', 'url': 'http://www.eporner.com/hd-porn/3YRUtzMcWn0',
'only_matching': True, 'only_matching': True,
}, {
'url': 'http://www.eporner.com/hd-porn/3YRUtzMcWn0',
'only_matching': True,
}] }]
def _real_extract(self, url): def _real_extract(self, url):

View File

@ -31,20 +31,19 @@ from .aenetworks import (
AENetworksIE, AENetworksIE,
HistoryTopicIE, HistoryTopicIE,
) )
from .afreecatv import ( from .afreecatv import AfreecaTVIE
AfreecaTVIE,
AfreecaTVGlobalIE,
)
from .airmozilla import AirMozillaIE from .airmozilla import AirMozillaIE
from .aljazeera import AlJazeeraIE from .aljazeera import AlJazeeraIE
from .alphaporno import AlphaPornoIE from .alphaporno import AlphaPornoIE
from .amcnetworks import AMCNetworksIE from .amcnetworks import AMCNetworksIE
from .americastestkitchen import AmericasTestKitchenIE
from .animeondemand import AnimeOnDemandIE from .animeondemand import AnimeOnDemandIE
from .anitube import AnitubeIE from .anitube import AnitubeIE
from .anvato import AnvatoIE from .anvato import AnvatoIE
from .anysex import AnySexIE from .anysex import AnySexIE
from .aol import AolIE from .aol import AolIE
from .allocine import AllocineIE from .allocine import AllocineIE
from .aliexpress import AliExpressLiveIE
from .aparat import AparatIE from .aparat import AparatIE
from .appleconnect import AppleConnectIE from .appleconnect import AppleConnectIE
from .appletrailers import ( from .appletrailers import (
@ -128,7 +127,10 @@ from .bloomberg import BloombergIE
from .bokecc import BokeCCIE from .bokecc import BokeCCIE
from .bostonglobe import BostonGlobeIE from .bostonglobe import BostonGlobeIE
from .bpb import BpbIE from .bpb import BpbIE
from .br import BRIE from .br import (
BRIE,
BRMediathekIE,
)
from .bravotv import BravoTVIE from .bravotv import BravoTVIE
from .breakcom import BreakIE from .breakcom import BreakIE
from .brightcove import ( from .brightcove import (
@ -148,7 +150,11 @@ from .camdemy import (
from .camwithher import CamWithHerIE from .camwithher import CamWithHerIE
from .canalplus import CanalplusIE from .canalplus import CanalplusIE
from .canalc2 import Canalc2IE from .canalc2 import Canalc2IE
from .canvas import CanvasIE from .canvas import (
CanvasIE,
CanvasEenIE,
VrtNUIE,
)
from .carambatv import ( from .carambatv import (
CarambaTVIE, CarambaTVIE,
CarambaTVPageIE, CarambaTVPageIE,
@ -240,11 +246,10 @@ from .dailymotion import (
DailymotionIE, DailymotionIE,
DailymotionPlaylistIE, DailymotionPlaylistIE,
DailymotionUserIE, DailymotionUserIE,
DailymotionCloudIE,
) )
from .daisuki import ( from .daisuki import (
DaisukiIE, DaisukiMottoIE,
DaisukiPlaylistIE, DaisukiMottoPlaylistIE,
) )
from .daum import ( from .daum import (
DaumIE, DaumIE,
@ -306,9 +311,10 @@ from .ehow import EHowIE
from .eighttracks import EightTracksIE from .eighttracks import EightTracksIE
from .einthusan import EinthusanIE from .einthusan import EinthusanIE
from .eitb import EitbIE from .eitb import EitbIE
from .ellentv import ( from .ellentube import (
EllenTVIE, EllenTubeIE,
EllenTVClipsIE, EllenTubeVideoIE,
EllenTubePlaylistIE,
) )
from .elpais import ElPaisIE from .elpais import ElPaisIE
from .embedly import EmbedlyIE from .embedly import EmbedlyIE
@ -341,11 +347,9 @@ from .filmon import (
FilmOnIE, FilmOnIE,
FilmOnChannelIE, FilmOnChannelIE,
) )
from .firstpost import FirstpostIE
from .firsttv import FirstTVIE from .firsttv import FirstTVIE
from .fivemin import FiveMinIE from .fivemin import FiveMinIE
from .fivetv import FiveTVIE from .fivetv import FiveTVIE
from .fktv import FKTVIE
from .flickr import FlickrIE from .flickr import FlickrIE
from .flipagram import FlipagramIE from .flipagram import FlipagramIE
from .folketinget import FolketingetIE from .folketinget import FolketingetIE
@ -372,13 +376,14 @@ from .francetv import (
FranceTVIE, FranceTVIE,
FranceTVEmbedIE, FranceTVEmbedIE,
FranceTVInfoIE, FranceTVInfoIE,
GenerationQuoiIE, GenerationWhatIE,
CultureboxIE, CultureboxIE,
) )
from .freesound import FreesoundIE from .freesound import FreesoundIE
from .freespeech import FreespeechIE from .freespeech import FreespeechIE
from .freshlive import FreshLiveIE from .freshlive import FreshLiveIE
from .funimation import FunimationIE from .funimation import FunimationIE
from .funk import FunkIE
from .funnyordie import FunnyOrDieIE from .funnyordie import FunnyOrDieIE
from .fusion import FusionIE from .fusion import FusionIE
from .fxnetworks import FXNetworksIE from .fxnetworks import FXNetworksIE
@ -387,7 +392,6 @@ from .gameone import (
GameOneIE, GameOneIE,
GameOnePlaylistIE, GameOnePlaylistIE,
) )
from .gamersyde import GamersydeIE
from .gamespot import GameSpotIE from .gamespot import GameSpotIE
from .gamestar import GameStarIE from .gamestar import GameStarIE
from .gaskrank import GaskrankIE from .gaskrank import GaskrankIE
@ -428,7 +432,10 @@ from .hitbox import HitboxIE, HitboxLiveIE
from .hitrecord import HitRecordIE from .hitrecord import HitRecordIE
from .hornbunny import HornBunnyIE from .hornbunny import HornBunnyIE
from .hotnewhiphop import HotNewHipHopIE from .hotnewhiphop import HotNewHipHopIE
from .hotstar import HotStarIE from .hotstar import (
HotStarIE,
HotStarPlaylistIE,
)
from .howcast import HowcastIE from .howcast import HowcastIE
from .howstuffworks import HowStuffWorksIE from .howstuffworks import HowStuffWorksIE
from .hrti import ( from .hrti import (
@ -481,6 +488,7 @@ from .jove import JoveIE
from .joj import JojIE from .joj import JojIE
from .jwplatform import JWPlatformIE from .jwplatform import JWPlatformIE
from .jpopsukitv import JpopsukiIE from .jpopsukitv import JpopsukiIE
from .kakao import KakaoIE
from .kaltura import KalturaIE from .kaltura import KalturaIE
from .kamcord import KamcordIE from .kamcord import KamcordIE
from .kanalplay import KanalPlayIE from .kanalplay import KanalPlayIE
@ -563,6 +571,8 @@ from .mangomolo import (
MangomoloVideoIE, MangomoloVideoIE,
MangomoloLiveIE, MangomoloLiveIE,
) )
from .manyvids import ManyVidsIE
from .massengeschmacktv import MassengeschmackTVIE
from .matchtv import MatchTVIE from .matchtv import MatchTVIE
from .mdr import MDRIE from .mdr import MDRIE
from .mediaset import MediasetIE from .mediaset import MediasetIE
@ -618,7 +628,6 @@ from .mwave import MwaveIE, MwaveMeetGreetIE
from .myspace import MySpaceIE, MySpaceAlbumIE from .myspace import MySpaceIE, MySpaceAlbumIE
from .myspass import MySpassIE from .myspass import MySpassIE
from .myvi import MyviIE from .myvi import MyviIE
from .myvideo import MyVideoIE
from .myvidster import MyVidsterIE from .myvidster import MyVidsterIE
from .nationalgeographic import ( from .nationalgeographic import (
NationalGeographicVideoIE, NationalGeographicVideoIE,
@ -680,6 +689,7 @@ from .nhl import (
) )
from .nick import ( from .nick import (
NickIE, NickIE,
NickBrIE,
NickDeIE, NickDeIE,
NickNightIE, NickNightIE,
NickRuIE, NickRuIE,
@ -766,6 +776,7 @@ from .ora import OraTVIE
from .orf import ( from .orf import (
ORFTVthekIE, ORFTVthekIE,
ORFFM4IE, ORFFM4IE,
ORFFM4StoryIE,
ORFOE1IE, ORFOE1IE,
ORFIPTVIE, ORFIPTVIE,
) )
@ -780,6 +791,7 @@ from .patreon import PatreonIE
from .pbs import PBSIE from .pbs import PBSIE
from .pearvideo import PearVideoIE from .pearvideo import PearVideoIE
from .people import PeopleIE from .people import PeopleIE
from .performgroup import PerformGroupIE
from .periscope import ( from .periscope import (
PeriscopeIE, PeriscopeIE,
PeriscopeUserIE, PeriscopeUserIE,
@ -805,6 +817,7 @@ from .polskieradio import (
PolskieRadioIE, PolskieRadioIE,
PolskieRadioCategoryIE, PolskieRadioCategoryIE,
) )
from .popcorntv import PopcornTVIE
from .porn91 import Porn91IE from .porn91 import Porn91IE
from .porncom import PornComIE from .porncom import PornComIE
from .pornflip import PornFlipIE from .pornflip import PornFlipIE
@ -845,6 +858,7 @@ from .radiofrance import RadioFranceIE
from .rai import ( from .rai import (
RaiPlayIE, RaiPlayIE,
RaiPlayLiveIE, RaiPlayLiveIE,
RaiPlayPlaylistIE,
RaiIE, RaiIE,
) )
from .rbmaradio import RBMARadioIE from .rbmaradio import RBMARadioIE
@ -897,6 +911,7 @@ from .rutube import (
RutubeEmbedIE, RutubeEmbedIE,
RutubeMovieIE, RutubeMovieIE,
RutubePersonIE, RutubePersonIE,
RutubePlaylistIE,
) )
from .rutv import RUTVIE from .rutv import RUTVIE
from .ruutu import RuutuIE from .ruutu import RuutuIE
@ -917,6 +932,7 @@ from .seeker import SeekerIE
from .senateisvp import SenateISVPIE from .senateisvp import SenateISVPIE
from .sendtonews import SendtoNewsIE from .sendtonews import SendtoNewsIE
from .servingsys import ServingSysIE from .servingsys import ServingSysIE
from .servus import ServusIE
from .sexu import SexuIE from .sexu import SexuIE
from .shahid import ShahidIE from .shahid import ShahidIE
from .shared import ( from .shared import (
@ -933,6 +949,7 @@ from .skynewsarabia import (
) )
from .skysports import SkySportsIE from .skysports import SkySportsIE
from .slideshare import SlideshareIE from .slideshare import SlideshareIE
from .slideslive import SlidesLiveIE
from .slutload import SlutloadIE from .slutload import SlutloadIE
from .smotri import ( from .smotri import (
SmotriIE, SmotriIE,
@ -985,6 +1002,7 @@ from .streamango import StreamangoIE
from .streamcloud import StreamcloudIE from .streamcloud import StreamcloudIE
from .streamcz import StreamCZIE from .streamcz import StreamCZIE
from .streetvoice import StreetVoiceIE from .streetvoice import StreetVoiceIE
from .stretchinternet import StretchInternetIE
from .sunporno import SunPornoIE from .sunporno import SunPornoIE
from .svt import ( from .svt import (
SVTIE, SVTIE,
@ -1100,10 +1118,6 @@ from .tvplayer import TVPlayerIE
from .tweakers import TweakersIE from .tweakers import TweakersIE
from .twentyfourvideo import TwentyFourVideoIE from .twentyfourvideo import TwentyFourVideoIE
from .twentymin import TwentyMinutenIE from .twentymin import TwentyMinutenIE
from .twentytwotracks import (
TwentyTwoTracksIE,
TwentyTwoTracksGenreIE
)
from .twitch import ( from .twitch import (
TwitchVideoIE, TwitchVideoIE,
TwitchChapterIE, TwitchChapterIE,
@ -1129,6 +1143,7 @@ from .udn import UDNEmbedIE
from .uktvplay import UKTVPlayIE from .uktvplay import UKTVPlayIE
from .digiteka import DigitekaIE from .digiteka import DigitekaIE
from .unistra import UnistraIE from .unistra import UnistraIE
from .unity import UnityIE
from .uol import UOLIE from .uol import UOLIE
from .uplynk import ( from .uplynk import (
UplynkIE, UplynkIE,
@ -1236,7 +1251,10 @@ from .vodpl import VODPlIE
from .vodplatform import VODPlatformIE from .vodplatform import VODPlatformIE
from .voicerepublic import VoiceRepublicIE from .voicerepublic import VoiceRepublicIE
from .voot import VootIE from .voot import VootIE
from .voxmedia import VoxMediaIE from .voxmedia import (
VoxMediaVolumeIE,
VoxMediaIE,
)
from .vporn import VpornIE from .vporn import VpornIE
from .vrt import VRTIE from .vrt import VRTIE
from .vrak import VrakIE from .vrak import VrakIE
@ -1321,6 +1339,11 @@ from .youku import (
YoukuIE, YoukuIE,
YoukuShowIE, YoukuShowIE,
) )
from .younow import (
YouNowLiveIE,
YouNowChannelIE,
YouNowMomentIE,
)
from .youporn import YouPornIE from .youporn import YouPornIE
from .yourupload import YourUploadIE from .yourupload import YourUploadIE
from .youtube import ( from .youtube import (
@ -1335,7 +1358,6 @@ from .youtube import (
YoutubeSearchDateIE, YoutubeSearchDateIE,
YoutubeSearchIE, YoutubeSearchIE,
YoutubeSearchURLIE, YoutubeSearchURLIE,
YoutubeSharedVideoIE,
YoutubeShowIE, YoutubeShowIE,
YoutubeSubscriptionsIE, YoutubeSubscriptionsIE,
YoutubeTruncatedIDIE, YoutubeTruncatedIDIE,

View File

@ -67,9 +67,9 @@ class FacebookIE(InfoExtractor):
'uploader': 'Tennis on Facebook', 'uploader': 'Tennis on Facebook',
'upload_date': '20140908', 'upload_date': '20140908',
'timestamp': 1410199200, 'timestamp': 1410199200,
} },
'skip': 'Requires logging in',
}, { }, {
'note': 'Video without discernible title',
'url': 'https://www.facebook.com/video.php?v=274175099429670', 'url': 'https://www.facebook.com/video.php?v=274175099429670',
'info_dict': { 'info_dict': {
'id': '274175099429670', 'id': '274175099429670',
@ -78,6 +78,7 @@ class FacebookIE(InfoExtractor):
'uploader': 'Asif Nawab Butt', 'uploader': 'Asif Nawab Butt',
'upload_date': '20140506', 'upload_date': '20140506',
'timestamp': 1399398998, 'timestamp': 1399398998,
'thumbnail': r're:^https?://.*',
}, },
'expected_warnings': [ 'expected_warnings': [
'title' 'title'
@ -94,6 +95,7 @@ class FacebookIE(InfoExtractor):
'upload_date': '20160110', 'upload_date': '20160110',
'timestamp': 1452431627, 'timestamp': 1452431627,
}, },
'skip': 'Requires logging in',
}, { }, {
'url': 'https://www.facebook.com/maxlayn/posts/10153807558977570', 'url': 'https://www.facebook.com/maxlayn/posts/10153807558977570',
'md5': '037b1fa7f3c2d02b7a0d7bc16031ecc6', 'md5': '037b1fa7f3c2d02b7a0d7bc16031ecc6',
@ -121,7 +123,11 @@ class FacebookIE(InfoExtractor):
'info_dict': { 'info_dict': {
'id': '10153664894881749', 'id': '10153664894881749',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Facebook video #10153664894881749', 'title': 'Average time to confirm recent Supreme Court nominees: 67 days Longest it\'s t...',
'thumbnail': r're:^https?://.*',
'timestamp': 1456259628,
'upload_date': '20160223',
'uploader': 'Barack Obama',
}, },
}, { }, {
# have 1080P, but only up to 720p in swf params # have 1080P, but only up to 720p in swf params
@ -130,10 +136,11 @@ class FacebookIE(InfoExtractor):
'info_dict': { 'info_dict': {
'id': '10155529876156509', 'id': '10155529876156509',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Holocaust survivor becomes US citizen', 'title': 'She survived the holocaust — and years later, shes getting her citizenship s...',
'timestamp': 1477818095, 'timestamp': 1477818095,
'upload_date': '20161030', 'upload_date': '20161030',
'uploader': 'CNN', 'uploader': 'CNN',
'thumbnail': r're:^https?://.*',
}, },
}, { }, {
# bigPipe.onPageletArrive ... onPageletArrive pagelet_group_mall # bigPipe.onPageletArrive ... onPageletArrive pagelet_group_mall
@ -158,6 +165,7 @@ class FacebookIE(InfoExtractor):
'timestamp': 1477305000, 'timestamp': 1477305000,
'upload_date': '20161024', 'upload_date': '20161024',
'uploader': 'La Guía Del Varón', 'uploader': 'La Guía Del Varón',
'thumbnail': r're:^https?://.*',
}, },
'params': { 'params': {
'skip_download': True, 'skip_download': True,
@ -376,6 +384,7 @@ class FacebookIE(InfoExtractor):
timestamp = int_or_none(self._search_regex( timestamp = int_or_none(self._search_regex(
r'<abbr[^>]+data-utime=["\'](\d+)', webpage, r'<abbr[^>]+data-utime=["\'](\d+)', webpage,
'timestamp', default=None)) 'timestamp', default=None))
thumbnail = self._og_search_thumbnail(webpage)
info_dict = { info_dict = {
'id': video_id, 'id': video_id,
@ -383,6 +392,7 @@ class FacebookIE(InfoExtractor):
'formats': formats, 'formats': formats,
'uploader': uploader, 'uploader': uploader,
'timestamp': timestamp, 'timestamp': timestamp,
'thumbnail': thumbnail,
} }
return webpage, info_dict return webpage, info_dict

View File

@ -1,7 +1,10 @@
# coding: utf-8 # coding: utf-8
from __future__ import unicode_literals from __future__ import unicode_literals
import re
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import compat_etree_fromstring
from ..utils import ( from ..utils import (
xpath_element, xpath_element,
xpath_text, xpath_text,
@ -43,10 +46,15 @@ class FazIE(InfoExtractor):
webpage = self._download_webpage(url, video_id) webpage = self._download_webpage(url, video_id)
description = self._og_search_description(webpage) description = self._og_search_description(webpage)
config_xml_url = self._search_regex( media = self._html_search_regex(
r'videoXMLURL\s*=\s*"([^"]+)', webpage, 'config xml url') r"data-videojs-media='([^']+)",
config = self._download_xml( webpage, 'media')
config_xml_url, video_id, 'Downloading config xml') if media == 'extern':
perform_url = self._search_regex(
r"<iframe[^>]+?src='((?:http:)?//player\.performgroup\.com/eplayer/eplayer\.html#/?[0-9a-f]{26}\.[0-9a-z]{26})",
webpage, 'perform url')
return self.url_result(perform_url)
config = compat_etree_fromstring(media)
encodings = xpath_element(config, 'ENCODINGS', 'encodings', True) encodings = xpath_element(config, 'ENCODINGS', 'encodings', True)
formats = [] formats = []
@ -55,12 +63,24 @@ class FazIE(InfoExtractor):
if encoding is not None: if encoding is not None:
encoding_url = xpath_text(encoding, 'FILENAME') encoding_url = xpath_text(encoding, 'FILENAME')
if encoding_url: if encoding_url:
formats.append({ tbr = xpath_text(encoding, 'AVERAGEBITRATE', 1000)
if tbr:
tbr = int_or_none(tbr.replace(',', '.'))
f = {
'url': encoding_url, 'url': encoding_url,
'format_id': code.lower(), 'format_id': code.lower(),
'quality': pref, 'quality': pref,
'tbr': int_or_none(xpath_text(encoding, 'AVERAGEBITRATE')), 'tbr': tbr,
}) 'vcodec': xpath_text(encoding, 'CODEC'),
}
mobj = re.search(r'(\d+)x(\d+)_(\d+)\.mp4', encoding_url)
if mobj:
f.update({
'width': int(mobj.group(1)),
'height': int(mobj.group(2)),
'tbr': tbr or int(mobj.group(3)),
})
formats.append(f)
self._sort_formats(formats) self._sort_formats(formats)
return { return {

View File

@ -2,7 +2,10 @@
from __future__ import unicode_literals from __future__ import unicode_literals
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import compat_urlparse from ..utils import (
int_or_none,
float_or_none,
)
class FczenitIE(InfoExtractor): class FczenitIE(InfoExtractor):
@ -14,6 +17,8 @@ class FczenitIE(InfoExtractor):
'id': '41044', 'id': '41044',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Так пишется история: казанский разгром ЦСКА на «Зенит-ТВ»', 'title': 'Так пишется история: казанский разгром ЦСКА на «Зенит-ТВ»',
'timestamp': 1462283735,
'upload_date': '20160503',
}, },
} }
@ -21,28 +26,31 @@ class FczenitIE(InfoExtractor):
video_id = self._match_id(url) video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id) webpage = self._download_webpage(url, video_id)
video_title = self._html_search_regex( msi_id = self._search_regex(
r'<[^>]+class=\"photoalbum__title\">([^<]+)', webpage, 'title') r"(?s)config\s*=\s*{.+?video_id\s*:\s*'([^']+)'", webpage, 'msi id')
video_items = self._parse_json(self._search_regex( msi_data = self._download_json(
r'arrPath\s*=\s*JSON\.parse\(\'(.+)\'\)', webpage, 'video items'), 'http://player.fc-zenit.ru/msi/video', msi_id, query={
video_id) 'video': msi_id,
})['data']
def merge_dicts(*dicts): title = msi_data['name']
ret = {}
for a_dict in dicts:
ret.update(a_dict)
return ret
formats = [{ formats = [{
'url': compat_urlparse.urljoin(url, video_url), 'format_id': q.get('label'),
'tbr': int(tbr), 'url': q['url'],
} for tbr, video_url in merge_dicts(*video_items).items()] 'height': int_or_none(q.get('label')),
} for q in msi_data['qualities'] if q.get('url')]
self._sort_formats(formats) self._sort_formats(formats)
tags = [tag['label'] for tag in msi_data.get('tags', []) if tag.get('label')]
return { return {
'id': video_id, 'id': video_id,
'title': video_title, 'title': title,
'thumbnail': msi_data.get('preview'),
'formats': formats, 'formats': formats,
'duration': float_or_none(msi_data.get('duration')),
'timestamp': int_or_none(msi_data.get('date')),
'tags': tags,
} }

View File

@ -1,50 +0,0 @@
from __future__ import unicode_literals
from .common import InfoExtractor
class FirstpostIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?firstpost\.com/[^/]+/.*-(?P<id>[0-9]+)\.html'
_TEST = {
'url': 'http://www.firstpost.com/india/india-to-launch-indigenous-aircraft-carrier-monday-1025403.html',
'md5': 'ee9114957692f01fb1263ed87039112a',
'info_dict': {
'id': '1025403',
'ext': 'mp4',
'title': 'India to launch indigenous aircraft carrier INS Vikrant today',
'description': 'md5:feef3041cb09724e0bdc02843348f5f4',
}
}
def _real_extract(self, url):
video_id = self._match_id(url)
page = self._download_webpage(url, video_id)
title = self._html_search_meta('twitter:title', page, 'title', fatal=True)
description = self._html_search_meta('twitter:description', page, 'title')
data = self._download_xml(
'http://www.firstpost.com/getvideoxml-%s.xml' % video_id, video_id,
'Downloading video XML')
item = data.find('./playlist/item')
thumbnail = item.find('./image').text
formats = [
{
'url': details.find('./file').text,
'format_id': details.find('./label').text.strip(),
'width': int(details.find('./width').text.strip()),
'height': int(details.find('./height').text.strip()),
} for details in item.findall('./source/file_details') if details.find('./file').text
]
self._sort_formats(formats)
return {
'id': video_id,
'title': title,
'description': description,
'thumbnail': thumbnail,
'formats': formats,
}

View File

@ -1,51 +0,0 @@
from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import (
clean_html,
determine_ext,
js_to_json,
)
class FKTVIE(InfoExtractor):
IE_NAME = 'fernsehkritik.tv'
_VALID_URL = r'https?://(?:www\.)?fernsehkritik\.tv/folge-(?P<id>[0-9]+)(?:/.*)?'
_TEST = {
'url': 'http://fernsehkritik.tv/folge-1',
'md5': '21f0b0c99bce7d5b524eb1b17b1c6d79',
'info_dict': {
'id': '1',
'ext': 'mp4',
'title': 'Folge 1 vom 10. April 2007',
'thumbnail': r're:^https?://.*\.jpg$',
},
}
def _real_extract(self, url):
episode = self._match_id(url)
webpage = self._download_webpage(
'http://fernsehkritik.tv/folge-%s/play' % episode, episode)
title = clean_html(self._html_search_regex(
'<h3>([^<]+)</h3>', webpage, 'title'))
thumbnail = self._search_regex(r'POSTER\s*=\s*"([^"]+)', webpage, 'thumbnail', fatal=False)
sources = self._parse_json(self._search_regex(r'(?s)MEDIA\s*=\s*(\[.+?\]);', webpage, 'media'), episode, js_to_json)
formats = []
for source in sources:
furl = source.get('src')
if furl:
formats.append({
'url': furl,
'format_id': determine_ext(furl),
})
self._sort_formats(formats)
return {
'id': episode,
'title': title,
'formats': formats,
'thumbnail': thumbnail,
}

View File

@ -2,57 +2,132 @@
from __future__ import unicode_literals from __future__ import unicode_literals
from .adobepass import AdobePassIE from .adobepass import AdobePassIE
from .uplynk import UplynkPreplayIE
from ..compat import compat_str
from ..utils import ( from ..utils import (
smuggle_url, HEADRequest,
int_or_none,
parse_age_limit,
parse_duration,
try_get,
unified_timestamp,
update_url_query, update_url_query,
) )
class FOXIE(AdobePassIE): class FOXIE(AdobePassIE):
_VALID_URL = r'https?://(?:www\.)?fox\.com/watch/(?P<id>[0-9]+)' _VALID_URL = r'https?://(?:www\.)?fox\.com/watch/(?P<id>[\da-fA-F]+)'
_TEST = { _TESTS = [{
'url': 'http://www.fox.com/watch/255180355939/7684182528', # clip
'url': 'https://www.fox.com/watch/4b765a60490325103ea69888fb2bd4e8/',
'md5': 'ebd296fcc41dd4b19f8115d8461a3165', 'md5': 'ebd296fcc41dd4b19f8115d8461a3165',
'info_dict': { 'info_dict': {
'id': '255180355939', 'id': '4b765a60490325103ea69888fb2bd4e8',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Official Trailer: Gotham', 'title': 'Aftermath: Bruce Wayne Develops Into The Dark Knight',
'description': 'Tracing the rise of the great DC Comics Super-Villains and vigilantes, Gotham reveals an entirely new chapter that has never been told.', 'description': 'md5:549cd9c70d413adb32ce2a779b53b486',
'duration': 129, 'duration': 102,
'timestamp': 1400020798, 'timestamp': 1504291893,
'upload_date': '20140513', 'upload_date': '20170901',
'uploader': 'NEWA-FNG-FOXCOM', 'creator': 'FOX',
'series': 'Gotham',
}, },
'add_ie': ['ThePlatform'], 'params': {
} 'skip_download': True,
},
}, {
# episode, geo-restricted
'url': 'https://www.fox.com/watch/087036ca7f33c8eb79b08152b4dd75c1/',
'only_matching': True,
}, {
# episode, geo-restricted, tv provided required
'url': 'https://www.fox.com/watch/30056b295fb57f7452aeeb4920bc3024/',
'only_matching': True,
}]
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
settings = self._parse_json(self._search_regex( video = self._download_json(
r'jQuery\.extend\(Drupal\.settings\s*,\s*({.+?})\);', 'https://api.fox.com/fbc-content/v1_4/video/%s' % video_id,
webpage, 'drupal settings'), video_id) video_id, headers={
fox_pdk_player = settings['fox_pdk_player'] 'apikey': 'abdcbed02c124d393b39e818a4312055',
release_url = fox_pdk_player['release_url'] 'Content-Type': 'application/json',
query = { 'Referer': url,
'mbr': 'true', })
'switch': 'http'
}
if fox_pdk_player.get('access') == 'locked':
ap_p = settings['foxAdobePassProvider']
rating = ap_p.get('videoRating')
if rating == 'n/a':
rating = None
resource = self._get_mvpd_resource('fbc-fox', None, ap_p['videoGUID'], rating)
query['auth'] = self._extract_mvpd_auth(url, video_id, 'fbc-fox', resource)
info = self._search_json_ld(webpage, video_id, fatal=False) title = video['name']
info.update({ release_url = video['videoRelease']['url']
'_type': 'url_transparent',
'ie_key': 'ThePlatform', description = video.get('description')
'url': smuggle_url(update_url_query(release_url, query), {'force_smil_url': True}), duration = int_or_none(video.get('durationInSeconds')) or int_or_none(
video.get('duration')) or parse_duration(video.get('duration'))
timestamp = unified_timestamp(video.get('datePublished'))
rating = video.get('contentRating')
age_limit = parse_age_limit(rating)
data = try_get(
video, lambda x: x['trackingData']['properties'], dict) or {}
creator = data.get('brand') or data.get('network') or video.get('network')
series = video.get('seriesName') or data.get(
'seriesName') or data.get('show')
season_number = int_or_none(video.get('seasonNumber'))
episode = video.get('name')
episode_number = int_or_none(video.get('episodeNumber'))
release_year = int_or_none(video.get('releaseYear'))
if data.get('authRequired'):
resource = self._get_mvpd_resource(
'fbc-fox', title, video.get('guid'), rating)
release_url = update_url_query(
release_url, {
'auth': self._extract_mvpd_auth(
url, video_id, 'fbc-fox', resource)
})
subtitles = {}
for doc_rel in video.get('documentReleases', []):
rel_url = doc_rel.get('url')
if not url or doc_rel.get('format') != 'SCC':
continue
subtitles['en'] = [{
'url': rel_url,
'ext': 'scc',
}]
break
info = {
'id': video_id, 'id': video_id,
}) 'title': title,
'description': description,
'duration': duration,
'timestamp': timestamp,
'age_limit': age_limit,
'creator': creator,
'series': series,
'season_number': season_number,
'episode': episode,
'episode_number': episode_number,
'release_year': release_year,
'subtitles': subtitles,
}
urlh = self._request_webpage(HEADRequest(release_url), video_id)
video_url = compat_str(urlh.geturl())
if UplynkPreplayIE.suitable(video_url):
info.update({
'_type': 'url_transparent',
'url': video_url,
'ie_key': UplynkPreplayIE.ie_key(),
})
else:
m3u8_url = self._download_json(release_url, video_id)['playURL']
formats = self._extract_m3u8_formats(
m3u8_url, video_id, 'mp4',
entry_protocol='m3u8_native', m3u8_id='hls')
self._sort_formats(formats)
info['formats'] = formats
return info return info

View File

@ -2,7 +2,6 @@
from __future__ import unicode_literals from __future__ import unicode_literals
from .anvato import AnvatoIE from .anvato import AnvatoIE
from ..utils import js_to_json
class FOX9IE(AnvatoIE): class FOX9IE(AnvatoIE):
@ -34,9 +33,9 @@ class FOX9IE(AnvatoIE):
video_id = self._parse_json( video_id = self._parse_json(
self._search_regex( self._search_regex(
r'AnvatoPlaylist\s*\(\s*(\[.+?\])\s*\)\s*;', r"this\.videosJson\s*=\s*'(\[.+?\])';",
webpage, 'anvato playlist'), webpage, 'anvato playlist'),
video_id, transform_source=js_to_json)[0]['video'] video_id)[0]['video']
return self._get_anvato_videos( return self._get_anvato_videos(
'anvato_epfox_app_web_prod_b3373168e12f423f41504f207000188daf88251b', 'anvato_epfox_app_web_prod_b3373168e12f423f41504f207000188daf88251b',

View File

@ -3,7 +3,6 @@
from __future__ import unicode_literals from __future__ import unicode_literals
import re import re
import json
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import compat_urlparse from ..compat import compat_urlparse
@ -14,10 +13,7 @@ from ..utils import (
parse_duration, parse_duration,
determine_ext, determine_ext,
) )
from .dailymotion import ( from .dailymotion import DailymotionIE
DailymotionIE,
DailymotionCloudIE,
)
class FranceTVBaseInfoExtractor(InfoExtractor): class FranceTVBaseInfoExtractor(InfoExtractor):
@ -291,10 +287,6 @@ class FranceTVInfoIE(FranceTVBaseInfoExtractor):
page_title = mobj.group('title') page_title = mobj.group('title')
webpage = self._download_webpage(url, page_title) webpage = self._download_webpage(url, page_title)
dmcloud_url = DailymotionCloudIE._extract_dmcloud_url(webpage)
if dmcloud_url:
return self.url_result(dmcloud_url, DailymotionCloudIE.ie_key())
dailymotion_urls = DailymotionIE._extract_urls(webpage) dailymotion_urls = DailymotionIE._extract_urls(webpage)
if dailymotion_urls: if dailymotion_urls:
return self.playlist_result([ return self.playlist_result([
@ -308,31 +300,32 @@ class FranceTVInfoIE(FranceTVBaseInfoExtractor):
return self._extract_video(video_id, catalogue) return self._extract_video(video_id, catalogue)
class GenerationQuoiIE(InfoExtractor): class GenerationWhatIE(InfoExtractor):
IE_NAME = 'france2.fr:generation-quoi' IE_NAME = 'france2.fr:generation-what'
_VALID_URL = r'https?://generation-quoi\.france2\.fr/portrait/(?P<id>[^/?#]+)' _VALID_URL = r'https?://generation-what\.francetv\.fr/[^/]+/video/(?P<id>[^/?#]+)'
_TEST = { _TESTS = [{
'url': 'http://generation-quoi.france2.fr/portrait/garde-a-vous', 'url': 'http://generation-what.francetv.fr/portrait/video/present-arms',
'info_dict': { 'info_dict': {
'id': 'k7FJX8VBcvvLmX4wA5Q', 'id': 'wtvKYUG45iw',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Génération Quoi - Garde à Vous', 'title': 'Generation What - Garde à vous - FRA',
'uploader': 'Génération Quoi', 'uploader': 'Generation What',
'uploader_id': 'UCHH9p1eetWCgt4kXBYCb3_w',
'upload_date': '20160411',
}, },
'params': { }, {
# It uses Dailymotion 'url': 'http://generation-what.francetv.fr/europe/video/present-arms',
'skip_download': True, 'only_matching': True,
}, }]
}
def _real_extract(self, url): def _real_extract(self, url):
display_id = self._match_id(url) display_id = self._match_id(url)
info_url = compat_urlparse.urljoin(url, '/medias/video/%s.json' % display_id) webpage = self._download_webpage(url, display_id)
info_json = self._download_webpage(info_url, display_id) youtube_id = self._search_regex(
info = json.loads(info_json) r"window\.videoURL\s*=\s*'([0-9A-Za-z_-]{11})';",
return self.url_result('http://www.dailymotion.com/video/%s' % info['id'], webpage, 'youtube id')
ie='Dailymotion') return self.url_result(youtube_id, 'Youtube', youtube_id)
class CultureboxIE(FranceTVBaseInfoExtractor): class CultureboxIE(FranceTVBaseInfoExtractor):
@ -363,6 +356,7 @@ class CultureboxIE(FranceTVBaseInfoExtractor):
raise ExtractorError('Video %s is not available' % name, expected=True) raise ExtractorError('Video %s is not available' % name, expected=True)
video_id, catalogue = self._search_regex( video_id, catalogue = self._search_regex(
r'"http://videos\.francetv\.fr/video/([^@]+@[^"]+)"', webpage, 'video id').split('@') r'["\'>]https?://videos\.francetv\.fr/video/([^@]+@.+?)["\'<]',
webpage, 'video id').split('@')
return self._extract_video(video_id, catalogue) return self._extract_video(video_id, catalogue)

View File

@ -1,37 +1,34 @@
from __future__ import unicode_literals from __future__ import unicode_literals
import re
import json
from .common import InfoExtractor from .common import InfoExtractor
class FreespeechIE(InfoExtractor): class FreespeechIE(InfoExtractor):
IE_NAME = 'freespeech.org' IE_NAME = 'freespeech.org'
_VALID_URL = r'https?://(?:www\.)?freespeech\.org/video/(?P<title>.+)' _VALID_URL = r'https?://(?:www\.)?freespeech\.org/stories/(?P<id>.+)'
_TEST = { _TEST = {
'add_ie': ['Youtube'], 'add_ie': ['Youtube'],
'url': 'https://www.freespeech.org/video/obama-romney-campaign-colorado-ahead-debate-0', 'url': 'http://www.freespeech.org/stories/fcc-announces-net-neutrality-rollback-whats-stake/',
'info_dict': { 'info_dict': {
'id': 'poKsVCZ64uU', 'id': 'waRk6IPqyWM',
'ext': 'webm', 'ext': 'mp4',
'title': 'Obama, Romney Campaign in Colorado Ahead of Debate', 'title': 'What\'s At Stake - Net Neutrality Special',
'description': 'Obama, Romney Campaign in Colorado Ahead of Debate', 'description': 'Presented by MNN and FSTV',
'uploader': 'freespeechtv', 'upload_date': '20170728',
'uploader_id': 'freespeechtv', 'uploader_id': 'freespeechtv',
'upload_date': '20121002', 'uploader': 'freespeechtv',
}, },
} }
def _real_extract(self, url): def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url) display_id = self._match_id(url)
title = mobj.group('title') webpage = self._download_webpage(url, display_id)
webpage = self._download_webpage(url, title) youtube_url = self._search_regex(
info_json = self._search_regex(r'jQuery.extend\(Drupal.settings, ({.*?})\);', webpage, 'info') r'data-video-url="([^"]+)"',
info = json.loads(info_json) webpage, 'youtube url')
return { return {
'_type': 'url', '_type': 'url',
'url': info['jw_player']['basic_video_node_player']['file'], 'url': youtube_url,
'ie_key': 'Youtube', 'ie_key': 'Youtube',
} }

View File

@ -57,7 +57,7 @@ class FunimationIE(InfoExtractor):
try: try:
data = self._download_json( data = self._download_json(
'https://prod-api-funimationnow.dadcdigital.com/api/auth/login/', 'https://prod-api-funimationnow.dadcdigital.com/api/auth/login/',
None, 'Logging in as %s' % username, data=urlencode_postdata({ None, 'Logging in', data=urlencode_postdata({
'username': username, 'username': username,
'password': password, 'password': password,
})) }))

View File

@ -0,0 +1,43 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
from .nexx import NexxIE
from ..utils import extract_attributes
class FunkIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?funk\.net/(?:mix|channel)/(?:[^/]+/)*(?P<id>[^?/#]+)'
_TESTS = [{
'url': 'https://www.funk.net/mix/59d65d935f8b160001828b5b/0/59d517e741dca10001252574/',
'md5': '4d40974481fa3475f8bccfd20c5361f8',
'info_dict': {
'id': '716599',
'ext': 'mp4',
'title': 'Neue Rechte Welle',
'description': 'md5:a30a53f740ffb6bfd535314c2cc5fb69',
'timestamp': 1501337639,
'upload_date': '20170729',
},
'params': {
'format': 'bestvideo',
'skip_download': True,
},
}, {
'url': 'https://www.funk.net/channel/59d5149841dca100012511e3/0/59d52049999264000182e79d/',
'only_matching': True,
}]
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
domain_id = NexxIE._extract_domain_id(webpage) or '741'
nexx_id = extract_attributes(self._search_regex(
r'(<div[^>]id=["\']mediaplayer-funk[^>]+>)',
webpage, 'media player'))['data-id']
return self.url_result(
'nexx:%s:%s' % (domain_id, nexx_id), ie=NexxIE.ie_key(),
video_id=nexx_id)

View File

@ -3,27 +3,31 @@ from __future__ import unicode_literals
from .adobepass import AdobePassIE from .adobepass import AdobePassIE
from ..utils import ( from ..utils import (
update_url_query,
extract_attributes, extract_attributes,
int_or_none,
parse_age_limit, parse_age_limit,
smuggle_url, smuggle_url,
update_url_query,
) )
class FXNetworksIE(AdobePassIE): class FXNetworksIE(AdobePassIE):
_VALID_URL = r'https?://(?:www\.)?(?:fxnetworks|simpsonsworld)\.com/video/(?P<id>\d+)' _VALID_URL = r'https?://(?:www\.)?(?:fxnetworks|simpsonsworld)\.com/video/(?P<id>\d+)'
_TESTS = [{ _TESTS = [{
'url': 'http://www.fxnetworks.com/video/719841347694', 'url': 'http://www.fxnetworks.com/video/1032565827847',
'md5': '1447d4722e42ebca19e5232ab93abb22', 'md5': '8d99b97b4aa7a202f55b6ed47ea7e703',
'info_dict': { 'info_dict': {
'id': '719841347694', 'id': 'dRzwHC_MMqIv',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Vanpage', 'title': 'First Look: Better Things - Season 2',
'description': 'F*ck settling down. You\'re the Worst returns for an all new season August 31st on FXX.', 'description': 'Because real life is like a fart. Watch this FIRST LOOK to see what inspired the new season of Better Things.',
'age_limit': 14, 'age_limit': 14,
'uploader': 'NEWA-FNG-FX', 'uploader': 'NEWA-FNG-FX',
'upload_date': '20160706', 'upload_date': '20170825',
'timestamp': 1467844741, 'timestamp': 1503686274,
'episode_number': 0,
'season_number': 2,
'series': 'Better Things',
}, },
'add_ie': ['ThePlatform'], 'add_ie': ['ThePlatform'],
}, { }, {
@ -64,6 +68,9 @@ class FXNetworksIE(AdobePassIE):
'id': video_id, 'id': video_id,
'title': title, 'title': title,
'url': smuggle_url(update_url_query(release_url, query), {'force_smil_url': True}), 'url': smuggle_url(update_url_query(release_url, query), {'force_smil_url': True}),
'series': video_data.get('data-show-title'),
'episode_number': int_or_none(video_data.get('data-episode')),
'season_number': int_or_none(video_data.get('data-season')),
'thumbnail': video_data.get('data-large-thumb'), 'thumbnail': video_data.get('data-large-thumb'),
'age_limit': parse_age_limit(rating), 'age_limit': parse_age_limit(rating),
'ie_key': 'ThePlatform', 'ie_key': 'ThePlatform',

View File

@ -1,70 +0,0 @@
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import (
js_to_json,
parse_duration,
remove_start,
)
class GamersydeIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?gamersyde\.com/hqstream_(?P<display_id>[\da-z_]+)-(?P<id>\d+)_[a-z]{2}\.html'
_TEST = {
'url': 'http://www.gamersyde.com/hqstream_bloodborne_birth_of_a_hero-34371_en.html',
'md5': 'f38d400d32f19724570040d5ce3a505f',
'info_dict': {
'id': '34371',
'ext': 'mp4',
'duration': 372,
'title': 'Bloodborne - Birth of a hero',
'thumbnail': r're:^https?://.*\.jpg$',
}
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
display_id = mobj.group('display_id')
webpage = self._download_webpage(url, display_id)
playlist = self._parse_json(
self._search_regex(
r'(?s)playlist: \[({.+?})\]\s*}\);', webpage, 'files'),
display_id, transform_source=js_to_json)
formats = []
for source in playlist['sources']:
video_url = source.get('file')
if not video_url:
continue
format_id = source.get('label')
f = {
'url': video_url,
'format_id': format_id,
}
m = re.search(r'^(?P<height>\d+)[pP](?P<fps>\d+)fps', format_id)
if m:
f.update({
'height': int(m.group('height')),
'fps': int(m.group('fps')),
})
formats.append(f)
self._sort_formats(formats)
title = remove_start(playlist['title'], '%s - ' % video_id)
thumbnail = playlist.get('image')
duration = parse_duration(self._search_regex(
r'Length:</label>([^<]+)<', webpage, 'duration', fatal=False))
return {
'id': video_id,
'display_id': display_id,
'title': title,
'thumbnail': thumbnail,
'duration': duration,
'formats': formats,
}

View File

@ -14,7 +14,7 @@ from ..utils import (
class GameSpotIE(OnceIE): class GameSpotIE(OnceIE):
_VALID_URL = r'https?://(?:www\.)?gamespot\.com/.*-(?P<id>\d+)/?' _VALID_URL = r'https?://(?:www\.)?gamespot\.com/(?:video|article)s/(?:[^/]+/\d+-|embed/)(?P<id>\d+)'
_TESTS = [{ _TESTS = [{
'url': 'http://www.gamespot.com/videos/arma-3-community-guide-sitrep-i/2300-6410818/', 'url': 'http://www.gamespot.com/videos/arma-3-community-guide-sitrep-i/2300-6410818/',
'md5': 'b2a30deaa8654fcccd43713a6b6a4825', 'md5': 'b2a30deaa8654fcccd43713a6b6a4825',
@ -35,6 +35,12 @@ class GameSpotIE(OnceIE):
'params': { 'params': {
'skip_download': True, # m3u8 downloads 'skip_download': True, # m3u8 downloads
}, },
}, {
'url': 'https://www.gamespot.com/videos/embed/6439218/',
'only_matching': True,
}, {
'url': 'https://www.gamespot.com/articles/the-last-of-us-2-receives-new-ps4-trailer/1100-6454469/',
'only_matching': True,
}] }]
def _real_extract(self, url): def _real_extract(self, url):
@ -52,7 +58,7 @@ class GameSpotIE(OnceIE):
manifest_url = f4m_url manifest_url = f4m_url
formats.extend(self._extract_f4m_formats( formats.extend(self._extract_f4m_formats(
f4m_url + '?hdcore=3.7.0', page_id, f4m_id='hds', fatal=False)) f4m_url + '?hdcore=3.7.0', page_id, f4m_id='hds', fatal=False))
m3u8_url = streams.get('m3u8_stream') m3u8_url = dict_get(streams, ('m3u8_stream', 'adaptive_stream'))
if m3u8_url: if m3u8_url:
manifest_url = m3u8_url manifest_url = m3u8_url
m3u8_formats = self._extract_m3u8_formats( m3u8_formats = self._extract_m3u8_formats(
@ -60,7 +66,7 @@ class GameSpotIE(OnceIE):
m3u8_id='hls', fatal=False) m3u8_id='hls', fatal=False)
formats.extend(m3u8_formats) formats.extend(m3u8_formats)
progressive_url = dict_get( progressive_url = dict_get(
streams, ('progressive_hd', 'progressive_high', 'progressive_low')) streams, ('progressive_hd', 'progressive_high', 'progressive_low', 'other_lr'))
if progressive_url and manifest_url: if progressive_url and manifest_url:
qualities_basename = self._search_regex( qualities_basename = self._search_regex(
r'/([^/]+)\.csmil/', r'/([^/]+)\.csmil/',
@ -105,7 +111,8 @@ class GameSpotIE(OnceIE):
onceux_url = self._parse_json(unescapeHTML(onceux_json), page_id).get('metadataUri') onceux_url = self._parse_json(unescapeHTML(onceux_json), page_id).get('metadataUri')
if onceux_url: if onceux_url:
formats.extend(self._extract_once_formats(re.sub( formats.extend(self._extract_once_formats(re.sub(
r'https?://[^/]+', 'http://once.unicornmedia.com', onceux_url).replace('ads/vmap/', ''))) r'https?://[^/]+', 'http://once.unicornmedia.com', onceux_url),
http_formats_preference=-1))
if not formats: if not formats:
for quality in ['sd', 'hd']: for quality in ['sd', 'hd']:

View File

@ -22,6 +22,8 @@ from ..utils import (
HEADRequest, HEADRequest,
is_html, is_html,
js_to_json, js_to_json,
KNOWN_EXTENSIONS,
mimetype2ext,
orderedSet, orderedSet,
sanitized_Request, sanitized_Request,
smuggle_url, smuggle_url,
@ -57,10 +59,7 @@ from .tnaflix import TNAFlixNetworkEmbedIE
from .drtuber import DrTuberIE from .drtuber import DrTuberIE
from .redtube import RedTubeIE from .redtube import RedTubeIE
from .vimeo import VimeoIE from .vimeo import VimeoIE
from .dailymotion import ( from .dailymotion import DailymotionIE
DailymotionIE,
DailymotionCloudIE,
)
from .dailymail import DailyMailIE from .dailymail import DailyMailIE
from .onionstudios import OnionStudiosIE from .onionstudios import OnionStudiosIE
from .viewlift import ViewLiftEmbedIE from .viewlift import ViewLiftEmbedIE
@ -99,6 +98,8 @@ from .mediaset import MediasetIE
from .joj import JojIE from .joj import JojIE
from .megaphone import MegaphoneIE from .megaphone import MegaphoneIE
from .vzaar import VzaarIE from .vzaar import VzaarIE
from .channel9 import Channel9IE
from .vshare import VShareIE
class GenericIE(InfoExtractor): class GenericIE(InfoExtractor):
@ -1088,23 +1089,24 @@ class GenericIE(InfoExtractor):
'ext': 'mp4', 'ext': 'mp4',
'upload_date': '20150212', 'upload_date': '20150212',
'uploader': 'The National Archives UK', 'uploader': 'The National Archives UK',
'description': 'md5:a236581cd2449dd2df4f93412f3f01c6', 'description': 'md5:8078af856dca76edc42910b61273dbbf',
'uploader_id': 'NationalArchives08', 'uploader_id': 'NationalArchives08',
'title': 'Webinar: Using Discovery, The National Archives online catalogue', 'title': 'Webinar: Using Discovery, The National Archives online catalogue',
}, },
}, },
# jwplayer rtmp # jwplayer rtmp
{ {
'url': 'http://www.suffolk.edu/sjc/', 'url': 'http://www.suffolk.edu/sjc/live.php',
'info_dict': { 'info_dict': {
'id': 'sjclive', 'id': 'live',
'ext': 'flv', 'ext': 'flv',
'title': 'Massachusetts Supreme Judicial Court Oral Arguments', 'title': 'Massachusetts Supreme Judicial Court Oral Arguments',
'uploader': 'www.suffolk.edu', 'uploader': 'www.suffolk.edu',
}, },
'params': { 'params': {
'skip_download': True, 'skip_download': True,
} },
'skip': 'Only has video a few mornings per month, see http://www.suffolk.edu/sjc/',
}, },
# Complex jwplayer # Complex jwplayer
{ {
@ -1113,6 +1115,7 @@ class GenericIE(InfoExtractor):
'id': 'videos', 'id': 'videos',
'ext': 'mp4', 'ext': 'mp4',
'title': 'king machine trailer 1', 'title': 'king machine trailer 1',
'description': 'Browse King Machine videos & audio for sweet media. Your eyes will thank you.',
'thumbnail': r're:^https?://.*\.jpg$', 'thumbnail': r're:^https?://.*\.jpg$',
}, },
}, },
@ -1130,13 +1133,55 @@ class GenericIE(InfoExtractor):
'skip_download': True, 'skip_download': True,
} }
}, },
{
# JWPlatform iframe
'url': 'https://www.mediaite.com/tv/dem-senator-claims-gary-cohn-faked-a-bad-connection-during-trump-call-to-get-him-off-the-phone/',
'md5': 'ca00a040364b5b439230e7ebfd02c4e9',
'info_dict': {
'id': 'O0c5JcKT',
'ext': 'mp4',
'upload_date': '20171122',
'timestamp': 1511366290,
'title': 'Dem Senator Claims Gary Cohn Faked a Bad Connection During Trump Call to Get Him Off the Phone',
},
'add_ie': [JWPlatformIE.ie_key()],
},
{
# Video.js embed, multiple formats
'url': 'http://ortcam.com/solidworks-урок-6-настройка-чертежа_33f9b7351.html',
'info_dict': {
'id': 'yygqldloqIk',
'ext': 'mp4',
'title': 'SolidWorks. Урок 6 Настройка чертежа',
'description': 'md5:baf95267792646afdbf030e4d06b2ab3',
'upload_date': '20130314',
'uploader': 'PROстое3D',
'uploader_id': 'PROstoe3D',
},
'params': {
'skip_download': True,
},
},
{
# Video.js embed, single format
'url': 'https://www.vooplayer.com/v3/watch/watch.php?v=NzgwNTg=',
'info_dict': {
'id': 'watch',
'ext': 'mp4',
'title': 'Step 1 - Good Foundation',
'description': 'md5:d1e7ff33a29fc3eb1673d6c270d344f4',
},
'params': {
'skip_download': True,
},
},
# rtl.nl embed # rtl.nl embed
{ {
'url': 'http://www.rtlnieuws.nl/nieuws/buitenland/aanslagen-kopenhagen', 'url': 'http://www.rtlnieuws.nl/nieuws/buitenland/aanslagen-kopenhagen',
'playlist_mincount': 5, 'playlist_mincount': 5,
'info_dict': { 'info_dict': {
'id': 'aanslagen-kopenhagen', 'id': 'aanslagen-kopenhagen',
'title': 'Aanslagen Kopenhagen | RTL Nieuws', 'title': 'Aanslagen Kopenhagen',
} }
}, },
# Zapiks embed # Zapiks embed
@ -1268,6 +1313,7 @@ class GenericIE(InfoExtractor):
'params': { 'params': {
'skip_download': True, 'skip_download': True,
}, },
'skip': 'This video is unavailable.',
}, },
# Pladform embed # Pladform embed
{ {
@ -1281,6 +1327,7 @@ class GenericIE(InfoExtractor):
'duration': 694, 'duration': 694,
'age_limit': 0, 'age_limit': 0,
}, },
'skip': 'HTTP Error 404: Not Found',
}, },
# Playwire embed # Playwire embed
{ {
@ -1301,6 +1348,14 @@ class GenericIE(InfoExtractor):
'id': '518726732', 'id': '518726732',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Facebook Creates "On This Day" | Crunch Report', 'title': 'Facebook Creates "On This Day" | Crunch Report',
'description': 'Amazon updates Fire TV line, Tesla\'s Model X spotted in the wild',
'timestamp': 1427237531,
'uploader': 'Crunch Report',
'upload_date': '20150324',
},
'params': {
# m3u8 download
'skip_download': True,
}, },
}, },
# SVT embed # SVT embed
@ -1352,16 +1407,20 @@ class GenericIE(InfoExtractor):
'upload_date': '20140107', 'upload_date': '20140107',
'timestamp': 1389118457, 'timestamp': 1389118457,
}, },
'skip': 'Invalid Page URL',
}, },
# NBC News embed # NBC News embed
{ {
'url': 'http://www.vulture.com/2016/06/letterman-couldnt-care-less-about-late-night.html', 'url': 'http://www.vulture.com/2016/06/letterman-couldnt-care-less-about-late-night.html',
'md5': '1aa589c675898ae6d37a17913cf68d66', 'md5': '1aa589c675898ae6d37a17913cf68d66',
'info_dict': { 'info_dict': {
'id': '701714499682', 'id': 'x_dtl_oa_LettermanliftPR_160608',
'ext': 'mp4', 'ext': 'mp4',
'title': 'PREVIEW: On Assignment: David Letterman', 'title': 'David Letterman: A Preview',
'description': 'A preview of Tom Brokaw\'s interview with David Letterman as part of the On Assignment series powered by Dateline. Airs Sunday June 12 at 7/6c.', 'description': 'A preview of Tom Brokaw\'s interview with David Letterman as part of the On Assignment series powered by Dateline. Airs Sunday June 12 at 7/6c.',
'upload_date': '20160609',
'timestamp': 1465431544,
'uploader': 'NBCU-NEWS',
}, },
}, },
# UDN embed # UDN embed
@ -1378,6 +1437,7 @@ class GenericIE(InfoExtractor):
# m3u8 download # m3u8 download
'skip_download': True, 'skip_download': True,
}, },
'expected_warnings': ['Failed to parse JSON Expecting value'],
}, },
# Ooyala embed # Ooyala embed
{ {
@ -1385,7 +1445,7 @@ class GenericIE(InfoExtractor):
'info_dict': { 'info_dict': {
'id': '50YnY4czr4ms1vJ7yz3xzq0excz_pUMs', 'id': '50YnY4czr4ms1vJ7yz3xzq0excz_pUMs',
'ext': 'mp4', 'ext': 'mp4',
'description': 'VIDEO: INDEX/MATCH versus VLOOKUP.', 'description': 'Index/Match versus VLOOKUP.',
'title': 'This is what separates the Excel masters from the wannabes', 'title': 'This is what separates the Excel masters from the wannabes',
'duration': 191.933, 'duration': 191.933,
}, },
@ -1409,22 +1469,6 @@ class GenericIE(InfoExtractor):
'timestamp': 1432570283, 'timestamp': 1432570283,
}, },
}, },
# Dailymotion Cloud video
{
'url': 'http://replay.publicsenat.fr/vod/le-debat/florent-kolandjian,dominique-cena,axel-decourtye,laurence-abeille,bruno-parmentier/175910',
'md5': 'dcaf23ad0c67a256f4278bce6e0bae38',
'info_dict': {
'id': 'x2uy8t3',
'ext': 'mp4',
'title': 'Sauvons les abeilles ! - Le débat',
'description': 'md5:d9082128b1c5277987825d684939ca26',
'thumbnail': r're:^https?://.*\.jpe?g$',
'timestamp': 1434970506,
'upload_date': '20150622',
'uploader': 'Public Sénat',
'uploader_id': 'xa9gza',
}
},
# OnionStudios embed # OnionStudios embed
{ {
'url': 'http://www.clickhole.com/video/dont-understand-bitcoin-man-will-mumble-explanatio-2537', 'url': 'http://www.clickhole.com/video/dont-understand-bitcoin-man-will-mumble-explanatio-2537',
@ -1581,22 +1625,6 @@ class GenericIE(InfoExtractor):
}, },
'add_ie': ['BrightcoveLegacy'], 'add_ie': ['BrightcoveLegacy'],
}, },
# Nexx embed
{
'url': 'https://www.funk.net/serien/5940e15073f6120001657956/items/593efbb173f6120001657503',
'info_dict': {
'id': '247746',
'ext': 'mp4',
'title': "Yesterday's Jam (OV)",
'description': 'md5:09bc0984723fed34e2581624a84e05f0',
'timestamp': 1492594816,
'upload_date': '20170419',
},
'params': {
'format': 'bestvideo',
'skip_download': True,
},
},
# Facebook <iframe> embed # Facebook <iframe> embed
{ {
'url': 'https://www.hostblogger.de/blog/archives/6181-Auto-jagt-Betonmischer.html', 'url': 'https://www.hostblogger.de/blog/archives/6181-Auto-jagt-Betonmischer.html',
@ -1879,6 +1907,25 @@ class GenericIE(InfoExtractor):
'title': 'Building A Business Online: Principal Chairs Q & A', 'title': 'Building A Business Online: Principal Chairs Q & A',
}, },
}, },
{
# multiple HTML5 videos on one page
'url': 'https://www.paragon-software.com/home/rk-free/keyscenarios.html',
'info_dict': {
'id': 'keyscenarios',
'title': 'Rescue Kit 14 Free Edition - Getting started',
},
'playlist_count': 4,
},
{
# vshare embed
'url': 'https://youtube-dl-demo.neocities.org/vshare.html',
'md5': '17b39f55b5497ae8b59f5fbce8e35886',
'info_dict': {
'id': '0f64ce6',
'title': 'vl14062007715967',
'ext': 'mp4',
}
}
# { # {
# # TODO: find another test # # TODO: find another test
# # http://schema.org/VideoObject # # http://schema.org/VideoObject
@ -2128,7 +2175,7 @@ class GenericIE(InfoExtractor):
return self.playlist_result(self._parse_xspf(doc, video_id), video_id) return self.playlist_result(self._parse_xspf(doc, video_id), video_id)
elif re.match(r'(?i)^(?:{[^}]+})?MPD$', doc.tag): elif re.match(r'(?i)^(?:{[^}]+})?MPD$', doc.tag):
info_dict['formats'] = self._parse_mpd_formats( info_dict['formats'] = self._parse_mpd_formats(
doc, video_id, doc,
mpd_base_url=compat_str(full_response.geturl()).rpartition('/')[0], mpd_base_url=compat_str(full_response.geturl()).rpartition('/')[0],
mpd_url=url) mpd_url=url)
self._sort_formats(info_dict['formats']) self._sort_formats(info_dict['formats'])
@ -2166,7 +2213,7 @@ class GenericIE(InfoExtractor):
# And then there are the jokers who advertise that they use RTA, # And then there are the jokers who advertise that they use RTA,
# but actually don't. # but actually don't.
AGE_LIMIT_MARKERS = [ AGE_LIMIT_MARKERS = [
r'Proudly Labeled <a href="http://www.rtalabel.org/" title="Restricted to Adults">RTA</a>', r'Proudly Labeled <a href="http://www\.rtalabel\.org/" title="Restricted to Adults">RTA</a>',
] ]
if any(re.search(marker, webpage) for marker in AGE_LIMIT_MARKERS): if any(re.search(marker, webpage) for marker in AGE_LIMIT_MARKERS):
age_limit = 18 age_limit = 18
@ -2228,7 +2275,7 @@ class GenericIE(InfoExtractor):
# Look for embedded rtl.nl player # Look for embedded rtl.nl player
matches = re.findall( matches = re.findall(
r'<iframe[^>]+?src="((?:https?:)?//(?:www\.)?rtl\.nl/system/videoplayer/[^"]+(?:video_)?embed[^"]+)"', r'<iframe[^>]+?src="((?:https?:)?//(?:(?:www|static)\.)?rtl\.nl/(?:system/videoplayer/[^"]+(?:video_)?)?embed[^"]+)"',
webpage) webpage)
if matches: if matches:
return self.playlist_from_matches(matches, video_id, video_title, ie='RtlNl') return self.playlist_from_matches(matches, video_id, video_title, ie='RtlNl')
@ -2243,36 +2290,11 @@ class GenericIE(InfoExtractor):
if vid_me_embed_url is not None: if vid_me_embed_url is not None:
return self.url_result(vid_me_embed_url, 'Vidme') return self.url_result(vid_me_embed_url, 'Vidme')
# Look for embedded YouTube player # Look for YouTube embeds
matches = re.findall(r'''(?x) youtube_urls = YoutubeIE._extract_urls(webpage)
(?: if youtube_urls:
<iframe[^>]+?src=|
data-video-url=|
<embed[^>]+?src=|
embedSWF\(?:\s*|
<object[^>]+data=|
new\s+SWFObject\(
)
(["\'])
(?P<url>(?:https?:)?//(?:www\.)?youtube(?:-nocookie)?\.com/
(?:embed|v|p)/.+?)
\1''', webpage)
if matches:
return self.playlist_from_matches( return self.playlist_from_matches(
matches, video_id, video_title, lambda m: unescapeHTML(m[1])) youtube_urls, video_id, video_title, ie=YoutubeIE.ie_key())
# Look for lazyYT YouTube embed
matches = re.findall(
r'class="lazyYT" data-youtube-id="([^"]+)"', webpage)
if matches:
return self.playlist_from_matches(matches, video_id, video_title, lambda m: unescapeHTML(m))
# Look for Wordpress "YouTube Video Importer" plugin
matches = re.findall(r'''(?x)<div[^>]+
class=(?P<q1>[\'"])[^\'"]*\byvii_single_video_player\b[^\'"]*(?P=q1)[^>]+
data-video_id=(?P<q2>[\'"])([^\'"]+)(?P=q2)''', webpage)
if matches:
return self.playlist_from_matches(matches, video_id, video_title, lambda m: m[-1])
matches = DailymotionIE._extract_urls(webpage) matches = DailymotionIE._extract_urls(webpage)
if matches: if matches:
@ -2652,7 +2674,7 @@ class GenericIE(InfoExtractor):
# Look for UDN embeds # Look for UDN embeds
mobj = re.search( mobj = re.search(
r'<iframe[^>]+src="(?P<url>%s)"' % UDNEmbedIE._PROTOCOL_RELATIVE_VALID_URL, webpage) r'<iframe[^>]+src="(?:https?:)?(?P<url>%s)"' % UDNEmbedIE._PROTOCOL_RELATIVE_VALID_URL, webpage)
if mobj is not None: if mobj is not None:
return self.url_result( return self.url_result(
compat_urlparse.urljoin(url, mobj.group('url')), 'UDNEmbed') compat_urlparse.urljoin(url, mobj.group('url')), 'UDNEmbed')
@ -2662,11 +2684,6 @@ class GenericIE(InfoExtractor):
if senate_isvp_url: if senate_isvp_url:
return self.url_result(senate_isvp_url, 'SenateISVP') return self.url_result(senate_isvp_url, 'SenateISVP')
# Look for Dailymotion Cloud videos
dmcloud_url = DailymotionCloudIE._extract_dmcloud_url(webpage)
if dmcloud_url:
return self.url_result(dmcloud_url, 'DailymotionCloud')
# Look for OnionStudios embeds # Look for OnionStudios embeds
onionstudios_url = OnionStudiosIE._extract_url(webpage) onionstudios_url = OnionStudiosIE._extract_url(webpage)
if onionstudios_url: if onionstudios_url:
@ -2856,6 +2873,16 @@ class GenericIE(InfoExtractor):
return self.playlist_from_matches( return self.playlist_from_matches(
vzaar_urls, video_id, video_title, ie=VzaarIE.ie_key()) vzaar_urls, video_id, video_title, ie=VzaarIE.ie_key())
channel9_urls = Channel9IE._extract_urls(webpage)
if channel9_urls:
return self.playlist_from_matches(
channel9_urls, video_id, video_title, ie=Channel9IE.ie_key())
vshare_urls = VShareIE._extract_urls(webpage)
if vshare_urls:
return self.playlist_from_matches(
vshare_urls, video_id, video_title, ie=VShareIE.ie_key())
def merge_dicts(dict1, dict2): def merge_dicts(dict1, dict2):
merged = {} merged = {}
for k, v in dict1.items(): for k, v in dict1.items():
@ -2874,13 +2901,20 @@ class GenericIE(InfoExtractor):
# Look for HTML5 media # Look for HTML5 media
entries = self._parse_html5_media_entries(url, webpage, video_id, m3u8_id='hls') entries = self._parse_html5_media_entries(url, webpage, video_id, m3u8_id='hls')
if entries: if entries:
for entry in entries: if len(entries) == 1:
entry.update({ entries[0].update({
'id': video_id, 'id': video_id,
'title': video_title, 'title': video_title,
}) })
else:
for num, entry in enumerate(entries, start=1):
entry.update({
'id': '%s-%s' % (video_id, num),
'title': '%s (%d)' % (video_title, num),
})
for entry in entries:
self._sort_formats(entry['formats']) self._sort_formats(entry['formats'])
return self.playlist_result(entries) return self.playlist_result(entries, video_id, video_title)
jwplayer_data = self._find_jwplayer_data( jwplayer_data = self._find_jwplayer_data(
webpage, video_id, transform_source=js_to_json) webpage, video_id, transform_source=js_to_json)
@ -2889,6 +2923,46 @@ class GenericIE(InfoExtractor):
jwplayer_data, video_id, require_title=False, base_url=url) jwplayer_data, video_id, require_title=False, base_url=url)
return merge_dicts(info, info_dict) return merge_dicts(info, info_dict)
# Video.js embed
mobj = re.search(
r'(?s)\bvideojs\s*\(.+?\.src\s*\(\s*((?:\[.+?\]|{.+?}))\s*\)\s*;',
webpage)
if mobj is not None:
sources = self._parse_json(
mobj.group(1), video_id, transform_source=js_to_json,
fatal=False) or []
if not isinstance(sources, list):
sources = [sources]
formats = []
for source in sources:
src = source.get('src')
if not src or not isinstance(src, compat_str):
continue
src = compat_urlparse.urljoin(url, src)
src_type = source.get('type')
if isinstance(src_type, compat_str):
src_type = src_type.lower()
ext = determine_ext(src).lower()
if src_type == 'video/youtube':
return self.url_result(src, YoutubeIE.ie_key())
if src_type == 'application/dash+xml' or ext == 'mpd':
formats.extend(self._extract_mpd_formats(
src, video_id, mpd_id='dash', fatal=False))
elif src_type == 'application/x-mpegurl' or ext == 'm3u8':
formats.extend(self._extract_m3u8_formats(
src, video_id, 'mp4', entry_protocol='m3u8_native',
m3u8_id='hls', fatal=False))
else:
formats.append({
'url': src,
'ext': (mimetype2ext(src_type) or
ext if ext in KNOWN_EXTENSIONS else 'mp4'),
})
if formats:
self._sort_formats(formats)
info_dict['formats'] = formats
return info_dict
# Looking for http://schema.org/VideoObject # Looking for http://schema.org/VideoObject
json_ld = self._search_json_ld( json_ld = self._search_json_ld(
webpage, video_id, default={}, expected_type='VideoObject') webpage, video_id, default={}, expected_type='VideoObject')
@ -2982,7 +3056,7 @@ class GenericIE(InfoExtractor):
# be supported by youtube-dl thus this is checked the very last (see # be supported by youtube-dl thus this is checked the very last (see
# https://dev.twitter.com/cards/types/player#On_twitter.com_via_desktop_browser) # https://dev.twitter.com/cards/types/player#On_twitter.com_via_desktop_browser)
embed_url = self._html_search_meta('twitter:player', webpage, default=None) embed_url = self._html_search_meta('twitter:player', webpage, default=None)
if embed_url: if embed_url and embed_url != url:
return self.url_result(embed_url) return self.url_result(embed_url)
if not found: if not found:

View File

@ -11,7 +11,7 @@ from ..utils import (
class GfycatIE(InfoExtractor): class GfycatIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?gfycat\.com/(?:ifr/)?(?P<id>[^/?#]+)' _VALID_URL = r'https?://(?:www\.)?gfycat\.com/(?:ifr/|gifs/detail/)?(?P<id>[^/?#]+)'
_TESTS = [{ _TESTS = [{
'url': 'http://gfycat.com/DeadlyDecisiveGermanpinscher', 'url': 'http://gfycat.com/DeadlyDecisiveGermanpinscher',
'info_dict': { 'info_dict': {
@ -44,6 +44,9 @@ class GfycatIE(InfoExtractor):
'categories': list, 'categories': list,
'age_limit': 0, 'age_limit': 0,
} }
}, {
'url': 'https://gfycat.com/gifs/detail/UnconsciousLankyIvorygull',
'only_matching': True
}] }]
def _real_extract(self, url): def _real_extract(self, url):

View File

@ -0,0 +1,22 @@
from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import (
ExtractorError,
urlencode_postdata,
)
class GigyaBaseIE(InfoExtractor):
def _gigya_login(self, auth_data):
auth_info = self._download_json(
'https://accounts.eu1.gigya.com/accounts.login', None,
note='Logging in', errnote='Unable to log in',
data=urlencode_postdata(auth_data))
error_message = auth_info.get('errorDetails') or auth_info.get('errorMessage')
if error_message:
raise ExtractorError(
'Unable to login: %s' % error_message, expected=True)
return auth_info

View File

@ -61,7 +61,7 @@ class GooglePlusIE(InfoExtractor):
'width': int(width), 'width': int(width),
'height': int(height), 'height': int(height),
} for width, height, video_url in re.findall( } for width, height, video_url in re.findall(
r'\d+,(\d+),(\d+),"(https?://[^.]+\.googleusercontent.com.*?)"', webpage)] r'\d+,(\d+),(\d+),"(https?://[^.]+\.googleusercontent\.com.*?)"', webpage)]
self._sort_formats(formats) self._sort_formats(formats)
return { return {

View File

@ -2,6 +2,7 @@
from __future__ import unicode_literals from __future__ import unicode_literals
from .common import InfoExtractor from .common import InfoExtractor
from .youtube import YoutubeIE
from ..utils import ( from ..utils import (
determine_ext, determine_ext,
int_or_none, int_or_none,
@ -25,6 +26,22 @@ class HeiseIE(InfoExtractor):
'description': 'md5:c934cbfb326c669c2bcabcbe3d3fcd20', 'description': 'md5:c934cbfb326c669c2bcabcbe3d3fcd20',
'thumbnail': r're:^https?://.*/gallery/$', 'thumbnail': r're:^https?://.*/gallery/$',
} }
}, {
# YouTube embed
'url': 'http://www.heise.de/newsticker/meldung/Netflix-In-20-Jahren-vom-Videoverleih-zum-TV-Revolutionaer-3814130.html',
'md5': 'e403d2b43fea8e405e88e3f8623909f1',
'info_dict': {
'id': '6kmWbXleKW4',
'ext': 'mp4',
'title': 'NEU IM SEPTEMBER | Netflix',
'description': 'md5:2131f3c7525e540d5fd841de938bd452',
'upload_date': '20170830',
'uploader': 'Netflix Deutschland, Österreich und Schweiz',
'uploader_id': 'netflixdach',
},
'params': {
'skip_download': True,
},
}, { }, {
'url': 'http://www.heise.de/ct/artikel/c-t-uplink-3-3-Owncloud-Tastaturen-Peilsender-Smartphone-2403911.html', 'url': 'http://www.heise.de/ct/artikel/c-t-uplink-3-3-Owncloud-Tastaturen-Peilsender-Smartphone-2403911.html',
'only_matching': True, 'only_matching': True,
@ -40,6 +57,16 @@ class HeiseIE(InfoExtractor):
video_id = self._match_id(url) video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id) webpage = self._download_webpage(url, video_id)
title = self._html_search_meta('fulltitle', webpage, default=None)
if not title or title == "c't":
title = self._search_regex(
r'<div[^>]+class="videoplayerjw"[^>]+data-title="([^"]+)"',
webpage, 'title')
yt_urls = YoutubeIE._extract_urls(webpage)
if yt_urls:
return self.playlist_from_matches(yt_urls, video_id, title, ie=YoutubeIE.ie_key())
container_id = self._search_regex( container_id = self._search_regex(
r'<div class="videoplayerjw"[^>]+data-container="([0-9]+)"', r'<div class="videoplayerjw"[^>]+data-container="([0-9]+)"',
webpage, 'container ID') webpage, 'container ID')
@ -47,12 +74,6 @@ class HeiseIE(InfoExtractor):
r'<div class="videoplayerjw"[^>]+data-sequenz="([0-9]+)"', r'<div class="videoplayerjw"[^>]+data-sequenz="([0-9]+)"',
webpage, 'sequenz ID') webpage, 'sequenz ID')
title = self._html_search_meta('fulltitle', webpage, default=None)
if not title or title == "c't":
title = self._search_regex(
r'<div[^>]+class="videoplayerjw"[^>]+data-title="([^"]+)"',
webpage, 'title')
doc = self._download_xml( doc = self._download_xml(
'http://www.heise.de/videout/feed', video_id, query={ 'http://www.heise.de/videout/feed', video_id, query={
'container': container_id, 'container': container_id,

View File

@ -1,22 +1,47 @@
# coding: utf-8 # coding: utf-8
from __future__ import unicode_literals from __future__ import unicode_literals
import re
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import compat_str
from ..utils import ( from ..utils import (
ExtractorError,
determine_ext, determine_ext,
ExtractorError,
int_or_none, int_or_none,
) )
class HotStarIE(InfoExtractor): class HotStarBaseIE(InfoExtractor):
_GEO_COUNTRIES = ['IN']
def _download_json(self, *args, **kwargs):
response = super(HotStarBaseIE, self)._download_json(*args, **kwargs)
if response['resultCode'] != 'OK':
if kwargs.get('fatal'):
raise ExtractorError(
response['errorDescription'], expected=True)
return None
return response['resultObj']
def _download_content_info(self, content_id):
return self._download_json(
'https://account.hotstar.com/AVS/besc', content_id, query={
'action': 'GetAggregatedContentDetails',
'appVersion': '5.0.40',
'channel': 'PCTV',
'contentId': content_id,
})['contentInfo'][0]
class HotStarIE(HotStarBaseIE):
_VALID_URL = r'https?://(?:www\.)?hotstar\.com/(?:.+?[/-])?(?P<id>\d{10})' _VALID_URL = r'https?://(?:www\.)?hotstar\.com/(?:.+?[/-])?(?P<id>\d{10})'
_TESTS = [{ _TESTS = [{
'url': 'http://www.hotstar.com/on-air-with-aib--english-1000076273', 'url': 'http://www.hotstar.com/on-air-with-aib--english-1000076273',
'info_dict': { 'info_dict': {
'id': '1000076273', 'id': '1000076273',
'ext': 'mp4', 'ext': 'mp4',
'title': 'On Air With AIB - English', 'title': 'On Air With AIB',
'description': 'md5:c957d8868e9bc793ccb813691cc4c434', 'description': 'md5:c957d8868e9bc793ccb813691cc4c434',
'timestamp': 1447227000, 'timestamp': 1447227000,
'upload_date': '20151111', 'upload_date': '20151111',
@ -34,23 +59,11 @@ class HotStarIE(InfoExtractor):
'only_matching': True, 'only_matching': True,
}] }]
def _download_json(self, url_or_request, video_id, note='Downloading JSON metadata', fatal=True, query=None):
json_data = super(HotStarIE, self)._download_json(
url_or_request, video_id, note, fatal=fatal, query=query)
if json_data['resultCode'] != 'OK':
if fatal:
raise ExtractorError(json_data['errorDescription'])
return None
return json_data['resultObj']
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) video_id = self._match_id(url)
video_data = self._download_json(
'http://account.hotstar.com/AVS/besc', video_id, query={ video_data = self._download_content_info(video_id)
'action': 'GetAggregatedContentDetails',
'channel': 'PCTV',
'contentId': video_id,
})['contentInfo'][0]
title = video_data['episodeTitle'] title = video_data['episodeTitle']
if video_data.get('encrypted') == 'Y': if video_data.get('encrypted') == 'Y':
@ -99,3 +112,51 @@ class HotStarIE(InfoExtractor):
'episode_number': int_or_none(video_data.get('episodeNumber')), 'episode_number': int_or_none(video_data.get('episodeNumber')),
'series': video_data.get('contentTitle'), 'series': video_data.get('contentTitle'),
} }
class HotStarPlaylistIE(HotStarBaseIE):
IE_NAME = 'hotstar:playlist'
_VALID_URL = r'(?P<url>https?://(?:www\.)?hotstar\.com/tv/[^/]+/(?P<content_id>\d+))/(?P<type>[^/]+)/(?P<id>\d+)'
_TESTS = [{
'url': 'http://www.hotstar.com/tv/pratidaan/14982/episodes/14812/9993',
'info_dict': {
'id': '14812',
},
'playlist_mincount': 75,
}, {
'url': 'http://www.hotstar.com/tv/pratidaan/14982/popular-clips/9998/9998',
'only_matching': True,
}]
_ITEM_TYPES = {
'episodes': 'EPISODE',
'popular-clips': 'CLIPS',
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
base_url = mobj.group('url')
content_id = mobj.group('content_id')
playlist_type = mobj.group('type')
content_info = self._download_content_info(content_id)
playlist_id = compat_str(content_info['categoryId'])
collection = self._download_json(
'https://search.hotstar.com/AVS/besc', playlist_id, query={
'action': 'SearchContents',
'appVersion': '5.0.40',
'channel': 'PCTV',
'moreFilters': 'series:%s;' % playlist_id,
'query': '*',
'searchOrder': 'last_broadcast_date desc,year desc,title asc',
'type': self._ITEM_TYPES.get(playlist_type, 'EPISODE'),
})
entries = [
self.url_result(
'%s/_/%s' % (base_url, video['contentId']),
ie=HotStarIE.ie_key(), video_id=video['contentId'])
for video in collection['response']['docs']
if video.get('contentId')]
return self.playlist_result(entries, playlist_id)

View File

@ -11,45 +11,20 @@ from ..utils import (
class HowStuffWorksIE(InfoExtractor): class HowStuffWorksIE(InfoExtractor):
_VALID_URL = r'https?://[\da-z-]+\.howstuffworks\.com/(?:[^/]+/)*(?:\d+-)?(?P<id>.+?)-video\.htm' _VALID_URL = r'https?://[\da-z-]+\.(?:howstuffworks|stuff(?:(?:youshould|theydontwantyouto)know|toblowyourmind|momnevertoldyou)|(?:brain|car)stuffshow|fwthinking|geniusstuff)\.com/(?:[^/]+/)*(?:\d+-)?(?P<id>.+?)-video\.htm'
_TESTS = [ _TESTS = [
{ {
'url': 'http://adventure.howstuffworks.com/5266-cool-jobs-iditarod-musher-video.htm', 'url': 'http://www.stufftoblowyourmind.com/videos/optical-illusions-video.htm',
'md5': '76646a5acc0c92bf7cd66751ca5db94d',
'info_dict': { 'info_dict': {
'id': '450221', 'id': '855410',
'ext': 'flv',
'title': 'Cool Jobs - Iditarod Musher',
'description': 'Cold sleds, freezing temps and warm dog breath... an Iditarod musher\'s dream. Kasey-Dee Gardner jumps on a sled to find out what the big deal is.',
'display_id': 'cool-jobs-iditarod-musher',
'thumbnail': r're:^https?://.*\.jpg$',
'duration': 161,
},
'skip': 'Video broken',
},
{
'url': 'http://adventure.howstuffworks.com/7199-survival-zone-food-and-water-in-the-savanna-video.htm',
'info_dict': {
'id': '453464',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Survival Zone: Food and Water In the Savanna', 'title': 'Your Trickster Brain: Optical Illusions -- Science on the Web',
'description': 'Learn how to find both food and water while trekking in the African savannah. In this video from the Discovery Channel.', 'description': 'md5:e374ff9561f6833ad076a8cc0a5ab2fb',
'display_id': 'survival-zone-food-and-water-in-the-savanna',
'thumbnail': r're:^https?://.*\.jpg$',
}, },
}, },
{ {
'url': 'http://entertainment.howstuffworks.com/arts/2706-sword-swallowing-1-by-dan-meyer-video.htm', 'url': 'http://shows.howstuffworks.com/more-shows/why-does-balloon-stick-to-hair-video.htm',
'info_dict': {
'id': '440011',
'ext': 'mp4',
'title': 'Sword Swallowing #1 by Dan Meyer',
'description': 'Video footage (1 of 3) used by permission of the owner Dan Meyer through Sword Swallowers Association International <www.swordswallow.org>',
'display_id': 'sword-swallowing-1-by-dan-meyer',
'thumbnail': r're:^https?://.*\.jpg$',
},
},
{
'url': 'http://shows.howstuffworks.com/stuff-to-blow-your-mind/optical-illusions-video.htm',
'only_matching': True, 'only_matching': True,
} }
] ]

View File

@ -104,7 +104,7 @@ class HRTiIE(HRTiBaseIE):
(?: (?:
hrti:(?P<short_id>[0-9]+)| hrti:(?P<short_id>[0-9]+)|
https?:// https?://
hrti\.hrt\.hr/\#/video/show/(?P<id>[0-9]+)/(?P<display_id>[^/]+)? hrti\.hrt\.hr/(?:\#/)?video/show/(?P<id>[0-9]+)/(?P<display_id>[^/]+)?
) )
''' '''
_TESTS = [{ _TESTS = [{
@ -129,6 +129,9 @@ class HRTiIE(HRTiBaseIE):
}, { }, {
'url': 'hrti:2181385', 'url': 'hrti:2181385',
'only_matching': True, 'only_matching': True,
}, {
'url': 'https://hrti.hrt.hr/video/show/3873068/cuvar-dvorca-dramska-serija-14',
'only_matching': True,
}] }]
def _real_extract(self, url): def _real_extract(self, url):
@ -170,7 +173,7 @@ class HRTiIE(HRTiBaseIE):
class HRTiPlaylistIE(HRTiBaseIE): class HRTiPlaylistIE(HRTiBaseIE):
_VALID_URL = r'https?://hrti.hrt.hr/#/video/list/category/(?P<id>[0-9]+)/(?P<display_id>[^/]+)?' _VALID_URL = r'https?://hrti\.hrt\.hr/(?:#/)?video/list/category/(?P<id>[0-9]+)/(?P<display_id>[^/]+)?'
_TESTS = [{ _TESTS = [{
'url': 'https://hrti.hrt.hr/#/video/list/category/212/ekumena', 'url': 'https://hrti.hrt.hr/#/video/list/category/212/ekumena',
'info_dict': { 'info_dict': {
@ -182,6 +185,9 @@ class HRTiPlaylistIE(HRTiBaseIE):
}, { }, {
'url': 'https://hrti.hrt.hr/#/video/list/category/212/', 'url': 'https://hrti.hrt.hr/#/video/list/category/212/',
'only_matching': True, 'only_matching': True,
}, {
'url': 'https://hrti.hrt.hr/video/list/category/212/ekumena',
'only_matching': True,
}] }]
def _real_extract(self, url): def _real_extract(self, url):

View File

@ -203,7 +203,7 @@ class PCMagIE(IGNIE):
_VALID_URL = r'https?://(?:www\.)?pcmag\.com/(?P<type>videos|article2)(/.+)?/(?P<name_or_id>.+)' _VALID_URL = r'https?://(?:www\.)?pcmag\.com/(?P<type>videos|article2)(/.+)?/(?P<name_or_id>.+)'
IE_NAME = 'pcmag' IE_NAME = 'pcmag'
_EMBED_RE = r'iframe.setAttribute\("src",\s*__util.objToUrlString\("http://widgets\.ign\.com/video/embed/content.html?[^"]*url=([^"]+)["&]' _EMBED_RE = r'iframe\.setAttribute\("src",\s*__util.objToUrlString\("http://widgets\.ign\.com/video/embed/content\.html?[^"]*url=([^"]+)["&]'
_TESTS = [{ _TESTS = [{
'url': 'http://www.pcmag.com/videos/2015/01/06/010615-whats-new-now-is-gogo-snooping-on-your-data', 'url': 'http://www.pcmag.com/videos/2015/01/06/010615-whats-new-now-is-gogo-snooping-on-your-data',

View File

@ -8,7 +8,10 @@ from ..compat import (
compat_urllib_parse_unquote, compat_urllib_parse_unquote,
compat_urlparse, compat_urlparse,
) )
from ..utils import determine_ext from ..utils import (
determine_ext,
update_url_query,
)
from .bokecc import BokeCCBaseIE from .bokecc import BokeCCBaseIE
@ -68,21 +71,22 @@ class InfoQIE(BokeCCBaseIE):
'play_path': playpath, 'play_path': playpath,
}] }]
def _extract_cookies(self, webpage): def _extract_cf_auth(self, webpage):
policy = self._search_regex(r'InfoQConstants.scp\s*=\s*\'([^\']+)\'', webpage, 'policy') policy = self._search_regex(r'InfoQConstants\.scp\s*=\s*\'([^\']+)\'', webpage, 'policy')
signature = self._search_regex(r'InfoQConstants.scs\s*=\s*\'([^\']+)\'', webpage, 'signature') signature = self._search_regex(r'InfoQConstants\.scs\s*=\s*\'([^\']+)\'', webpage, 'signature')
key_pair_id = self._search_regex(r'InfoQConstants.sck\s*=\s*\'([^\']+)\'', webpage, 'key-pair-id') key_pair_id = self._search_regex(r'InfoQConstants\.sck\s*=\s*\'([^\']+)\'', webpage, 'key-pair-id')
return 'CloudFront-Policy=%s; CloudFront-Signature=%s; CloudFront-Key-Pair-Id=%s' % ( return {
policy, signature, key_pair_id) 'Policy': policy,
'Signature': signature,
'Key-Pair-Id': key_pair_id,
}
def _extract_http_video(self, webpage): def _extract_http_video(self, webpage):
http_video_url = self._search_regex(r'P\.s\s*=\s*\'([^\']+)\'', webpage, 'video URL') http_video_url = self._search_regex(r'P\.s\s*=\s*\'([^\']+)\'', webpage, 'video URL')
http_video_url = update_url_query(http_video_url, self._extract_cf_auth(webpage))
return [{ return [{
'format_id': 'http_video', 'format_id': 'http_video',
'url': http_video_url, 'url': http_video_url,
'http_headers': {
'Cookie': self._extract_cookies(webpage)
},
}] }]
def _extract_http_audio(self, webpage, video_id): def _extract_http_audio(self, webpage, video_id):
@ -91,22 +95,20 @@ class InfoQIE(BokeCCBaseIE):
if not http_audio_url: if not http_audio_url:
return [] return []
cookies_header = {'Cookie': self._extract_cookies(webpage)}
# base URL is found in the Location header in the response returned by # base URL is found in the Location header in the response returned by
# GET https://www.infoq.com/mp3download.action?filename=... when logged in. # GET https://www.infoq.com/mp3download.action?filename=... when logged in.
http_audio_url = compat_urlparse.urljoin('http://res.infoq.com/downloads/mp3downloads/', http_audio_url) http_audio_url = compat_urlparse.urljoin('http://res.infoq.com/downloads/mp3downloads/', http_audio_url)
http_audio_url = update_url_query(http_audio_url, self._extract_cf_auth(webpage))
# audio file seem to be missing some times even if there is a download link # audio file seem to be missing some times even if there is a download link
# so probe URL to make sure # so probe URL to make sure
if not self._is_valid_url(http_audio_url, video_id, headers=cookies_header): if not self._is_valid_url(http_audio_url, video_id):
return [] return []
return [{ return [{
'format_id': 'http_audio', 'format_id': 'http_audio',
'url': http_audio_url, 'url': http_audio_url,
'vcodec': 'none', 'vcodec': 'none',
'http_headers': cookies_header,
}] }]
def _real_extract(self, url): def _real_extract(self, url):

View File

@ -1,5 +1,6 @@
from __future__ import unicode_literals from __future__ import unicode_literals
import itertools
import re import re
from .common import InfoExtractor from .common import InfoExtractor
@ -7,7 +8,6 @@ from ..compat import compat_str
from ..utils import ( from ..utils import (
get_element_by_attribute, get_element_by_attribute,
int_or_none, int_or_none,
limit_length,
lowercase_escape, lowercase_escape,
try_get, try_get,
) )
@ -130,13 +130,21 @@ class InstagramIE(InfoExtractor):
video_url = media.get('video_url') video_url = media.get('video_url')
height = int_or_none(media.get('dimensions', {}).get('height')) height = int_or_none(media.get('dimensions', {}).get('height'))
width = int_or_none(media.get('dimensions', {}).get('width')) width = int_or_none(media.get('dimensions', {}).get('width'))
description = media.get('caption') description = try_get(
media, lambda x: x['edge_media_to_caption']['edges'][0]['node']['text'],
compat_str) or media.get('caption')
thumbnail = media.get('display_src') thumbnail = media.get('display_src')
timestamp = int_or_none(media.get('date')) timestamp = int_or_none(media.get('taken_at_timestamp') or media.get('date'))
uploader = media.get('owner', {}).get('full_name') uploader = media.get('owner', {}).get('full_name')
uploader_id = media.get('owner', {}).get('username') uploader_id = media.get('owner', {}).get('username')
like_count = int_or_none(media.get('likes', {}).get('count'))
comment_count = int_or_none(media.get('comments', {}).get('count')) def get_count(key, kind):
return int_or_none(try_get(
media, (lambda x: x['edge_media_%s' % key]['count'],
lambda x: x['%ss' % kind]['count'])))
like_count = get_count('preview_like', 'like')
comment_count = get_count('to_comment', 'comment')
comments = [{ comments = [{
'author': comment.get('user', {}).get('username'), 'author': comment.get('user', {}).get('username'),
'author_id': comment.get('user', {}).get('id'), 'author_id': comment.get('user', {}).get('id'),
@ -212,7 +220,7 @@ class InstagramIE(InfoExtractor):
class InstagramUserIE(InfoExtractor): class InstagramUserIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?instagram\.com/(?P<username>[^/]{2,})/?(?:$|[?#])' _VALID_URL = r'https?://(?:www\.)?instagram\.com/(?P<id>[^/]{2,})/?(?:$|[?#])'
IE_DESC = 'Instagram user profile' IE_DESC = 'Instagram user profile'
IE_NAME = 'instagram:user' IE_NAME = 'instagram:user'
_TEST = { _TEST = {
@ -221,82 +229,79 @@ class InstagramUserIE(InfoExtractor):
'id': 'porsche', 'id': 'porsche',
'title': 'porsche', 'title': 'porsche',
}, },
'playlist_mincount': 2, 'playlist_count': 5,
'playlist': [{
'info_dict': {
'id': '614605558512799803_462752227',
'ext': 'mp4',
'title': '#Porsche Intelligent Performance.',
'thumbnail': r're:^https?://.*\.jpg',
'uploader': 'Porsche',
'uploader_id': 'porsche',
'timestamp': 1387486713,
'upload_date': '20131219',
},
}],
'params': { 'params': {
'extract_flat': True, 'extract_flat': True,
'skip_download': True, 'skip_download': True,
'playlistend': 5,
} }
} }
def _real_extract(self, url): def _entries(self, uploader_id):
mobj = re.match(self._VALID_URL, url) query = {
uploader_id = mobj.group('username') '__a': 1,
}
entries = [] def get_count(kind):
page_count = 0 return int_or_none(try_get(
media_url = 'http://instagram.com/%s/media' % uploader_id node, lambda x: x['%ss' % kind]['count']))
while True:
for page_num in itertools.count(1):
page = self._download_json( page = self._download_json(
media_url, uploader_id, 'https://instagram.com/%s/' % uploader_id, uploader_id,
note='Downloading page %d ' % (page_count + 1), note='Downloading page %d' % page_num,
) fatal=False, query=query)
page_count += 1 if not page:
break
for it in page['items']: nodes = try_get(page, lambda x: x['user']['media']['nodes'], list)
if it.get('type') != 'video': if not nodes:
break
max_id = None
for node in nodes:
node_id = node.get('id')
if node_id:
max_id = node_id
if node.get('__typename') != 'GraphVideo' and node.get('is_video') is not True:
continue
video_id = node.get('code')
if not video_id:
continue continue
like_count = int_or_none(it.get('likes', {}).get('count'))
user = it.get('user', {})
formats = [{ info = self.url_result(
'format_id': k, 'https://instagram.com/p/%s/' % video_id,
'height': v.get('height'), ie=InstagramIE.ie_key(), video_id=video_id)
'width': v.get('width'),
'url': v['url'],
} for k, v in it['videos'].items()]
self._sort_formats(formats)
thumbnails_el = it.get('images', {}) description = try_get(
thumbnail = thumbnails_el.get('thumbnail', {}).get('url') node, [lambda x: x['caption'], lambda x: x['text']['id']],
compat_str)
thumbnail = node.get('thumbnail_src') or node.get('display_src')
timestamp = int_or_none(node.get('date'))
# In some cases caption is null, which corresponds to None comment_count = get_count('comment')
# in python. As a result, it.get('caption', {}) gives None like_count = get_count('like')
title = (it.get('caption') or {}).get('text', it['id']) view_count = int_or_none(node.get('video_views'))
entries.append({ info.update({
'id': it['id'], 'description': description,
'title': limit_length(title, 80),
'formats': formats,
'thumbnail': thumbnail, 'thumbnail': thumbnail,
'webpage_url': it.get('link'), 'timestamp': timestamp,
'uploader': user.get('full_name'), 'comment_count': comment_count,
'uploader_id': user.get('username'),
'like_count': like_count, 'like_count': like_count,
'timestamp': int_or_none(it.get('created_time')), 'view_count': view_count,
}) })
if not page['items']: yield info
break
max_id = page['items'][-1]['id'].split('_')[0]
media_url = (
'http://instagram.com/%s/media?max_id=%s' % (
uploader_id, max_id))
return { if not max_id:
'_type': 'playlist', break
'entries': entries,
'id': uploader_id, query['max_id'] = max_id
'title': uploader_id,
} def _real_extract(self, url):
uploader_id = self._match_id(url)
return self.playlist_result(
self._entries(uploader_id), uploader_id, uploader_id)

View File

@ -4,6 +4,7 @@ from __future__ import unicode_literals
import uuid import uuid
import xml.etree.ElementTree as etree import xml.etree.ElementTree as etree
import json import json
import re
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import ( from ..compat import (
@ -142,9 +143,9 @@ class ITVIE(InfoExtractor):
f['url'] = rtmp_url f['url'] = rtmp_url
formats.append(f) formats.append(f)
ios_playlist_url = params.get('data-video-playlist') ios_playlist_url = params.get('data-video-playlist') or params.get('data-video-id')
hmac = params.get('data-video-hmac') hmac = params.get('data-video-hmac')
if ios_playlist_url and hmac: if ios_playlist_url and hmac and re.match(r'https?://', ios_playlist_url):
headers = self.geo_verification_headers() headers = self.geo_verification_headers()
headers.update({ headers.update({
'Accept': 'application/vnd.itv.vod.playlist.v2+json', 'Accept': 'application/vnd.itv.vod.playlist.v2+json',
@ -159,12 +160,12 @@ class ITVIE(InfoExtractor):
'token': '' 'token': ''
}, },
'device': { 'device': {
'manufacturer': 'Apple', 'manufacturer': 'Safari',
'model': 'iPad', 'model': '5',
'os': { 'os': {
'name': 'iPhone OS', 'name': 'Windows NT',
'version': '9.3', 'version': '6.1',
'type': 'ios' 'type': 'desktop'
} }
}, },
'client': { 'client': {
@ -173,10 +174,10 @@ class ITVIE(InfoExtractor):
}, },
'variantAvailability': { 'variantAvailability': {
'featureset': { 'featureset': {
'min': ['hls', 'aes'], 'min': ['hls', 'aes', 'outband-webvtt'],
'max': ['hls', 'aes'] 'max': ['hls', 'aes', 'outband-webvtt']
}, },
'platformTag': 'mobile' 'platformTag': 'dotcom'
} }
}).encode(), headers=headers, fatal=False) }).encode(), headers=headers, fatal=False)
if ios_playlist: if ios_playlist:

View File

@ -30,7 +30,7 @@ class JeuxVideoIE(InfoExtractor):
webpage = self._download_webpage(url, title) webpage = self._download_webpage(url, title)
title = self._html_search_meta('name', webpage) or self._og_search_title(webpage) title = self._html_search_meta('name', webpage) or self._og_search_title(webpage)
config_url = self._html_search_regex( config_url = self._html_search_regex(
r'data-src(?:set-video)?="(/contenu/medias/video.php.*?)"', r'data-src(?:set-video)?="(/contenu/medias/video\.php.*?)"',
webpage, 'config URL') webpage, 'config URL')
config_url = 'http://www.jeuxvideo.com' + config_url config_url = 'http://www.jeuxvideo.com' + config_url

View File

@ -24,7 +24,7 @@ class JWPlatformIE(InfoExtractor):
@staticmethod @staticmethod
def _extract_url(webpage): def _extract_url(webpage):
mobj = re.search( mobj = re.search(
r'<script[^>]+?src=["\'](?P<url>(?:https?:)?//content.jwplatform.com/players/[a-zA-Z0-9]{8})', r'<(?:script|iframe)[^>]+?src=["\'](?P<url>(?:https?:)?//content.jwplatform.com/players/[a-zA-Z0-9]{8})',
webpage) webpage)
if mobj: if mobj:
return mobj.group('url') return mobj.group('url')

View File

@ -0,0 +1,149 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
from ..compat import compat_str
from ..utils import (
int_or_none,
unified_timestamp,
update_url_query,
)
class KakaoIE(InfoExtractor):
_VALID_URL = r'https?://tv\.kakao\.com/channel/(?P<channel>\d+)/cliplink/(?P<id>\d+)'
_API_BASE = 'http://tv.kakao.com/api/v1/ft/cliplinks'
_TESTS = [{
'url': 'http://tv.kakao.com/channel/2671005/cliplink/301965083',
'md5': '702b2fbdeb51ad82f5c904e8c0766340',
'info_dict': {
'id': '301965083',
'ext': 'mp4',
'title': '乃木坂46 バナナマン 「3期生紹介コーナーが始動顔高低差GPも」 『乃木坂工事中』',
'uploader_id': 2671005,
'uploader': '그랑그랑이',
'timestamp': 1488160199,
'upload_date': '20170227',
}
}, {
'url': 'http://tv.kakao.com/channel/2653210/cliplink/300103180',
'md5': 'a8917742069a4dd442516b86e7d66529',
'info_dict': {
'id': '300103180',
'ext': 'mp4',
'description': '러블리즈 - Destiny (나의 지구) (Lovelyz - Destiny)\r\n\r\n[쇼! 음악중심] 20160611, 507회',
'title': '러블리즈 - Destiny (나의 지구) (Lovelyz - Destiny)',
'uploader_id': 2653210,
'uploader': '쇼 음악중심',
'timestamp': 1485684628,
'upload_date': '20170129',
}
}]
def _real_extract(self, url):
video_id = self._match_id(url)
player_header = {
'Referer': update_url_query(
'http://tv.kakao.com/embed/player/cliplink/%s' % video_id, {
'service': 'kakao_tv',
'autoplay': '1',
'profile': 'HIGH',
'wmode': 'transparent',
})
}
QUERY_COMMON = {
'player': 'monet_html5',
'referer': url,
'uuid': '',
'service': 'kakao_tv',
'section': '',
'dteType': 'PC',
}
query = QUERY_COMMON.copy()
query['fields'] = 'clipLink,clip,channel,hasPlusFriend,-service,-tagList'
impress = self._download_json(
'%s/%s/impress' % (self._API_BASE, video_id),
video_id, 'Downloading video info',
query=query, headers=player_header)
clip_link = impress['clipLink']
clip = clip_link['clip']
title = clip.get('title') or clip_link.get('displayTitle')
tid = impress.get('tid', '')
query = QUERY_COMMON.copy()
query.update({
'tid': tid,
'profile': 'HIGH',
})
raw = self._download_json(
'%s/%s/raw' % (self._API_BASE, video_id),
video_id, 'Downloading video formats info',
query=query, headers=player_header)
formats = []
for fmt in raw.get('outputList', []):
try:
profile_name = fmt['profile']
fmt_url_json = self._download_json(
'%s/%s/raw/videolocation' % (self._API_BASE, video_id),
video_id,
'Downloading video URL for profile %s' % profile_name,
query={
'service': 'kakao_tv',
'section': '',
'tid': tid,
'profile': profile_name
}, headers=player_header, fatal=False)
if fmt_url_json is None:
continue
fmt_url = fmt_url_json['url']
formats.append({
'url': fmt_url,
'format_id': profile_name,
'width': int_or_none(fmt.get('width')),
'height': int_or_none(fmt.get('height')),
'format_note': fmt.get('label'),
'filesize': int_or_none(fmt.get('filesize'))
})
except KeyError:
pass
self._sort_formats(formats)
thumbs = []
for thumb in clip.get('clipChapterThumbnailList', []):
thumbs.append({
'url': thumb.get('thumbnailUrl'),
'id': compat_str(thumb.get('timeInSec')),
'preference': -1 if thumb.get('isDefault') else 0
})
top_thumbnail = clip.get('thumbnailUrl')
if top_thumbnail:
thumbs.append({
'url': top_thumbnail,
'preference': 10,
})
return {
'id': video_id,
'title': title,
'description': clip.get('description'),
'uploader': clip_link.get('channel', {}).get('name'),
'uploader_id': clip_link.get('channelId'),
'thumbnails': thumbs,
'timestamp': unified_timestamp(clip_link.get('createTime')),
'duration': int_or_none(clip.get('duration')),
'view_count': int_or_none(clip.get('playCount')),
'like_count': int_or_none(clip.get('likeCount')),
'comment_count': int_or_none(clip.get('commentCount')),
'formats': formats,
}

View File

@ -287,6 +287,9 @@ class KalturaIE(InfoExtractor):
# skip for now. # skip for now.
if f.get('fileExt') == 'chun': if f.get('fileExt') == 'chun':
continue continue
# DRM-protected video, cannot be decrypted
if f.get('fileExt') == 'wvm':
continue
if not f.get('fileExt'): if not f.get('fileExt'):
# QT indicates QuickTime; some videos have broken fileExt # QT indicates QuickTime; some videos have broken fileExt
if f.get('containerFormat') == 'qt': if f.get('containerFormat') == 'qt':

View File

@ -1,5 +1,6 @@
from __future__ import unicode_literals from __future__ import unicode_literals
from .canvas import CanvasIE
from .common import InfoExtractor from .common import InfoExtractor
@ -7,7 +8,7 @@ class KetnetIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?ketnet\.be/(?:[^/]+/)*(?P<id>[^/?#&]+)' _VALID_URL = r'https?://(?:www\.)?ketnet\.be/(?:[^/]+/)*(?P<id>[^/?#&]+)'
_TESTS = [{ _TESTS = [{
'url': 'https://www.ketnet.be/kijken/zomerse-filmpjes', 'url': 'https://www.ketnet.be/kijken/zomerse-filmpjes',
'md5': 'd907f7b1814ef0fa285c0475d9994ed7', 'md5': '6bdeb65998930251bbd1c510750edba9',
'info_dict': { 'info_dict': {
'id': 'zomerse-filmpjes', 'id': 'zomerse-filmpjes',
'ext': 'mp4', 'ext': 'mp4',
@ -15,6 +16,20 @@ class KetnetIE(InfoExtractor):
'description': 'Gluur mee met Ghost Rockers op de filmset', 'description': 'Gluur mee met Ghost Rockers op de filmset',
'thumbnail': r're:^https?://.*\.jpg$', 'thumbnail': r're:^https?://.*\.jpg$',
} }
}, {
# mzid in playerConfig instead of sources
'url': 'https://www.ketnet.be/kijken/nachtwacht/de-greystook',
'md5': '90139b746a0a9bd7bb631283f6e2a64e',
'info_dict': {
'id': 'md-ast-4ac54990-ce66-4d00-a8ca-9eac86f4c475',
'display_id': 'md-ast-4ac54990-ce66-4d00-a8ca-9eac86f4c475',
'ext': 'flv',
'title': 'Nachtwacht: De Greystook',
'description': 'md5:1db3f5dc4c7109c821261e7512975be7',
'thumbnail': r're:^https?://.*\.jpg$',
'duration': 1468.03,
},
'expected_warnings': ['is not a supported codec', 'Unknown MIME type'],
}, { }, {
'url': 'https://www.ketnet.be/kijken/karrewiet/uitzending-8-september-2016', 'url': 'https://www.ketnet.be/kijken/karrewiet/uitzending-8-september-2016',
'only_matching': True, 'only_matching': True,
@ -38,6 +53,12 @@ class KetnetIE(InfoExtractor):
'player config'), 'player config'),
video_id) video_id)
mzid = config.get('mzid')
if mzid:
return self.url_result(
'https://mediazone.vrt.be/api/v1/ketnet/assets/%s' % mzid,
CanvasIE.ie_key(), video_id=mzid)
title = config['title'] title = config['title']
formats = [] formats = []

View File

@ -114,7 +114,7 @@ class LivestreamIE(InfoExtractor):
smil_url = video_data.get('smil_url') smil_url = video_data.get('smil_url')
if smil_url: if smil_url:
formats.extend(self._extract_smil_formats(smil_url, video_id)) formats.extend(self._extract_smil_formats(smil_url, video_id, fatal=False))
m3u8_url = video_data.get('m3u8_url') m3u8_url = video_data.get('m3u8_url')
if m3u8_url: if m3u8_url:
@ -338,7 +338,7 @@ class LivestreamOriginalIE(InfoExtractor):
info = { info = {
'title': self._og_search_title(webpage), 'title': self._og_search_title(webpage),
'description': self._og_search_description(webpage), 'description': self._og_search_description(webpage),
'thumbnail': self._search_regex(r'channelLogo.src\s*=\s*"([^"]+)"', webpage, 'thumbnail', None), 'thumbnail': self._search_regex(r'channelLogo\.src\s*=\s*"([^"]+)"', webpage, 'thumbnail', None),
} }
video_data = self._download_json(stream_url, content_id) video_data = self._download_json(stream_url, content_id)
is_live = video_data.get('isLive') is_live = video_data.get('isLive')

View File

@ -11,7 +11,7 @@ from ..utils import (
class LnkGoIE(InfoExtractor): class LnkGoIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?lnkgo\.alfa\.lt/visi-video/(?P<show>[^/]+)/ziurek-(?P<id>[A-Za-z0-9-]+)' _VALID_URL = r'https?://(?:www\.)?lnkgo\.(?:alfa\.)?lt/visi-video/(?P<show>[^/]+)/ziurek-(?P<id>[A-Za-z0-9-]+)'
_TESTS = [{ _TESTS = [{
'url': 'http://lnkgo.alfa.lt/visi-video/yra-kaip-yra/ziurek-yra-kaip-yra-162', 'url': 'http://lnkgo.alfa.lt/visi-video/yra-kaip-yra/ziurek-yra-kaip-yra-162',
'info_dict': { 'info_dict': {
@ -42,6 +42,9 @@ class LnkGoIE(InfoExtractor):
'params': { 'params': {
'skip_download': True, # HLS download 'skip_download': True, # HLS download
}, },
}, {
'url': 'http://www.lnkgo.lt/visi-video/aktualai-pratesimas/ziurek-putka-trys-klausimai',
'only_matching': True,
}] }]
_AGE_LIMITS = { _AGE_LIMITS = {
'N-7': 7, 'N-7': 7,

View File

@ -94,7 +94,7 @@ class LyndaBaseIE(InfoExtractor):
class LyndaIE(LyndaBaseIE): class LyndaIE(LyndaBaseIE):
IE_NAME = 'lynda' IE_NAME = 'lynda'
IE_DESC = 'lynda.com videos' IE_DESC = 'lynda.com videos'
_VALID_URL = r'https?://(?:www\.)?lynda\.com/(?:[^/]+/[^/]+/(?P<course_id>\d+)|player/embed)/(?P<id>\d+)' _VALID_URL = r'https?://(?:www\.)?(?:lynda\.com|educourse\.ga)/(?:[^/]+/[^/]+/(?P<course_id>\d+)|player/embed)/(?P<id>\d+)'
_TIMECODE_REGEX = r'\[(?P<timecode>\d+:\d+:\d+[\.,]\d+)\]' _TIMECODE_REGEX = r'\[(?P<timecode>\d+:\d+:\d+[\.,]\d+)\]'
@ -110,6 +110,9 @@ class LyndaIE(LyndaBaseIE):
}, { }, {
'url': 'https://www.lynda.com/player/embed/133770?tr=foo=1;bar=g;fizz=rt&fs=0', 'url': 'https://www.lynda.com/player/embed/133770?tr=foo=1;bar=g;fizz=rt&fs=0',
'only_matching': True, 'only_matching': True,
}, {
'url': 'https://educourse.ga/Bootstrap-tutorials/Using-exercise-files/110885/114408-4.html',
'only_matching': True,
}] }]
def _raise_unavailable(self, video_id): def _raise_unavailable(self, video_id):
@ -253,7 +256,7 @@ class LyndaCourseIE(LyndaBaseIE):
# Course link equals to welcome/introduction video link of same course # Course link equals to welcome/introduction video link of same course
# We will recognize it as course link # We will recognize it as course link
_VALID_URL = r'https?://(?:www|m)\.lynda\.com/(?P<coursepath>[^/]+/[^/]+/(?P<courseid>\d+))-\d\.html' _VALID_URL = r'https?://(?:www|m)\.(?:lynda\.com|educourse\.ga)/(?P<coursepath>[^/]+/[^/]+/(?P<courseid>\d+))-\d\.html'
def _real_extract(self, url): def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url) mobj = re.match(self._VALID_URL, url)

View File

@ -5,7 +5,7 @@ from .common import InfoExtractor
class MakerTVIE(InfoExtractor): class MakerTVIE(InfoExtractor):
_VALID_URL = r'https?://(?:(?:www\.)?maker\.tv/(?:[^/]+/)*video|makerplayer.com/embed/maker)/(?P<id>[a-zA-Z0-9]{12})' _VALID_URL = r'https?://(?:(?:www\.)?maker\.tv/(?:[^/]+/)*video|makerplayer\.com/embed/maker)/(?P<id>[a-zA-Z0-9]{12})'
_TEST = { _TEST = {
'url': 'http://www.maker.tv/video/Fh3QgymL9gsc', 'url': 'http://www.maker.tv/video/Fh3QgymL9gsc',
'md5': 'ca237a53a8eb20b6dc5bd60564d4ab3e', 'md5': 'ca237a53a8eb20b6dc5bd60564d4ab3e',

View File

@ -22,7 +22,7 @@ class MangomoloBaseIE(InfoExtractor):
format_url = self._html_search_regex( format_url = self._html_search_regex(
[ [
r'file\s*:\s*"(https?://[^"]+?/playlist.m3u8)', r'file\s*:\s*"(https?://[^"]+?/playlist\.m3u8)',
r'<a[^>]+href="(rtsp://[^"]+)"' r'<a[^>]+href="(rtsp://[^"]+)"'
], webpage, 'format url') ], webpage, 'format url')
formats = self._extract_wowza_formats( formats = self._extract_wowza_formats(

View File

@ -0,0 +1,48 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import int_or_none
class ManyVidsIE(InfoExtractor):
_VALID_URL = r'(?i)https?://(?:www\.)?manyvids\.com/video/(?P<id>\d+)'
_TEST = {
'url': 'https://www.manyvids.com/Video/133957/everthing-about-me/',
'md5': '03f11bb21c52dd12a05be21a5c7dcc97',
'info_dict': {
'id': '133957',
'ext': 'mp4',
'title': 'everthing about me (Preview)',
'view_count': int,
'like_count': int,
},
}
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
video_url = self._search_regex(
r'data-(?:video-filepath|meta-video)\s*=s*(["\'])(?P<url>(?:(?!\1).)+)\1',
webpage, 'video URL', group='url')
title = '%s (Preview)' % self._html_search_regex(
r'<h2[^>]+class="m-a-0"[^>]*>([^<]+)', webpage, 'title')
like_count = int_or_none(self._search_regex(
r'data-likes=["\'](\d+)', webpage, 'like count', default=None))
view_count = int_or_none(self._html_search_regex(
r'(?s)<span[^>]+class="views-wrapper"[^>]*>(.+?)</span', webpage,
'view count', default=None))
return {
'id': video_id,
'title': title,
'view_count': view_count,
'like_count': like_count,
'formats': [{
'url': video_url,
}],
}

View File

@ -0,0 +1,77 @@
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import (
clean_html,
determine_ext,
int_or_none,
js_to_json,
mimetype2ext,
parse_filesize,
)
class MassengeschmackTVIE(InfoExtractor):
IE_NAME = 'massengeschmack.tv'
_VALID_URL = r'https?://(?:www\.)?massengeschmack\.tv/play/(?P<id>[^?&#]+)'
_TEST = {
'url': 'https://massengeschmack.tv/play/fktv202',
'md5': 'a9e054db9c2b5a08f0a0527cc201e8d3',
'info_dict': {
'id': 'fktv202',
'ext': 'mp4',
'title': 'Fernsehkritik-TV - Folge 202',
},
}
def _real_extract(self, url):
episode = self._match_id(url)
webpage = self._download_webpage(url, episode)
title = clean_html(self._html_search_regex(
'<h3>([^<]+)</h3>', webpage, 'title'))
thumbnail = self._search_regex(r'POSTER\s*=\s*"([^"]+)', webpage, 'thumbnail', fatal=False)
sources = self._parse_json(self._search_regex(r'(?s)MEDIA\s*=\s*(\[.+?\]);', webpage, 'media'), episode, js_to_json)
formats = []
for source in sources:
furl = source.get('src')
if not furl:
continue
furl = self._proto_relative_url(furl)
ext = determine_ext(furl) or mimetype2ext(source.get('type'))
if ext == 'm3u8':
formats.extend(self._extract_m3u8_formats(
furl, episode, 'mp4', 'm3u8_native',
m3u8_id='hls', fatal=False))
else:
formats.append({
'url': furl,
'format_id': determine_ext(furl),
})
for (durl, format_id, width, height, filesize) in re.findall(r'''(?x)
<a[^>]+?href="(?P<url>(?:https:)?//[^"]+)".*?
<strong>(?P<format_id>.+?)</strong>.*?
<small>(?:(?P<width>\d+)x(?P<height>\d+))?\s+?\((?P<filesize>[\d,]+\s*[GM]iB)\)</small>
''', webpage):
formats.append({
'url': durl,
'format_id': format_id,
'width': int_or_none(width),
'height': int_or_none(height),
'filesize': parse_filesize(filesize),
'vcodec': 'none' if format_id.startswith('Audio') else None,
})
self._sort_formats(formats, ('width', 'height', 'filesize', 'tbr'))
return {
'id': episode,
'title': title,
'formats': formats,
'thumbnail': thumbnail,
}

Some files were not shown because too many files have changed in this diff Show More