1
0
mirror of https://github.com/l1ving/youtube-dl synced 2025-02-11 05:03:17 +08:00

Merge pull request #15 from rg3/master

update 14th jun
This commit is contained in:
siddht1 2016-06-14 23:09:25 +05:30 committed by GitHub
commit 636f9bfb9a
15 changed files with 297 additions and 40 deletions

View File

@ -6,8 +6,8 @@
--- ---
### Make sure you are using the *latest* version: run `youtube-dl --version` and ensure your version is *2016.06.12*. If it's not read [this FAQ entry](https://github.com/rg3/youtube-dl/blob/master/README.md#how-do-i-update-youtube-dl) and update. Issues with outdated version will be rejected. ### Make sure you are using the *latest* version: run `youtube-dl --version` and ensure your version is *2016.06.14*. If it's not read [this FAQ entry](https://github.com/rg3/youtube-dl/blob/master/README.md#how-do-i-update-youtube-dl) and update. Issues with outdated version will be rejected.
- [ ] I've **verified** and **I assure** that I'm running youtube-dl **2016.06.12** - [ ] I've **verified** and **I assure** that I'm running youtube-dl **2016.06.14**
### Before submitting an *issue* make sure you have: ### Before submitting an *issue* make sure you have:
- [ ] At least skimmed through [README](https://github.com/rg3/youtube-dl/blob/master/README.md) and **most notably** [FAQ](https://github.com/rg3/youtube-dl#faq) and [BUGS](https://github.com/rg3/youtube-dl#bugs) sections - [ ] At least skimmed through [README](https://github.com/rg3/youtube-dl/blob/master/README.md) and **most notably** [FAQ](https://github.com/rg3/youtube-dl#faq) and [BUGS](https://github.com/rg3/youtube-dl#bugs) sections
@ -35,7 +35,7 @@ $ youtube-dl -v <your command line>
[debug] User config: [] [debug] User config: []
[debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj'] [debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj']
[debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251 [debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251
[debug] youtube-dl version 2016.06.12 [debug] youtube-dl version 2016.06.14
[debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2 [debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2
[debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4 [debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4
[debug] Proxy map: {} [debug] Proxy map: {}

View File

@ -173,3 +173,5 @@ Kevin Deldycke
inondle inondle
Tomáš Čech Tomáš Čech
Déstin Reed Déstin Reed
Roman Tsiupa
Artur Krysiak

View File

@ -142,9 +142,9 @@ After you have ensured this site is distributing it's content legally, you can f
``` ```
5. Add an import in [`youtube_dl/extractor/extractors.py`](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/extractors.py). 5. Add an import in [`youtube_dl/extractor/extractors.py`](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/extractors.py).
6. Run `python test/test_download.py TestDownload.test_YourExtractor`. This *should fail* at first, but you can continually re-run it until you're done. If you decide to add more than one test, then rename ``_TEST`` to ``_TESTS`` and make it into a list of dictionaries. The tests will then be named `TestDownload.test_YourExtractor`, `TestDownload.test_YourExtractor_1`, `TestDownload.test_YourExtractor_2`, etc. 6. Run `python test/test_download.py TestDownload.test_YourExtractor`. This *should fail* at first, but you can continually re-run it until you're done. If you decide to add more than one test, then rename ``_TEST`` to ``_TESTS`` and make it into a list of dictionaries. The tests will then be named `TestDownload.test_YourExtractor`, `TestDownload.test_YourExtractor_1`, `TestDownload.test_YourExtractor_2`, etc.
7. Have a look at [`youtube_dl/extractor/common.py`](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/common.py) for possible helper methods and a [detailed description of what your extractor should and may return](https://github.com/rg3/youtube-dl/blob/58525c94d547be1c8167d16c298bdd75506db328/youtube_dl/extractor/common.py#L68-L226). Add tests and code for as many as you want. 7. Have a look at [`youtube_dl/extractor/common.py`](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/common.py) for possible helper methods and a [detailed description of what your extractor should and may return](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/common.py#L74-L252). Add tests and code for as many as you want.
8. Keep in mind that the only mandatory fields in info dict for successful extraction process are `id`, `title` and either `url` or `formats`, i.e. these are the critical data the extraction does not make any sense without. This means that [any field](https://github.com/rg3/youtube-dl/blob/58525c94d547be1c8167d16c298bdd75506db328/youtube_dl/extractor/common.py#L138-L226) apart from aforementioned mandatory ones should be treated **as optional** and extraction should be **tolerate** to situations when sources for these fields can potentially be unavailable (even if they always available at the moment) and **future-proof** in order not to break the extraction of general purpose mandatory fields. For example, if you have some intermediate dict `meta` that is a source of metadata and it has a key `summary` that you want to extract and put into resulting info dict as `description`, you should be ready that this key may be missing from the `meta` dict, i.e. you should extract it as `meta.get('summary')` and not `meta['summary']`. Similarly, you should pass `fatal=False` when extracting data from a webpage with `_search_regex/_html_search_regex`. 8. Keep in mind that the only mandatory fields in info dict for successful extraction process are `id`, `title` and either `url` or `formats`, i.e. these are the critical data the extraction does not make any sense without. This means that [any field](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/common.py#L148-L252) apart from aforementioned mandatory ones should be treated **as optional** and extraction should be **tolerate** to situations when sources for these fields can potentially be unavailable (even if they always available at the moment) and **future-proof** in order not to break the extraction of general purpose mandatory fields. For example, if you have some intermediate dict `meta` that is a source of metadata and it has a key `summary` that you want to extract and put into resulting info dict as `description`, you should be ready that this key may be missing from the `meta` dict, i.e. you should extract it as `meta.get('summary')` and not `meta['summary']`. Similarly, you should pass `fatal=False` when extracting data from a webpage with `_search_regex/_html_search_regex`.
9. Check the code with [flake8](https://pypi.python.org/pypi/flake8). 9. Check the code with [flake8](https://pypi.python.org/pypi/flake8). Also make sure your code works under all [Python](http://www.python.org/) versions claimed supported by youtube-dl, namely 2.6, 2.7, and 3.2+.
10. When the tests pass, [add](http://git-scm.com/docs/git-add) the new files and [commit](http://git-scm.com/docs/git-commit) them and [push](http://git-scm.com/docs/git-push) the result, like this: 10. When the tests pass, [add](http://git-scm.com/docs/git-add) the new files and [commit](http://git-scm.com/docs/git-commit) them and [push](http://git-scm.com/docs/git-push) the result, like this:
$ git add youtube_dl/extractor/extractors.py $ git add youtube_dl/extractor/extractors.py

View File

@ -935,9 +935,9 @@ After you have ensured this site is distributing it's content legally, you can f
``` ```
5. Add an import in [`youtube_dl/extractor/extractors.py`](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/extractors.py). 5. Add an import in [`youtube_dl/extractor/extractors.py`](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/extractors.py).
6. Run `python test/test_download.py TestDownload.test_YourExtractor`. This *should fail* at first, but you can continually re-run it until you're done. If you decide to add more than one test, then rename ``_TEST`` to ``_TESTS`` and make it into a list of dictionaries. The tests will then be named `TestDownload.test_YourExtractor`, `TestDownload.test_YourExtractor_1`, `TestDownload.test_YourExtractor_2`, etc. 6. Run `python test/test_download.py TestDownload.test_YourExtractor`. This *should fail* at first, but you can continually re-run it until you're done. If you decide to add more than one test, then rename ``_TEST`` to ``_TESTS`` and make it into a list of dictionaries. The tests will then be named `TestDownload.test_YourExtractor`, `TestDownload.test_YourExtractor_1`, `TestDownload.test_YourExtractor_2`, etc.
7. Have a look at [`youtube_dl/extractor/common.py`](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/common.py) for possible helper methods and a [detailed description of what your extractor should and may return](https://github.com/rg3/youtube-dl/blob/58525c94d547be1c8167d16c298bdd75506db328/youtube_dl/extractor/common.py#L68-L226). Add tests and code for as many as you want. 7. Have a look at [`youtube_dl/extractor/common.py`](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/common.py) for possible helper methods and a [detailed description of what your extractor should and may return](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/common.py#L74-L252). Add tests and code for as many as you want.
8. Keep in mind that the only mandatory fields in info dict for successful extraction process are `id`, `title` and either `url` or `formats`, i.e. these are the critical data the extraction does not make any sense without. This means that [any field](https://github.com/rg3/youtube-dl/blob/58525c94d547be1c8167d16c298bdd75506db328/youtube_dl/extractor/common.py#L138-L226) apart from aforementioned mandatory ones should be treated **as optional** and extraction should be **tolerate** to situations when sources for these fields can potentially be unavailable (even if they always available at the moment) and **future-proof** in order not to break the extraction of general purpose mandatory fields. For example, if you have some intermediate dict `meta` that is a source of metadata and it has a key `summary` that you want to extract and put into resulting info dict as `description`, you should be ready that this key may be missing from the `meta` dict, i.e. you should extract it as `meta.get('summary')` and not `meta['summary']`. Similarly, you should pass `fatal=False` when extracting data from a webpage with `_search_regex/_html_search_regex`. 8. Keep in mind that the only mandatory fields in info dict for successful extraction process are `id`, `title` and either `url` or `formats`, i.e. these are the critical data the extraction does not make any sense without. This means that [any field](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/common.py#L148-L252) apart from aforementioned mandatory ones should be treated **as optional** and extraction should be **tolerate** to situations when sources for these fields can potentially be unavailable (even if they always available at the moment) and **future-proof** in order not to break the extraction of general purpose mandatory fields. For example, if you have some intermediate dict `meta` that is a source of metadata and it has a key `summary` that you want to extract and put into resulting info dict as `description`, you should be ready that this key may be missing from the `meta` dict, i.e. you should extract it as `meta.get('summary')` and not `meta['summary']`. Similarly, you should pass `fatal=False` when extracting data from a webpage with `_search_regex/_html_search_regex`.
9. Check the code with [flake8](https://pypi.python.org/pypi/flake8). 9. Check the code with [flake8](https://pypi.python.org/pypi/flake8). Also make sure your code works under all [Python](http://www.python.org/) versions claimed supported by youtube-dl, namely 2.6, 2.7, and 3.2+.
10. When the tests pass, [add](http://git-scm.com/docs/git-add) the new files and [commit](http://git-scm.com/docs/git-commit) them and [push](http://git-scm.com/docs/git-push) the result, like this: 10. When the tests pass, [add](http://git-scm.com/docs/git-add) the new files and [commit](http://git-scm.com/docs/git-commit) them and [push](http://git-scm.com/docs/git-push) the result, like this:
$ git add youtube_dl/extractor/extractors.py $ git add youtube_dl/extractor/extractors.py
@ -964,7 +964,7 @@ with youtube_dl.YoutubeDL(ydl_opts) as ydl:
ydl.download(['http://www.youtube.com/watch?v=BaW_jenozKc']) ydl.download(['http://www.youtube.com/watch?v=BaW_jenozKc'])
``` ```
Most likely, you'll want to use various options. For a list of what can be done, have a look at [`youtube_dl/YoutubeDL.py`](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/YoutubeDL.py#L121-L269). For a start, if you want to intercept youtube-dl's output, set a `logger` object. Most likely, you'll want to use various options. For a list of options available, have a look at [`youtube_dl/YoutubeDL.py`](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/YoutubeDL.py#L128-L278). For a start, if you want to intercept youtube-dl's output, set a `logger` object.
Here's a more complete example of a program that outputs only errors (and a short message after the download is finished), and downloads/converts the video to an mp3 file: Here's a more complete example of a program that outputs only errors (and a short message after the download is finished), and downloads/converts the video to an mp3 file:

View File

@ -15,6 +15,7 @@
set -e set -e
skip_tests=true skip_tests=true
gpg_sign_commits=""
buildserver='localhost:8142' buildserver='localhost:8142'
while true while true
@ -24,6 +25,10 @@ case "$1" in
skip_tests=false skip_tests=false
shift shift
;; ;;
--gpg-sign-commits|-S)
gpg_sign_commits="-S"
shift
;;
--buildserver) --buildserver)
buildserver="$2" buildserver="$2"
shift 2 shift 2
@ -69,7 +74,7 @@ sed -i "s/__version__ = '.*'/__version__ = '$version'/" youtube_dl/version.py
/bin/echo -e "\n### Committing documentation, templates and youtube_dl/version.py..." /bin/echo -e "\n### Committing documentation, templates and youtube_dl/version.py..."
make README.md CONTRIBUTING.md .github/ISSUE_TEMPLATE.md supportedsites make README.md CONTRIBUTING.md .github/ISSUE_TEMPLATE.md supportedsites
git add README.md CONTRIBUTING.md .github/ISSUE_TEMPLATE.md docs/supportedsites.md youtube_dl/version.py git add README.md CONTRIBUTING.md .github/ISSUE_TEMPLATE.md docs/supportedsites.md youtube_dl/version.py
git commit -m "release $version" git commit $gpg_sign_commits -m "release $version"
/bin/echo -e "\n### Now tagging, signing and pushing..." /bin/echo -e "\n### Now tagging, signing and pushing..."
git tag -s -m "Release $version" "$version" git tag -s -m "Release $version" "$version"
@ -116,7 +121,7 @@ git clone --branch gh-pages --single-branch . build/gh-pages
"$ROOT/devscripts/gh-pages/update-copyright.py" "$ROOT/devscripts/gh-pages/update-copyright.py"
"$ROOT/devscripts/gh-pages/update-sites.py" "$ROOT/devscripts/gh-pages/update-sites.py"
git add *.html *.html.in update git add *.html *.html.in update
git commit -m "release $version" git commit $gpg_sign_commits -m "release $version"
git push "$ROOT" gh-pages git push "$ROOT" gh-pages
git push "$ORIGIN_URL" gh-pages git push "$ORIGIN_URL" gh-pages
) )

View File

@ -535,6 +535,7 @@
- **revision3:embed** - **revision3:embed**
- **RICE** - **RICE**
- **RingTV** - **RingTV**
- **RockstarGames**
- **RottenTomatoes** - **RottenTomatoes**
- **Roxwel** - **Roxwel**
- **RTBF** - **RTBF**
@ -699,6 +700,7 @@
- **TVPlay**: TV3Play and related services - **TVPlay**: TV3Play and related services
- **Tweakers** - **Tweakers**
- **twitch:chapter** - **twitch:chapter**
- **twitch:clips**
- **twitch:past_broadcasts** - **twitch:past_broadcasts**
- **twitch:profile** - **twitch:profile**
- **twitch:stream** - **twitch:stream**
@ -793,10 +795,11 @@
- **WNL** - **WNL**
- **WorldStarHipHop** - **WorldStarHipHop**
- **wrzuta.pl** - **wrzuta.pl**
- **wrzuta.pl:playlist**
- **WSJ**: Wall Street Journal - **WSJ**: Wall Street Journal
- **XBef** - **XBef**
- **XboxClips** - **XboxClips**
- **XFileShare**: XFileShare based sites: DaClips, FileHoot, GorillaVid, MovPod, PowerWatch, Rapidvideo.ws, TheVideoBee, Vidto, Streamin.To - **XFileShare**: XFileShare based sites: DaClips, FileHoot, GorillaVid, MovPod, PowerWatch, Rapidvideo.ws, TheVideoBee, Vidto, Streamin.To, XVIDSTAGE
- **XHamster** - **XHamster**
- **XHamsterEmbed** - **XHamsterEmbed**
- **xiami:album**: 虾米音乐 - 专辑 - **xiami:album**: 虾米音乐 - 专辑

View File

@ -85,7 +85,7 @@ class ExternalFD(FileDownloader):
cmd, stderr=subprocess.PIPE) cmd, stderr=subprocess.PIPE)
_, stderr = p.communicate() _, stderr = p.communicate()
if p.returncode != 0: if p.returncode != 0:
self.to_stderr(stderr) self.to_stderr(stderr.decode('utf-8', 'replace'))
return p.returncode return p.returncode

View File

@ -649,6 +649,7 @@ from .revision3 import (
from .rice import RICEIE from .rice import RICEIE
from .ringtv import RingTVIE from .ringtv import RingTVIE
from .ro220 import Ro220IE from .ro220 import Ro220IE
from .rockstargames import RockstarGamesIE
from .rottentomatoes import RottenTomatoesIE from .rottentomatoes import RottenTomatoesIE
from .roxwel import RoxwelIE from .roxwel import RoxwelIE
from .rtbf import RTBFIE from .rtbf import RTBFIE
@ -862,6 +863,7 @@ from .twitch import (
TwitchProfileIE, TwitchProfileIE,
TwitchPastBroadcastsIE, TwitchPastBroadcastsIE,
TwitchStreamIE, TwitchStreamIE,
TwitchClipsIE,
) )
from .twitter import ( from .twitter import (
TwitterCardIE, TwitterCardIE,
@ -978,7 +980,10 @@ from .weiqitv import WeiqiTVIE
from .wimp import WimpIE from .wimp import WimpIE
from .wistia import WistiaIE from .wistia import WistiaIE
from .worldstarhiphop import WorldStarHipHopIE from .worldstarhiphop import WorldStarHipHopIE
from .wrzuta import WrzutaIE from .wrzuta import (
WrzutaIE,
WrzutaPlaylistIE,
)
from .wsj import WSJIE from .wsj import WSJIE
from .xbef import XBefIE from .xbef import XBefIE
from .xboxclips import XboxClipsIE from .xboxclips import XboxClipsIE

View File

@ -95,7 +95,6 @@ class LyndaIE(LyndaBaseIE):
IE_NAME = 'lynda' IE_NAME = 'lynda'
IE_DESC = 'lynda.com videos' IE_DESC = 'lynda.com videos'
_VALID_URL = r'https?://www\.lynda\.com/(?:[^/]+/[^/]+/\d+|player/embed)/(?P<id>\d+)' _VALID_URL = r'https?://www\.lynda\.com/(?:[^/]+/[^/]+/\d+|player/embed)/(?P<id>\d+)'
_NETRC_MACHINE = 'lynda'
_TIMECODE_REGEX = r'\[(?P<timecode>\d+:\d+:\d+[\.,]\d+)\]' _TIMECODE_REGEX = r'\[(?P<timecode>\d+:\d+:\d+[\.,]\d+)\]'

View File

@ -1,3 +1,4 @@
# coding: utf-8
from __future__ import unicode_literals from __future__ import unicode_literals
import itertools import itertools
@ -39,7 +40,25 @@ class PornHubIE(InfoExtractor):
'dislike_count': int, 'dislike_count': int,
'comment_count': int, 'comment_count': int,
'age_limit': 18, 'age_limit': 18,
} },
}, {
# non-ASCII title
'url': 'http://www.pornhub.com/view_video.php?viewkey=1331683002',
'info_dict': {
'id': '1331683002',
'ext': 'mp4',
'title': '重庆婷婷女王足交',
'uploader': 'cj397186295',
'duration': 1753,
'view_count': int,
'like_count': int,
'dislike_count': int,
'comment_count': int,
'age_limit': 18,
},
'params': {
'skip_download': True,
},
}, { }, {
'url': 'http://www.pornhub.com/view_video.php?viewkey=ph557bbb6676d2d', 'url': 'http://www.pornhub.com/view_video.php?viewkey=ph557bbb6676d2d',
'only_matching': True, 'only_matching': True,
@ -76,19 +95,25 @@ class PornHubIE(InfoExtractor):
'PornHub said: %s' % error_msg, 'PornHub said: %s' % error_msg,
expected=True, video_id=video_id) expected=True, video_id=video_id)
# video_title from flashvars contains whitespace instead of non-ASCII (see
# http://www.pornhub.com/view_video.php?viewkey=1331683002), not relying
# on that anymore.
title = self._html_search_meta(
'twitter:title', webpage, default=None) or self._search_regex(
(r'<h1[^>]+class=["\']title["\'][^>]*>(?P<title>[^<]+)',
r'<div[^>]+data-video-title=(["\'])(?P<title>.+?)\1',
r'shareTitle\s*=\s*(["\'])(?P<title>.+?)\1'),
webpage, 'title', group='title')
flashvars = self._parse_json( flashvars = self._parse_json(
self._search_regex( self._search_regex(
r'var\s+flashvars_\d+\s*=\s*({.+?});', webpage, 'flashvars', default='{}'), r'var\s+flashvars_\d+\s*=\s*({.+?});', webpage, 'flashvars', default='{}'),
video_id) video_id)
if flashvars: if flashvars:
video_title = flashvars.get('video_title')
thumbnail = flashvars.get('image_url') thumbnail = flashvars.get('image_url')
duration = int_or_none(flashvars.get('video_duration')) duration = int_or_none(flashvars.get('video_duration'))
else: else:
video_title, thumbnail, duration = [None] * 3 title, thumbnail, duration = [None] * 3
if not video_title:
video_title = self._html_search_regex(r'<h1 [^>]+>([^<]+)', webpage, 'title')
video_uploader = self._html_search_regex( video_uploader = self._html_search_regex(
r'(?s)From:&nbsp;.+?<(?:a href="/users/|a href="/channels/|span class="username)[^>]+>(.+?)<', r'(?s)From:&nbsp;.+?<(?:a href="/users/|a href="/channels/|span class="username)[^>]+>(.+?)<',
@ -137,7 +162,7 @@ class PornHubIE(InfoExtractor):
return { return {
'id': video_id, 'id': video_id,
'uploader': video_uploader, 'uploader': video_uploader,
'title': video_title, 'title': title,
'thumbnail': thumbnail, 'thumbnail': thumbnail,
'duration': duration, 'duration': duration,
'view_count': view_count, 'view_count': view_count,

View File

@ -0,0 +1,69 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import (
int_or_none,
parse_iso8601,
)
class RockstarGamesIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?rockstargames\.com/videos(?:/video/|#?/?\?.*\bvideo=)(?P<id>\d+)'
_TESTS = [{
'url': 'https://www.rockstargames.com/videos/video/11544/',
'md5': '03b5caa6e357a4bd50e3143fc03e5733',
'info_dict': {
'id': '11544',
'ext': 'mp4',
'title': 'Further Adventures in Finance and Felony Trailer',
'description': 'md5:6d31f55f30cb101b5476c4a379e324a3',
'thumbnail': 're:^https?://.*\.jpg$',
'timestamp': 1464876000,
'upload_date': '20160602',
}
}, {
'url': 'http://www.rockstargames.com/videos#/?video=48',
'only_matching': True,
}]
def _real_extract(self, url):
video_id = self._match_id(url)
video = self._download_json(
'https://www.rockstargames.com/videoplayer/videos/get-video.json',
video_id, query={
'id': video_id,
'locale': 'en_us',
})['video']
title = video['title']
formats = []
for video in video['files_processed']['video/mp4']:
if not video.get('src'):
continue
resolution = video.get('resolution')
height = int_or_none(self._search_regex(
r'^(\d+)[pP]$', resolution or '', 'height', default=None))
formats.append({
'url': self._proto_relative_url(video['src']),
'format_id': resolution,
'height': height,
})
if not formats:
youtube_id = video.get('youtube_id')
if youtube_id:
return self.url_result(youtube_id, 'Youtube')
self._sort_formats(formats)
return {
'id': video_id,
'title': title,
'description': video.get('description'),
'thumbnail': self._proto_relative_url(video.get('screencap')),
'timestamp': parse_iso8601(video.get('created')),
'formats': formats,
}

View File

@ -16,6 +16,7 @@ from ..compat import (
from ..utils import ( from ..utils import (
ExtractorError, ExtractorError,
int_or_none, int_or_none,
js_to_json,
orderedSet, orderedSet,
parse_duration, parse_duration,
parse_iso8601, parse_iso8601,
@ -454,3 +455,45 @@ class TwitchStreamIE(TwitchBaseIE):
'formats': formats, 'formats': formats,
'is_live': True, 'is_live': True,
} }
class TwitchClipsIE(InfoExtractor):
IE_NAME = 'twitch:clips'
_VALID_URL = r'https?://clips\.twitch\.tv/(?:[^/]+/)*(?P<id>[^/?#&]+)'
_TEST = {
'url': 'https://clips.twitch.tv/ea/AggressiveCobraPoooound',
'md5': '761769e1eafce0ffebfb4089cb3847cd',
'info_dict': {
'id': 'AggressiveCobraPoooound',
'ext': 'mp4',
'title': 'EA Play 2016 Live from the Novo Theatre',
'thumbnail': 're:^https?://.*\.jpg',
'creator': 'EA',
'uploader': 'stereotype_',
'uploader_id': 'stereotype_',
},
}
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
clip = self._parse_json(
self._search_regex(
r'(?s)clipInfo\s*=\s*({.+?});', webpage, 'clip info'),
video_id, transform_source=js_to_json)
video_url = clip['clip_video_url']
title = clip['channel_title']
return {
'id': video_id,
'url': video_url,
'title': title,
'thumbnail': self._og_search_thumbnail(webpage),
'creator': clip.get('broadcaster_display_name') or clip.get('broadcaster_login'),
'uploader': clip.get('curator_login'),
'uploader_id': clip.get('curator_display_name'),
}

View File

@ -5,8 +5,10 @@ import re
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import ( from ..utils import (
ExtractorError,
int_or_none, int_or_none,
qualities, qualities,
remove_start,
) )
@ -26,16 +28,17 @@ class WrzutaIE(InfoExtractor):
'uploader_id': 'laboratoriumdextera', 'uploader_id': 'laboratoriumdextera',
'description': 'md5:7fb5ef3c21c5893375fda51d9b15d9cd', 'description': 'md5:7fb5ef3c21c5893375fda51d9b15d9cd',
}, },
'skip': 'Redirected to wrzuta.pl',
}, { }, {
'url': 'http://jolka85.wrzuta.pl/audio/063jOPX5ue2/liber_natalia_szroeder_-_teraz_ty', 'url': 'http://vexling.wrzuta.pl/audio/01xBFabGXu6/james_horner_-_into_the_na_39_vi_world_bonus',
'md5': 'bc78077859bea7bcfe4295d7d7fc9025', 'md5': 'f80564fb5a2ec6ec59705ae2bf2ba56d',
'info_dict': { 'info_dict': {
'id': '063jOPX5ue2', 'id': '01xBFabGXu6',
'ext': 'ogg', 'ext': 'mp3',
'title': 'Liber & Natalia Szroeder - Teraz Ty', 'title': 'James Horner - Into The Na\'vi World [Bonus]',
'duration': 203, 'description': 'md5:30a70718b2cd9df3120fce4445b0263b',
'uploader_id': 'jolka85', 'duration': 95,
'description': 'md5:2d2b6340f9188c8c4cd891580e481096', 'uploader_id': 'vexling',
}, },
}] }]
@ -45,7 +48,10 @@ class WrzutaIE(InfoExtractor):
typ = mobj.group('typ') typ = mobj.group('typ')
uploader = mobj.group('uploader') uploader = mobj.group('uploader')
webpage = self._download_webpage(url, video_id) webpage, urlh = self._download_webpage_handle(url, video_id)
if urlh.geturl() == 'http://www.wrzuta.pl/':
raise ExtractorError('Video removed', expected=True)
quality = qualities(['SD', 'MQ', 'HQ', 'HD']) quality = qualities(['SD', 'MQ', 'HQ', 'HD'])
@ -80,3 +86,73 @@ class WrzutaIE(InfoExtractor):
'description': self._og_search_description(webpage), 'description': self._og_search_description(webpage),
'age_limit': embedpage.get('minimalAge', 0), 'age_limit': embedpage.get('minimalAge', 0),
} }
class WrzutaPlaylistIE(InfoExtractor):
"""
this class covers extraction of wrzuta playlist entries
the extraction process bases on following steps:
* collect information of playlist size
* download all entries provided on
the playlist webpage (the playlist is split
on two pages: first directly reached from webpage
second: downloaded on demand by ajax call and rendered
using the ajax call response)
* in case size of extracted entries not reached total number of entries
use the ajax call to collect the remaining entries
"""
IE_NAME = 'wrzuta.pl:playlist'
_VALID_URL = r'https?://(?P<uploader>[0-9a-zA-Z]+)\.wrzuta\.pl/playlista/(?P<id>[0-9a-zA-Z]+)'
_TESTS = [{
'url': 'http://miromak71.wrzuta.pl/playlista/7XfO4vE84iR/moja_muza',
'playlist_mincount': 14,
'info_dict': {
'id': '7XfO4vE84iR',
'title': 'Moja muza',
},
}, {
'url': 'http://heroesf70.wrzuta.pl/playlista/6Nj3wQHx756/lipiec_-_lato_2015_muzyka_swiata',
'playlist_mincount': 144,
'info_dict': {
'id': '6Nj3wQHx756',
'title': 'Lipiec - Lato 2015 Muzyka Świata',
},
}, {
'url': 'http://miromak71.wrzuta.pl/playlista/7XfO4vE84iR',
'only_matching': True,
}]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
playlist_id = mobj.group('id')
uploader = mobj.group('uploader')
webpage = self._download_webpage(url, playlist_id)
playlist_size = int_or_none(self._html_search_regex(
(r'<div[^>]+class=["\']playlist-counter["\'][^>]*>\d+/(\d+)',
r'<div[^>]+class=["\']all-counter["\'][^>]*>(.+?)</div>'),
webpage, 'playlist size', default=None))
playlist_title = remove_start(
self._og_search_title(webpage), 'Playlista: ')
entries = []
if playlist_size:
entries = [
self.url_result(entry_url)
for _, entry_url in re.findall(
r'<a[^>]+href=(["\'])(http.+?)\1[^>]+class=["\']playlist-file-page',
webpage)]
if playlist_size > len(entries):
playlist_content = self._download_json(
'http://%s.wrzuta.pl/xhr/get_playlist_offset/%s' % (uploader, playlist_id),
playlist_id,
'Downloading playlist JSON',
'Unable to download playlist JSON')
entries.extend([
self.url_result(entry['filelink'])
for entry in playlist_content.get('files', []) if entry.get('filelink')])
return self.playlist_result(entries, playlist_id, playlist_title)

View File

@ -5,8 +5,10 @@ import re
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import ( from ..utils import (
decode_packed_codes,
ExtractorError, ExtractorError,
int_or_none, int_or_none,
NO_DEFAULT,
sanitized_Request, sanitized_Request,
urlencode_postdata, urlencode_postdata,
) )
@ -23,20 +25,24 @@ class XFileShareIE(InfoExtractor):
('thevideobee.to', 'TheVideoBee'), ('thevideobee.to', 'TheVideoBee'),
('vidto.me', 'Vidto'), ('vidto.me', 'Vidto'),
('streamin.to', 'Streamin.To'), ('streamin.to', 'Streamin.To'),
('xvidstage.com', 'XVIDSTAGE'),
) )
IE_DESC = 'XFileShare based sites: %s' % ', '.join(list(zip(*_SITES))[1]) IE_DESC = 'XFileShare based sites: %s' % ', '.join(list(zip(*_SITES))[1])
_VALID_URL = (r'https?://(?P<host>(?:www\.)?(?:%s))/(?:embed-)?(?P<id>[0-9a-zA-Z]+)' _VALID_URL = (r'https?://(?P<host>(?:www\.)?(?:%s))/(?:embed-)?(?P<id>[0-9a-zA-Z]+)'
% '|'.join(re.escape(site) for site in list(zip(*_SITES))[0])) % '|'.join(re.escape(site) for site in list(zip(*_SITES))[0]))
_FILE_NOT_FOUND_REGEX = r'>(?:404 - )?File Not Found<' _FILE_NOT_FOUND_REGEXES = (
r'>(?:404 - )?File Not Found<',
r'>The file was removed by administrator<',
)
_TESTS = [{ _TESTS = [{
'url': 'http://gorillavid.in/06y9juieqpmi', 'url': 'http://gorillavid.in/06y9juieqpmi',
'md5': '5ae4a3580620380619678ee4875893ba', 'md5': '5ae4a3580620380619678ee4875893ba',
'info_dict': { 'info_dict': {
'id': '06y9juieqpmi', 'id': '06y9juieqpmi',
'ext': 'flv', 'ext': 'mp4',
'title': 'Rebecca Black My Moment Official Music Video Reaction-6GK87Rc8bzQ', 'title': 'Rebecca Black My Moment Official Music Video Reaction-6GK87Rc8bzQ',
'thumbnail': 're:http://.*\.jpg', 'thumbnail': 're:http://.*\.jpg',
}, },
@ -78,6 +84,17 @@ class XFileShareIE(InfoExtractor):
'ext': 'mp4', 'ext': 'mp4',
'title': 'Big Buck Bunny trailer', 'title': 'Big Buck Bunny trailer',
}, },
}, {
'url': 'http://xvidstage.com/e0qcnl03co6z',
'info_dict': {
'id': 'e0qcnl03co6z',
'ext': 'mp4',
'title': 'Chucky Prank 2015.mp4',
},
}, {
# removed by administrator
'url': 'http://xvidstage.com/amfy7atlkx25',
'only_matching': True,
}] }]
def _real_extract(self, url): def _real_extract(self, url):
@ -87,7 +104,7 @@ class XFileShareIE(InfoExtractor):
url = 'http://%s/%s' % (mobj.group('host'), video_id) url = 'http://%s/%s' % (mobj.group('host'), video_id)
webpage = self._download_webpage(url, video_id) webpage = self._download_webpage(url, video_id)
if re.search(self._FILE_NOT_FOUND_REGEX, webpage) is not None: if any(re.search(p, webpage) for p in self._FILE_NOT_FOUND_REGEXES):
raise ExtractorError('Video %s does not exist' % video_id, expected=True) raise ExtractorError('Video %s does not exist' % video_id, expected=True)
fields = self._hidden_inputs(webpage) fields = self._hidden_inputs(webpage)
@ -113,10 +130,23 @@ class XFileShareIE(InfoExtractor):
r'>Watch (.+) ', r'>Watch (.+) ',
r'<h2 class="video-page-head">([^<]+)</h2>'], r'<h2 class="video-page-head">([^<]+)</h2>'],
webpage, 'title', default=None) or self._og_search_title(webpage)).strip() webpage, 'title', default=None) or self._og_search_title(webpage)).strip()
video_url = self._search_regex(
[r'file\s*:\s*["\'](http[^"\']+)["\'],', def extract_video_url(default=NO_DEFAULT):
r'file_link\s*=\s*\'(https?:\/\/[0-9a-zA-z.\/\-_]+)'], return self._search_regex(
webpage, 'file url') (r'file\s*:\s*(["\'])(?P<url>http.+?)\1,',
r'file_link\s*=\s*(["\'])(?P<url>http.+?)\1',
r'addVariable\((\\?["\'])file\1\s*,\s*(\\?["\'])(?P<url>http.+?)\2\)',
r'<embed[^>]+src=(["\'])(?P<url>http.+?)\1'),
webpage, 'file url', default=default, group='url')
video_url = extract_video_url(default=None)
if not video_url:
webpage = decode_packed_codes(self._search_regex(
r"(}\('(.+)',(\d+),(\d+),'[^']*\b(?:file|embed)\b[^']*'\.split\('\|'\))",
webpage, 'packed code'))
video_url = extract_video_url()
thumbnail = self._search_regex( thumbnail = self._search_regex(
r'image\s*:\s*["\'](http[^"\']+)["\'],', webpage, 'thumbnail', default=None) r'image\s*:\s*["\'](http[^"\']+)["\'],', webpage, 'thumbnail', default=None)

View File

@ -1,3 +1,3 @@
from __future__ import unicode_literals from __future__ import unicode_literals
__version__ = '2016.06.12' __version__ = '2016.06.14'