'provider' renamed as 'extractor' which should be more natural

2025-02-05 05:05:36 +08:00 · 2012-09-30 23:37:24 +03:00 · 2012-09-30 23:37:24 +03:00 · c0558a0046
commit c0558a0046
parent e0c5e9a532 13e69f546c
6 changed files with 64 additions and 21 deletions
--- a/README.md
+++ b/README.md
@ -43,9 +43,9 @@ which means you can modify it, redistribute it or use it however you like.
                             %(autonumber)s to get an automatically incremented
                             number, %(ext)s for the filename extension,
                             %(upload_date)s for the upload date (YYYYMMDD),
-                             %(provider) for the provider (youtube, metacafe, etc),
-                             %(id)s for the video id and %% for a literal percent.
-                             Use - to output to stdout.
+                             %(extractor)s for the extractor used (youtube, metacafe,
+                             etc), %(id)s for the video id and %% for a literal
+                             percent. Use - to output to stdout.
    -a, --batch-file FILE    file containing URLs to download ('-' for stdin)
    -w, --no-overwrites      do not overwrite files
    -c, --continue           resume partially downloaded files
@ -148,6 +148,10 @@ Please note that Python 2.5 is not supported anymore.

 Since June 2012 (#342) youtube-dl is packed as an executable zipfile, simply unzip it (might need renaming to `youtube-dl.zip` first on some systems) or clone the git repo to see the code. If you modify the code, you can run it by executing the `__main__.py` file. To recompile the executable, run `make compile`.

+### The exe throws a *Runtime error from Visual C++*
+
+To run the exe you need to install first the [Microsoft Visual C++ 2008 Redistributable Package](http://www.microsoft.com/en-us/download/details.aspx?id=29).
+
 # COPYRIGHT

 youtube-dl is released into the public domain by the copyright holders.
--- a/BIN
+++ b/BIN
--- a/youtube-dl.1
+++ b/youtube-dl.1
@ -54,9 +54,10 @@ redistribute it or use it however you like.
 \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ title,\ %(uploader)s\ for\ the\ uploader\ name,
 \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ %(autonumber)s\ to\ get\ an\ automatically\ incremented
 \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ number,\ %(ext)s\ for\ the\ filename\ extension,
-\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ %(upload_date)s\ for\ the\ upload\ date\ (YYYYMMDD),\ and
-\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ %%\ for\ a\ literal\ percent.\ Use\ -\ to\ output\ to
-\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ stdout.
+\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ %(upload_date)s\ for\ the\ upload\ date\ (YYYYMMDD),
+\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ %(extractor)s\ for\ the\ extractor\ used\ (youtube,\ metacafe,
+\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ etc),\ %(id)s\ for\ the\ video\ id\ and\ %%\ for\ a\ literal
+\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ percent.\ Use\ -\ to\ output\ to\ stdout.
 -a,\ --batch-file\ FILE\ \ \ \ file\ containing\ URLs\ to\ download\ (\[aq]-\[aq]\ for\ stdin)
 -w,\ --no-overwrites\ \ \ \ \ \ do\ not\ overwrite\ files
 -c,\ --continue\ \ \ \ \ \ \ \ \ \ \ resume\ partially\ downloaded\ files
@ -172,7 +173,7 @@ You can update youtube-dl with \f[C]sudo\ youtube-dl\ --update\f[].
 youtube requires an additional signature since September 2012 which is
 not supported by old versions of youtube-dl.
 You can update youtube-dl with \f[C]sudo\ youtube-dl\ --update\f[].
-.SS SyntaxError:Non-ASCII character
+.SS SyntaxError: Non-ASCII character
 .PP
 The error
 .IP
@ -193,11 +194,24 @@ out like this:
 \f[C]
 git\ clone\ git://github.com/rg3/youtube-dl.git
 cd\ youtube-dl
-python\ -m\ youtube-dl\ --help
+python\ -m\ youtube_dl\ --help
 \f[]
 .fi
 .PP
 Please note that Python 2.5 is not supported anymore.
+.SS What is this binary file? Where has the code gone?
+.PP
+Since June 2012 (#342) youtube-dl is packed as an executable zipfile,
+simply unzip it (might need renaming to \f[C]youtube-dl.zip\f[] first on
+some systems) or clone the git repo to see the code.
+If you modify the code, you can run it by executing the
+\f[C]__main__.py\f[] file.
+To recompile the executable, run \f[C]make\ compile\f[].
+.SS The exe throws a \f[I]Runtime error from Visual C++\f[]
+.PP
+To run the exe you need to install first the Microsoft Visual C++ 2008
+Redistributable
+Package (http://www.microsoft.com/en-us/download/details.aspx?id=29).
 .SH COPYRIGHT
 .PP
 youtube-dl is released into the public domain by the copyright holders.
--- a/youtube-dl.exe
+++ b/youtube-dl.exe
--- a/youtube_dl/InfoExtractors.py
+++ b/youtube_dl/InfoExtractors.py
@ -97,7 +97,25 @@ class InfoExtractor(object):
 class YoutubeIE(InfoExtractor):
 	"""Information extractor for youtube.com."""

-	_VALID_URL = r'^((?:https?://)?(?:youtu\.be/|(?:\w+\.)?youtube(?:-nocookie)?\.com/|tube\.majestyc\.net/)(?!view_play_list|my_playlists|artist|playlist)(?:(?:(?:v|embed|e)/)|(?:(?:watch(?:_popup)?(?:\.php)?)?(?:\?|#!?)(?:.+&)?v=))?)?([0-9A-Za-z_-]+)(?(1).+)?$'
+	_VALID_URL = r"""^
+	                 (
+	                     (?:https?://)?                                       # http(s):// (optional)
+	                     (?:youtu\.be/|(?:\w+\.)?youtube(?:-nocookie)?\.com/|
+	                     	tube\.majestyc\.net/)                             # the various hostnames, with wildcard subdomains
+	                     (?!view_play_list|my_playlists|artist|playlist)      # ignore playlist URLs
+	                     (?:                                                  # the various things that can precede the ID:
+	                         (?:(?:v|embed|e)/)                               # v/ or embed/ or e/
+	                         |(?:                                             # or the v= param in all its forms
+	                             (?:watch(?:_popup)?(?:\.php)?)?              # preceding watch(_popup|.php) or nothing (like /?v=xxxx)
+	                             (?:\?|\#!?)                                  # the params delimiter ? or # or #!
+	                             (?:.+&)?                                     # any other preceding param (like /?s=tuff&v=xxxx)
+	                             v=
+	                         )
+	                     )?                                                   # optional -> youtube.com/xxxx is OK
+	                 )?                                                       # all until now is optional -> you can pass the naked ID
+	                 ([0-9A-Za-z_-]+)                                         # here is it! the YouTube video ID
+	                 (?(1).+)?                                                # if we found the ID, everything can follow
+	                 $"""
 	_LANG_URL = r'http://www.youtube.com/?hl=en&persist_hl=1&gl=US&persist_gl=1&opt_out_ackd=1'
 	_LOGIN_URL = 'https://www.youtube.com/signup?next=/&gl=US&hl=en'
 	_AGE_URL = 'http://www.youtube.com/verify_age?next_url=/&gl=US&hl=en'
@ -136,6 +154,10 @@ class YoutubeIE(InfoExtractor):
 	}	
 	IE_NAME = u'youtube'

+	def suitable(self, url):
+		"""Receives a URL and returns True if suitable for this IE."""
+		return re.match(self._VALID_URL, url, re.VERBOSE) is not None
+
 	def report_lang(self):
 		"""Report attempt to set language."""
 		self._downloader.to_screen(u'[youtube] Setting language')
@ -270,7 +292,7 @@ class YoutubeIE(InfoExtractor):
 			url = 'http://www.youtube.com/' + urllib.unquote(mobj.group(1)).lstrip('/')

 		# Extract video id from URL
-		mobj = re.match(self._VALID_URL, url)
+		mobj = re.match(self._VALID_URL, url, re.VERBOSE)
 		if mobj is None:
 			self._downloader.trouble(u'ERROR: invalid URL: %s' % url)
 			return
@ -594,7 +616,7 @@ class MetacafeIE(InfoExtractor):
 class DailymotionIE(InfoExtractor):
 	"""Information Extractor for Dailymotion"""

-	_VALID_URL = r'(?i)(?:https?://)?(?:www\.)?dailymotion\.[a-z]{2,3}/video/([^_/]+)_([^/]+)'
+	_VALID_URL = r'(?i)(?:https?://)?(?:www\.)?dailymotion\.[a-z]{2,3}/video/([^/]+)'
 	IE_NAME = u'dailymotion'

 	def __init__(self, downloader=None):
@ -615,9 +637,9 @@ class DailymotionIE(InfoExtractor):
 			self._downloader.trouble(u'ERROR: invalid URL: %s' % url)
 			return

-		video_id = mobj.group(1)
+		video_id = mobj.group(1).split('_')[0].split('?')[0]

-		video_extension = 'flv'
+		video_extension = 'mp4'

 		# Retrieve video webpage to extract further information
 		request = urllib2.Request(url)
@ -631,20 +653,23 @@ class DailymotionIE(InfoExtractor):

 		# Extract URL, uploader and title from webpage
 		self.report_extraction(video_id)
-		mobj = re.search(r'(?i)addVariable\(\"sequence\"\s*,\s*\"([^\"]+?)\"\)', webpage)
+		mobj = re.search(r'\s*var flashvars = (.*)', webpage)
 		if mobj is None:
 			self._downloader.trouble(u'ERROR: unable to extract media URL')
 			return
-		sequence = urllib.unquote(mobj.group(1))
-		mobj = re.search(r',\"sdURL\"\:\"([^\"]+?)\",', sequence)
+		flashvars = urllib.unquote(mobj.group(1))
+		if 'hqURL' in flashvars: max_quality = 'hqURL'
+		elif 'sdURL' in flashvars: max_quality = 'sdURL'
+		else: max_quality = 'ldURL'
+		mobj = re.search(r'"' + max_quality + r'":"(.+?)"', flashvars)
+		if mobj is None:
+			mobj = re.search(r'"video_url":"(.*?)",', flashvars)
 		if mobj is None:
 			self._downloader.trouble(u'ERROR: unable to extract media URL')
 			return
-		mediaURL = urllib.unquote(mobj.group(1)).replace('\\', '')
+		video_url = urllib.unquote(mobj.group(1)).replace('\\/', '/')

-		# if needed add http://www.dailymotion.com/ if relative URL
-
-		video_url = mediaURL
+		# TODO: support choosing qualities

 		mobj = re.search(r'<meta property="og:title" content="(?P<title>[^"]*)" />', webpage)
 		if mobj is None:
--- a/youtube_dl/init.py
+++ b/youtube_dl/init.py
@ -269,7 +269,7 @@ def parseOpts():
 			action='store_true', dest='autonumber',
 			help='number downloaded files starting from 00000', default=False)
 	filesystem.add_option('-o', '--output',
-			dest='outtmpl', metavar='TEMPLATE', help='output filename template. Use %(stitle)s to get the title, %(uploader)s for the uploader name, %(autonumber)s to get an automatically incremented number, %(ext)s for the filename extension, %(upload_date)s for the upload date (YYYYMMDD), and %% for a literal percent. Use - to output to stdout.')
+			dest='outtmpl', metavar='TEMPLATE', help='output filename template. Use %(stitle)s to get the title, %(uploader)s for the uploader name, %(autonumber)s to get an automatically incremented number, %(ext)s for the filename extension, %(upload_date)s for the upload date (YYYYMMDD), %(extractor)s for the extractor used (youtube, metacafe, etc), %(id)s for the video id and %% for a literal percent. Use - to output to stdout.')
 	filesystem.add_option('-a', '--batch-file',
 			dest='batchfile', metavar='FILE', help='file containing URLs to download (\'-\' for stdin)')
 	filesystem.add_option('-w', '--no-overwrites',