Collaboration diagram for pydl.ScrapeFile:

Public Member Functions
def	__init__ (self, input_url, download_path, file_id, report)

def	follow (self, url)

def	chunk_report (self, bytes_so_far, chunk_size, total_size)

def	chunk_read (self, response, ofilename, chunk_size=8192, report_hook=None)

def	download (self)

def	test_thread (self)

Public Attributes
	filename

	download_path

	ofilename

	report

Detailed Description

Class to get file

Definition at line 98 of file pydl.py.

Constructor & Destructor Documentation

def pydl.ScrapeFile.__init__	(	self,
		input_url,
		download_path,
		file_id,
		report
	)

Definition at line 102 of file pydl.py.

     def __init__(self, input_url, download_path, file_id, report):
         #TODO: handle other site layouts (like sourceforge)
         purl = urllib.parse.urlparse(input_url)
         self.filename = purl.path.split('/')[-1]
 
         #build the download url 
         download_path_start = purl.scheme + "://" + purl.netloc + download_path + purl.path.split('/')[file_id]
 
         #TODO: add try block in caller
         self.download_path = purl.scheme + "://" + purl.netloc + self.follow(download_path_start)
 
         # set the output filename
         self.ofilename = self.download_path.split('/')[-1]
 
         # report class instance reference
         self.report = report
 

Member Function Documentation

def pydl.ScrapeFile.chunk_read	(	self,
		response,
		ofilename,
		chunk_size = `8192`,
		report_hook = `None`
	)

Definition at line 138 of file pydl.py.

Referenced by pydl.ScrapeFile.download().

     def chunk_read(self, response, ofilename, chunk_size=8192, report_hook=None):
         total_size = int(response.headers["Content-Length"])
         bytes_so_far = 0
         data = []
 
         with open(ofilename,'wb') as ofd:
             while 1:
                chunk = response.read(chunk_size)
 
                if not chunk:
                   break
 
                bytes_so_far += len(chunk)
 
                if report_hook:
                   report_hook(bytes_so_far, chunk_size, total_size)
 
                ofd.write(chunk)
 

def pydl.ScrapeFile.chunk_report	(	self,
		bytes_so_far,
		chunk_size,
		total_size
	)

Report the latest downloaded file chunk

Definition at line 132 of file pydl.py.

References pydl.ScrapeFile.ofilename.

Referenced by pydl.ScrapeFile.download(), and pydl.ScrapeFile.test_thread().

     def chunk_report(self, bytes_so_far, chunk_size, total_size):
         """ Report the latest downloaded file chunk
         """
         self.report.progressX[self.ofilename] = {"bytes_so_far": bytes_so_far, "total_size": total_size}
 
 

def pydl.ScrapeFile.download ( self )

Definition at line 157 of file pydl.py.

References pydl.ScrapeFile.chunk_read(), pydl.ScrapeFile.chunk_report(), pydl.ScrapeFile.download_path, and pydl.ScrapeFile.ofilename.

     def download(self):
 
         # just in case we need to look like a browser
         hdr = {'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.64 Safari/537.11',
                'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
                'Accept-Charset': 'ISO-8859-1,utf-8;q=0.7,*;q=0.3',
                'Accept-Encoding': 'none',
                'Accept-Language': 'en-US,en;q=0.8',
                'Connection': 'keep-alive'}
 
         self.ofilename = self.download_path.split('/')[-1]
         req = urllib.request.Request(self.download_path, headers=hdr)
         response = urllib.request.urlopen(req)
 
         self.chunk_read(response, self.ofilename, report_hook=self.chunk_report)
 

Here is the call graph for this function:

def pydl.ScrapeFile.follow	(	self,
		url
	)

Follow the url through redirects

Definition at line 119 of file pydl.py.

     def follow(self, url):
         """ Follow the url through redirects
         """
         while True:
             with closing(urllib.request.urlopen(url)) as stream:
                 next = parse(stream).xpath("//meta[@http-equiv = 'refresh']/@content")
                 if next:
                     url = next[0].split(";")[1].strip().replace("url=", "")
                     # temp hack return bc of known hop level
                     return url
                 else:
                     return stream.geturl()
 

def pydl.ScrapeFile.test_thread ( self )

Definition at line 173 of file pydl.py.

References pydl.ScrapeFile.chunk_report().

     def test_thread(self):
         chunk_size = 8192
         inc = randint(0, 1024)
 
         total_size = 10240
         bytes_so_far = 0
 
         while bytes_so_far < total_size:
             time.sleep(.5)
             bytes_so_far += inc
 
             #compensate for random number overage
             if ( bytes_so_far > total_size):
                 bytes_so_far = total_size
 
             self.chunk_report(bytes_so_far, chunk_size, total_size)