PyDL  Version: 0.0.1
pydl.ScrapeFile Class Reference
Collaboration diagram for pydl.ScrapeFile:
Collaboration graph

Public Member Functions

def __init__ (self, input_url, download_path, file_id, report)
 
def follow (self, url)
 
def chunk_report (self, bytes_so_far, chunk_size, total_size)
 
def chunk_read (self, response, ofilename, chunk_size=8192, report_hook=None)
 
def download (self)
 
def test_thread (self)
 

Public Attributes

 filename
 
 download_path
 
 ofilename
 
 report
 

Detailed Description

Class to get file

Definition at line 98 of file pydl.py.

Constructor & Destructor Documentation

def pydl.ScrapeFile.__init__ (   self,
  input_url,
  download_path,
  file_id,
  report 
)

Definition at line 102 of file pydl.py.

102  def __init__(self, input_url, download_path, file_id, report):
103  #TODO: handle other site layouts (like sourceforge)
104  purl = urllib.parse.urlparse(input_url)
105  self.filename = purl.path.split('/')[-1]
106 
107  #build the download url
108  download_path_start = purl.scheme + "://" + purl.netloc + download_path + purl.path.split('/')[file_id]
109 
110  #TODO: add try block in caller
111  self.download_path = purl.scheme + "://" + purl.netloc + self.follow(download_path_start)
112 
113  # set the output filename
114  self.ofilename = self.download_path.split('/')[-1]
115 
116  # report class instance reference
117  self.report = report
118 
def __init__(self, input_url, download_path, file_id, report)
Definition: pydl.py:102
def follow(self, url)
Definition: pydl.py:119

Member Function Documentation

def pydl.ScrapeFile.chunk_read (   self,
  response,
  ofilename,
  chunk_size = 8192,
  report_hook = None 
)

Definition at line 138 of file pydl.py.

Referenced by pydl.ScrapeFile.download().

138  def chunk_read(self, response, ofilename, chunk_size=8192, report_hook=None):
139  total_size = int(response.headers["Content-Length"])
140  bytes_so_far = 0
141  data = []
142 
143  with open(ofilename,'wb') as ofd:
144  while 1:
145  chunk = response.read(chunk_size)
146 
147  if not chunk:
148  break
149 
150  bytes_so_far += len(chunk)
151 
152  if report_hook:
153  report_hook(bytes_so_far, chunk_size, total_size)
154 
155  ofd.write(chunk)
156 
def chunk_read(self, response, ofilename, chunk_size=8192, report_hook=None)
Definition: pydl.py:138
def pydl.ScrapeFile.chunk_report (   self,
  bytes_so_far,
  chunk_size,
  total_size 
)
Report the latest downloaded file chunk

Definition at line 132 of file pydl.py.

References pydl.ScrapeFile.ofilename.

Referenced by pydl.ScrapeFile.download(), and pydl.ScrapeFile.test_thread().

132  def chunk_report(self, bytes_so_far, chunk_size, total_size):
133  """ Report the latest downloaded file chunk
134  """
135  self.report.progressX[self.ofilename] = {"bytes_so_far": bytes_so_far, "total_size": total_size}
136 
137 
def chunk_report(self, bytes_so_far, chunk_size, total_size)
Definition: pydl.py:132
def pydl.ScrapeFile.download (   self)

Definition at line 157 of file pydl.py.

References pydl.ScrapeFile.chunk_read(), pydl.ScrapeFile.chunk_report(), pydl.ScrapeFile.download_path, and pydl.ScrapeFile.ofilename.

157  def download(self):
158 
159  # just in case we need to look like a browser
160  hdr = {'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.64 Safari/537.11',
161  'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
162  'Accept-Charset': 'ISO-8859-1,utf-8;q=0.7,*;q=0.3',
163  'Accept-Encoding': 'none',
164  'Accept-Language': 'en-US,en;q=0.8',
165  'Connection': 'keep-alive'}
166 
167  self.ofilename = self.download_path.split('/')[-1]
168  req = urllib.request.Request(self.download_path, headers=hdr)
169  response = urllib.request.urlopen(req)
170 
171  self.chunk_read(response, self.ofilename, report_hook=self.chunk_report)
172 
def chunk_report(self, bytes_so_far, chunk_size, total_size)
Definition: pydl.py:132
def chunk_read(self, response, ofilename, chunk_size=8192, report_hook=None)
Definition: pydl.py:138
def download(self)
Definition: pydl.py:157

Here is the call graph for this function:

def pydl.ScrapeFile.follow (   self,
  url 
)
Follow the url through redirects

Definition at line 119 of file pydl.py.

119  def follow(self, url):
120  """ Follow the url through redirects
121  """
122  while True:
123  with closing(urllib.request.urlopen(url)) as stream:
124  next = parse(stream).xpath("//meta[@http-equiv = 'refresh']/@content")
125  if next:
126  url = next[0].split(";")[1].strip().replace("url=", "")
127  # temp hack return bc of known hop level
128  return url
129  else:
130  return stream.geturl()
131 
def follow(self, url)
Definition: pydl.py:119
def pydl.ScrapeFile.test_thread (   self)

Definition at line 173 of file pydl.py.

References pydl.ScrapeFile.chunk_report().

173  def test_thread(self):
174  chunk_size = 8192
175  inc = randint(0, 1024)
176 
177  total_size = 10240
178  bytes_so_far = 0
179 
180  while bytes_so_far < total_size:
181  time.sleep(.5)
182  bytes_so_far += inc
183 
184  #compensate for random number overage
185  if ( bytes_so_far > total_size):
186  bytes_so_far = total_size
187 
188  self.chunk_report(bytes_so_far, chunk_size, total_size)
189 
190 
def chunk_report(self, bytes_so_far, chunk_size, total_size)
Definition: pydl.py:132
def test_thread(self)
Definition: pydl.py:173

Here is the call graph for this function:

Member Data Documentation

pydl.ScrapeFile.download_path

Definition at line 111 of file pydl.py.

Referenced by pydl.ScrapeFile.download().

pydl.ScrapeFile.filename

Definition at line 105 of file pydl.py.

pydl.ScrapeFile.ofilename

Definition at line 114 of file pydl.py.

Referenced by pydl.ScrapeFile.chunk_report(), and pydl.ScrapeFile.download().

pydl.ScrapeFile.report

Definition at line 117 of file pydl.py.


The documentation for this class was generated from the following file: