Return self.download(url,headers,proxy,num_retries- 1,data) # URL is not available in the cache pass #else: #if result is not None and self.num_retries >0 and 500 0 and 500<=code< 600: Sleep_secs = lay - (datetime.now() - last_accessed).secondsĬlass Downloader: def _init_ (self,delay= 5,user_agent= 'wswp',proxies=None,num_retries= 1,timeout= 60,opener=None,cache=None): If lay > 0 and last_accessed is not None: # timestamp of when a domain was last accessed """ def _init_ (self, delay): # amount of delay between downloads for each domain The following is the code to implement the above function:Īmong them, the content of the downloader: #-*- coding=utf-8 -*- import reĭEFAULT_TIMEOUT= 60 class Throttle: """Throttle downloading by sleeping between requests to same domain Each line of traversal in the CSV file, extract data from it There are two contents in the table, namely ranking and domain name –ĭraw the data to include the following four steps:Ģ. Alexa website list is provided in the form of an electronic table.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |