You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It would be great to have Duck Duck Go implemented within the icrawler framework. I created my own script, based upon other code (attribution provided below). My code does not conform to the icrawler framework style. It does nothing more than search from images on DDG and return URLs. I’ve looked through the icrawler framework and I’m not proficient to be able to implement it in this style. If you like, I could put something together as a pull request that would provide a minimally viable DDG engine within the framework. Alternatively, I post the code here is someone else wants to implement it themselves:
#1
Closed
bibbu994 opened this issue
Sep 6, 2020
· 0 comments
It would be great to have Duck Duck Go implemented within the icrawler framework. I created my own script, based upon other code (attribution provided below). My code does not conform to the icrawler framework style. It does nothing more than search from images on DDG and return URLs. I’ve looked through the icrawler framework and I’m not proficient to be able to implement it in this style. If you like, I could put something together as a pull request that would provide a minimally viable DDG engine within the framework. Alternatively, I post the code here is someone else wants to implement it themselves:
### image_search_ddg.py
### C. Bryan Daniels
### 9/1/2020
### Adopted from https://github.com/deepanprabhu/duckduckgo-images-api
###
import requests, re, json, time, sys
headers = {'authority':'duckduckgo.com','accept':'application/json,text/javascript,*/*; q=0.01','sec-fetch-dest':'empty',
'x-requested-with':'XMLHttpRequest',
'user-agent':'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_4) AppleWebKit/537.36 (KHTML,like Gecko) Chrome/80.0.3987.163 Safari/537.36',
'sec-fetch-site':'same-origin','sec-fetch-mode':'cors','referer':'https://duckduckgo.com/','accept-language':'en-US,en;q=0.9'}
def image_search_ddg(keywords,max_n=100):
"""Search for 'keywords' with DuckDuckGo and return a unique urls of 'max_n' images"""
url = 'https://duckduckgo.com/'
params = {'q':keywords}
res = requests.post(url,data=params)
searchObj = re.search(r'vqd=([\d-]+)\&',res.text)
if not searchObj: print('Token Parsing Failed !'); return
params = (('l','us-en'),('o','json'),('q',keywords),('vqd',searchObj.group(1)),('f',',,,'),('p','1'),('v7exp','a'))
requestUrl = url + 'i.js'
urls = []
while True:
try:
res = requests.get(requestUrl,headers=headers,params=params)
data = json.loads(res.text)
for obj in data['results']:
urls.append(obj['image'])
max_n = max_n - 1
if max_n < 1: return print_uniq(urls)
if 'next' not in data: return print_uniq(urls)
requestUrl = url + data['next']
except:
pass
def print_uniq(urls):
for url in set(urls):
print(url)
if __name__ == "__main__":
if len(sys.argv) == 2: image_search_ddg(sys.argv[1])
elif len(sys.argv) == 3: image_search_ddg(sys.argv[1],int(sys.argv[2]))
else: print("usage: search(keywords,max_n=100)")
It would be great to have Duck Duck Go implemented within the icrawler framework. I created my own script, based upon other code (attribution provided below). My code does not conform to the icrawler framework style. It does nothing more than search from images on DDG and return URLs. I’ve looked through the icrawler framework and I’m not proficient to be able to implement it in this style. If you like, I could put something together as a pull request that would provide a minimally viable DDG engine within the framework. Alternatively, I post the code here is someone else wants to implement it themselves:
Originally posted by @prairie-guy in hellock/icrawler#82
The text was updated successfully, but these errors were encountered: