bibbu994 · 2020-09-06T19:22:45Z

It would be great to have Duck Duck Go implemented within the icrawler framework. I created my own script, based upon other code (attribution provided below). My code does not conform to the icrawler framework style. It does nothing more than search from images on DDG and return URLs. I’ve looked through the icrawler framework and I’m not proficient to be able to implement it in this style. If you like, I could put something together as a pull request that would provide a minimally viable DDG engine within the framework. Alternatively, I post the code here is someone else wants to implement it themselves:


### image_search_ddg.py                                                                                                                               
### C. Bryan Daniels                                                                                                                                  
### 9/1/2020                                                                                                                                          
### Adopted from https://github.com/deepanprabhu/duckduckgo-images-api                                                                                
###                                                                                                                                                   

import requests, re, json, time, sys

headers = {'authority':'duckduckgo.com','accept':'application/json,text/javascript,*/*; q=0.01','sec-fetch-dest':'empty',
        'x-requested-with':'XMLHttpRequest',
        'user-agent':'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_4) AppleWebKit/537.36 (KHTML,like Gecko) Chrome/80.0.3987.163 Safari/537.36',
        'sec-fetch-site':'same-origin','sec-fetch-mode':'cors','referer':'https://duckduckgo.com/','accept-language':'en-US,en;q=0.9'}

def image_search_ddg(keywords,max_n=100):
    """Search for 'keywords' with DuckDuckGo and return a unique urls of 'max_n' images"""
    url = 'https://duckduckgo.com/'
    params = {'q':keywords}
    res = requests.post(url,data=params)
    searchObj = re.search(r'vqd=([\d-]+)\&',res.text)
    if not searchObj: print('Token Parsing Failed !'); return
    params = (('l','us-en'),('o','json'),('q',keywords),('vqd',searchObj.group(1)),('f',',,,'),('p','1'),('v7exp','a'))
    requestUrl = url + 'i.js'
    urls = []
    while True:
        try:
            res = requests.get(requestUrl,headers=headers,params=params)
            data = json.loads(res.text)
            for obj in data['results']:
                urls.append(obj['image'])
                max_n = max_n - 1
                if max_n < 1: return print_uniq(urls)
            if 'next' not in data: return print_uniq(urls)
            requestUrl = url + data['next']
        except:
            pass

def print_uniq(urls):
    for url in set(urls):
        print(url)

if __name__ == "__main__": 
    if len(sys.argv)    == 2: image_search_ddg(sys.argv[1])
    elif len(sys.argv)  == 3: image_search_ddg(sys.argv[1],int(sys.argv[2]))
    else: print("usage: search(keywords,max_n=100)")

Originally posted by @prairie-guy in hellock/icrawler#82

The text was updated successfully, but these errors were encountered:

bibbu994 self-assigned this Sep 6, 2020

bibbu994 closed this as completed Oct 11, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bibbu994 commented Sep 6, 2020

Comments

bibbu994 commented Sep 6, 2020