-
Notifications
You must be signed in to change notification settings - Fork 13
KeywordQuery
hollingsworthd edited this page Dec 8, 2014
·
23 revisions
JSON Object
Type | Name | Description |
---|---|---|
String | site | Required -- URL of search page. |
String[] | instances | Required (unless specified in screenslicer.config) -- IP addresses of ScreenSlicer servers. |
String | keywords | Text to enter into search box. |
String[] | urlWhitelist | Substrings that URLs of results must contain. |
String[] | urlPatterns | Regular expressions that URLs of results must match. Follows the syntax of String-escaped Java Patterns. |
HtmlNode[] | urlMatchNodes | HtmlNodes that result URL nodes must match. |
HtmlNode | matchParent | Override to specify a particular type of node that's a parent of result nodes to extract. |
HtmlNode | matchResult | Override to specify a particular type of node that's a result node to extract. |
Boolean | requireResultAnchor | Whether results must have anchors. Defaults to true. |
HtmlNode | searchSubmitClick | Click to submit search. |
Boolean | proactiveUrlFiltering | Whether to apply the urlWhitelist and urlPatterns before analyzing the page to extract results. This generally produces a more accurate extraction. Defaults to false. |
UrlTransform[] | urlTransforms | Converts result URLs to another format, based on regular expressions. |
Integer | pages | Maximum number of search pages to extract, unless the results maximum has already been reached. Defaults to 1. Set to 0 or less to disable this maximum. |
Integer | results | Maximum number of results to extract, unless the pages maximum has already been reached. Defaults to 0. Set to 0 or less to disable this maximum. |
Boolean | fetch | Whether to get the content at each result URL. Defaults to false. |
Boolean | fetchCached | Whether to visit the result URL directly or try a public web cache. Defaults to false. |
Boolean | fetchInNewWindow | Whether to fetch results in a new window. Defaults to true. |
Boolean | extract | Whether to extract results or just return the HTML. Defaults to true. |
HtmlNode[] | preAuthClicks | Clicks on HTML elements prior to authentication. |
HtmlNode[] | preSearchClicks | Clicks on HTML elements prior to searching. |
HtmlNode[] | postSearchClicks | Clicks on HTML elements after searching. |
HtmlNode[] | postFetchClicks | Clicks on HTML elements at a result page after fetching it. |
HtmlNode[] | proceedClicks | Clicks on HTML elements to get successive pages of results. |
Credentials | credentials | Credentials for authentication. |
Integer | timeout | Page load timeout, in seconds. Defaults to 25 seconds. |
Boolean | throttle | Whether to throttle requests. Defaults to true. |
Proxy | proxy | Proxy settings. Defaults to a local tor-socks (socks 5) connection at 9050. |
Proxy[] | proxies | Proxy settings. Allows multiple proxies. If proxy types overlap, then one is pseudo-randomly chosen. Defaults to null. |
Map<String, Object> | browserPrefs | Browser preferences. |
Map<String, String> | httpHeaders | HTTP headers added to each request. |
String | runGuid | GUID assigned to any ScreenSlicer request. Defaults to a new GUID (just a random string). |
Boolean | continueSession | Whether the browser session should be retained from prior request. Defaults to false which is generally what's advisable. |
Boolean | collapse | Whether to return only a unique ID for each SearchResult which can later be used to request the actual content. Useful for very large result sets. Defaults to false. |
KeywordQuery | keywordQuery | KeywordQuery to perform at each fetched result. |
FormQuery | formQuery | FormQuery to perform at each fetched result. |