Skip to content

KeywordQuery

hollingsworthd edited this page Dec 8, 2014 · 23 revisions

JSON Object

Type Name Description
String site Required -- URL of search page.
String[] instances Required (unless specified in screenslicer.config) -- IP addresses of ScreenSlicer servers.
String keywords Text to enter into search box.
String[] urlWhitelist Substrings that URLs of results must contain.
String[] urlPatterns Regular expressions that URLs of results must match. Follows the syntax of String-escaped Java Patterns.
HtmlNode[] urlMatchNodes HtmlNodes that result URL nodes must match.
HtmlNode matchParent Override to specify a particular type of node that's a parent of result nodes to extract.
HtmlNode matchResult Override to specify a particular type of node that's a result node to extract.
Boolean requireResultAnchor Whether results must have anchors. Defaults to true.
HtmlNode searchSubmitClick Click to submit search.
Boolean proactiveUrlFiltering Whether to apply the urlWhitelist and urlPatterns before analyzing the page to extract results. This generally produces a more accurate extraction. Defaults to false.
UrlTransform[] urlTransforms Converts result URLs to another format, based on regular expressions.
Integer pages Maximum number of search pages to extract, unless the results maximum has already been reached. Defaults to 1. Set to 0 or less to disable this maximum.
Integer results Maximum number of results to extract, unless the pages maximum has already been reached. Defaults to 0. Set to 0 or less to disable this maximum.
Boolean fetch Whether to get the content at each result URL. Defaults to false.
Boolean fetchCached Whether to visit the result URL directly or try a public web cache. Defaults to false.
Boolean fetchInNewWindow Whether to fetch results in a new window. Defaults to true.
Boolean extract Whether to extract results or just return the HTML. Defaults to true.
HtmlNode[] preAuthClicks Clicks on HTML elements prior to authentication.
HtmlNode[] preSearchClicks Clicks on HTML elements prior to searching.
HtmlNode[] postSearchClicks Clicks on HTML elements after searching.
HtmlNode[] postFetchClicks Clicks on HTML elements at a result page after fetching it.
HtmlNode[] proceedClicks Clicks on HTML elements to get successive pages of results.
Credentials credentials Credentials for authentication.
Integer timeout Page load timeout, in seconds. Defaults to 25 seconds.
Boolean throttle Whether to throttle requests. Defaults to true.
Proxy proxy Proxy settings. Defaults to a local tor-socks (socks 5) connection at 9050.
Proxy[] proxies Proxy settings. Allows multiple proxies. If proxy types overlap, then one is pseudo-randomly chosen. Defaults to null.
Map<String, Object> browserPrefs Browser preferences.
Map<String, String> httpHeaders HTTP headers added to each request.
String runGuid GUID assigned to any ScreenSlicer request. Defaults to a new GUID (just a random string).
Boolean continueSession Whether the browser session should be retained from prior request. Defaults to false which is generally what's advisable.
Boolean collapse Whether to return only a unique ID for each SearchResult which can later be used to request the actual content. Useful for very large result sets. Defaults to false.
KeywordQuery keywordQuery KeywordQuery to perform at each fetched result.
FormQuery formQuery FormQuery to perform at each fetched result.
Clone this wiki locally