Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Paged JSON support, 'highlight' and 'snippet' URL parameters #61

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

Yetangitu
Copy link

This PR encompasses the following changes:

  • changed JSON endpoint to allow paged json, use without page parameter (or page=0) to get unpaged result
  • added snippets and highlight parameters to search url, use highlight=0 to disable search term highlighting, snippets=0 to disable snippet generation.
  • added optional usjon module support

The first two changes can be used in combination with the Searx meta-search plugin to Recoll to enable Searx to search Recoll sites.

parameter (or page=0) to get unpaged result
 - added 'snippets' and 'highlight' parameters to search url, use
'highlight=0' to disable search term highlighting, 'snippets=0' to
disable snippet generation.
 - added optional usjon module support

The first two changes can be used in combination with the Searx
meta-search plugin to Recoll to enable Searx to search Recoll sites.
@@ -183,7 +192,7 @@ def endMatch(self):
return '</span>'
#}}}
#{{{ recoll_search
def recoll_search(q, dosnippets=True):
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

options are communicated through q, no need for dosnippets

@@ -315,7 +327,6 @@ def edit(resnum):
@bottle.route('/json')
def get_json():
query = get_query()
query['page'] = 0
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This allows the generation of paged JSON, set page=0 or use without page parameter to get an unpaged result.

qs = query_to_recoll_string(query)
bottle.response.headers['Content-Type'] = 'text/csv'
bottle.response.headers['Content-Disposition'] = 'attachment; filename=recoll-%s.csv' % normalise_filename(qs)
res, nres, timer = recoll_search(query, False)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

options are communicated through query, no need for extra parameter

import csv
import StringIO
import ConfigParser
import string
import shlex
import urllib

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

usjon is quite a bit faster but there seem to be some concerns about correctness

@@ -34,7 +41,7 @@
'context': 30,
'stem': 1,
'timefmt': '%c',
'dirdepth': 3,
'dirdepth': 2,
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the web interface can get unbearably slow with a deep tree, setting the default depth to 2 solves this issue.

    * add support for extra databases through RECOLL_EXTRA_DBS
      parameter, set this to the path for one or more (colon-separated)
      xapiandb directories to have the webui query those databases.
      The code assumes there to be a recoll.conf file one level below
      the indicated database directories from which it will pull
      the indexed top directories. When using the 'dir' parameter to
      limit a query to a given directory it will only query those
      databases (one or more) which match the indicated top directory.

    * JSON endpoint now publishes number of results in 'nres' parameter

    * recoll_search now gracefully denies to produce non-existing result
      pages instead of crashing and burning
@Yetangitu
Copy link
Author

Yetangitu commented Apr 8, 2018

The last commits add support for using more than one database by setting RECOLL_EXTRA_DBS to the path for one or more (colon-separated) xapiandb directories (analog to the way the Qt GUI handles multiple databases). The code assumes there to be a recoll.conf file one level below the indicated database directories from which it will pull the indexed top directories. When using the dir parameter to limit a query to a given directory it will only query those databases with matching topdirs.

The reason for implementing this scheme is to limit the total set size for directed queries as this increases performance (less data to search through) and flexibility (easier to add indexed directories).

@ghost
Copy link

ghost commented Oct 2, 2019

Hi,

I merged the "Paged JSON support, 'highlight' and 'snippet' URL parameters" into https://opensourceprojects.eu/p/recollwebui/code which is where I (recoll dev) maintain the webui until koniu reappears.

I can't merge the extradbs thing because the assumption about the config directory relative to the db one is wrong. If dbdir is set, the xapiandb can live anywhere.

For the main interface, extrabds contains index directories to make it clear that the configuration parameters are ignored.

In your case, I would use a different environment variable, and list configuration directories, from which you can retrieve both topdirs and dbdir (if the latter is not set, the xapian index indeed lives in the xapiandb subdir of the config).

ameisehaufen pushed a commit to ameisehaufen/kmrecollwebui that referenced this pull request Jan 4, 2021
Merge from koniu/recoll-webui#61

- Changed JSON endpoint to allow paged json, use without page parameter (or
  page=0) to get unpaged result
- Added snippets and highlight parameters to search url, use highlight=0 to
  disable search term highlighting, snippets=0 to disable snippet
  generation.
- Added optional usjon module support
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant