Copyright © 2001-2016 Andrew Aksyonoff
Copyright © 2008-2016 Sphinx Technologies Inc, http://sphinxsearch.com
+
Sphinx initial author (and a benevolent dictator ever since):
Andrew Aksyonoff, http://shodan.ru
-
+
Past and present employees of Sphinx Technologies Inc who should be noted on their work on Sphinx (in alphabetical order):
-
People who contributed to Sphinx and their contributions (in no particular order): +
People who contributed to Sphinx and their contributions (in no particular order):
Robert "coredev" Bengtsson (Sweden), initial version of PostgreSQL data source
Len Kranendonk, Perl API
Dmytro Shteflyuk, Ruby API
Extract everything from the distribution tarball (haven't you already?)
and go to the sphinx
subdirectory. (We are using
- version 2.3.2-dev here for the sake of example only; be sure to change this
+ version 2.3.2-beta here for the sake of example only; be sure to change this
to a specific version you're using.)
-
There are two ways of getting Sphinx for Ubuntu: regular deb packages and the Launchpad PPA repository.
Deb packages:
Sphinx requires a few libraries to be installed on Debian/Ubuntu. Use apt-get to download and install these dependencies:
$ sudo apt-get install mysql-client unixodbc libpq5
Now you can install Sphinx:
$ sudo dpkg -i sphinxsearch_2.3.2-dev-0ubuntu11~trusty_amd64.deb
Now you can install Sphinx:
$ sudo dpkg -i sphinxsearch_2.3.2-beta-1~trusty_amd64.deb
PPA repository (Ubuntu only).
Installing Sphinx is much easier from Sphinxsearch PPA repository, because you will get all dependencies and can also update Sphinx to the latest version with the same command.
First, add Sphinxsearch repository and update the list of packages:
$ sudo add-apt-repository ppa:builds/sphinxsearch-rel23
$ sudo apt-get update
Install/update sphinxsearch package:
$ sudo apt-get install sphinxsearch
Installing Sphinx on a Windows server is often easier than installing on a Linux environment; unless you are preparing code patches, you can use the pre-compiled binary files from the Downloads area on the website.
Extract everything from the .zip file you have downloaded -
- sphinx-2.3.2-dev-win32.zip
,
- or sphinx-2.3.2-dev-win32-pgsql.zip
if you need PostgresSQL support as well.
- (We are using version 2.3.2-dev here for the sake of example only;
+ sphinx-2.3.2-beta-win32.zip
,
+ or sphinx-2.3.2-beta-win32-pgsql.zip
if you need PostgresSQL support as well.
+ (We are using version 2.3.2-beta here for the sake of example only;
be sure to change this to a specific version you're using.)
You can use Windows Explorer in Windows XP and up to extract the files,
or a freeware package like 7Zip to open the archive.
For the remainder of this guide, we will assume that the folders are unzipped into C:\Sphinx
,
@@ -810,7 +816,7 @@
is obsolete and will be removed in the near future.
docinfo=inline is deprecated. You can now use ondisk_attrs or -ondisk_attrs_default instead.
workers=threads is a new default for all OS now. We're gonna get rid of other modes in future.
mem_limit=128M is a new default.
Removed CLI search which confused people instead of helping them and sql_query_info.
Deprecated SetMatchMode() API call.
Changed default thread_stack
+ Changed default thread_stack
value to 1M. Deprecated SetOverride() API call.
(excluding title and content, that are full-text fields) as attributes, indexing them, and then using API calls to setup filtering, sorting, and grouping. Here as an example. -
+Example sphinx.conf part:
... sql_query = SELECT id, title, content, \ author_id, forum_id, post_date FROM my_forum_posts @@ -1028,7 +1034,7 @@Author
sql_attr_uint = forum_id sql_attr_timestamp = post_date ... -
Example application code (in PHP):
+Example application code (in PHP):
// only search posts by author whose ID is 123 $cl->SetFilter ( "author_id", array ( 123 ) ); @@ -1264,7 +1270,7 @@Author
Obviously, that's not much of a difference for 2000-row table, but when it comes to indexing 10-million-row MyISAM table, ranged queries might be of some help. -
sql_query_post
vs.sql_query_post_index
+
sql_query_post
vs.sql_query_post_index
The difference between post-query and post-index query is in that post-query is run immediately when Sphinx received all the documents, but further indexing may still fail for some other reason. On the contrary, @@ -1785,7 +1791,7 @@
Author
2.0.3-release,
binlog_max_log_size
defaults to 0.)There are 3 different binlog flushing strategies, controlled by -binlog_flush directive +binlog_flush directive which takes the values of 0, 1, or 2. 0 means to flush the log to OS and sync it to disk every second; 1 means flush and sync every transaction; and 2 (the default mode) means flush every @@ -1817,7 +1823,7 @@
Author
is not very good for disk use and crash recovery time. Starting with 2.0.1-beta you can configure
searchd
to perform a periodic RAM chunk flush to fix that problem -using a rt_flush_period +using a rt_flush_period directive. With periodic flushes enabled,searchd
will keep a separate thread, checking whether RT indexes RAM chunks need to be written back to disk. Once that happens, @@ -1859,9 +1865,10 @@Author
@@ -2283,7 +2290,7 @@
// SphinxQL mysql_query ( "SELECT ... OPTION ranker=sph04" );
-
+
Legacy matching modes automatically select a ranker as follows:
SPH_MATCH_ALL uses SPH_RANK_PROXIMITY ranker;
SPH_MATCH_ANY uses SPH_RANK_MATCHANY ranker;
Returns the current timestamp as an INTEGER. Introduced in version 0.9.9-rc1.
Returns the integer year (in 1969..2038 range) from a timestamp argument, according to the current timezone. Introduced in version 2.0.1-beta.
Returns the integer year and month code (in 196912..203801 range) from a timestamp argument, according to the current timezone. Introduced in version 2.0.1-beta.
Returns the integer year, month, and date code (in 19691231..20380119 range) from a timestamp argument, according to the current timezone. Introduced in version 2.0.1-beta.
Returns the integer year, month, and date code (in 19691231..20380119 range) from a timestamp argument, according to the current timezone. Introduced in version 2.0.1-beta.
Returns the integer second (in 0..59 range) from a timestamp argument, according to the current timezone. Introduced in version 2.3.2-beta.
Returns the integer minute (in 0..59 range) from a timestamp argument, according to the current timezone. Introduced in version 2.3.2-beta.
Returns the integer hour (in 0..23 range) from a timestamp argument, according to the current timezone. Introduced in version 2.3.2-beta.
@@ -3030,6 +3044,9 @@
SELECT REMAP(userid, karmapoints, (1, 67), (999, 0)) FROM users; SELECT REMAP(id%10, salary, (0), (0.0)) FROM employes;
+
+RAND(seed) function was added in 2.3.2-beta. Returns a random float between 0..1. Optional, an integer seed value can be specified.
@@ -3051,7 +3068,7 @@
SPH_SORT_RELEVANCE is equivalent to sorting by "@weight DESC, @id ASC" in extended sorting mode, SPH_SORT_ATTR_ASC is equivalent to "attribute ASC, @weight DESC, @id ASC", and SPH_SORT_ATTR_DESC to "attribute DESC, @weight DESC, @id ASC" respectively. -
+
In SPH_SORT_TIME_SEGMENTS mode, attribute values are split into so-called time segments, and then sorted by time segment first, and by relevance second.
@@ -3387,7 +3404,28 @@
modifying RT indexes with INSERT, REPLACE, and DELETE, and much more. Full SphinxQL reference is available in Chapter 8, SphinxQL reference.
+Starting with 2.3.2-beta, Sphinx search daemon supports HTTP protocol and can be accessed with regular HTTP clients. +Supported endpoints: +
/ - default response, returns a simple HTML page
/search - allows a simple full-text search, parameters can be : index (index or list of indexes), match (equivalent of MATCH()), select (as SELECT clause), group (grouping attribute), +order (SQL-like sorting), limit (equivalent of LIMIT 0,N) +
curl -X POST 'http://sphinxsearch:9308/search/' +-d 'index=forum&match=@subject php sphinx&select=id,subject,author_id&limit=5' +
+
/ sql - allows running a SELECT SphinxQL, set as query parameter +
+ curl -X POST 'http://sphinxsearch:9308/sql/' +-d 'query=select id,subject,author_id from forum where match('@subject php sphinx') group by author_id order by id desc limit 0,5' +
+
+The result for /sql/ and /search/ endpoints is an array of attrs,matches and meta, same as for SphinxAPI, encoded as a JSON object. +
Multi-queries, or query batches, let you send multiple queries to Sphinx in one go (more formally, one network request). @@ -3396,7 +3434,7 @@
AddQuery() and RunQueries(). You can also run multiple queries with SphinxQL, see -Section 8.43, “Multi-statement queries”. +Section 8.45, “Multi-statement queries”. (In fact, regular Query() call is internally implemented as a single AddQuery() call immediately followed by RunQueries() call.) AddQuery() captures the current state @@ -3404,7 +3442,7 @@
the query. RunQueries() actually sends all the memorized queries, and returns multiple result sets. There are no restrictions on the queries at all, except just a sanity check on a number of queries -in a single batch (see Section 12.4.22, “max_batch_queries”). +in a single batch (see Section 12.4.23, “max_batch_queries”).
Why use multi-queries? Generally, it all boils down to performance.
First, by sending requests to searchd
in a batch
@@ -3463,8 +3501,8 @@
There's a common two-word part ("barack obama") that can be computed only once, then cached and shared across the queries. And common subtree optimization does just that. Per-query cache size is strictly controlled -by subtree_docs_cache -and subtree_hits_cache +by subtree_docs_cache +and subtree_hits_cache directives (so that caching all sixteen gazillions of documents that match "i am" does not exhaust the RAM and instantly kill your server). @@ -3507,7 +3545,7 @@
even more improvements, and that's from production instances, not just synthetic tests.
Introduced to Sphinx in version 2.0.1-beta to supplement string sorting, collations essentially affect the string attribute comparisons. They specify @@ -3552,7 +3590,7 @@
case-insensitive (_ci) and case-sensitive (_cs) comparisons respectively.
By default they will use C locale, effectively resorting to bytewise
comparisons. To change that, you need to specify a different available
-locale using collation_libc_locale
+locale using collation_libc_locale
directive. The list of locales available on your system can usually be obtained
with the locale
command:
@@ -3594,7 +3632,7 @@Author
SET collation_connection
statement. All subsequent SphinxQL queries will use this collation. SphinxAPI and SphinxSE queries will use the server default collation, as specified in -collation_server configuration +collation_server configuration directive. Sphinx currently defaults tolibc_ci
collation.Collations should affect all string attribute comparisons, including @@ -3602,20 +3640,20 @@
Author
can be returned depending on the collation chosen. Note that collations don't affect full-text searching, for that use charset_table.
Query cache, added in 2.3.1-beta, stores a compressed result set in memory, and then reuses it for subsequent queries where possible. You can configure it using the following directives: -
qcache_max_bytes, a limit on the RAM use for cached queries storage. Defaults to 16 MB. Setting qcache_max_bytes to 0 completely disables the query cache.
qcache_thresh_msec, the minimum wall query time to cache. Queries that completed faster than this will not be cached. Defaults to 3000 msec, or 3 seconds.
qcache_ttl_sec, cached entry TTL, or time to live. Queries will stay cached for this much. Defaults to 60 seconds, or 1 minute.
for obvious security reasons: securing a single folder is easy; letting
anyone install arbitrary code into searchd
is a risk.
You can load and unload them dynamically into searchd
-with CREATE FUNCTION and
-DROP FUNCTION SphinxQL statements
+with CREATE FUNCTION and
+DROP FUNCTION SphinxQL statements
respectively. Also, you can seamlessly reload UDFs (and other plugins) with
-RELOAD PLUGINS statement.
+RELOAD PLUGINS statement.
Sphinx keeps track of the currently loaded functions, that is,
every time you create or drop an UDF, searchd
writes
-its state to the sphinxql_state file
+its state to the sphinxql_state file
as a plain good old SQL script.
Once you successfully load an UDF, you can use it in your SELECT or other @@ -3954,25 +3992,25 @@
create a dynamic library (either .so or.dll), most likely in C or C++;
load that plugin into searchd using -CREATE PLUGIN; +CREATE PLUGIN;
invoke it using the plugin specific calls (typically using this or that OPTION).
to unload or reload a plugin use -DROP PLUGIN and -RELOAD PLUGINS +DROP PLUGIN and +RELOAD PLUGINS respectively.
Note that while UDFs are first-class plugins they are nevertheless installed using a separate -CREATE FUNCTION +CREATE FUNCTION statement. It lets you specify the return type neatly so there was especially little reason to ruin backwards compatibility and change the syntax.
-Dynamic plugins are supported in workers=threads +Dynamic plugins are supported in workers=threads mode only. Multiple plugins (and/or UDFs) may reside in a single library file. So you might choose to either put all your project-specific plugins in a single common uber-library; or you might choose to have a separate library for every @@ -4146,7 +4184,7 @@
to rename the indexes (renaming the existing ones to include .old
and renaming the .new
to replace them), and then start serving
from the newer files. Depending on the setting of
-seamless_rotate, there may be a slight delay
+seamless_rotate, there may be a slight delay
in being able to search the newer indexes. Example usage:
$ indexer --rotate --all @@ -4250,7 +4288,11 @@Author
the "new" index; if not found, attributes from the new index are used. If the user has updated attributes in the index, but not in the actual source used for the index, all updates will be lost when reindexing; using --keep-attrs -enables saving the updated attribute values from the previous index +enables saving the updated attribute values from the previous index. +Starting with 2.3.2-beta it is possible to specify a path for index files to used instead of reference path from config: +
+indexer myindex --keep-attrs=/path/to/index/files +
--dump-rows <FILE>
dumps rows fetched
by SQL source(s) into the specified file, in a MySQL compatible syntax.
@@ -4515,7 +4557,7 @@
Initiates a clean shutdown. New queries will not be handled; but queries that are already started will not be forcibly interrupted.
Initiates index rotation. Depending on the value of - seamless_rotate setting, + seamless_rotate setting, new queries might be shortly stalled; clients will receive temporary errors.
Forces reopen of searchd log and query log files, letting @@ -4697,37 +4739,39 @@
SphinxQL is our SQL dialect that exposes all of the search daemon @@ -4854,7 +4898,7 @@
Starting with 2.0.1-beta, GROUP BY on a string attribute is supported, -with respect for current collation (see Section 5.12, “Collations”). +with respect for current collation (see Section 5.13, “Collations”).
Starting with 2.2.1-beta, you can query Sphinx to return (no more than) N top matches for each group accordingly to WITHIN GROUP ORDER BY.
SELECT id FROM products GROUP 3 BY category @@ -4894,7 +4938,7 @@Author
ORDER BY timeseg DESC, w DESC
Starting with 2.0.1-beta, WITHIN GROUP ORDER BY on a string attribute is supported, -with respect for current collation (see Section 5.12, “Collations”). +with respect for current collation (see Section 5.13, “Collations”).
HAVING clause. This is used to filter on GROUP BY values. It was added in @@ -4921,7 +4965,7 @@
evaluated just for a subset of values.
Starting with 2.0.1-beta, ORDER BY on a string attribute is supported, -with respect for current collation (see Section 5.12, “Collations”). +with respect for current collation (see Section 5.13, “Collations”).
Starting with 2.0.2-beta, ORDER BY RAND() syntax is supported. Note that this syntax is actually going to randomize the weight @@ -5022,7 +5066,7 @@
can also noticeably impact performance.
'max_query_time' - integer (max search time threshold, msec)
'max_predicted_time' - integer (max predicted search time, see Section 12.4.44, “predicted_time_costs”)
'max_predicted_time' - integer (max predicted search time, see Section 12.4.45, “predicted_time_costs”)
'ranker' - any of 'proximity_bm25', 'bm25', 'none', 'wordcount', 'proximity', 'matchany', 'fieldmask', 'sph04', 'expr', or 'export' (refer to Section 5.4, “Search results ranking” for more details on each ranker)
for an ORDER BY RAND()
query, for example: ... OPTION rand_seed=1234
.
By default, a new and different seed value is autogenerated for every query.
'low_priority' - runs the query with idle priority, introduced in 2.3.2-beta. +
Example: @@ -5045,7 +5091,7 @@
FACET clause. This Sphinx specific extension enables faceted search with subtree optimization. -It is capable of returning multiple result sets with a single SQL statement, without the need for complicated multi-queries. +It is capable of returning multiple result sets with a single SQL statement, without the need for complicated multi-queries. FACET clauses should be written at the very end of SELECT statements with spaces between them.
FACET {expr_list} [BY {expr_list}] [ORDER BY {expr | FACET()} {ASC | DESC}] [LIMIT [offset,] count] @@ -5138,7 +5184,7 @@Author
SHOW META shows additional meta-information about the latest query such as query time and keyword statistics. IO and CPU counters will only be available if searchd was started with --iostats and --cpustats switches respectively. Additional predicted_time, dist_predicted_time, [{local|dist}]_fetched_[{docs|hits|skips}] counters will only be available if searchd was configured with -predicted time costs and query had predicted_time in OPTION clause. +predicted time costs and query had predicted_time in OPTION clause.
mysql> SELECT * FROM test1 WHERE MATCH('test|one|two'); +------+--------+----------+------------+ @@ -5437,7 +5483,7 @@Author
COLLATION_CONNECTION = collation_name
Selects the collation to be used for ORDER BY or GROUP BY on string -values in the subsequent queries. Refer to Section 5.12, “Collations” +values in the subsequent queries. Refer to Section 5.13, “Collations” for a list of known collation names. Introduced in version 2.0.1-beta.
CHARACTER_SET_RESULTS = charset_name
PROFILING = {0 | 1}
Enables query profiling in the current session. Defaults to 0. -See also Section 8.31, “SHOW PROFILE syntax”. +See also Section 8.33, “SHOW PROFILE syntax”. Introduced in version 2.1.1-beta.
@@ -5477,15 +5523,15 @@
Introduced in version 2.0.1-beta.
QCACHE_MAX_BYTES = <value>
-Changes the query cache RAM use limit to a given value. +Changes the query cache RAM use limit to a given value. Added in 2.3.1-beta.
QCACHE_THRESH_MSEC = <value>
-Changes the query cache minimum wall time threshold to a given value. +Changes the query cache minimum wall time threshold to a given value. Added in 2.3.1-beta.
QCACHE_TTL_SEC = <value>
-Changes the query cache TTL for a cached result to a given value. +Changes the query cache TTL for a cached result to a given value. Added in 2.3.1-beta.
@@ -5574,7 +5620,7 @@
-CALL KEYWORDS(text, index [, 1]) +CALL KEYWORDS(text, index [, options])
CALL KEYWORDS statement, introduced in version 1.10-beta, splits text into particular keywords. It returns tokenized and normalized forms @@ -5584,10 +5630,69 @@
text
is the text to break down to keywords.
index
is the name of the index from which to take the text
-processing settings. hits
is an optional boolean parameter
+processing settings.
+options
prior 2.3.2-beta, is an optional boolean parameter
that specifies whether to return document and hit occurrence statistics.
+options
starting with 2.3.2-beta, can also accept parameters for configuring folding depending on tokenization settings:
+
stats
- show statistics of keywords, default is 0
fold_wildcards
- fold wildcards, default is 1
fold_lemmas
- fold morphological lemmas, default is 0
fold_blended
- fold blended words, default is 0
expansion_limit
- override expansion_limit defined in configuration, default is 0 (use value from configuration)
+
+call keywords( + 'que*', + 'myindex', + 1 as fold_wildcards, + 1 as fold_lemmas, + 1 as fold_blended, + 1 as expansion_limit, + 1 as stats); +
+Default values to match previous CALL KEYWORDS output are: +
+call keywords( + 'que*', + 'myindex', + 1 as fold_wildcards, + 0 as fold_lemmas, + 0 as fold_blended, + 0 as expansion_limit, + 0 as stats); +
+CALL QSUGGEST(word, index [,options]) ++CALL QSUGGEST statement, introduced in version 2.3.2-beta, enumerates for a giving word all suggestions from the dictionary. This statement works only on indexes with infixing enabled and dict=keywords. +It returns the suggested keywords, Levenshtein distance between the suggested and original keyword and the docs statistic of the suggested keyword. +Several options are supported for customization: +
limit
- returned N top matches, default is 5
max_edits
- keep only dictionary words which Levenshtein distance is less or equal, default is 4
result_stats
- provide Levenshtein distance and document count of the found words, default is 1 (enabled)
delta_len
- keep only dictionary words whose length difference is less, default is 3
max_matches
- number of matches to keep, default is 25
reject
- defaults to 4; rejected words are matches that are not better than those already in the match queue. They are put in a rejected queue that gets reset in case one actually can go in the match queue.
+This parameter defines the size of the rejected queue (as reject*max(max_matched,limit)). If the rejected queue is filled, the engine stops looking for potential matches.
result_line
- alternate mode to display the data by returning all suggests, distances and docs each per one row, default is 0
+mysql> CALL QSUGGEST('automaticlly ','forum', 5 as limit, 4 as max_edits,1 as result_stats,3 as delta_len,0 as result_line,25 as max_matches,4 as reject ); ++---------------+----------+------+ +| suggest | distance | docs | ++---------------+----------+------+ +| automatically | 1 | 282 | +| automaticly | 1 | 6 | +| automaticaly | 1 | 3 | +| automagically | 2 | 14 | +| automtically | 2 | 1 | ++---------------+----------+------+ +5 rows in set (0.00 sec) +
SHOW TABLES [ LIKE pattern ]
@@ -5620,7 +5725,7 @@
+-------+-------------+ 1 row in set (0.00 sec)
{DESC | DESCRIBE} index [ LIKE pattern ]
@@ -5652,7 +5757,7 @@
Starting from version 2.1.1-beta, an optional LIKE clause is supported. Refer to Section 8.3, “SHOW META syntax” for its syntax details.
CREATE FUNCTION udf_name RETURNS {INT | INTEGER | BIGINT | FLOAT | STRING} @@ -5679,7 +5784,7 @@Author
| 4 | 1 | 7,40 | 23.500000 | +------+--------+---------+-----------+
DROP FUNCTION udf_name
@@ -5693,7 +5798,7 @@
mysql> DROP FUNCTION avgmva; Query OK, 0 rows affected (0.00 sec)
SHOW [{GLOBAL | SESSION}] VARIABLES [WHERE variable_name='xxx']
SHOW VARIABLES statement was added in version 2.0.1-beta @@ -5723,7 +5828,7 @@
to help certain connectors.
SHOW COLLATION
@@ -5735,7 +5840,7 @@
mysql> SHOW COLLATION; Query OK, 0 rows affected (0.00 sec)
SHOW CHARACTER SET
@@ -5753,7 +5858,7 @@
+---------+---------------+-------------------+--------+ 1 row in set (0.00 sec)
UPDATE index SET col1 = newval1 [, ...] WHERE where_condition [OPTION opt_name = opt_value [, ...]]
@@ -5807,7 +5912,7 @@
ALTER TABLE index {ADD|DROP} COLUMN column_name [{INTEGER|INT|BIGINT|FLOAT|BOOL|MULTI|MULTI64|JSON|STRING}]
@@ -5876,7 +5981,7 @@
4 rows in set (0.00 sec)
ATTACH INDEX diskindex TO RTINDEX rtindex
@@ -5902,7 +6007,7 @@
making ATTACH INDEX a one-time conversion operation only. Those restrictions may be lifted in future releases, as we add the needed functionality to the RT indexes. The complete list is as follows. -
Target RT index needs to be empty. (See Section 8.29, “TRUNCATE RTINDEX syntax”)
Target RT index needs to be empty. (See Section 8.31, “TRUNCATE RTINDEX syntax”)
Source disk index needs to have index_sp=0, boundary_step=0, stopword_step=1.
Source disk index needs to have an empty index_zones setting.
mysql> SELECT * FROM disk WHERE MATCH('test'); ERROR 1064 (42000): no enabled local indexes to search
FLUSH RTINDEX rtindex
@@ -5974,7 +6079,7 @@
write would need to be replayed. Those writes normally happen either on a clean shutdown, or periodically with a (big enough!) interval between writes specified in -rt_flush_period directive. +rt_flush_period directive. So such a backup made at an arbitrary point in time just might end up with way too much binary log data to replay.
@@ -5986,7 +6091,7 @@
mysql> FLUSH RTINDEX rt; Query OK, 0 rows affected (0.05 sec)
FLUSH RAMCHUNK rtindex
@@ -6004,13 +6109,13 @@
Most likely, you want to use FLUSH RTINDEX instead. We suggest that you abstain from using just this statement unless you're absolutely sure what you're doing. As the right way is to issue FLUSH RAMCHUNK with -following OPTIMIZE command. +following OPTIMIZE command. Such combo allows to keep RT index fragmentation on minimum.
mysql> FLUSH RAMCHUNK rt; Query OK, 0 rows affected (0.05 sec)
+FLUSH HOSTNAMES +
+Added in 2.3.2-beta. Renew IPs associates to agent host names. To always query the DNS for getting the host name IP, see hostname_lookup directive. +
+mysql> FLUSH HOSTNAMES; +Query OK, 5 rows affected (0.01 sec) +
TRUNCATE RTINDEX rtindex
@@ -6043,9 +6157,9 @@
You may want to use this if you are using RT indices as "delta index" files; when you build the main index, you need to wipe the delta index, and thus TRUNCATE RTINDEX. -You also need to use this command before attaching an index; see Section 8.25, “ATTACH INDEX syntax”. +You also need to use this command before attaching an index; see Section 8.26, “ATTACH INDEX syntax”.
SHOW AGENT ['agent'|'index'|index] STATUS [ LIKE pattern ]
@@ -6053,7 +6167,7 @@
agents or distributed index. It includes the values like the age of the last request, last answer, the number of different kind of errors and successes, etc. The statistic is shown for every agent for last 1, 5 -and 15 intervals, each of them of ha_period_karma seconds. +and 15 intervals, each of them of ha_period_karma seconds. The command exists only in sphinxql.
mysql> SHOW AGENT STATUS; @@ -6174,7 +6288,7 @@Author
+--------------------------------------+--------------------------------+ 13 rows in set (0.00 sec)
SHOW INDEX index_name STATUS
@@ -6285,6 +6399,8 @@
@@ -6304,7 +6420,7 @@Author
+--------------------+-------------+ 8 rows in set (0.00 sec)
SHOW INDEX index_name[.N | CHUNK N] SETTINGS
@@ -6315,7 +6431,7 @@
a particular chunk number for the RT indexes.
OPTIMIZE INDEX index_name
@@ -6336,8 +6452,8 @@
to the SHOW INDEX STATUS and SHOW STATUS statements respectively). The optimization thread can be IO-throttled, you can control the maximum number of IOs per second and the maximum IO size -with rt_merge_iops -and rt_merge_maxiosize +with rt_merge_iops +and rt_merge_maxiosize directives respectively. The optimization jobs queue is lost on daemon crash.
@@ -6354,7 +6470,7 @@
mysql> OPTIMIZE INDEX rt; Query OK, 0 rows affected (0.00 sec)
SHOW DATABASES
Added in 2.2.1-beta. This is a dummy statement to support MySQL Workbench and other clients that require it. Currently, it does absolutely nothing.
CREATE PLUGIN plugin_name TYPE 'plugin_type' SONAME 'plugin_library'
@@ -6420,7 +6536,7 @@
mysql> CREATE PLUGIN myranker TYPE 'ranker' SONAME 'myplugins.so'; Query OK, 0 rows affected (0.00 sec)
DROP PLUGIN plugin_name TYPE 'plugin_type'
@@ -6435,7 +6551,7 @@
mysql> DROP PLUGIN myranker TYPE 'ranker'; Query OK, 0 rows affected (0.00 sec)
SHOW PLUGINS
@@ -6455,7 +6571,7 @@
+------+----------+----------------+-------+-------+ 1 row in set (0.00 sec)
RELOAD PLUGINS FROM SONAME 'plugin_library'
@@ -6485,7 +6601,7 @@
mysql> RELOAD PLUGINS FROM SONAME 'udfexample.dll'; Query OK, 0 rows affected (0.00 sec)
SHOW THREADS [ OPTION columns=width ]
@@ -6517,7 +6633,7 @@
+------+----------+-------+----------+----------------------------------------------------+ 3 row in set (0.00 sec)
RELOAD INDEX idx [ FROM '/path/to/index_files' ]
@@ -6534,11 +6650,11 @@
mysql> RELOAD INDEX plain_index; mysql> RELOAD INDEX plain_index FROM '/home/mighty/new_index_files';
Starting version 2.0.1-beta, SphinxQL supports multi-statement queries, or batches. Possible inter-statement optimizations described -in Section 5.11, “Multi-queries” do apply to SphinxQL just as well. +in Section 5.12, “Multi-queries” do apply to SphinxQL just as well. The batched queries should be separated by a semicolon. Your MySQL client library needs to support MySQL multi-query mechanism and multiple result set. For instance, mysqli interface in PHP @@ -6596,7 +6712,7 @@
returned should match those that would be returned if the batched queries were sent one by one.
Since version 2.0.1-beta, SphinxQL supports C-style comment syntax.
Everything from an opening /*
sequence to a closing
@@ -6610,7 +6726,7 @@
SELECT /*! SQL_CALC_FOUND_ROWS */ col1 FROM table1 WHERE ...
A complete alphabetical list of keywords that are currently reserved in SphinxQL syntax (and therefore can not be used as identifiers).
@@ -6618,7 +6734,7 @@Author
MOD, NOT, NULL, OR, ORDER, SELECT, TRUE
This section only applies to existing applications that use SphinxQL versions prior to 2.0.1-beta. @@ -6936,7 +7052,7 @@
(Section 9.4.5, “SetGeoAnchor”) are now internally implemented using this computed expressions mechanism, using magic names '@expr' and '@geodist' respectively. -
+Example:
$cl->SetSelect ( "*, @weight+(user_karma+ln(pageviews))*0.1 AS myweight" ); $cl->SetSelect ( "exp_years, salary_gbp*{$gbp_usd_rate} AS salary_usd, IF(age>40,1,0) AS over40" ); @@ -7420,13 +7536,13 @@Author
Added in version 1.10-beta. Whether to handle $docs as data to extract snippets from (default behavior), or to treat it as file names, and load data from specified files on the server side. Starting with - version 2.0.1-beta, up to dist_threads + version 2.0.1-beta, up to dist_threads worker threads per request will be created to parallelize the work when this flag is enabled. Boolean, default is false. Starting with version 2.0.2-beta, - building of the snippets could be parallelized between remote agents. Just set the 'dist_threads' param in the config + building of the snippets could be parallelized between remote agents. Just set the 'dist_threads' param in the config to the value greater than 1, and then invoke the snippets generation over the distributed index, which contain only one(!) local agent and several remotes. - Starting with version 2.1.1-beta, the snippets_file_prefix option is + Starting with version 2.1.1-beta, the snippets_file_prefix option is also in the game and the final filename is calculated by concatenation of the prefix with given name. Otherwords, when snippets_file_prefix is '/var/data' and filename is 'text.txt' the sphinx will try to generate the snippets from the file '/var/datatext.txt', which is exactly '/var/data' + 'text.txt'. @@ -7503,7 +7619,7 @@
They are very fast because they're working fully in RAM, but they can also
be made persistent: updates are saved on disk on clean searchd
shutdown initiated by SIGTERM signal. With additional restrictions, updates
-are also possible on MVA attributes; refer to mva_updates_pool
+are also possible on MVA attributes; refer to mva_updates_pool
directive for details.
Usage example: @@ -8078,10 +8194,10 @@
because to fix it, we need to be able either to reproduce and fix the bug, or to deduce what's causing it from the information that you provide. So here are some instructions on how to do that. -
Nothing special to say here. Here is the +
Nothing special to say here. Here is the <a href="http://sphinxsearch.com/bugs">link</a>. Create a new ticket and describe your bug in details so both you and developers can -save their time.
In case of crashes we sometimes can get enough info to fix from +save their time.
In case of crashes we sometimes can get enough info to fix from backtrace.
Sphinx tries to write crash backtrace to its log file. It may look like this:
@@ -8128,7 +8244,7 @@Author
that the binary is not stripped. Our official binary packages should be fine. (That, or we have the symbols stored.) However, if you manually build Sphinx from the source tarball, do not run
strip
utility on that -binary, and/or do not let your build/packaging system do that!Uploading your data
To fix your bug developers often need to reproduce it on their machines. +binary, and/or do not let your build/packaging system do that!
Uploading your data
To fix your bug developers often need to reproduce it on their machines. To do this they need your sphinx.conf, index files, binlog (if present), sometimes data to index (like SQL tables or XMLpipe2 data files) and queries.
@@ -8270,54 +8386,56 @@
Author
mssql
type is currently only available on Windows.
odbc
type is available both on Windows natively and on
Linux through UnixODBC library.
-
+Example:
type = mysql
and "127.0.0.1" will force TCP/IP usage. Refer to MySQL manual for more details. -
+Example:
sql_host = localhost
Optional, default is 3306 for mysql
source type and 5432 for pgsql
type.
Applies to SQL source types (mysql
, pgsql
, mssql
) only.
Note that it depends on sql_host setting whether this value will actually be used.
-
+Example:
sql_port = 3306
SQL user to use when connecting to sql_host.
Mandatory, no default value.
Applies to SQL source types (mysql
, pgsql
, mssql
) only.
-
+Example:
sql_user = test
SQL user password to use when connecting to sql_host.
Mandatory, no default value.
Applies to SQL source types (mysql
, pgsql
, mssql
) only.
-
+Example:
sql_pass = mysecretpassword
SQL database (in MySQL terms) to use after the connection and perform further queries within.
Mandatory, no default value.
Applies to SQL source types (mysql
, pgsql
, mssql
) only.
-
+Example:
sql_db = test
On Linux, it would typically be /var/lib/mysql/mysql.sock
.
On FreeBSD, it would typically be /tmp/mysql.sock
.
Note that it depends on sql_host setting whether this value will actually be used.
-
+Example:
sql_sock = /tmp/mysql.sock
both in theory and in practice. However, enabling compression on 100 Mbps links may improve indexing time significantly (upto 20-30% of the total indexing time improvement was reported). Your mileage may vary. -
+Example:
mysql_connect_flags = 32 # enable compression
indexer
and MySQL. The details on creating
the certificates and setting up MySQL server can be found in
MySQL documentation.
-
+Example:
mysql_ssl_cert = /etc/ssl/client-cert.pem mysql_ssl_key = /etc/ssl/client-key.pem mysql_ssl_ca = /etc/ssl/cacert.pem @@ -8461,7 +8579,7 @@Author
ODBC DSN (Data Source Name) specifies the credentials (host, user, password, etc) to use when connecting to ODBC data source. The format depends on specific ODBC driver used. -
Example:
+Example:
odbc_dsn = Driver={Oracle ODBC Driver};Dbq=myDBName;Uid=myUsername;Pwd=myPassword
sql_query_pre = SET SESSION query_cache_type=OFF
-
+Example:
sql_query_pre = SET NAMES utf8 sql_query_pre = SET SESSION query_cache_type=OFF
by default it builds with 32-bit IDs support but --enable-id64
option
to configure
allows to build with 64-bit document and word IDs support.
-
+Example:
sql_query = \ SELECT id, group_id, UNIX_TIMESTAMP(date_added) AS date_added, \ title, content \ @@ -8594,7 +8712,7 @@Author
it will automatically switch to a variant that matches keywords in those fields, computes a sum of matched payloads multiplied by field weights, and adds that sum to the final rank. -
Example:
+Example:
sql_joined_field = \ tagstext from query; \ SELECT docid, CONCAT('tag',tagid) FROM tags ORDER BY docid ASC @@ -8627,7 +8745,7 @@Author
exactly equal to
$start
or$end
from your query. The example in Section 3.8, “Ranged queries”) illustrates that; note how it uses greater-or-equal and less-or-equal comparisons. -Example:
+Example:
sql_query_range = SELECT MIN(id),MAX(id) FROM documents
-
+Example:
sql_range_step = 1000
over the network when sending queries. (Because that might be too much of an impact when the K-list is huge.) You will need to setup a separate per-server K-lists in that case. -
+Example:
sql_query_killlist = \ SELECT id FROM documents WHERE updated_ts>=@last_reindex UNION \ SELECT id FROM documents_deleted WHERE deleted_ts>=@last_reindex @@ -8716,7 +8834,7 @@Author
such bitfields are packed together in 32-bit chunks in
.spa
attribute data file. Bit size settings are ignored if using inline storage. -Example:
+Example:
sql_attr_uint = group_id sql_attr_uint = forum_id:9 # 9 bits for forum_id
Multi-value (there might be multiple attributes declared), optional.
Applies to SQL source types (mysql
, pgsql
, mssql
) only.
Equivalent to sql_attr_uint declaration with a bit count of 1.
-
+Example:
sql_attr_bool = is_deleted # will be packed to 1 bit
Note that unlike sql_attr_uint, these values are signed. Introduced in version 0.9.9-rc1. -
+Example:
sql_attr_bigint = my_bigint_id
and UNIX_TIMESTAMP() in MySQL will not return anything expected. If you only needs to work with dates, not times, consider TO_DAYS() function in MySQL instead. -
+Example:
# sql_query = ... UNIX_TIMESTAMP(added_datetime) AS added_ts ... sql_attr_timestamp = added_ts
One important usage of the float attributes is storing latitude and longitude values (in radians), for further usage in query-time geosphere distance calculations. -
+Example:
sql_attr_float = lat_radians sql_attr_float = long_radians
RANGE-QUERY is SQL query used to fetch min and max ID values, similar to 'sql_query_range'
-
+Example:
sql_attr_multi = uint tag from query; SELECT id, tag FROM tags sql_attr_multi = bigint tag from ranged-query; \ SELECT id, tag FROM tags WHERE id>=$start AND id<=$end; \ @@ -8830,7 +8948,7 @@Author
declared using
sql_attr_string
will not be full-text indexed; you can use sql_field_string directive for that. -Example:
+Example:
sql_attr_string = title # will be stored but will not be indexed
You can read more on JSON attributes in http://sphinxsearch.com/blog/2013/08/08/full-json-support-in-trunk/. -
+Example:
sql_attr_json = properties
sql_column_buffers = <colname>=<size>[K|M] [, ...]
-
+Example:
sql_query = SELECT id, mytitle, mycontent FROM documents sql_column_buffers = mytitle=64K, mycontent=10M
value but does not full-text index it. In some cases it might be desired to both full-text
index the column and store it as attribute. sql_field_string
lets you do
exactly that. Both the field and the attribute will be named the same.
-
+Example:
sql_field_string = title # will be both indexed and stored
in size are skipped. Any errors during the file loading (IO errors, missed limits, etc) will be reported as indexing warnings and will not early terminate the indexing. No content will be indexed for such files. -
+Example:
sql_file_field = my_file_path # load and index files referred to by my_file_path
For instance, updates on helper table that permanently change the last successfully indexed ID should not be run from post-fetch query; they should be run from post-index query instead. -
+Example:
sql_query_post = DROP TABLE my_tmp_table
expanded to maximum document ID which was actually fetched from the database during indexing. If no documents were indexed, $maxid will be expanded to 0. -
+Example:
sql_query_post_index = REPLACE INTO counters ( id, val ) \ VALUES ( 'max_indexed_id', $maxid )
database server. It causes the indexer to sleep for given amount of milliseconds once per each ranged query step. This sleep is unconditional, and is performed before the fetch query. -
+Example:
sql_ranged_throttle = 1000 # sleep for 1 sec before each query step
Specifies a command that will be executed and which output will be parsed for documents. Refer to Section 3.9, “xmlpipe2 data source” for specific format description. -
+Example:
xmlpipe_command = cat /home/sphinx/test.xml
xmlpipe field declaration.
Multi-value, optional.
Applies to xmlpipe2
source type only. Refer to Section 3.9, “xmlpipe2 data source”.
-
+Example:
xmlpipe_field = subject xmlpipe_field = content
Makes the specified XML element indexed as both a full-text field and a string attribute. Equivalent to <sphinx:field name="field" attr="string"/> declaration within the XML file. -
+Example:
xmlpipe_field_string = subject
Multi-value, optional.
Applies to xmlpipe2
source type only.
Syntax fully matches that of sql_attr_uint.
-
+Example:
xmlpipe_attr_uint = author_id
Multi-value, optional.
Applies to xmlpipe2
source type only.
Syntax fully matches that of sql_attr_bigint.
-
+Example:
xmlpipe_attr_bigint = my_bigint_id
Multi-value, optional.
Applies to xmlpipe2
source type only.
Syntax fully matches that of sql_attr_bool.
-
+Example:
xmlpipe_attr_bool = is_deleted # will be packed to 1 bit
Multi-value, optional.
Applies to xmlpipe2
source type only.
Syntax fully matches that of sql_attr_timestamp.
-
+Example:
xmlpipe_attr_timestamp = published
Multi-value, optional.
Applies to xmlpipe2
source type only.
Syntax fully matches that of sql_attr_float.
-
+Example:
xmlpipe_attr_float = lat_radians xmlpipe_attr_float = long_radians
that will constitute the MVA will be extracted, similar to how sql_attr_multi parses SQL column contents when 'field' MVA source type is specified. -
+Example:
xmlpipe_attr_multi = taglist
that will constitute the MVA will be extracted, similar to how sql_attr_multi parses SQL column contents when 'field' MVA source type is specified. -
+Example:
xmlpipe_attr_multi_64 = taglist
This setting declares a string attribute tag in xmlpipe2 stream. The contents of the specified tag will be parsed and stored as a string value. -
+Example:
xmlpipe_attr_string = subject
XML tag are to be treated as a JSON document and stored into a Sphinx index for later use. Refer to Section 12.1.24, “sql_attr_json” for more details on the JSON attributes. -
+Example:
xmlpipe_attr_json = properties
UTF8 fixup feature lets you avoid that. When fixup is enabled, Sphinx will preprocess the incoming stream before passing it to the XML parser and replace invalid UTF-8 sequences with spaces. -
+Example:
xmlpipe_fixup_utf8 = 1
authentication when connecting to MS SQL Server. Note that when running
searchd
as a service, account user can differ
from the account you used to install the service.
-
+Example:
mssql_winauth = 1
using standard zlib algorithm (called deflate and also implemented by gunzip
).
When indexing on a different box than the database, this lets you offload the database, and save on network traffic.
The feature is only available if zlib and zlib-devel were both available during build time.
-
+Example:
unpack_zlib = col1 unpack_zlib = col2
using modified zlib algorithm used by MySQL COMPRESS() and UNCOMPRESS() functions. When indexing on a different box than the database, this lets you offload the database, and save on network traffic. The feature is only available if zlib and zlib-devel were both available during build time. -
+Example:
unpack_mysqlcompress = body_compressed unpack_mysqlcompress = description_compressed
data can not go over the buffer size. This option lets you control the buffer size,
both to limit indexer
memory use, and to enable unpacking
of really long data fields if necessary.
-
+Example:
unpack_mysqlcompress_maxsize = 1M
Index type setting lets you choose the needed type. By default, plain local index type will be assumed. -
+Example:
type = distributed
-
+Example:
source = srcpart1 source = srcpart2 source = srcpart3 @@ -9304,7 +9422,7 @@Author
.spe
stores skip-lists to speed up doc-list filtering
-
+Example:
path = /var/data/test1
However, such cases are infrequent, and docinfo defaults to "extern". Refer to Section 3.3, “Attributes” for in-depth discussion and RAM usage estimates. -
+Example:
docinfo = inline
from root account, or be granted enough privileges otherwise. If mlock() fails, a warning is emitted, but index continues working. -
+Example:
mlock = 1
a matching entry in the dictionary, stemmers will not be applied at all. Or in other words, wordforms can be used to implement stemming exceptions. -
+Example:
morphology = stem_en, libstemmer_sv
on how many actual keywords match the given substring (in other words, into how many keywords does the search term expand). The maximum number of keywords matched is restricted by the -expansion_limit +expansion_limit directive.
Essentially, keywords and CRC dictionaries represent the two @@ -9558,7 +9676,7 @@
top-speed worst-case searches (CRC dictionary), or only slightly impact indexing time but sacrifice worst-case searching time when the prefix expands into very many keywords (keywords dictionary). -
+Example:
dict = keywords
PRE, TABLE, TBODY, TD, TFOOT, TH, THEAD, TR, and UL.
Both sentences and paragraphs increment the keyword position counter by 1. -
+Example:
index_sp = 1
in a document. Once indexed, zones can then be used for matching with the ZONE operator, see Section 5.3, “Extended query syntax”. -
+Example:
index_zones = h*, th, titleEarlier versions than 2.1.1-beta only provided this feature for plain @@ -9648,7 +9766,7 @@
Author
exactly as long as specified will be stemmed. So in order to avoid stemming 3-character keywords, you should specify 4 for the value. For more finely grained control, refer to wordforms feature. -
Example:
+Example:
min_stemming_len = 4
of the index, sorted by the keyword frequency, see --buildstops
and --buildfreqs
switch in Section 7.1, “indexer
command reference”.
Top keywords from that dictionary can usually be used as stopwords.
-
+Example:
stopwords = /usr/local/sphinx/data/stopwords.txt stopwords = stopwords-ru.txt stopwords-en.txt
s02e02 > season 2 episode 2 s3 e3 > season 3 episode 3
-
+Example:
wordforms = /usr/local/sphinx/data/wordforms.txt wordforms = /usr/local/sphinx/data/alternateforms.txt wordforms = /usr/local/sphinx/private/dict*.txt @@ -9787,7 +9905,7 @@Author
time it makes no sense to embed a 100 MB wordforms dictionary into a tiny delta index. So there needs to be a size threshold, and
embedded_limit
is that threshold. -Example:
+Example:
embedded_limit = 32K
during indexing and searching respectively. Therefore, to pick up
changes in the file it's required to reindex and restart
searchd
.
-
+Example:
exceptions = /usr/local/sphinx/data/exceptions.txt
Only those words that are not shorter than this minimum will be indexed. For instance, if min_word_len is 4, then 'the' won't be indexed, but 'they' will be. -
+Example:
min_word_len = 4
Starting with 2.2.3-beta, aliases "english" and "russian" are allowed at control character mapping. -
+Example:
# default are English and Russian letters charset_table = 0..9, A..Z->a..z, _, a..z, \ U+410..U+42F->U+430..U+44F, U+430..U+44F, U+401->U+451, U+451 @@ -9968,7 +10086,7 @@Author
The syntax is the same as for charset_table, but it's only allowed to declare characters, and not allowed to map them. Also, the ignored characters must not be present in charset_table. -
Example:
+Example:
ignore_chars = U+AD
Perfect word matches can be differentiated from prefix matches, and ranked higher, by utilizing all of the following options: a) dict=keywords (on by default), b) index_exact_words=1 (off by default), and c) expand_keywords=1 (also off by default). Note that either with the legacy dict=crc mode (which you should ditch anyway!), or with any of the above options disable, there is no data to differentiate between the prefixes and full words, and thus perfect word matches can't be ranked higher. -
+Example:
min_prefix_len = 3
identify keywords matching on just a single character, expand '*a*' to an OR operator over 100,000+ keywords, and evaluate that expanded query, in practice this will very definitely kill your server.) -
+Example:
min_infix_len = 3
and intentionally forbidden in that case. If required, you can still limit the length of a substring that you search for in the application code. -
+Example:
max_substring_len = 12
page contents. prefix_fields specifies what fields will be prefix-indexed; all other fields will be indexed in normal mode. The value format is a comma-separated list of field names. -
+Example:
prefix_fields = url, domain
Similar to prefix_fields, but lets you limit infix-indexing to given fields. -
+Example:
infix_fields = url, domain
good results, thanks to phrase based ranking: it will pull closer phrase matches (which in case of N-gram CJK words can mean closer multi-character word matches) to the top. -
+Example:
ngram_len = 1
this list defines characters, sequences of which are subject to N-gram extraction. Words comprised of other characters will not be affected by N-gram indexing feature. The value format is identical to charset_table. -
+Example:
ngram_chars = U+3000..U+2FA1F
Phrase boundary condition will be raised if and only if such character is followed by a separator; this is to avoid abbreviations such as S.T.A.L.K.E.R or URLs being treated as several phrases. -
+Example:
phrase_boundary = ., ?, !, U+2026 # horizontal ellipsis
On phrase boundary, current word position will be additionally incremented by this number. See phrase_boundary for details. -
+Example:
phrase_boundary_step = 100
There are no restrictions on tag names; ie. everything that looks like a valid tag start, or end, or a comment will be stripped. -
+Example:
html_strip = 1
Specifies HTML markup attributes whose contents should be retained and indexed even though other HTML markup is stripped. The format is per-tag enumeration of indexable attributes, as shown in the example below. -
+Example:
html_index_attrs = img=alt,title; a=title;
The value is a comma-separated list of element (tag) names whose contents should be removed. Tag names are case insensitive. -
+Example:
html_remove_elements = style, script
Note that by default all local indexes will be searched sequentially,
utilizing only 1 CPU or core. To parallelize processing of the local parts
in the distributed index, you should use dist_threads
directive,
-see Section 12.4.26, “dist_threads”.
+see Section 12.4.27, “dist_threads”.
Before dist_threads
, there also was a legacy solution
to configure searchd
to query itself instead of using
local indexes (refer to Section 12.2.31, “agent” for the details). However,
that creates redundant CPU and network load, and dist_threads
is now strongly suggested instead.
-
+Example:
local = chunk1 local = chunk2
Starting with 2.2.9-release, the value can additionally enumerate per agent options such as:
ha_strategy - random, -roundrobin, nodeads, noerrors (replces index ha_strategy +roundrobin, nodeads, noerrors (replaces index ha_strategy for particular agent)
conn - pconn, persistent (same as agent_persistent @@ -10324,7 +10442,7 @@
agent = address1:index-list[[ha_strategy=value] | [conn=value] | [blackhole=value]] -
+
# config on box2 # sharding an index over 3 servers agent = box2:9312:chunk2 @@ -10344,7 +10462,7 @@Author
agent = box1:9312:chunk1[ha_strategy=nodeads] agent = box2:9312:chunk2[conn=pconn] agent = test:9312:any[blackhole=1] -
+
New syntax added in 2.1.1-beta lets you define so-called agent mirrors that can be used interchangeably when processing a search query. Master server keeps track of mirror status (alive or dead) and response times, and does @@ -10365,7 +10483,7 @@
By default, all queries are routed to the best of the mirrors. The best one is picked based on the recent statistics, as controlled by the -ha_period_karma config directive. +ha_period_karma config directive. Master stores a number of metrics (total query count, error count, response time, etc) recently observed for every agent. It groups those by time spans, and karma is that time span length. The best agent mirror is then determined @@ -10375,16 +10493,16 @@
The karma period is in seconds and defaults to 60 seconds. Master stores upto 15 karma spans with per-agent statistics for instrumentation purposes -(see SHOW AGENT STATUS +(see SHOW AGENT STATUS statement). However, only the last 2 spans out of those are ever used for HA/LB logic.
When there are no queries, master sends a regular ping command every -ha_ping_interval milliseconds +ha_ping_interval milliseconds in order to have some statistics and at least check, whether the remote host is still alive. ha_ping_interval defaults to 1000 msec. Setting it to 0 disables pings and statistics will only be accumulated based on actual queries. -
+Example:
# sharding index over 4 servers total # in just 2 chunks but with 2 failover mirrors for each chunk # box1, box2 carry chunk1 as local @@ -10407,10 +10525,10 @@Author
is that the master will not open a new connection to the agent for every query and then close it. Rather, it will keep a connection open and attempt to reuse for the subsequent queries. The maximal number of such persistent connections per one agent host -is limited by persistent_connections_limit option of searchd section. +is limited by persistent_connections_limit option of searchd section.
Note, that you have to set the last one in something greater than 0 if you want to use persistent agent connections. -Otherwise - when persistent_connections_limit is not defined, it assumes +Otherwise - when persistent_connections_limit is not defined, it assumes the zero num of persistent connections, and 'agent_persistent' acts exactly as simple 'agent'.
Persistent master-agent connections reduce TCP port pressure, and @@ -10418,7 +10536,7 @@
Author
in workers=threads mode. In other modes, simple non-persistent connections (i.e., one connection per operation) will be used, and a warning will show up in the console. -
Example:
+Example:
agent_persistent = remotebox:9312:index2
Also, all network errors on blackhole agents will be ignored. The value format is completely identical to regular agent directive. -
+Example:
agent_blackhole = testbox:9312:testindex1,testindex2
successfully. If the timeout is reached but connect() does not complete, and retries are enabled, retry will be initiated. -
+Example:
agent_connect_timeout = 300
a remote agent equals to the sum of agent_connection_timeout
and
agent_query_timeout
. Queries will not be retried
if this timeout is reached; a warning will be produced instead.
-
+Example:
agent_query_timeout = 10000 # our query can be long, allow up to 10 sec
This directive does not affect indexer
in any way,
it only affects searchd
.
-
+Example:
preopen = 1
This directive does not affect searchd
in any way,
it only affects indexer
.
-
+Example:
inplace_enable = 1
This directive does not affect searchd
in any way,
it only affects indexer
.
-
+Example:
inplace_hit_gap = 1M
This directive does not affect searchd
in any way,
it only affects indexer
.
-
+Example:
inplace_docinfo_gap = 1M
This directive does not affect searchd
in any way,
it only affects indexer
.
-
+Example:
inplace_reloc_factor = 0.1
This directive does not affect searchd
in any way,
it only affects indexer
.
-
+Example:
inplace_write_factor = 0.1
enables exact form operator in the query language to work. This impacts the index size and the indexing time. However, searching performance is not impacted at all. -
+Example:
index_exact_words = 1
This directive does not affect searchd
in any way,
it only affects indexer
.
-
+Example:
overshort_step = 1
This directive does not affect searchd
in any way,
it only affects indexer
.
-
+Example:
stopword_step = 1
hitless, "simon says hello world" will be converted to ("simon says" & hello & world) query, matching all documents that contain "hello" and "world" anywhere in the document, and also "simon says" as an exact phrase. -
+Example:
hitless_words = all
This directive does not affect indexer
in any way,
it only affects searchd
.
-
+Example:
expand_keywords = 1
so that multiple different blended characters could be normalized into just one base form. This is useful when indexing multiple alternative Unicode codepoints with equivalent glyphs. -
+Example:
blend_chars = +, &, U+23 blend_chars = +, &->+ # 2.0.1 and above
Default behavior is to index the entire token, equivalent to
blend_mode = trim_none
.
-
+Example:
blend_mode = trim_tail, skip_pure
hence, specifying 512 MB limit and only inserting 3 MB of data should result in allocating 3 MB, not 512 MB.
-
+Example:
rt_mem_limit = 512M
in INSERT statements without an explicit list of inserted columns will have to be in the same order as configured.
-
+Example:
rt_field = author rt_field = title rt_field = content @@ -10801,7 +10919,7 @@Author
Multi-value (an arbitrary number of attributes is allowed), optional. Declares an unsigned 32-bit attribute. Introduced in version 1.10-beta. -
Example:
+Example:
rt_attr_uint = gid
Multi-value (there might be multiple attributes declared), optional. Declares a 1-bit unsigned integer attribute. Introduced in version 2.1.2-release. -
+Example:
rt_attr_bool = available
Multi-value (an arbitrary number of attributes is allowed), optional. Declares a signed 64-bit attribute. Introduced in version 1.10-beta. -
+Example:
rt_attr_bigint = guid
Multi-value (an arbitrary number of attributes is allowed), optional. Declares a single precision, 32-bit IEEE 754 format float attribute. Introduced in version 1.10-beta. -
+Example:
rt_attr_float = gpa
Declares the UNSIGNED INTEGER (unsigned 32-bit) MVA attribute. Multi-value (ie. there may be more than one such attribute declared), optional. Applies to RT indexes only. -
+Example:
rt_attr_multi = my_tags
Declares the BIGINT (signed 64-bit) MVA attribute. Multi-value (ie. there may be more than one such attribute declared), optional. Applies to RT indexes only. -
+Example:
rt_attr_multi_64 = my_wide_tags
Timestamp attribute declaration. Multi-value (an arbitrary number of attributes is allowed), optional. Introduced in version 1.10-beta. -
+Example:
rt_attr_timestamp = date_added
String attribute declaration. Multi-value (an arbitrary number of attributes is allowed), optional. Introduced in version 1.10-beta. -
+Example:
rt_attr_string = author
Introduced in version 2.1.1-beta.
Refer to Section 12.1.24, “sql_attr_json” for more details on the JSON attributes. -
+Example:
rt_attr_json = properties
index. Essentially, this directive controls how exactly master does the load balancing between the configured mirror agent nodes. As of 2.1.1-beta, the following strategies are implemented: -
ha_strategy = random
+
ha_strategy = random
The default balancing mode. Simple linear random distribution among the mirrors. That is, equal selection probability are assigned to every mirror. Kind of similar to round-robin (RR), but unlike RR, does not impose a strict selection order. -
+
The default simple random strategy does not take mirror status, error rate, and, most importantly, actual response latencies into account. So to accommodate for heterogeneous clusters and/or temporary spikes in agent node load, we have @@ -10934,7 +11052,7 @@
ha_strategy = noerrors
Latency-weighted probabilities, but mirrors with worse errors/success ratio are excluded from the selection. -
ha_strategy = roundrobin
Simple round-robin selection, that is, selecting the 1st mirror +
ha_strategy = roundrobin
Simple round-robin selection, that is, selecting the 1st mirror in the list, then the 2nd one, then the 3rd one, etc, and then repeating the process once the last mirror in the list is reached. Unlike with the randomized strategies, RR imposes a strict querying order (1, 2, 3, .., @@ -10960,7 +11078,7 @@
to index a current word pair or not.
bigram_freq_words
lets you define a list of such keywords.
-
+Example:
bigram_freq_words = the, a, you, i
For most usecases, both_freq
would be the best mode, but
your mileage may vary.
-
+Example:
bigram_freq_words = both_freq
and its extension towards multiple fields, called BM25F. They require per-document length and per-field lengths, respectively. Hence the additional directive. -
+Example:
index_field_lengths = 1
installed in the system and Sphinx must be configured built with a
--with-re2
switch. Binary packages should come with RE2
builtin.
-
+Example:
# index '13-inch' as '13inch' regexp_filter = \b(\d+)\" => \1inch @@ -11088,7 +11206,7 @@Author
stopwords_unstemmed directive fixes that issue. When it's enabled, stopwords are applied before stemming (and therefore to the original word forms), and the tokens are stopped when token == stopword. -
Example:
+Example:
stopwords_unstemmed = 1
first, then converting those to .idf format using --buildidf
,
then merging all .idf files across cluser using --mergeidf
.
Refer to Section 7.4, “indextool
command reference” for more information.
-
+Example:
global_idf = /usr/local/sphinx/var/global.idf
RLP context configuration file. Mandatory if RLP is used. Added in 2.2.1-beta. -
+Example:
rlp_context = /home/myuser/RLP/rlp-context.xml
Note that this option also affects RT indexes. When it is enabled, all atribute updates will be disabled, and also all disk chunks of RT indexes will behave described above. However inserting and deleting of docs from RT indexes is still possible with enabled ondisk_attrs. -
and the database server can timeout. You can resolve that
either by raising timeouts on SQL server side or by lowering
mem_limit
.
-
+Example:
mem_limit = 256M # mem_limit = 262144K # same, but in KB # mem_limit = 268435456 # same, but in bytes @@ -11216,7 +11334,7 @@Author
(that's mostly limited by disk heads seek time). Limiting indexing I/O to a fraction of that can help reduce search performance degradation caused by indexing. -
Example:
+Example:
max_iops = 40
by max_iops setting. At the time of this
writing, all I/O calls should be under 256 KB (default internal buffer size)
anyway, so max_iosize
values higher than 256 KB must not affect anything.
-
+Example:
max_iosize = 1048576
Maximum allowed field size for XMLpipe2 source type, bytes. Optional, default is 2 MB. -
+Example:
max_xmlpipe2_field = 8M
mem_limit. Note that several (currently up to 4) buffers for different files will be allocated, proportionally increasing the RAM usage. -
+Example:
write_buffer = 4M
(for example) 2 MB in size, but max_file_field_buffer
value is 128 MB, peak buffer usage would still be only 2 MB. However,
files over 128 MB would be entirely skipped.
-
+Example:
max_file_field_buffer = 128M
makes all connections to that port bypass the thread pool and always forcibly create a new dedicated thread. That's useful for managing in case of a severe overload when the daemon would either stall or not let you connect via a regular port. -
+Examples:
listen = localhost listen = localhost:5000 listen = 192.168.0.1:5000 @@ -11403,7 +11521,7 @@Author
Also you can use the 'syslog' as the file name. In this case the events will be sent to syslog daemon. To use the syslog option the sphinx must be configured '--with-syslog' on building. -
Example:
+Example:
log = /var/log/searchd.log
In this case all search queries will be sent to syslog daemon with LOG_INFO priority, prefixed with '[query]' instead of timestamp. To use the syslog option the sphinx must be configured '--with-syslog' on building. -
+Example:
query_log = /var/log/query.log
on the fly, using SET GLOBAL query_log_format=sphinxql
syntax.
Refer to Section 5.9, “searchd
query log formats” for more discussion and format
details.
-
+Example:
query_log_format = sphinxql
Network client request read timeout, in seconds.
Optional, default is 5 seconds.
searchd
will forcibly close the client connections which fail to send a query within this timeout.
-
+Example:
read_timeout = 1
Maximum time to wait between requests (in seconds) when using persistent connections. Optional, default is five minutes. -
+Example:
client_timeout = 3600
+Maximum time to wait between requests (in seconds) when using +sphinxql interface. Optional, default is 15 minutes. Introduced in 2.3.2-beta. +
+sphinxql_timeout = 900 +
Maximum amount of worker threads (or in other words, concurrent queries to run in parallel). Optional, default is 0 (unlimited) in workers=threads, or 1.5 times the CPU cores count @@ -11476,23 +11601,23 @@
an internal network thread, and only the 2 actually active queries will be subject to max_children limit. When the limit is reached, any additional incoming connections will still be accepted, and any additional -queries will get enqueued +queries will get enqueued until there are free worker threads. The queries will only start failing with a temporary Thus, in thread_pool mode it makes little sense to raise max_children much higher than the amount of CPU cores. Usually that will only hurt CPU contention and decrease the general throughput. -
+Example:
max_children = 10
Number of network threads for workers=thread_pool mode, default is 1.
Useful for extremely high query rates, when just 1 thread is not enough to manage all the incoming queries.
Maximum pending queries queue length for workers=thread_pool mode, default is 0 (unlimited).
@@ -11500,7 +11625,7 @@
This directive lets you constrain queue length and start rejecting incoming queries at some point with a "maxed out" message.
searchd
process ID file name.
Mandatory.
@@ -11512,10 +11637,10 @@
of searchd
; to stop searchd
;
to notify it that it should rotate the indexes. Can also be used for
different external automation scripts.
-
+Example:
pid_file = /var/run/searchd.pid
Prevents searchd
stalls while rotating indexes with huge amounts of data to precache.
Optional, default is 1 (enable seamless rotation). On Windows systems seamless rotation is disabled by default.
@@ -11551,10 +11676,10 @@
memory usage during the rotation (because both old and new copies of
.spa/.spi/.spm
data need to be in RAM while
preloading new copy). Average usage stays the same.
-
+Example:
seamless_rotate = 1
Whether to forcibly preopen all indexes on startup. Optional, default is 1 (preopen everything). @@ -11574,17 +11699,17 @@
They also make searchd
use more file
handles. In most scenarios it's therefore preferred and
recommended to preopen indexes.
-
+Example:
preopen_indexes = 1
Whether to unlink .old index copies on successful rotation. Optional, default is 1 (do unlink). -
+Example:
unlink_old = 0
When calling UpdateAttributes()
to update document attributes in
real-time, changes are first written to the in-memory copy of attributes
@@ -11597,20 +11722,20 @@
between those intervals is set with attr_flush_period
, in seconds.
It defaults to 0, which disables the periodic flushing, but flushing will still occur at normal shut-down. -
+Example:
attr_flush_period = 900 # persist updates to disk every 15 minutes
Maximum allowed network packet size. Limits both query packets from clients, and response packets from remote agents in distributed environment. Only used for internal sanity checks, does not directly affect RAM use or performance. Optional, default is 8M. Introduced in version 0.9.9-rc1. -
+Example:
max_packet_size = 32M
Shared pool size for in-memory MVA updates storage. Optional, default size is 1M. @@ -11625,28 +11750,28 @@
In the meantime, MVA updates are intended to be used as a measure to quickly catchup with latest changes in the database until the next index rebuild; not as a persistent storage mechanism. -
+Example:
mva_updates_pool = 16M
Maximum allowed per-query filter count. Only used for internal sanity checks, does not directly affect RAM use or performance. Optional, default is 256. Introduced in version 0.9.9-rc1. -
+Example:
max_filters = 1024
Maximum allowed per-filter values count. Only used for internal sanity checks, does not directly affect RAM use or performance. Optional, default is 4096. Introduced in version 0.9.9-rc1. -
+Example:
max_filter_values = 16384
TCP listen backlog. Optional, default is 5. @@ -11657,10 +11782,10 @@
fail with "connection refused" message. listen_backlog directive controls the length of the connection queue. Non-Windows builds should work fine with the default value. -
+Example:
listen_backlog = 20
Per-keyword read buffer size. Optional, default is 256K. @@ -11669,10 +11794,10 @@
two associated read buffers (one for document list and one for hit list). This setting lets you control their sizes, increasing per-query RAM use, but possibly decreasing IO time. -
+Example:
read_buffer = 1M
Unhinted read size. Optional, default is 32K. @@ -11685,43 +11810,43 @@
unhinted read size, but raising it for smaller lists. It will not affect RAM use because read buffer will be already allocated. So it should be not greater than read_buffer. -
+Example:
read_unhinted = 32K
Limits the amount of queries per batch. Optional, default is 32.
Makes searchd perform a sanity check of the amount of the queries -submitted in a single batch when using multi-queries. +submitted in a single batch when using multi-queries. Set it to 0 to skip the check. -
+Example:
max_batch_queries = 256
Max common subtree document cache size, per-query. Optional, default is 0 (disabled).
-Limits RAM usage of a common subtree optimizer (see Section 5.11, “Multi-queries”). +Limits RAM usage of a common subtree optimizer (see Section 5.12, “Multi-queries”). At most this much RAM will be spent to cache document entries per each query. Setting the limit to 0 disables the optimizer. -
+Example:
subtree_docs_cache = 8M
Max common subtree hit cache size, per-query. Optional, default is 0 (disabled).
-Limits RAM usage of a common subtree optimizer (see Section 5.11, “Multi-queries”). +Limits RAM usage of a common subtree optimizer (see Section 5.12, “Multi-queries”). At most this much RAM will be spent to cache keyword occurrences (hits) per each query. Setting the limit to 0 disables the optimizer. -
+Example:
subtree_hits_cache = 16M
Multi-processing mode (MPM). Optional; allowed values are thread_pool, and threads. @@ -11751,10 +11876,10 @@
does not suffer from overheads of creating a new thread per every new connection and managing a lot of parallel threads. As of 2.3.1, we still retain workers=threads for the transition period, but thread pool is scheduled to become the only MPM mode. -
+Example:
workers = thread_pool
Max local worker threads to use for parallelizable requests (searching a distributed index; building a batch of snippets). Optional, default is 0, which means to disable in-request parallelism. @@ -11782,7 +11907,7 @@
Up to dist_threads
threads are be created to process
those files. That speeds up snippet extraction when the total amount
of document data to process is significant (hundreds of megabytes).
-
+Example:
index dist_test { type = distributed @@ -11796,7 +11921,7 @@Author
dist_threads = 4
Binary log (aka transaction log) files path. Optional, default is build-time configured data directory. @@ -11821,11 +11946,11 @@
Otherwise, the default path, which in most cases is the same as working folder, may point to the folder with no write access (for example, /usr/local/var/data). In this case, the searchd will not start at all. -
+Example:
binlog_path = # disable logging binlog_path = /var/data # /var/data/binlog.001 etc will be created
Binary log transaction flush/sync mode. Optional, default is 2 (flush every transaction, sync every second). @@ -11853,10 +11978,10 @@
cases, the default hybrid mode 2 provides a nice balance of speed and safety, with full RT index data protection against daemon crashes, and some protection against hardware ones. -
+Example:
binlog_flush = 1 # ultimate safety, low speed
Maximum binary log file size. Optional, default is 0 (do not reopen binlog file based on size). @@ -11865,10 +11990,10 @@
A new binlog file will be forcibly opened once the current binlog file reaches this limit. This achieves a finer granularity of logs and can yield more efficient binlog disk usage under certain borderline workloads. -
+Example:
binlog_max_log_size = 16M
A prefix to prepend to the local file names when generating snippets. Optional, default is empty. @@ -11889,10 +12014,10 @@
This might be useful, for instance, when the document storage locations (be those local storage or NAS mountpoints) are inconsistent across the servers. -
+Example:
snippets_file_prefix = /mnt/common/server1/
Default server collation. Optional, default is libc_ci. @@ -11900,22 +12025,22 @@
Specifies the default collation used for incoming requests. The collation can be overridden on a per-query basis. -Refer to Section 5.12, “Collations” section for the list of available collations and other details. -
+Refer to Section 5.13, “Collations” section for the list of available collations and other details. +Example:
collation_server = utf8_ci
Server libc locale. Optional, default is C. Introduced in version 2.0.1-beta.
Specifies the libc locale, affecting the libc-based collations. -Refer to Section 5.12, “Collations” section for the details. -
+Refer to Section 5.13, “Collations” section for the details. +Example:
collation_libc_locale = fr_FR
A server version string to return via MySQL protocol. Optional, default is empty (return Sphinx version). @@ -11931,10 +12056,10 @@
mysql_version_string
directive and have searchd
report a different version to clients connecting over MySQL protocol.
(By default, it reports its own version.)
-
+Example:
mysql_version_string = 5.0.37
RT indexes RAM chunk flush check period, in seconds. Optional, default is 10 hours. @@ -11946,10 +12071,10 @@
periodic flush checks, and eligible RAM chunks can get saved, enabling consequential binlog cleanup. See Section 4.4, “Binary logging” for more details. -
+Example:
rt_flush_period = 3600 # 1 hour
Per-thread stack size. Optional, default is 1M. @@ -11972,10 +12097,10 @@
with upto 250 levels, 150K for upto 700 levels, etc. If the stack size limit
is not met, searchd
fails the query and reports
the required stack size in the error message.
-
+Example:
thread_stack = 256K
The maximum number of expanded keywords for a single wildcard. Optional, default is 0 (no limit). @@ -11989,26 +12114,26 @@
of such expansions. Setting expansion_limit = N
restricts expansions to no more than N of the most frequent
matching keywords (per each wildcard in the query).
-
+Example:
expansion_limit = 16
Threaded server watchdog. Optional, default is 1 (watchdog enabled). Introduced in version 2.0.1-beta.
A crashed query in threads
multi-processing mode
-(workers = threads
)
+(workers = threads
)
can take down the entire server. With watchdog feature enabled,
searchd
additionally keeps a separate lightweight
process that monitors the main server process, and automatically
restarts the latter in case of abnormal termination. Watchdog
is enabled by default.
-
+Example:
watchdog = 0 # disable watchdog
Path to a file where current SphinxQL state will be serialized. Available since version 2.1.1-beta. @@ -12018,10 +12143,10 @@
If you load UDF functions, but Sphinx crashes, when it gets (automatically) restarted, your UDF and global variables will no longer be available; using persistent state helps a graceful recovery with no such surprises. -
+Example:
sphinxql_state = uservars.sql
Interval between agent mirror pings, in milliseconds. Optional, default is 1000. @@ -12034,10 +12159,10 @@
by this directive.
To disable pings, set ha_ping_interval to 0. -
+Example:
ha_ping_interval = 0
Agent mirror statistics window size, in seconds. Optional, default is 60. @@ -12057,23 +12182,23 @@
Despite that at most 2 blocks are used for mirror selection, upto 15 last blocks are actually stored, for instrumentation purposes. They can be inspected using -SHOW AGENT STATUS +SHOW AGENT STATUS statement. -
+Example:
ha_period_karma = 120
The maximum # of simultaneous persistent connections to remote persistent agents. Each time connecting agent defined under 'agent_persistent' we try to reuse existing connection (if any), or connect and save the connection for the future. However we can't hold unlimited # of such persistent connections, since each one holds a worker on agent size (and finally we'll receive the 'maxed out' error, when all of them are busy). This very directive limits the number. It affects the num of connections to each agent's host, across all distributed indexes.
-It is reasonable to set the value equal or less than max_children option of the agents. -
+It is reasonable to set the value equal or less than max_children option of the agents. +Example:
persistent_connections_limit = 29 # assume that each host of agents has max_children = 30 (or 29).
A maximum number of I/O operations (per second) that the RT chunks merge thread is allowed to start. Optional, default is 0 (no limit). Added in 2.1.1-beta. @@ -12083,10 +12208,10 @@
RT optimization activity will not generate more disk iops (I/Os per second) than the configured limit. Modern SATA drives can perform up to around 100 I/O operations per second, and limiting rt_merge_iops can reduce search performance degradation caused by merging. -
+Example:
rt_merge_iops = 40
A maximum size of an I/O operation that the RT chunks merge thread is allowed to start. @@ -12096,14 +12221,14 @@
This directive lets you throttle down the I/O impact arising from
the OPTIMIZE
statements. I/Os bigger than this limit will be
broken down into 2 or more I/Os, which will then be accounted as separate I/Os
-with regards to the rt_merge_iops
+with regards to the rt_merge_iops
limit. Thus, it is guaranteed that all the optimization activity will not
generate more than (rt_merge_iops * rt_merge_maxiosize) bytes of disk I/O
per second.
-
+Example:
rt_merge_maxiosize = 1M
Costs for the query time prediction model, in nanoseconds. Optional, default is "doc=64, hit=48, skip=2048, match=64" (without the quotes). @@ -12150,10 +12275,10 @@
is somewhat more error prone.) It is not necessary to specify all 4 costs at once, as the missed one will take the default values. However, we strongly suggest to specify all of them, for readability. -
+Example:
predicted_time_costs = doc=128, hit=96, skip=4096, match=128
searchd --stopwait wait time, in seconds. Optional, default is 3 seconds. @@ -12164,10 +12289,10 @@
flushing attributes and updating binlog. And it requires some time. searchd --stopwait will wait up to shutdown_time seconds for daemon to finish its jobs. Suitable time depends on your index size and load. -
+Example:
shutdown_timeout = 5 # wait for up to 5 seconds
Instance-wide defaults for ondisk_attrs directive. Optional, default is 0 (all attributes are loaded in memory). This @@ -12175,50 +12300,67 @@
served by this copy of searchd. Per-index directives take precedence, and will overwrite this instance-wide default value, allowing for fine-grain control.
Limit (in milliseconds) that prevents the query from being written to the query log. Optional, default is 0 (all queries are written to the query log). This directive specifies that only queries with execution times that exceed the specified limit will be logged.
Instance-wide defaults for agent_connect_timeout parameter. The last defined in distributed (network) indexes.
Instance-wide defaults for agent_query_timeout parameter. The last defined in distributed (network) indexes, or also may be overrided per-query using OPTION clause.
Integer, specifies how many times sphinx will try to connect and query remote agents in distributed index before reporting fatal query error. Default is 0 (i.e. no retries). This value may be also specified on per-query basis using 'OPTION retry_count=XXX' clause. If per-query option exists, it will override the one specified in config. +
+Note, that if you use agent mirrors in definition of your distributed +index, then before every attempt of connect sphinx will select different +mirror, according to specified ha_strategyspecified. +
+For example, if you have 10 mirrors, and surely know, that at least one of them
+alive, then you can definitely take the answer to a correct query,
+specifying options ha_strategy = roundrobin
and
+agent_retry_count = 9
in your config.
Integer, in milliseconds. Specifies the delay sphinx rest before retrying to query a remote agent in case it fails. -The value has sense only if non-zero agent_retry_count +The value has sense only if non-zero agent_retry_count or non-zero per-query OPTION retry_count specified. Default is 500. This value may be also specified on per-query basis using 'OPTION retry_delay=XXX' clause. If per-query option exists, it will override the one specified in config.
+Hostnames renew strategy. By default, IP addresses of agent host names are cached at daemon start to avoid extra flood to DNS. +In some cases the IP can change dynamically (e.g. cloud hosting) and it might be desired to don't cache the IPs. Setting this option to 'request' disabled the caching and queries the DNS at each query. +The IP addresses can also be manually renewed with FLUSH HOSTNAMES command. Added in 2.3.2-beta. +
-Integer, in bytes. The maximum RAM allocated for cached result sets. Defaults to 16777216, -or 16 MB. 0 means disable query cache. Added in 2.3.1-beta. Refer to query cache for details. +Integer, in bytes. The maximum RAM allocated for cached result sets. Default is 0, meaning disabled. Added in 2.3.1-beta. Refer to query cache for details. +
+qcache_max_bytes = 16777216 +
Integer, in milliseconds. The minimum wall time threshold for a query result to be cached. Defaults to 3000, or 3 seconds. 0 means cache everything. Added in 2.3.1-beta. -Refer to query cache for details. +Refer to query cache for details.
Integer, in seconds. The expiration period for a cached result set. Defaults to 60, or 1 minute. -The minimum possible value is 1 second. Added in 2.3.1-beta. Refer to query cache for details. +The minimum possible value is 1 second. Added in 2.3.1-beta. Refer to query cache for details.
the base dictionary path. File names are hardcoded and specific to a given lemmatizer; the Russian lemmatizer uses ru.pak dictionary file. The dictionaries can be obtained from the Sphinx website. -
+Example:
lemmatizer_base = /usr/local/share/sphinx/dicts/
By default, JSON format errors are ignored (ignore_attr
) and
the indexer tool will just show a warning. Setting this option to fail_index
will rather make indexing fail at the first JSON format error.
-
+Example:
on_json_attr_error = ignore_attr
of strings; if the option is 0, such values will be indexed as strings. This conversion applies to any data source, that is, JSON attributes originating from either SQL or XMLpipe2 sources will all be affected. -
+Example:
json_autoconv_numbers = 1
will be automatically brought to lower case when indexing. This conversion applies to any data source, that is, JSON attributes originating from either SQL or XMLpipe2 sources will all be affected. -
+Example:
json_autoconv_keynames = lowercase
Path to the RLP root folder. Mandatory if RLP is used. Added in 2.2.1-beta. -
+Example:
rlp_root = /home/myuser/RLP
RLP environment configuration file. Mandatory if RLP is used. Added in 2.2.1-beta. -
+Example:
rlp_environment = /home/myuser/RLP/rlp-environment.xml
Do not set this value to more than 10Mb because sphinx splits large documents to 10Mb chunks before processing them by the RLP.
This option has effect only if morphology = rlp_chinese_batched
is specified.
Added in 2.2.1-beta.
-
+Example:
rlp_max_batch_size = 100k
Maximum number of documents batched before processing them by the RLP. Optional, default is 50.
This option has effect only if morphology = rlp_chinese_batched
is specified.
Added in 2.2.1-beta.
-
+Example:
rlp_max_batch_docs = 100
Specifies the trusted directory from which the UDF libraries can be loaded. Requires -workers = thread to take effect. -
+workers = thread to take effect. +Example:
plugin_dir = /usr/local/sphinx/lib
Table of Contents
Table of Contents
added query cache
added thread pool mode, and the respective workers = thread_pool, - max_children, net_workers, - queue_max_length directives +
added RELOAD INDEX SphinxQL statement
added sphinxql_timeout directive
fixed #2503 update of attributes at index prevents binlog from clean
fixed #2516 suggest for index with exact_word or morphology options
fixed #2507 .NET Connector overflow exception (unsigned id support)
Fixed initial round-robin counter
Thread-safety checks added (backported)
Refactored dl-staff
added per-index statistics to 'show index status'
fixed #2502 final calculation of expression at RT index (optimized calls count)
Refactored ha-staff
Added begin() and end() to CSphVector, CSphTightVector
fixed error handle for API protocol net loop
Fixed crash on exit (revealed in test 234 on Ubuntu 16.04)
added token_filter and string list filter to API (php, python); set client ver to 32; fixed filter string list escape; updated token_filter plugin interface
Backported behavior for pthread_mutex_timedlock, SCHED_IDLE
Fast runaround for issue #877
fixed #2496 profiler counts multiple sequential queries with thread_pool worker
fixed #1825 added support for embedded zeroes in fields for pgsql, odbc data sources
PHP sphinx api: renamed SphinxClient c-tr to __construct
fixed #2461 crash of daemon with worker thread_pool on high load of fast queries
fixed uninitialized m_bSync variable
fixed #2461 crash of daemon with worker thread_pool on high load of fast queries
fixed #2456 daemon stuck on rotating index due to high amount of search threads
Fixed internal date calculation which caused different result of day(NUM) function in different timezones
fixed #2400 crash of daemon on CALL KEYWORDS to RT index with disk chunks and regexp filter; added regression to test 194
added #2393 feature wildcards for CALL KEYWORDS; bunch of options (fold_wildcards, fold_lemmas, fold_blended, expansion_limit, stats); added cases to test 254; fixed github #17
fixed #2390 latency at workers thread_pool added net-loop wakeup on job done added send at the end of job then transfer left data to net-loop added spin-wait at polling wait added socket_pair emulation for windows version of net-loop added eventfd checks to configure
added length() for expressions, disabled Expr_Time_c hashing, fixed test_253
fixed Expr_Time and Expr_Timediff always returning empty strings
fixed minor expression hash calc bug
added a big test for GetHash in expressions; added Expr_Now_c; fixed template expression name check
fixed several filters vs qcache issues
check filter expression tree when caching queries
fixed #2384 replace large index list at message to distributed index name; added regression to test 153
fixed #2384 fold large index list at message from distributed index; added regression to test 153
fixed #2372 ALL(mva) filter passed from master to agent as legacy filter; added regressions to test 244; set master version to 13
fixed #2371 warning on query via API with filter on MVA attribute; added cases to test 244
fixed query cache vs filters with expressions
fixed #2351 ALTER RECONFIGURE skipped for RT index with only re2 or rlp changes; added regression as test 252 set binlog version to 6
fixed daemon to work with --nodetach option after previous commit breaks it
fixed #2358 mmap memory to be fork-less fixed bitvec copying fixing false socket shutdown at net-loop added ping handling to net-loop instead API command added feature to distributed index to break kill-list of local indexes sequence
fixed a memory leak on inserts with aot enabled
fixed #2062 attribute name shadows field with same name; added check at ALTER and RT index config; added regressions to test 214
fixed #2330 daemon shutdown stopped waiting searching threads
fixed dlopen bug on linux while reloading udf
fixed (searchd.cpp split issues): stats mutex leak and crash of dashboard at distributed index setup due to config reload; added tests 248, 249
fixed #2299 crash of indexer due to empty xmlpipe2 source with embedded schema; added regression to test 68
fixed RLP vs non-CJK fields (missing trailing zero)
refactored RLP to work as a field filter (preprocessor)
fixed RLP enabled build
fixed ubertest to pass on different linux platforms
added SphinxQL support for comparison, IN, and BETWEEN conditions over ANY/ALL(mva); and added missing "ident NOT BETWEEN x AND y" syntax
fixed #2277 network connection timeout overflow for agent with worker = thread_pool added test 243
fixed mantis-2156 (COUNT(DISTINCT attr) does not work with strings)
updated old links to code.google.com to new links to github.com
fixed embedded zeroes in rt inserts
fixed mantis-1825 (no support for embedded zeroes in fields)
Removed CodeBlocks. Modified .gitignore for clion
fixed examples version in documentation, rebuilt docs
added #2262 new blend_mode trim_all added cases to test 192
fixed #2261 ngram chars presence at charset_table, now it warns for such config added regression to tests fixed test 19
fixed multiform handling (multiform + lemmatizer case) in CALL KEYWORDS
fixed libre2.patch to be in sync with latest re2 changes
Eliminated gcc warnings in http_parser.c. Eliminated msvc warning in sphinxquery.cpp.
Windows yy.cmd synced with bash yy.sh script
lex/bison files and rules fixed for bison >1.875
do not create tokenizer for every document in batch insert, create it just once and reuse instead
fixed bug #1766 (UPDATE does not correctly update negative values for bigint and float attributes)
fixed hits duplicates at RT index on document indexing fixed aggregate depended expression at RT index fixed tests 162, 192, 205 to pass rt mode updated visual studio 2013 project file
optimized away crazy memmove() in CSV/TSV parser, much faster CSV/TSV indexing (more than 10x on a synthetic test)
field lengths are no longer required to be last in schema
initial per-index field lengths support for RT, fixes test 217 --rt
fixed CSphMatchVariant::ToDocid conversion to match plain index behavior (fixes test 047 --rt)
fixed duplicates handling vs RT INSERT (first row wins now, not the last one)
added fetched_* counters collection to rt (fixes test_209 in --rt mode)
fixed keyword expansion in rt with docinfo=inline (fixes test_126 in --rt mode)
unified CSphIndex::SetupQueryTokenizer and sphSetupQueryTokenizer implementations, fixes most (but not all) of test 165 --rt
fixed off-by-1 in non-stemmed stopword check; fixed that lemmas got stemmed; fixed that wordforms could get applied twice through exact_dict; and rebuilt test 207 accordingly
improved RT insert speed (%7 gain in my batch insert test case)
indextool needs to preread checkpoints and infixes too
fixed mlock option on caching index files
fixed #2223 query cache last entry eviction during search cause daemon to hung
Expr_Rand_c speedup and fixes, thread-safe XorShift64, updated test 125
fixed #2053 added RAND() function
fixed #2230 memory corruption at daemon on inserting data into RT with bad HTML markup added regression to tests
fixed span length and lcs calculation in proximity queries
fixed performance on reading a lot of small buffers
fixed #2223 crash at watchdog shutdown on some OSes like centos, rhel
optimize RT inserts
refactoring
improving insertion speed into RT index (5% gain in my test)
refactoring, removed unneeded code
added RELOAD INDEX to SphinxQL
fixed #2209 prohibited order by MVA, added error message
fixed undefined reference to void ISphOutputBuffer::SendT in release version
new qcache defaults
lets handle 32bit weights in qcache
fixed a couple of memory leaks
fixed typo in vs2008 proj; added lost files to codeblocks projects
searchd.cpp splitted
fixed agent dashboard setup due to remove of workers
added test_232, positions coming out of the matching engine
fixed several bugs in qcache (bug #2191 and some more)
use RAII on CSphMutex instead of separate initialization method, fixed clang warnings
added feature #2195 memory mapping of all index files with separated caching thread daemon (re)start should be immediately and fix of 'old' ondisk* issue fixed update of attributes for indexes with ondisk* option got rig and prohibit 32bit to 64bit index conversion on load got rid of all shared memory code
fixes in variant_match model generation (more compatible attr types, and better diff report)
fixed HTML stripper handling of broken PI (processing instruction) tags
added #2179 SphinxQL client timeout searchd section option sphinxql_timeout, default value is 900 seconds
added query cache
added thread pool mode, and the respective workers = thread_pool, + max_children, net_workers, + queue_max_length directives
added vip suffixes to listener protocols (sphinx_vip, mysql41_vip)
removed fork and prefork modes
removed fork and prefork modes
removed prefork_rotation_throttle
directive
added RELOAD PLUGINS SphinxQL statement
added FLUSH ATTRIBUTES SphinxQL statement
added RELOAD PLUGINS SphinxQL statement
added FLUSH ATTRIBUTES SphinxQL statement
fixed #2167, --keep_attrs
did not work with --rotate
fixed #2499 crash of daemon at phrase node with star shift; added regressions to test 41
fixed #2499 crash of daemon at phrase node with star shift; added regressions to test 41
Backported RE2 patch and solutions from master
fixed #2488 performance issue with matching hitless terms
fixed #2498 wrong profiling report (was filter instead get_hits)
fixed #2320 rt index crashes on groupby() for large JSON fields
fixed indextool --check vs nested JSON objects
added #2310, --replay-flags=ignore-open-errors
switch to replay binlogs even if some files are missing
added #2310, --replay-flags=ignore-open-errors
switch to replay binlogs even if some files are missing
added #2234, support for empty string values (stringattr='') in WHERE clause
added #2233, support for IN()
filters with string values
added #2232, string collation support in SELECT expressions
added #2121, "where flt<>val" support, "where fltcol=intval" and "where fltcol!=intval" conditions
added #2119, new indexer
exit code 2 on a --rotate
failure
fixed #2207, unified min_prefix_len
, min_infix_len
behavior between RT and plain indexes
fixed #2020, unified (and greatly shortened) the list of SphinxQL reserved keywords between indexer checks, SphinxQL parser checks, and the documentation
fixed #2020, unified (and greatly shortened) the list of SphinxQL reserved keywords between indexer checks, SphinxQL parser checks, and the documentation
fixed #2251, expressions dependent on aggregation results (eg. as in SELECT MAX(id) m1, m1+10 m2) were not computed properly in RT indexes
fixed #2251, expressions dependent on aggregation results (eg. as in SELECT MAX(id) m1, m1+10 m2) were not computed properly in RT indexes
fixed #2146, OPTIMIZE could occasionally break big RT indexes (by violating 4/16 GB string/MVA per chunk size limits)
fixed #2146, OPTIMIZE could occasionally break big RT indexes (by violating 4/16 GB string/MVA per chunk size limits)
fixed #2118, multi-wordforms with clashing prefixes were processed in a wrong order
fixed #1926, disabled and later re-enabled indexes were not picked up again by searchd
on SIGHUP
fixed #2312, using FACTORS() along with a subtree cache could crash (because on wrong qpos values from the cache passed to the ranker)
fixed #2312, using FACTORS() along with a subtree cache could crash (because on wrong qpos values from the cache passed to the ranker)
fixed #2310, comparing a non-existent JSON field with a string constant (as in jcol.some_typo='abc') could crash
fixed #2309, UDFs with BIGINT return were saved without a type into sphinxql_state file
fixed #2305, punctuation chars not mentioned in charset_table could still occasionally affect term position in the query
fixed #2242, added whitespaces support to SNIPPET() before_match/after_matches options, and fixed the handling of repeated %PASSAGE_ID% macros
fixed #2238, added a few safeguards to prevent crashes/freezes on loading damaged RT RAM chunks
fixed #2237, ATTACH-ing a part of a distributed index did not correctly invalidate it, could crash
fixed #2235, UPDATE ... OPTION strict=1
did not with plain indexes
fixed #2235, UPDATE ... OPTION strict=1
did not with plain indexes
fixed #2225, searchd
crashed on startup if agent host string was empty
fixed #2127, indextool
did not handle RT indexes with updated JSON attributes in them
fixed #2117, GEODIST() calls with hash {in=deg,out=mi} arguments on a distributed index did not parse correctly
fixed searchd
crash when trying to load a damaged index with an incorrect row count
fixed indextool
MVA checks (an index error could sometimes be mistakenly reported)
fixed #2228, removed searchd
shutdown behavior on failed connection
fixed #2228, removed searchd
shutdown behavior on failed connection
fixed #2208, ZONESPANLIST() support for RT indexes
fixed #2201, indextool
false positive error on RT index
fixed #2201, crash with string comparison at expressions and expression ranker
fixed #2199, invalid packedfactors JSON output for index with stopwords
fixed #2197, TRUNCATE fails to remove disk chunk files after calling OPTIMIZE
fixed #2197, TRUNCATE fails to remove disk chunk files after calling OPTIMIZE
fixed #2196, .NET connector issue (UTC_TIMESTAMP() support)
fixed #2176, agent used ha_strategy=random
instead of specified in config
fixed #1979, snippets generation and span length and lcs calculation in proximity queries
fixed truncated results (and a potential crash) vs long enough ZONESPANLIST() result
added #2166, per agent HA strategy for distributed indexes
added #2166, per agent HA strategy for distributed indexes
fixed #2182, incorrect query results with multiple same destination wordforms
fixed #2182, incorrect query results with multiple same destination wordforms
fixed #2181, improved error message on incorrect filters
fixed #2178, ZONESPAN operator for queries with more than two words
fixed #2172, incorrect results with field position fulltext operators
fixed WLCCS ranking factor computation
fixed memory leak on queries with ZONEs
added #2112, string equal comparison support for IF() function (for JSON and string attributes)
fixed #2158, crash at RT index after morphology changed to AOT after index was created
fixed #2158, crash at RT index after morphology changed to AOT after index was created
fixed #2155, stopwords got missed on disk chunk save at RT index
fixed #2151, agents statistics missed in case of huge amount of agents
fixed #2139, escape all special characters in JSON result set, according to RFC 4627
fixed #2003, lemmatize_XX_all handling of short and exact words
fixed #1912, reduce indextool
memory usage during a check of a huge index
fixed off by one errors in filtering of BIGINT
attributes
fixed seamless rotation in prefork mode
fixed seamless rotation in prefork mode
fixed snippets crash with blend chars at the beginning of a string
fixed #2104, ALL()/ANY()/INDEXOF() support for distributed indexes
fixed #2104, ALL()/ANY()/INDEXOF() support for distributed indexes
fixed #2102, show agent status misses warnings from agents
fixed #2100, crash of indexer
while loading stopwords with tokenizer plugin
fixed #2098, arbitrary JSON subkeys and IS NULL for distributed indexes
fixed possibly memory leak in plugin creation function
indexation of duplicate documents
added OPTION rand_seed which affects ORDER BY RAND()
added OPTION rand_seed which affects ORDER BY RAND()
fixed #2042, indextool
fails with field mask on 32+ fields
fixed #2042, indextool
fails with field mask on 32+ fields
fixed #2031, wrong encoding with UnixODBC/Oracle source
fixed #2056, several bugs in RLP tokenizer
fixed #2054, SHOW THREADS hangs if queries in prefork mode
fixed #2054, SHOW THREADS hangs if queries in prefork mode
fixed #2057, WARNING at indexer
on duplicated wordforms
fixed #2066, snippet generation with weight_order enabled
fixed exception parsing in queries
fixed crash in config parser
fixed MySQL protocol response when daemon maxed out
added ALTER RTINDEX rt1 RECONFIGURE which allows to change RT index settings on the fly
added SHOW INDEX idx1 SETTINGS statement
added ALTER RTINDEX rt1 RECONFIGURE which allows to change RT index settings on the fly
added SHOW INDEX idx1 SETTINGS statement
added ability to specify several destination forms for the same source wordform (as a result, N:M mapping is now available)
added blended chars support to exceptions
added FACTORS() alias for PACKEDFACTORS() function
added LIMIT
clause for the FACET keyword
added JSON-formatted output to PACKEDFACTORS()
function
added #1999 ATAN2() function
added connections counter and also avg and max timers to agent status
added searchd
configuration keys agent_connect_timeout, agent_query_timeout, agent_retry_count and agent_retry_delay
added searchd
configuration keys agent_connect_timeout, agent_query_timeout, agent_retry_count and agent_retry_delay
GROUPBY() function now returns strings for string attributes
optimized json_autoconv_numbers option speed
optimized json_autoconv_numbers option speed
optimized tokenizing with expections on
fixed #1970, speeding up ZONE and ZONESPAN operators
fixed #2027, slow queries to multiple indexes with large kill-lists
fixed #2027, slow queries to multiple indexes with large kill-lists
fixed #2022, blend characters of matched word must not be outside of snippet passage
fixed #2018, different wildcard behaviour in RT and plain indexes
fixed buffer overrun when sizing packed factors (with way too many fields) in expression ranker
fixed cpu time logging for cases where work is done in child threads or agents
added #1920, charset_table aliases
added #1920, charset_table aliases
added #1887, filtering over string attributes
added #1689, GROUP BY JSON attributes
improved speed of concurrent insertion in RT indexes
improved speed of concurrent insertion in RT indexes
removed max_matches config key
fixed #1942, crash in SHOW THREADS command
fixed #1942, crash in SHOW THREADS command
fixed #1922, crash on snippet generation for queries with duplicated words
fixed #1870, crash on ORDER BY JSON attributes
fixed template index removing on rotation
added #1604, CALL KEYWORDS can show now multiple lemmas for a keyword
added ALTER TABLE DROP COLUMN
added #1604, CALL KEYWORDS can show now multiple lemmas for a keyword
added ALTER TABLE DROP COLUMN
added ALTER for JSON/string/MVA attributes
added REMAP() function which surpasses SetOverride() API
added an argument to PACKEDFACTORS() to disable ATC calculation (syntax: PACKEDFACTORS({no_atc=1}))
added exact phrase query syntax
added flag '--enable-dl'
to configure script which works with libmysqlclient
, libpostgresql
, libexpat
, libunixobdc
added new plugin system: CREATE/DROP PLUGIN, SHOW PLUGINS, plugin_dir now in common, index/query_token_filter plugins
added new plugin system: CREATE/DROP PLUGIN, SHOW PLUGINS, plugin_dir now in common, index/query_token_filter plugins
added ondisk_attrs support for RT indexes
added position shift operator to phrase operator
added possibility to add user-defined rankers (via plugins)
changed #1797, per-term statistics report (expanded terms fold to their respective substrings)
changed default thread_stack value to 1M
changed #1797, per-term statistics report (expanded terms fold to their respective substrings)
changed default thread_stack value to 1M
changed local directive in a distributed index which takes now a list (eg. local=shard1,shard2,shard3
)
deprecated SetMatchMode() API call
deprecated SetOverride() API call
removed deprecated str2wordcount
attributes
removed support for client versions 0.9.6 and below
workers=prefork
idf
, tfidf_unnormalized
and tfidf_normalized
flagslccs
, wlccs
, exact_order
, min_gaps
, and atc
ranking factorssphinx_get_XXX_factors()
, a faster interface to access PACKEDFACTORS() in UDFspredicted_time
, dist_predicted_time
, fetched_docs
, fetched_hits
counters to SHOW METAtotal_tokens
and disk_bytes
counters to SHOW INDEX STATUStotal_tokens
and disk_bytes
counters to SHOW INDEX STATUSsearchd
config sectionsearchd
config sectionindex_weights
option for that caseworkers=threads
and 1000s of threads'if ( stem(token)==stem(abc) ) emit(def)'
xmlpipe
data source v1, compat_sphinxql_magics
directive, SetWeights()
SphinxAPI call, and SPH_SORT_CUSTOM SphinxAPI modeidf=tfidf_normalized
was ignored for distributed queriesindex_weights
predicted_time
was not accumulated with dist_threadspredicted_time
was not accumulated with dist_threadslcs
and min_best_span_pos ranking factor values when any expansion (expand_keywords or lemmatize) occurred
fixed #1994, parsing of empty JSON arrays
fixed #1994, parsing of empty JSON arrays
fixed #1987, handling of index_exact_words with AOT morphology and infixes on
fixed #1984, teaching HTML parser to handle hex numbers
fixed #1983, master and agents networking issue
fixed #1977, escaping of characters doens't work with exceptions
fixed #1968, parsing of WEIGHT() function (queries to distributed indexes affected)
fixed #1933, quorum operator works incorrectly if it's number is exception
fixed #1932, fixed daemon index recovery after failed rotation
fixed #1923, crash at indexer
with dict=keywords
fixed #1918, fixed crash while hitless words are used within fulltext operators which require hits
fixed #1878, daemon doesn't reset regexp_filter after rotation with seamless_rotate=0
fixed #1878, daemon doesn't reset regexp_filter after rotation with seamless_rotate=0
fixed #1682, field end modifier doesn't work with words containing blended chars
fixed #1917, field limit propagation outside of group
fixed #1917, field limit propagation outside of group
fixed #1915, exact form passes to index skipping stopwords filter
fixed #1905, multiple lemmas at the end of a field
fixed #1903, indextool
check mode for hitless indexes and indexes with large amount of documents
fixed unnecessary escaping in JSON result set
fixed Quick Tour documentation chapter
fixed #1857, crash in arabic stemmer
fixed #1857, crash in arabic stemmer
fixed #1875, fixed crash on adding documents with long words in dict=keyword index with morphology and infixes enabled
fixed #1876, crash on words with large codepoints and infix searches
fixed #1880, crash on multiquery with one incorrect query
fixed #1848, infixes and morphology clash
fixed #1848, infixes and morphology clash
fixed #1823, indextool
fails to handle indexes with lemmatizer morphology
fixed #1799, crash in queries to distributed indexes with GROUP BY on multiple values
fixed #1718, expand_keywords
option lost in disk chunks of RT indexes
fixed documentation on rt_flush_period
fixed documentation on rt_flush_period
fixed network protocol issue which results in timeouts of libmysqlclient
for big Sphinx responses
fixed #1778, indexes with more than 255 attributes
fixed #1778, indexes with more than 255 attributes
fixed #1777, ORDER BY WEIGHT()
fixed #1796, missing results in queries with quorum operator of indexes with some lemmatizer
fixed #1780, incorrect results while querying indexes with wordforms, some lemmatizer and enable_star=1
fixed, SHOW PROFILE for fullscan queries
fixed, --with-re2 check
fixed #1753, path to re2 sources could not be set using --with-re2
, options --with-re2-libs
and --with-re2-includes
added to configure
fixed #1753, path to re2 sources could not be set using --with-re2
, options --with-re2-libs
and --with-re2-includes
added to configure
fixed #1739, erroneous conversion of RAM chunk into disk chunk when loading id32 index with id64 binary
fixed #1738, unlinking RAM chunk when converting it to disk chunk
fixed #1710, unable to filter by attributes created by index_field_lengths=1
fixed #1716, random crash with with multiple running threads
fixed crash while querying index with lemmatizer and wordforms
added FLUSH RAMCHUNK statement
added SHOW PLAN statement
added FLUSH RAMCHUNK statement
added SHOW PLAN statement
added support for GROUP BY on multiple attributes
added BM25F() function to SELECT
expressions (now works with the expression based ranker)
added indextool --fold
command and -q
switch
JSON
attributes (up to 5-20% faster SELECTs
using JSON objects)
optimized xmlpipe2 indexing (up to 9 times faster on some schemas)
fixed #1684, COUNT(DISTINCT smth) with implicit GROUP BY
returns correct value now
fixed #1684, COUNT(DISTINCT smth) with implicit GROUP BY
returns correct value now
fixed #1672, exact token AOT vs lemma (indexer
skips exact form of token that passed AOT through tokenizer)
fixed #1659, fail while loading empty infix dictionary with dict=keywords
fixed #1638, force explicit JSON type conversion for aggregate functions
fixed #1606, hard interruption of the daemon by Ctrl+C (SIGINT) signal
fixed #1592, duplicates vs expression ranker
fixed #1578, SORT BY string attribute via API attr_asc
\ attr_desc
fixed #1575, crash of daemon on MVA receive from agents with dist_threads enabled
fixed #1575, crash of daemon on MVA receive from agents with dist_threads enabled
fixed #1574, agent got kill list of local indexes of distributed index
fixed #1573, ranker expression vs expanded terms
fixed #1572, BM25F
vs negative terms
fixed #1439, filters on float values in JSON issue, string values quoting issue
fixed #1399, filter error message on string attribute
fixed #1384, added possibility to define any own DSN line with source=mssql (like as in source=odbc
)
fixed ATTACH vs wordforms or stopwords; after daemon was restarted this setting was getting lost in RT indexes
fixed ATTACH vs wordforms or stopwords; after daemon was restarted this setting was getting lost in RT indexes
fixed balancing of agents in HA
fixed co-working of index_exact_word
+ AOT lemmatizer
fixed epoll invoking and turned on by default
fixed string case error with JSON attributes in select list of a query
fixed TOP_COUNT
usage in misc/suggest
and updated to PHP 5.3 and UTF-8
added query profiling (SET PROFILING=1 and SHOW PROFILE statements)
added query profiling (SET PROFILING=1 and SHOW PROFILE statements)
added AOT-based Russian lemmatizer (morphology={lemmatize_ru | lemmatize_ru_all}, lemmatizer_base, and lemmatizer_cache directives)
added wordbreaker, a tool to split compounds into individual words
added JSON attributes support (sql_attr_json, on_json_attr_error, json_autoconv_numbers, json_autoconv_keynames directives)
added initial subselects support, SELECT * FROM (SELECT ... ORDER BY cond1 LIMIT X) ORDER BY cond2 LIMIT Y
added bigram indexing, and phrase searching with bigrams (bigram_index, bigram_freq_words directives)
added HA/LB support, ha_strategy and agent_persistent directives, SHOW AGENT STATUS statement
added RT index optimization (OPTIMIZE INDEX statement, rt_merge_iops and rt_merge_maxiosize directives)
added RT index optimization (OPTIMIZE INDEX statement, rt_merge_iops and rt_merge_maxiosize directives)
added wildcards support to dict=keywords (eg. "t?st*")
added substring search support (min_infix_len=2 and above) to dict=keywords
added --checkconfig switch to indextool to check config file for correctness (bug #1395)
added --checkconfig switch to indextool to check config file for correctness (bug #1395)
added global IDF support (global_idf directive, OPTION global_idf)
added "term1 term2 term3"/0.5 quorum fraction syntax (bug #1372)
added an option to apply stopwords before morphology, stopwords_unstemmed directive
added an alternative method to compute keyword IDFs, OPTION idf=plain
added boolean query optimizations, OPTION boolean_simplify=1 (bug #1294)
added stringptr return type support to UDFs, and CREATE FUNCTION ... RETURNS STRING syntax
added early query termination by predicted execution time (OPTION max_predicted_time, and predicted_time_costs directive)
added stringptr return type support to UDFs, and CREATE FUNCTION ... RETURNS STRING syntax
added early query termination by predicted execution time (OPTION max_predicted_time, and predicted_time_costs directive)
added index_field_lengths directive, BM25A() and BM25F() functions to expression ranker
added ranker=export, and PACKEDFACTORS() function
added support for attribute files over 4 GB (bug #1274)
added addr2line output to crash reports (bug #1265)
added OPTION ignore_nonexistent_columns to UPDATE, and a respective UpdateAttributes() argument
added OPTION ignore_nonexistent_columns to UPDATE, and a respective UpdateAttributes() argument
added --keep-attrs switch to indexer
added --with-static-mysql, --with-static-pgsql switches to configure
added ZONESPANLIST() builtin function
added regexp_filter directive, regexp document/query filtering support (uses RE2)
added min_idf, max_idf, sum_idf ranking factors
added uservars persistence, and sphinxql_state directive (bug #1132)
added uservars persistence, and sphinxql_state directive (bug #1132)
added ZONESPAN operator
added snippets_file_prefix directive
added snippets_file_prefix directive
added Arabic stemmer, morphology=stem_ar directive (bug #519)
added OPTION sort_method={pq | kbuffer}, an alternative match sorting method
added SPZ (sentence, paragraph, zone) support to RT indexes
added support for upto 255 keywords in quorum operator (bug #1030)
added multi-threaded agent querying (bug #1000)
added SHOW INDEX indexname STATUS statement
added SHOW INDEX indexname STATUS statement
added LIKE clause support to multiple SHOW xxx statements
added SNIPPET() function
added GROUP_CONCAT() aggregate function
added iostats and cpustats to SHOW META
added support for DELETE statement over distributed indexes (bug #1104)
added EXIST('attr_name', default_value) builtin function (bug #1037)
added SHOW VARIABLES WHERE variable_name='xxx' syntax
added TRUNCATE RTINDEX statement
added SHOW VARIABLES WHERE variable_name='xxx' syntax
added TRUNCATE RTINDEX statement
changed that UDFs are now allowed in fork/prefork modes via sphinxql_state startup script
changed that UDFs are now allowed in fork/prefork modes via sphinxql_state startup script
changed that compat_sphinxql_magics now defaults to 0
changed that small enough exceptions, wordforms, stopwords files are now embedded into the index header
changed that rt_mem_limit can now be over 2 GB (bug #1059)
optimized multi-keyword searching (added skiplists)
optimized filtering and scan in several frequent cases (single-value, 2-arg, 3-arg WHERE clauses)
fixed #1778, SENTENCE and PARAGRAPH operators and infix stars clash
fixed #1778, SENTENCE and PARAGRAPH operators and infix stars clash
fixed #1774, stack overflow on parsing large expressions
fixed #1744, daemon failed to write to log file bigger than 4G
fixed #1705, expression ranker handling of indexes with more than 32 fields
fixed #1520, SetLimits() API documentation
fixed #1491, documentation: space character is prohibited in charset_table
fixed memory leak in expressions with max_window_hits
fixed rt_flush_period - less stricter internal check and more often flushes overall
fixed rt_flush_period - less stricter internal check and more often flushes overall
fixed #1655, special characters like ()?* were not processed correctly by exceptions
fixed #1651, CREATE FUNCTION can now be used with BIGINT return type
fixed #1655, special characters like ()?* were not processed correctly by exceptions
fixed #1651, CREATE FUNCTION can now be used with BIGINT return type
fixed #1649, incorrect warning message (about statistics mismatch) was returned when mixing wildcards and regular keywords
fixed #1603, passing MVA64 arguments to non-MVA functions caused unpredicted behavior and crashes (now explicitly forbidden)
fixed #1601, negative numbers in IN() clause caused a syntax error
fixed #757, wordforms shared between multiple indexes with different tokenizer settings failed to load (they now load with a warning)
fixed that batch queries did not batch in some cases (because of internal expression alias issues)
fixed that CALL KEYWORDS occasionally gave incorrect error messages
fixed searchd crashes on ATTACHing plain indexes with MVAs
fixed searchd crashes on ATTACHing plain indexes with MVAs
fixed several deadlocks and other threading issues
fixed incorrect sorting order with utf8_general_ci
fixed incorrect sorting order with utf8_general_ci
fixed that in some cases incorrect attribute values were returned when using expression aliases
optimized xmlpipe2 indexing
added a warning for missed stopwords, exception, wordforms files on index load and in indextool --check
fixed #1515, log strings over 2KB were clipped when query_log_format=plain
fixed #1515, log strings over 2KB were clipped when query_log_format=plain
fixed #1514, RT index disk chunk lose attribute update on daemon restart
fixed #1512, crash while formatting log messages
fixed #1511, crash on indexing PostgreSQL data source with MVA attributes
fixed #1483, snippets limits fix
fixed #1481, shebang config changes check on rotation
fixed #1479, port handling in PHP Sphinx API
fixed #1474, daemon crash at SphinxQL packet overflows max_packet_size
fixed #1474, daemon crash at SphinxQL packet overflows max_packet_size
fixed #1472, crash on loading index to indextool
for check
fixed #1465, expansion_limit got lost in index rotation
fixed #1465, expansion_limit got lost in index rotation
fixed #1427, #1506, utf8 3 and 4-bytes codepoints
fixed #1405, between with mixed int float values
fixed #1475, memory leak in the expression parser
fixed #1475, memory leak in the expression parser
fixed #1457, error messages over 2KB were clipped
fixed #1454, searchd did not display an error message when the binlog path did not exist
fixed #1441, SHOW META in a query batch was returning the last non-batch error
added a console message about crashes during index loading at startup
added more debug info about failed index loading
fixed #1322, J connector seems to be broken in rel20 , but works in trunk
fixed #1322, J connector seems to be broken in rel20 , but works in trunk
fixed #1321, 'set names utf8' passes, but 'set names utf-8' doesn't because of syntax error '-'
fixed #1318, unhandled float comparison operators at filter
fixed #1317, FD leaks on thread seamless rotation
fixed #1302, daemon random crashes on OS X
fixed #1301, indexer
fails to send rotate signal
fixed #1300, lost index settings on attach
fixes #1285, crash on running searchd
with syslog
and watchdog
fixed #1279, linking against explicitly disabled iconv. Also added --with-libexpat
to config options, which sometimes required on systems without XML support
fixed sample config file
fixed x64 configurations for libstemmer
fixed #1258, xmlpipe2
refused to index indexes with docinfo=inline
fixed #1258, xmlpipe2
refused to index indexes with docinfo=inline
fixed #1257, legacy groupby modes vs dist_threads
could occasionally return wrong search results (race condition)
fixed #1253, missing single-word query performance optimization (simplified ranker) vs prefix-expanded keywords vs dict=keywords
fixed #1252, COUNT(*) vs dist_threads could occasionally crash (race condition)
fixed #1252, COUNT(*) vs dist_threads could occasionally crash (race condition)
fixed #1251, missing expression support in the IN() function
fixed #1245, FlushAttributes mistakenly disabled by attr_flush_period=0 setting
fixed #1245, FlushAttributes mistakenly disabled by attr_flush_period=0 setting
fixed #1244, per-API-command (search, update, etc) statistics were not updated by SphinxQL requests
fixed #1243, misc issues (broken statistics, weights, checks) with very long keywords having blended parts in RT indexes
fixed #1240, embedded xmlpipe2
schema with more attributes than the sphinx.conf
one caused indexer
to crash
fixed that blended vs multiforms vs min_word_len could hang the query parser
fixed missing command-line switches documentation
fixed #605, pack vs mysql compress
fixed #605, pack vs mysql compress
fixed #783, #862, #917, #985, #990, #1032 documentation bugs
fixed #885, bitwise AND/OR were not available via API
fixed #984, crash on indexing data with MAGIC_CODE_ZONE symbol
fixed #1050, expression ranker vs agents
fixed #1054, max_query_time not handled properly on searching at RT index
fixed #1055, expansion_limit on searching at RT disk chunks
fixed #1055, expansion_limit on searching at RT disk chunks
fixed #1057, daemon crashes on generating snippet with 0 documents provided
fixed #1060, load_files_scattered don't work
fixed #1065, libsphinxclient vs distribute index (agents)
fixed #1119, missing expression ranker support in SphinxSE
fixed #1120, negative total_found, docs and hits counter on huge indexes
fixed #1031, SphinxQL parsing syntax for MVA at insert \ replace statements
fixed #1031, SphinxQL parsing syntax for MVA at insert \ replace statements
fixed #1027, stalls on attribute update in high-concurrency load
fixed #1026, daemon crash on malformed API command
fixed #1021, max_children
option has been ignored with worker=threads
fixed build of SphinxSE with MySQL 5.1
fixed crash log for 'fork' and 'prefork' workers
added keywords dictionary (dict=keywords
) support to RT indexes
added keywords dictionary (dict=keywords
) support to RT indexes
added MVA, index_exact_words support to RT indexes (#888)
added MVA64 (a set of BIGINTs) support to both disk and RT indexes (rt_attr_multi_64 directive)
added an expression-based ranker, and a number of new ranking factors
added ATTACH INDEX statement that converts a disk index to RT index
added WHERE
clause support to UPDATE statement
added bigint
, float
, and MVA
attribute support to UPDATE statement
added ATTACH INDEX statement that converts a disk index to RT index
added WHERE
clause support to UPDATE statement
added bigint
, float
, and MVA
attribute support to UPDATE statement
added support for upto 256 searchable fields (was upto 32 before)
added support for upto 256 searchable fields (was upto 32 before)
added FIBONACCI()
function to expressions
added load_files_scattered option to snippets
added implicit attribute type promotions in multi-index result sets (#939)
optimized search performance with many ZONE
operators
improved suggestion tool (added Levenshtein limit, removed extra DB fetch)
improved