Database

Text Search

We use a modified version of textsearch_ja (main repository). Modifications to the source code (in order to allow searching stop words or part of speech tags) are noted with EDIT comments.

Instructions for setting this up was adapted from https://stackoverflow.com/a/76150756/10499803.

Querying

Every token in a Japanese sentence is converted into a basic lexeme and a part-of-speech lexeme in a ts_vector when using the japanese_with_types language configuration:

Format	Input	Output
`basic`	`話せ`	`話す`
`＃type・s1・s2・s3`	`早い`	`＃形容詞・自立`

Full list of part-of-speech tags here under the ChaSen section.

Sample query:

with
matching as (
  select *
  from documents
  where textsearch_index_jp_col @@ (phraseto_tsquery('japanese', 'その人') && '＃動詞:*')
  order by score
  limit 200
)
select *
from (select distinct on(jp) * from matching) deduped
order by score desc
limit 100;

The japanese and japanese_with_types configurations always do not remove stop words. For English, use the english_nostop configuration to also not remove stop words.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Database

Text Search

Querying

Files

README.md

Latest commit

History

README.md

File metadata and controls

Database

Text Search

Querying