Appendix A. FTS Complete Tutorial

This tutorial is about how to setup typical FTS application using PostgreSQL.

We create our configuration pg, which will be default for locale ru_RU.UTF-8. To be safe, we do this in transaction.

begin;
CREATE FULLTEXT CONFIGURATION public.pg  LOCALE 'ru_RU.UTF-8'  LIKE english WITH MAP AS DEFAULT;

We'll use postgresql specific dictionary using synonym template dictionary and store it under PG_ROOT/share/dicts_data directory. The dictionary looks like:

postgres    pg
pgsql       pg
postgresql  pg

CREATE FULLTEXT DICTIONARY pg_dict OPTION  'pg_dict.txt' LIKE synonym;

Register ispell dictionary en_ispell using ispell_template template.

CREATE FULLTEXT DICTIONARY en_ispell 
OPTION 'DictFile="english-utf8.dict",
        AffFile="english-utf8.aff",
        StopFile="english-utf8.stop"'
LIKE ispell_template;

Use the same stop-word list for snowball stemmeren_stem, which is available on default.

ALTER FULLTEXT DICTIONARY en_stem SET OPTION 'english-utf8.stop';

Modify mappings for Latin words for configuration 'pg'

ALTER FULLTEXT MAPPING ON pg FOR lword,lhword,lpart_hword
                             WITH pg_dict,en_ispell,en_stem;

We won't index/search some tokens

DROP FULLTEXT MAPPING ON pg FOR email, url, sfloat, uri, float;

Now, we could test our configuration.

select * from ts_debug('public.pg', '
PostgreSQL, the highly scalable, SQL compliant, open source object-relational
database management system, is now undergoing beta testing of the next
version of our software: PostgreSQL 8.2.
');


end;

We have a table pgweb, which contains 11239 documents from PostgreSQL web site. Only relevant columns are shown.

=# \d pgweb
           Table "public.pgweb"
  Column   |       Type        | Modifiers
-----------+-------------------+-----------
 tid       | integer           | not null
 path      | character varying | not null
 body      | character varying |
 title     | character varying |
 dlm       | integer           |

First we should take care about default FTS configuration - we want our public.pg to be default. To do so, we need to redefine search_path, since we already have predefined default full-text configuration (for ru_RU.UTF-8 locale) in pg_catalog.

=# \dF
pg_catalog | russian_utf8          | ru_RU.UTF-8     | Y
public     | pg                    | ru_RU.UTF-8     | Y

=# show tsearch_conf_name;
    tsearch_conf_name
-------------------------
 pg_catalog.russian_utf8

=# SET search_path=public, pg_catalog;

=# show tsearch_conf_name;
 tsearch_conf_name
-------------------
 public.pg

The very simple full-text search without ranking is already available here. Select top 10 fresh documents (dlm is last-modified date in seconds since 1970), which contains query create table.

=# select title from pgweb where textcat(title,body)  @@  
          plainto_tsquery('create table') order by dlm desc limit 10;

We can create index to speedup search.

=# create index pgweb_idx on pgweb using gin(textcat(title,body));

For clarity, we omitted here coalesce function to prevent unwanted effect of NULL concatenation.

To implement FTS with ranking support we need tsvector column to store preprocessed document, which is a concatenation of title and body. We assign different labels to them to preserve information about origin of every word.

=# alter table pgweb add column fts_index tsvector;
=# update pgweb set fts_index = 
    setweight( to_tsvector( coalesce (title,'')), 'A' ) || ' ' ||
    setweight( to_tsvector(coalesce (body,'')),'D');

Then we create GIN index to speedup search.

=# create index fts_idx on pgweb using gin(fts_index);

After vacuuming, we are ready to perform full-text search.

=# select rank_cd(fts_index, q)as rank, title from pgweb, 
   plainto_tsquery('create table') q  
   where q @@ fts_index order by rank desc limit 10;