2.10. Debugging

Function ts_debug allows easy testing your full-text configuration.

ts_debug( [cfgname | oid ],document TEXT) RETURNS SETOF tsdebug

It displays information about every token from document as they produced by a parser and processed by dictionaries as it was defined in configuration, specified by cfgname or oid.

tsdebug type defined as

CREATE TYPE tsdebug AS (
     "Alias" text,
     "Description" text,
     "Token" text,
     "Dicts list"   text[],
     "Lexized token" text

For demonstration of how function ts_debug works we first create public.english configuration and ispell dictionary for english language. You may skip test step and play with standard english configuration.

CREATE FULLTEXT CONFIGURATION public.english LIKE pg_catalog.english WITH MAP AS DEFAULT;
CREATE FULLTEXT DICTIONARY en_ispell
OPTION 'DictFile="/usr/local/share/dicts/ispell/english-utf8.dict",
        AffFile="/usr/local/share/dicts/ispell/english-utf8.aff",
        StopFile="/usr/local/share/dicts/english.stop"'
LIKE ispell_template;
ALTER FULLTEXT MAPPING ON public.english FOR lword WITH en_ispell,en_stem;
=# select * from ts_debug('public.english','The Brightest supernovaes');
 Alias |  Description  |    Token    |              Dicts list               |          Lexized token
-------+---------------+-------------+---------------------------------------+---------------------------------
 lword | Latin word    | The         | {public.en_ispell,pg_catalog.en_stem} | public.en_ispell: {}
 blank | Space symbols |             |                                       |
 lword | Latin word    | Brightest   | {public.en_ispell,pg_catalog.en_stem} | public.en_ispell: {bright}
 blank | Space symbols |             |                                       |
 lword | Latin word    | supernovaes | {public.en_ispell,pg_catalog.en_stem} | pg_catalog.en_stem: {supernova}
(5 rows)

In this example, the word 'Brightest' was recognized by a parser as a Latin word (alias lword) and came through a dictionaries public.en_ispell,pg_catalog.en_stem. It was recognized by public.en_ispell, which reduced it to the noun bright. Word supernovaes is unknown for public.en_ispell dictionary, so it was passed to the next dictionary, and, fortunately, was recognized (in fact, public.en_stem is a stemming dictionary and recognizes everything, that is why it placed at the end the dictionary stack).

The word The was recognized by public.en_ispell dictionary as a stop-word (Section 1.3.6) and will not indexed.

You can always explicitly specify what columns you want to see

=# select "Alias", "Token", "Lexized token" 
from ts_debug('public.english','The Brightest supernovaes');
 Alias |    Token    |          Lexized token          
-------+-------------+---------------------------------
 lword | The         | public.en_ispell: {}
 blank |             | 
 lword | Brightest   | public.en_ispell: {bright}
 blank |             | 
 lword | supernovaes | pg_catalog.en_stem: {supernova}
(5 rows)