GiST for PostgreSQL | PostgreSQL mailing list archive | OpenFTS full text search engine
PostgreSQL site

Working with UTF8 and KOI8

I've got confused when I tried postgresql (7.3.3) with cyrillic text and UNICODE. After reading documentation (charset.sgml) I've realized what I did wrong :) I thought that I could work with several databases in different encodings. Well, I could use createdb -E encoding, but the only thing is important for text operations is encoding specified at 'initdb' stage !

The key phrase:

The nature of some locale categories is that their value has to be fixed for the lifetime of a database cluster. That is, once initdb has run, you cannot change them anymore. LC_COLLATE and LC_CTYPE are those categories. They affect the sort order of indexes, so they must be kept fixed, or indexes on text columns will become corrupt. PostgreSQL enforces this by recording the values of LC_COLLATE and LC_CTYPE that are seen by initdb. The server automatically adopts those two values when it is started.
Test bed:
Linux, Slackware 8.1, libc 2.2.5, postgresql 7.3.3, perl 5.6.1
Steps to success: Conclusion:

PostgreSQL works well with cyrillic and UTF8

Bad news:

I discovered that upper(), lower() function doesn't works in my setup. Read http://fts.postgresql.org/db/msg.html?mid=1070198 for details.



Leave a message ? oleg@sai.msu.su