Archive for May, 2010

Use of Text indexes

30 May 2010

For what can we use these text indexes? There are many options:

  1. Large text parts (also XML)
  2. Binary documents (Word, PDF and many more)
  3. Case insensitive search
  4. Diacritic search
  5. Fuzzy search
  6. Stemming search

The above are the ones in which I choose for a text index. There are many more possibilities, such as position and score of a search.

The first two I think are the most used options for text, the other are less used, which is sadly, because option 3 and 4 are very simple to implement with Oracle text.

The third option is the one where many people choose for a function-based index (theu use upper or lower to get the case insensitive search), but Oracle Text is per default case-insensitive. The fourth option you can include in the text so you can immediately search without diacritics to find the term with a diacritic letter. This can be very useful for languages which use diacritics (German, French, Hungarian and many more). Fuzzy search is the search for words that looks similar, normally because of spelling errors. Fuzzy search is not possible for all languages supported for Oracle Text. Stemming search is the option to search for words which have the same linguistic root. So you can search cat and you also find cats. Stemming also is not possible for all languages supported for Oracle Text.

Later on I will give some examples how you can do some of the above.