Fuzzy Searching

A fuzzy search can locate all occurrences of a word, together with all other words that are "close" in spelling to the original word. You specify the degree of closeness to the original word. ZyLAB's fuzzy search is optimized for OCR-errors, misspellings and spelling variations in names that are derived from a non-roman script (such as Cyrillic, Arabic, Farsi, Hindi, Hebrew, Chinese or Japanese)

One main advantage of the ZyLAB fuzzy algorithms is that , the ZyIMAGE fuzzy search is language and application independent and does not need to be trained. ZyLAB's fuzzy search keeps a excellent precision, even at high fuzzy degrees and performs almost just as well with large datasets as with smaller datasets. Even when the first character of a word is different from a query word, ZyLAB's fuzzy search will pick up the word.

Examples of Fuzzy

Think of fuzzy search in terms of how similar one word is to another. To change one word into another, you can add, delete and replace single characters. A single degree is one change of one character. For example:

- To change "commuter" into "computer" requires one replacement: the second "m" with "p." One degree.
- To change "computw" into "computer" requires one replacement and one addition: replace "w" with "e"and add "r." Two degrees.
- To change "coinputer" into "computer" requires one replacement and one deletion: replace "i" with "m,"and delete "n." Two degrees.

The higher the degree, the greater the margin of error; the lower the degree, the less leeway is allowed in matching a search term with words in your files.

Degree of Fuzzy

Standard, the degree of fuzzy ranges from 1 to 4 by default. We recommend that you set the degree to 2 for searching normal text. This provides for mistakes that occur in scanned text because of broken and joined characters. If you need to search for long words, set the degree to 3 or 4.

An additional constraint takes into account the length of the word you are searching for, to prevent the retrieval of too many irrelevant shorter words. This constraint limits the degree for a specific word to be the lesser of the fuzzy degree settings and 0.5 times the word's length. For example if you set the fuzzy degree to 4 and the search term is six characters long, the actual degree of fuzzy will be 0.5 X 6 = 3 rather than 4.

In combination with progressive search, you can run a first search with a fuzzy degree of 2 and then limit this result set with an additional part of the query without fuzzy. This allows you to overcome the problem of the retrieval "too much noise".

Progressive Search

Progressive Search is one of the ways ZyFIND provides to execute a "search within a search once." The Progressive Search checkbox, located at the left of the Search Statement text box. Check Progressive Search if you want to restrict the next search to the set of files retrieved in the immediately preceding search. You can only carry out a progressive search once per query. In all other respects than those listed above, progressive search behaves as an unrestricted search.

Although Progressive Search remains enabled until you turn it off, a search with no results resets this mechanism, so that the next search again applies to the entire index.