Search using fuzzy methods

Search using fuzzy methods.

fuzzy()

The fuzzy() predicate uses optimal string alignment distance calculations to match properties designated as StrFields. Variations in the letters used in words, such as misspellings, are the focus of this predicate. The edit distance specified refers to the number of transpositions of letters, with a single transposition of letters constituting one edit.Find the exact person name of James Beard:

g.V().hasLabel('person').has('name', fuzzy('James Beard', 0)).values('name')

The 0 designates that the result must be an exact match. This query results in:

==>James BEARD

Changing the last value in a fuzzy() predicate will find misspellings:

g.V().hasLabel('person').has('name', fuzzy('James Beard', 1)).values('name')

The 1 designates that the result matches with an edit distance of at most one. This query results in:

==>James BEARD
==>Jmaes BEARD

If a person vertex exists with the misspelling Jmaes Beard, the query shown will find both vertices. The value of 1 finds this misspelling because of the single transposition of the letters a and m.

Note that searching for a misspelling will find the records with the correct spelling, as well as the misspelled name:

g.V().hasLabel('person').has('name', fuzzy('Jmase Beard', 2)).values('name')

The 2 designates that the result must match with at most two transpositions. This query results in:

==>James BEARD
==>Jmaes BEARD
==>Jmase BEARD

If a person vertex exists with the misspelling Jmaes Beard, the query shown will find both vertices. The value of 2 finds both the misspelling because of the single transposition of letters, e and s in Jmaes Beard, as well as the correct spelling with a second transposition of letters from Jmase Beard to James Beard .

Specifying an edit distance of 3 or greater matches too many terms for useful results. The resulting search index will be too large to efficiently filter queries.

tokenFuzzy()

The tokenFuzzy() predicate similar to fuzzy(), but searches for variation across individual tokens in analyzed textual data (TextFields).Find the recipe name that includes the word Wild while searching for the word with a one-letter misspelling:

g.V().hasLabel('recipe').has('name', tokenFuzzy('Wlid',1)).values('name')

The 1 designates that one letter misspelling (one transposition) is acceptable.This query results in:

==>Wild Mushroom Stroganoff

Was this helpful?

Give Feedback

How can we improve the documentation?

© 2024 DataStax | Privacy policy | Terms of use

Apache, Apache Cassandra, Cassandra, Apache Tomcat, Tomcat, Apache Lucene, Apache Solr, Apache Hadoop, Hadoop, Apache Pulsar, Pulsar, Apache Spark, Spark, Apache TinkerPop, TinkerPop, Apache Kafka and Kafka are either registered trademarks or trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries. Kubernetes is the registered trademark of the Linux Foundation.

General Inquiries: +1 (650) 389-6000, info@datastax.com