Synonyms for query expansion in a search index - Azure Cognitive Search (2023)

  • Article
  • 6 minutes to read

Within a search service, synonym maps are a global resource that associates equivalent terms, expanding the scope of a query without the user having to provide the term. For example, assuming that "dog", "canine", and "puppy" are mapped synonyms, a query on "canine" will match a document containing "dog".

create synonyms

A synonym map is an asset that can be created once and used in many indices. Heservice leveldetermines how many synonym maps you can create, from three synonym maps for the free and basic levels, up to 20 for the standard levels.

You can create multiple synonym maps for different languages, such as English and French versions, or lexicons if your content includes technical or obscure terminology. Although you can create multiple synonym maps in your search service, within an index, a field definition can only have one synonym map designation.

A thesaurus map consists of name, format, and rules that act as thesaurus map entries. The only supported format isSolr, It is likeSolrthe format determines the construction of the rule.

POST /synonymmaps?api-version=2020-06-30{ "name": "geo-synonyms", "format": "solr", "synonyms": " US, United States, United States of America\ n Washington, Washing, WA => WA\n"}

To create a synonym map, do it programmatically (the portal does not support synonym map definitions):

  • Create synonym map (REST API). This reference is the most descriptive.
  • Classe SynonymMap (.NET)miAdd synonyms using C#
  • Classe SynonymMap (Python)
  • Interfaz SynonymMap (JavaScript)
  • Classe SynonymMap (Java)

establish rules

The mapping rules adhere to the open source Apache Solr synonym filter specification, which is described in this document:synonym filter. oSolrThe format supports two types of rules:

  • equivalence (where the terms are equal substitutes in the query)

  • explicit assignments (where terms are assigned to an explicit term before the query)

Each rule must be delimited by the newline character (\norte). You can define up to 5,000 rules per synonym map in a free service and 20,000 rules per map in other layers. Each rule can have up to 20 expansions (or elements in a rule). For more information, seesynonym limits.

Query parsers will lowercase any uppercase or mixed-case terms, but if you want to preserve any special characters in the string, such as a comma or hyphen, add the appropriate escape characters when creating the thesaurus map.

equivalence rules

Rules for equivalent terms are delimited by commas within the rule itself. In the first example, a query onEUwill expand toEUO"USA"O"United States of America". Note that if you want to match a phrase, the query itself must be a quoted phrase query.

In the equivalence case, a query todogwill expand the query to also includepuppymicanine.

{"format": "solr","synonyms": " USA, United States, United States of America\n dog, puppy, canine\n coffee, latte, coffee cup, java\n"}

explicit mapping

The rules for an explicit mapping are indicated by an arrow=>. When specified, a string of terms from a search query that matches the left hand side of=>they will be replaced by the alternatives to the right at query time.

In the explicit case, a query forWashington,Lavar.oWashingtonwill be rewritten asWashington, and the query engine will only search for matches on the termWashington. Explicit mapping only applies in the specified direction and does not rewrite the queryWashingtonforWashingtonthen.

{"format": "solr","synonyms": " Washington, WA => WA\n California, CA => CA\n"}

Escaping special characters

In full-text search, synonyms are parsed during query processing like any other query term, which means that the rules about special and reserved characters are applied to the terms in your synonym map. The list of characters that require escaping ranges from plain syntax to full syntax:

  • simple syntax + | " ( ) ' \
  • full syntax + - & | ! ( ) { } [ ] ^ " ~ * ? : \ /

Note that if you need to preserve characters that the default parser would discard during indexing, you should replace them with a parser that preserves them. Some options include natural Microsoftlanguage analyzers, which preserves hyphenated words or a custom parser for more complex patterns. For more information, seePartial terms, patterns and special characters.

The following example shows an example of escaping a character with a backslash:

{"formato": "solr","sinónimos": "WA\, EE. UU., WA, Washington\n"}

Since the backslash is a special character in other languages ​​like JSON and C#, you may need to use a double escape. For example, the JSON sent to the REST API for the synonym map above would look like this:

{"format":"solr","synonyms": "WA\\, USA, WA, Washington"}

Upload and manage synonym maps

As mentioned above, you can create or update a synonym map without disrupting query and indexing workloads. A synonym map is a self-contained object (such as indexes or data sources), and as long as it is not used by any field, updates will not cause indexing or query errors. However, after adding a synonym map to a field definition, if you remove a synonym map, any query that includes the fields in question will fail with a 404 error.

Creating, updating, and deleting a thesaurus map is always a document-wide operation, which means that you cannot update or delete parts of the thesaurus map incrementally. Updating even a single rule requires a reload.

Assign synonyms to fields

After loading a synonym map, you can enable synonyms in fields of typeEdm.CadenaoCollection (Edm.String), in fields with"searchable": true. As noted, a field definition can use only one synonym map.

POST /indexes?api-version=2020-06-30{ "name":"hotels-sample-index", "fields":[ { "name":"description", "type":"Edm.String", "buscable":true, "synonymMaps":[ "en-synonyms" ] }, { "name":"description_fr", "type":"Edm.String", "buscable":true, "analyzer":"fr .microsoft", "synonymMaps":["fr-sinónimos" ] } ]}

Query on equivalent or mapped fields

Adding synonyms does not impose new requirements on query construction. You can query for terms and phrases just as you did before adding synonyms. The only difference is that if a query term exists in the synonym map, the query engine expands or rewrites the term or phrase based on the rule.

How synonyms are used during query execution

Synonyms are a query expansion technique that supplements the contents of an index with equivalent terms, but only for fields that have a synonym assignment. If a query with field scopeexcludesa synonym-enabled field, you will not see synonym map matches.

For synonym-enabled fields, the synonyms are subject to the same text analysis as the associated field. For example, if a field is parsed with the standard Lucene parser, the synonym terms will also be subject to the standard Lucene parser at query time. If you want to preserve punctuation, such as periods or hyphens, in the synonym term, apply a content-preserving parser to the field.

Internally, the synonyms function rewrites the original query with synonyms using the OR operator. For this reason, the highlighted profiles and the hit score treat the original term and its synonyms as equivalent.

Synonyms only apply to free-form text queries and do not support filters, facets, autocomplete, or suggestions. Autocomplete and suggestions are based solely on the original term; synonym matches do not appear in the response.

Synonym expansions do not apply to wildcard search terms; Prefix, fuzzy, and regular expression terms are not expanded.

If you need to perform a single query that applies fuzzy searches, regular expressions, or synonym and wildcard expansion, you can combine the queries using the OR syntax. For example, to combine synonyms with wildcards for a simple query syntax, the term would be<query> | <query>*.

If you have an existing index in a development (not production) environment, try a small dictionary to see how adding synonyms changes the search experience, including the impact on score profiles, match highlighting, and suggestions .

Next steps

Create a synonym map (REST API)

Top Articles
Latest Posts
Article information

Author: Duncan Muller

Last Updated: 12/05/2022

Views: 5553

Rating: 4.9 / 5 (59 voted)

Reviews: 82% of readers found this page helpful

Author information

Name: Duncan Muller

Birthday: 1997-01-13

Address: Apt. 505 914 Phillip Crossroad, O'Konborough, NV 62411

Phone: +8555305800947

Job: Construction Agent

Hobby: Shopping, Table tennis, Snowboarding, Rafting, Motor sports, Homebrewing, Taxidermy

Introduction: My name is Duncan Muller, I am a enchanting, good, gentle, modern, tasty, nice, elegant person who loves writing and wants to share my knowledge and understanding with you.