- Article
- 6 minutes to read
Within a search service, synonym maps are a global resource that associates equivalent terms, expanding the scope of a query without the user having to provide the term. For example, assuming that "dog", "canine", and "puppy" are mapped synonyms, a query on "canine" will match a document containing "dog".
create synonyms
A synonym map is an asset that can be created once and used in many indices. Heservice leveldetermines how many synonym maps you can create, from three synonym maps for the free and basic levels, up to 20 for the standard levels.
You can create multiple synonym maps for different languages, such as English and French versions, or lexicons if your content includes technical or obscure terminology. Although you can create multiple synonym maps in your search service, within an index, a field definition can only have one synonym map designation.
A thesaurus map consists of name, format, and rules that act as thesaurus map entries. The only supported format isSolr
, It is likeSolr
the format determines the construction of the rule.
POST /synonymmaps?api-version=2020-06-30{ "name": "geo-synonyms", "format": "solr", "synonyms": " US, United States, United States of America\ n Washington, Washing, WA => WA\n"}
To create a synonym map, do it programmatically (the portal does not support synonym map definitions):
- Create synonym map (REST API). This reference is the most descriptive.
- Classe SynonymMap (.NET)miAdd synonyms using C#
- Classe SynonymMap (Python)
- Interfaz SynonymMap (JavaScript)
- Classe SynonymMap (Java)
establish rules
The mapping rules adhere to the open source Apache Solr synonym filter specification, which is described in this document:synonym filter. oSolr
The format supports two types of rules:
equivalence (where the terms are equal substitutes in the query)
explicit assignments (where terms are assigned to an explicit term before the query)
Each rule must be delimited by the newline character (\norte
). You can define up to 5,000 rules per synonym map in a free service and 20,000 rules per map in other layers. Each rule can have up to 20 expansions (or elements in a rule). For more information, seesynonym limits.
Query parsers will lowercase any uppercase or mixed-case terms, but if you want to preserve any special characters in the string, such as a comma or hyphen, add the appropriate escape characters when creating the thesaurus map.
equivalence rules
Rules for equivalent terms are delimited by commas within the rule itself. In the first example, a query onEU
will expand toEU
O"USA"
O"United States of America"
. Note that if you want to match a phrase, the query itself must be a quoted phrase query.
In the equivalence case, a query todog
will expand the query to also includepuppy
micanine
.
{"format": "solr","synonyms": " USA, United States, United States of America\n dog, puppy, canine\n coffee, latte, coffee cup, java\n"}
explicit mapping
The rules for an explicit mapping are indicated by an arrow=>
. When specified, a string of terms from a search query that matches the left hand side of=>
they will be replaced by the alternatives to the right at query time.
In the explicit case, a query forWashington
,Lavar.
oWashington
will be rewritten asWashington
, and the query engine will only search for matches on the termWashington
. Explicit mapping only applies in the specified direction and does not rewrite the queryWashington
forWashington
then.
{"format": "solr","synonyms": " Washington, WA => WA\n California, CA => CA\n"}
Escaping special characters
In full-text search, synonyms are parsed during query processing like any other query term, which means that the rules about special and reserved characters are applied to the terms in your synonym map. The list of characters that require escaping ranges from plain syntax to full syntax:
- simple syntax
+ | " ( ) ' \
- full syntax
+ - & | ! ( ) { } [ ] ^ " ~ * ? : \ /
Note that if you need to preserve characters that the default parser would discard during indexing, you should replace them with a parser that preserves them. Some options include natural Microsoftlanguage analyzers, which preserves hyphenated words or a custom parser for more complex patterns. For more information, seePartial terms, patterns and special characters.
The following example shows an example of escaping a character with a backslash:
{"formato": "solr","sinónimos": "WA\, EE. UU., WA, Washington\n"}
Since the backslash is a special character in other languages like JSON and C#, you may need to use a double escape. For example, the JSON sent to the REST API for the synonym map above would look like this:
{"format":"solr","synonyms": "WA\\, USA, WA, Washington"}
Upload and manage synonym maps
As mentioned above, you can create or update a synonym map without disrupting query and indexing workloads. A synonym map is a self-contained object (such as indexes or data sources), and as long as it is not used by any field, updates will not cause indexing or query errors. However, after adding a synonym map to a field definition, if you remove a synonym map, any query that includes the fields in question will fail with a 404 error.
Creating, updating, and deleting a thesaurus map is always a document-wide operation, which means that you cannot update or delete parts of the thesaurus map incrementally. Updating even a single rule requires a reload.
Assign synonyms to fields
After loading a synonym map, you can enable synonyms in fields of typeEdm.Cadena
oCollection (Edm.String)
, in fields with"searchable": true
. As noted, a field definition can use only one synonym map.
POST /indexes?api-version=2020-06-30{ "name":"hotels-sample-index", "fields":[ { "name":"description", "type":"Edm.String", "buscable":true, "synonymMaps":[ "en-synonyms" ] }, { "name":"description_fr", "type":"Edm.String", "buscable":true, "analyzer":"fr .microsoft", "synonymMaps":["fr-sinónimos" ] } ]}
Query on equivalent or mapped fields
Adding synonyms does not impose new requirements on query construction. You can query for terms and phrases just as you did before adding synonyms. The only difference is that if a query term exists in the synonym map, the query engine expands or rewrites the term or phrase based on the rule.
How synonyms are used during query execution
Synonyms are a query expansion technique that supplements the contents of an index with equivalent terms, but only for fields that have a synonym assignment. If a query with field scopeexcludesa synonym-enabled field, you will not see synonym map matches.
For synonym-enabled fields, the synonyms are subject to the same text analysis as the associated field. For example, if a field is parsed with the standard Lucene parser, the synonym terms will also be subject to the standard Lucene parser at query time. If you want to preserve punctuation, such as periods or hyphens, in the synonym term, apply a content-preserving parser to the field.
Internally, the synonyms function rewrites the original query with synonyms using the OR operator. For this reason, the highlighted profiles and the hit score treat the original term and its synonyms as equivalent.
Synonyms only apply to free-form text queries and do not support filters, facets, autocomplete, or suggestions. Autocomplete and suggestions are based solely on the original term; synonym matches do not appear in the response.
Synonym expansions do not apply to wildcard search terms; Prefix, fuzzy, and regular expression terms are not expanded.
If you need to perform a single query that applies fuzzy searches, regular expressions, or synonym and wildcard expansion, you can combine the queries using the OR syntax. For example, to combine synonyms with wildcards for a simple query syntax, the term would be<query> | <query>*
.
If you have an existing index in a development (not production) environment, try a small dictionary to see how adding synonyms changes the search experience, including the impact on score profiles, match highlighting, and suggestions .
Next steps
Create a synonym map (REST API)