-
Notifications
You must be signed in to change notification settings - Fork 9
Description
An extension of #76 - I've just realised that wildcard field names are going to be a bit problematic. When parsing text from a query, the QueryTokenizer needs to know which index tokenizer to use when processing the search text.
Consider this index:
var index = new FullTextIndexBuilder<int>()
.WithDefaultTokenization(t => t.WithStemming()) // Stemming on all fields by default
.WithObjectTokenization<Customer>(o => o
.WithKey(c => c.Id)
.WithField(
"Name",
c => c.Name,
tokenizationOptions: fo => fo.WithTokenization(t => t)) // No stemming on the Name field
.WithDynamicFields("Tags", c => c.TagDictionary, "Tag_")
)
.Build();The default index tokenizer uses stemming, whereas the field Name has it's own index tokenizer configured without stemming. If we allowed wildcard field names like this [Na*]=Something then it's no longer clear which tokenizer to use for the search text Something (especially if we ended up with another field starting with Na).
So I think as things stand, the options are:
- Support wildcards, but duplicate the search parts for each matched field, e.g .
[Tag_*]=foowould be equivalent to searching for[Tag_One]=foo | [Tag_Two]=foo | [Tag_Three]=foo - Support searching across all fields emitted by a named dynamic field provider using some other syntax, e.g.
[?Tags]=foo(Syntax TBD). A single dynamic field provider will only ever have one index tokenizer associated to it, so this should work.
The first option would have a performance impact on the query, and we're probably going to need to build in some search optimisations to cache the search results emitted by a query to save the same search predicate being performed multiple times.
The second option is a bit more limited, but at least solves the issue across a specific dynamic field source.