Skip to content

Query syntax: Support wildcard field searches/searching across all dynamic fields from a specific provider #77

@mikegoatly

Description

@mikegoatly

An extension of #76 - I've just realised that wildcard field names are going to be a bit problematic. When parsing text from a query, the QueryTokenizer needs to know which index tokenizer to use when processing the search text.

Consider this index:

var index = new FullTextIndexBuilder<int>()
    .WithDefaultTokenization(t => t.WithStemming()) // Stemming on all fields by default
    .WithObjectTokenization<Customer>(o => o
        .WithKey(c => c.Id)
        .WithField(
           "Name", 
           c => c.Name, 
           tokenizationOptions: fo => fo.WithTokenization(t => t)) // No stemming on the Name field
        .WithDynamicFields("Tags", c => c.TagDictionary, "Tag_")
    )
    .Build();

The default index tokenizer uses stemming, whereas the field Name has it's own index tokenizer configured without stemming. If we allowed wildcard field names like this [Na*]=Something then it's no longer clear which tokenizer to use for the search text Something (especially if we ended up with another field starting with Na).

So I think as things stand, the options are:

  1. Support wildcards, but duplicate the search parts for each matched field, e.g . [Tag_*]=foo would be equivalent to searching for [Tag_One]=foo | [Tag_Two]=foo | [Tag_Three]=foo
  2. Support searching across all fields emitted by a named dynamic field provider using some other syntax, e.g. [?Tags]=foo (Syntax TBD). A single dynamic field provider will only ever have one index tokenizer associated to it, so this should work.

The first option would have a performance impact on the query, and we're probably going to need to build in some search optimisations to cache the search results emitted by a query to save the same search predicate being performed multiple times.

The second option is a bit more limited, but at least solves the issue across a specific dynamic field source.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions