|
e.g http://test.musicbrainz.org/recording/9444485e-0cd5-4750-a35f-6d5fae25876d can be found by both of these searches at the moment it is forgiving , but if I implmented the fix as is for your track then the 1st query would no longer work In advance of a resolution to this, I thought I would mention something that I was previously unaware of. The indexed search indexes comments as well. So if you add a comment to this song, perhaps call it something this song is colloquially known as ('spade'), you could use that text instead in order to adequately search for it. Suggestion by Nikki, only do the special character conversion if on analyzing without it results in an empty token That didnt work because the CharFilter only comes before Tokenizer in a pipeline, so once we have tokenized it and found it returning nothing we cannot then go back to and use the filter. But have fixed by modifying the tokenizer to not throw away punctuation but to keep it, but group tokens that only contain punctuation differently to tokens containing alphanumeric text and punctuation, then use a filter to strip out the punctuation characters when matched as part of alphanumeric match but keep them in the case of punctuation only token. (When I say punctuation I actually mean any character that is not a letter, number, Chinese or whitespace). |
||||||||||||||||||||||||||||||||||||||||||||||||||
The search server generally ignores punctuation and other unusual characters when matching, this gives the best results. But of course if your search consists only of punctuation then you have a problem because the search gets converted to an empty search, we have a way round this http://jira.musicbrainz.org/browse/SEARCH-33
, we convert a range of punctuation to ascii text i.e '!!!' to 'ApostropheApostropheApostrophe'. The only trouble is that it doesnt need to match the whole string so if we added '♠' -> 'Spade' it would also change names which just include '♠' as one character which we don't want to convert so I cant fix until Ive resolved this.