Searching for the character ♠ which is the title of a track by Marilyn Manson produces no results:
Search results should include at least http://test.musicbrainz.org/recording/3fb639b1-5f34-4b3e-a092-79d346675a69
The search server generally ignores punctuation and other unusual characters when matching, this gives the best results. But of course if your search consists only of punctuation then you have a problem because the search gets converted to an empty search, we have a way round this http://jira.musicbrainz.org/browse/SEARCH-33 , we convert a range of punctuation to ascii text i.e '!!!' to 'ApostropheApostropheApostrophe'. The only trouble is that it doesnt need to match the whole string so if we added '♠' -> 'Spade' it would also change names which just include '♠' as one character which we don't want to convert so I cant fix until Ive resolved this.
can be found by both of these searches at the moment
it is forgiving , but if I implmented the fix as is for your track then the 1st query would no longer work
In advance of a resolution to this, I thought I would mention something that I was previously unaware of.
The indexed search indexes comments as well. So if you add a comment to this song, perhaps call it something this song is colloquially known as ('spade'), you could use that text instead in order to adequately search for it.
Suggestion by Nikki, only do the special character conversion if on analyzing without it results in an empty token
That didnt work because the CharFilter only comes before Tokenizer in a pipeline, so once we have tokenized it and found it returning nothing we cannot then go back to and use the filter. But have fixed by modifying the tokenizer to not throw away punctuation but to keep it, but group tokens that only contain punctuation differently to tokens containing alphanumeric text and punctuation, then use a filter to strip out the punctuation characters when matched as part of alphanumeric match but keep them in the case of punctuation only token.
(When I say punctuation I actually mean any character that is not a letter, number, Chinese or whitespace).