Issue Details (XML | Word | Printable)

Key: SEARCH-51
Type: Bug Bug
Status: Closed Closed
Resolution: Fixed
Priority: Normal Normal
Assignee: Paul Taylor
Reporter: Alex Mauer
Votes: 0
Watchers: 0
Operations

If you were logged in you would be able to see more operations.
MusicBrainz Search Server

Searching for certain characters returns no results, even if they're valid.

Created: 11/Oct/10 08:53 PM   Updated: 20/Oct/11 11:05 AM   Resolved: 20/Oct/11 11:03 AM
Component/s: None
Affects Version/s: None
Fix Version/s: 2011-12-08

Issue Links:
Depends
 


 Description  « Hide

Searching for the character ♠ which is the title of a track by Marilyn Manson produces no results:
http://test.musicbrainz.org/search?query=%E2%99%A0&type=recording

Search results should include at least http://test.musicbrainz.org/recording/3fb639b1-5f34-4b3e-a092-79d346675a69



Sort Order: Ascending order - Click to sort in descending order
Paul Taylor added a comment - 20/Jan/11 12:29 PM

The search server generally ignores punctuation and other unusual characters when matching, this gives the best results. But of course if your search consists only of punctuation then you have a problem because the search gets converted to an empty search, we have a way round this http://jira.musicbrainz.org/browse/SEARCH-33 , we convert a range of punctuation to ascii text i.e '!!!' to 'ApostropheApostropheApostrophe'. The only trouble is that it doesnt need to match the whole string so if we added '♠' -> 'Spade' it would also change names which just include '♠' as one character which we don't want to convert so I cant fix until Ive resolved this.



VxJasonxV added a comment - 06/Aug/11 11:52 PM

In advance of a resolution to this, I thought I would mention something that I was previously unaware of.

The indexed search indexes comments as well. So if you add a comment to this song, perhaps call it something this song is colloquially known as ('spade'), you could use that text instead in order to adequately search for it.


Paul Taylor added a comment - 16/Oct/11 09:47 PM

Suggestion by Nikki, only do the special character conversion if on analyzing without it results in an empty token


Paul Taylor added a comment - 20/Oct/11 11:03 AM

That didnt work because the CharFilter only comes before Tokenizer in a pipeline, so once we have tokenized it and found it returning nothing we cannot then go back to and use the filter. But have fixed by modifying the tokenizer to not throw away punctuation but to keep it, but group tokens that only contain punctuation differently to tokens containing alphanumeric text and punctuation, then use a filter to strip out the punctuation characters when matched as part of alphanumeric match but keep them in the case of punctuation only token.

(When I say punctuation I actually mean any character that is not a letter, number, Chinese or whitespace).


Paul Taylor added a comment - 20/Oct/11 11:03 AM

Fixed.