Issue Details (XML | Word | Printable)

Key: MBS-4698
Type: Bug Bug
Status: Closed Closed
Resolution: Fixed
Priority: Normal Normal
Assignee: nikki
Reporter: Johannes Weißl
Votes: 1
Watchers: 3
Operations

If you were logged in you would be able to see more operations.
MusicBrainz Server

Amazon links get sometimes incorrectly cleaned up by URLCleanup.js

Created: 12/May/12 11:53 AM   Updated: 20/Aug/12 09:51 AM   Resolved: 20/Aug/12 09:51 AM
Component/s: Edit system, JavaScript
Affects Version/s: None
Fix Version/s: Bug fixes, 2012-08-20

Issue Links:
Relates
 



Sort Order: Ascending order - Click to sort in descending order
Johannes Weißl added a comment - 12/May/12 12:23 PM

Hmm, I don't have a perfect solution for this, so I'm unassigning myself. My best try would be to match for "/dp/([A-Z0-9]{10})" or "/product/([A-Z0-9]{10})" first. The regular expression which causes the bug is

/(?:\/|\ba=)([A-Z0-9]{10})(?:[/?&%#]|$)/

https://github.com/metabrainz/musicbrainz-server/blob/master/root/static/scripts/edit/MB/Control/URLCleanup.js#L197


Nicolás Tamargo added a comment - 26/May/12 01:34 AM

Don't all ASINs start with B? That would already make the chances of it picking the wrong thing much smaller, wouldn't it?


Aurélien Mino added a comment - 26/May/12 09:26 AM

ASINs for audiobooks e.g. don't start with B.


Nicolás Tamargo added a comment - 26/May/12 09:30 AM

And their three first chars are not Letter Number Number either? (just trying to find something to make this somewhat more precise)


Johannes Weißl added a comment - 27/May/12 07:41 PM

No, for books (also audiobooks) the ASIN is often the ISBN number, so all numbers.


nikki added a comment - 01/Jun/12 07:45 PM
musicbrainz=# select url from url where url ~ '^http://www.amazon.(com|ca|co.uk|fr|de|it|es|co.jp|cn)/gp/product/[0-9A-Z]{10}$' and url !~ '^http://www.amazon.(com|ca|co.uk|fr|de|it|es|co.jp|cn)/gp/product/(B[0-9A-Z]{9}|[0-9]{9}[0-9X])$';
 url 
-----
(0 rows)

musicbrainz=# select url from url where url ~ '^http://www.amazon.(com|ca|co.uk|fr|de|it|es|co.jp|cn)/gp/product/[0-9A-Z]{10}$' and url !~ '^http://www.amazon.(com|ca|co.uk|fr|de|it|es|co.jp|cn)/gp/product/(B00[0-9A-Z]{7}|[0-9]{9}[0-9X])$';
 url                      
-----------------------------------------------
 http://www.amazon.com/gp/product/BT00CHI1V2
 http://www.amazon.co.uk/gp/product/BT00CHI1V2
(2 rows)

musicbrainz=#

I think my preference would be to try multiple replacements, the first one trying to match the typical URL formats (as hrglgrmpf mentioned) with a relatively strict regex for the ASIN (e.g. the one from the second query) which should work for the vast majority of cases and then try the current method of trying to find anything that looks like it could be an ASIN if that fails.


Johannes Weißl added a comment - 01/Jun/12 07:55 PM

+1 for that... I think it is the best thing we can do... do you want to implement it?


Nicolás Tamargo added a comment - 02/Jun/12 09:54 PM

Whoever rewrites the cleanup code should also take into account the issue with artist profiles in MBS-4806 (I was going to look at it but it really makes more sense to deal with both issues in one go).


nikki added a comment - 10/Aug/12 09:05 AM