Uploaded image for project: 'Picard'
  1. Picard
  2. PICARD-686

Enhanced Match Weighting

XMLWordPrintable

    • Icon: New Feature New Feature
    • Resolution: Unresolved
    • Icon: Normal Normal
    • None
    • None
    • Lookup & Match
    • None
    • All

      From: http://forums.musicbrainz.org/viewtopic.php?id=5443

      Is the percentage of match (the final one that is shown on the right hand pane) based on pure number of matching fields? or is there some kind of weighting?
      and, if there is a weighting, can it be tuned?
      In looking at some of the matched records, I am seeing things in yellow such as:
      artist: [orig] Ac/Dc [new] AC/DC
      Album Name: [orig] Cool Album. [new] cool album
      to me, this would be a 95+% match...
      Or other tracks where nothing was changed, but some of the fields were added
      Then, in another record, I show the same album name, track name, time, etc, but a totally different artist name. this (to me) should show up as red

      It would be nice to have something like a weighting system tune these... something like (as a really rough example):
      start at 100% match,
      reduce by:

      • 1% for case errors
      • 3% for extra punct/non alpha numeric character differences
      • 5% for soundex same but differently spelled in [Artist, Album, Track]
        + 5% for all the same words in the title, just rearranged
      • 5% for differences in fields [Released]
      • 1% for differences in minor fields
        + 10% for music signature match
      • 25% for different key fields
        + 20% for being only missing track in an album
        ...
        in the end, this could easily be added to scripting rules and make the tool much more hands off as well, and by allowing users to tune the information, it would allow for each person to tune as fits their collection. (if you know you have all imported content from CD's, then the one unmatched track is almost definitely the one missing track, no matter what it thinks.. or if you know that you are importing a bunch of tracks that the case is important, or there are lots of non-alpha characters in the titles that are key, then you could increase the penalty for those differences.
        Thoughts? is this already what is happening?

      ------

      Basically, I was envisioning this to be something plugin-enabled, so that, either in existing plugin's, or in a new class of them, you could write your own weighting approach, then balance them in the UI somewhere. this way, for international folks, they can avoid soundex (or replace it with something more specific to their language) or if you don't like approach b, fix it.

            Unassigned Unassigned
            dstrohl Dan Strohl
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

              Created:
              Updated:

                Version Package