Issue Details (XML | Word | Printable)

Key: MBS-2021
Type: Improvement Improvement
Status: Closed Closed
Resolution: Fixed
Priority: Normal Normal
Assignee: Michael Wiencek
Reporter: nikki
Votes: 22
Watchers: 15
Operations

If you were logged in you would be able to see more operations.
MusicBrainz Server

Set recording times automatically

Created: 18/May/11 07:11 AM   Updated: Tuesday 05:10 AM  Due: 31/Mar/14   Resolved: Tuesday 05:10 AM
Component/s: User interface
Affects Version/s: None
Fix Version/s: 2014-04-14

Issue Links:
Duplicate
 
Relates
Resolution
 


 Description  « Hide

I think recording times should be automatically set as the average of the track times (ignoring tracks where the time isn't set). As it is, you have to set both the track times and recording times manually, which is twice as much work and confuses people - MBS-2005

I would say only make the recording times editable when there aren't any tracks (i.e. standalone recordings)



Sort Order: Ascending order - Click to sort in descending order
Aurélien Mino added a comment - 08/Jun/11 05:53 AM

IMO it should work both ways:
if a recording has a defined length, but is linked to only one track that has no length, track time should be set to recording's one.


voiceinsideyou added a comment - 08/Jun/11 06:11 AM

Yup, I agree.


monxton added a comment - 05/Jul/11 09:53 AM

The issue http://tickets.musicbrainz.org/browse/MBS-2302 has been marked as a dup of this one. The problem there is that if you add a release, then attach a disc ID, the track times get fixed by the bot, but the recording times do not. I observed this, for example, in the release http://musicbrainz.org/release/4c0dfe6a-87eb-4aef-8820-99eeae4a9bcb. I am particularly concerned about this release because it is comprised of new recordings of previously released tracks, with different artists performing, so it is important that the list of recordings shows that they have significantly different recording lengths to avoid the recordings being merged by a tidy editor.

So my first question is, will the fix include a sweep of recently-added disc IDs, or should I just hand-edit the recording times now?

However, having read this issue, I'd also like to express concern about the proposed solution in general. The description is quite terse, but as it is talking about average track times, I'm guessing that it is saying that each time a new track is associated or disassociated with a recording, a new average will be calculated and applied to the recording. It is very easy to get confused about whether or not a set of tracks correspond to the same recording. I don't agree that recording lengths should be averaged in that way. It's just too tricksy and is likely to result in lower-quality data, IMO.

I'd definitely like to see a bot handle the cases where either the recording length or the track length is unset and is first associated with a track or recording with a specified length, or the length of the associated track or recording is first set, since in the majority of cases there is only ever one track associated with one recording.

I guess there are other hierarchies of quality of length data too: lengths from disc IDs are more significant than those from digital data, lengths from albums more than those from compilations, so they should not be given equal weight.


Johannes Weißl added a comment - 03/Aug/11 04:02 AM

Hmm, I'm against setting track list times from recordings! It should only be the other way round! Track list times are a property of the medium, and shouldn't be set automatically.


voiceinsideyou added a comment - 03/Aug/11 01:18 PM

hrglgrmphf: Sure, but 99.999% (randomly made up statistic) of track times are +/- a second from the recording time surely, so doing it both ways has got to be a net productivity win. It can always be overridden, and no-one is arguing that it should replaceAnd help with confidence of matching in taggers, which will typically be using the track times, rather than recording.

monxton: Yeah, automatically averaging is probably a bit dubious now I think about it further. Sometimes tracks on mediums have minutes of silence added for bonus tracks that start as the next track, for example - you don't want that screwing up the recording time and then being unable to edit it to override it.

I would suggest that this could be mitigated by only averaging if the existing times are within range X seconds of one another, and only if the recording has no existing time. (probably a rare situation since it will tend to be set as soon as the first track with length is added I guess) If the lengths are too different, this could be highlighted in a report of "tracks with very different lengths linked to a recording with no length" for manual review.

Otherwise perhaps there needs to be some advanced mode to the release editor that allows you to set checkboxes for recordings you want to update to the same as the track time - but that would be making the RE even more complicated than it already is :-/ I guess an extended implementation of MBS-2513 including times.


Johannes Weißl added a comment - 03/Aug/11 01:53 PM

Hmm, you are right, I am also against any average! In most cases, the difference between average and just a random recording are neglectable, so it doesn't matter. And it extreme cases it means, instead one correct one (of multiple) we have a duration that was never correct.

But I must say I'm still strongly against setting track list times automatically. The unknown ":" value gives me as an editor the important information that nobody has entered/checked a duration for that track. Otherwise it would be even harder to detect incorrectly merged recordings (e.g. a vinyl single recording that is different).

A UI option to manually set track times to recordings (or the other way round) would be my preferred solution!


voiceinsideyou added a comment - 03/Aug/11 02:27 PM

Manually setting everything when we have the ability to do the right thing most of the time (as long as there is a way to manually correct it if it is wrong), scales a LOT better. Manual approaches really should be the last option. I personally believe we as a community continually underestimate how much work is involved, and how many years it will take take, to manually do stuff across the DB (for popular artists this is not so much of a problem, but think of the mass of VA releases, the less edited artists) - for the sake of avoiding limited risk of false positives in a handful of cases - when in fact there have probably been far more edits made manually that are blatantly incorrect than the system would make.

Editing time spent manually setting times is editing time NOT able to spent adding ARs, moving other data into NGS structure etc.


Johannes Weißl added a comment - 03/Aug/11 03:09 PM

I'll try to explain: doing such a move (recording->track duration) automatically will only destroy data, and add absolutely nothing to the database. Everyone (Picard tagger, BBC, ...) can do such an automatic move on their own when displaying a track list any time. Without manual checking the track times are not worth anything...


voiceinsideyou added a comment - 03/Aug/11 03:57 PM

Completely disagree; and have no idea how adding data that doesn't exist could possibly be considered "destroy[ing] data".


Oliver Charles added a comment - 03/Aug/11 04:00 PM

A database is established on true facts. I agree with hrglgrmpf that deriving a recording time from a track time is not necessarily the truth, but just a good guess.


monxton added a comment - 03/Aug/11 10:55 PM

The current situation is that if the recording is created automatically when a release is added, then its length is set to the length of the track. But if the track length is set later, for example from a Disc ID, then the recording length remains unset.

Your point about the recording length not necessarily being the same as the track length is true. But it is no more and no less true at the time the release is created than when the track length is set at a later time.

If I add a Disc ID and the recording lengths are unset, it seems crazy not to take the opportunity to set the recording lengths to the best data we have. This is not even an option at present, I can only copy each of the lengths manually.


Johannes Weißl added a comment - 03/Aug/11 11:59 PM

My comments were just about the other way round: recording duration --> track duration. I do not want that.

I support setting the duration for the recording if it unset automatically (but then it can only be changed manually, no average etc.)!


voiceinsideyou added a comment - 04/Aug/11 03:33 AM

Suffice to say, we've created a royal confusion on this ticket


Oliver Charles added a comment - 09/Sep/11 01:49 PM

This requires a decision on what to do before I can start work on it.


Jim DeLaHunt added a comment - 03/Oct/11 10:12 AM

I agree with monxton: "If I add a Disc ID and the recording lengths are unset, it seems crazy not to take the opportunity to set the recording lengths to the best data we have."

I partially disagree with Oliver Charles: "A database is established on true facts." No, I think MusicBrainz is a database of claims about recorded music by various people and machines, along with some Style guidelines explaining what kind of claims we prefer to use. Some of those claims describe reality well, some describe reality poorly, and some are partly right but inaccurate. Some match our guidelines well, some poorly, and sometimes our guidelines are so hard to understand that we can't agree whether a particular claim matches well or poorly. We have voting and editing mechanisms to ratchet the contents of the database closer to claims that describe reality well and in line with our guidelines.

I think disc IDs provide high quality claims about the length of Recordings, especially for Recordings which are attached only to one Track. If the Recording currently has an empty duration, the disc ID's claim is even more compelling.


Paul Taylor added a comment - 11/Nov/11 09:40 AM

Agree with voiceinsideyou general point that we should be doing this automatically when it is to add data that is missing, this is how I see it:

I don't like the average idea.
If we have a recording time and track time unset, and we add a time to one, it should definitently add the time to the other, and should work both ways.
If we change a track time and the recording time is already set but the recording only appears on one release then the recording time shouldn't be updated automtically, but there should be an option/button to enable this if the editor wants.
If we change a recording time and the recording is on one release only, we should give the option/button again.
If we change a track time and the recording time is already set and exists on multiple releases, do nothing.
If we change a recording time and the track time is already set and exists on multiple releases, do nothing.

hrglgrmp:
'I'll try to explain: doing such a move (recording->track duration) automatically will only destroy data, and add absolutely nothing to the database. Everyone (Picard tagger, BBC, ...) can do such an automatic move on their own when displaying a track list any time. Without manual checking the track times are not worth anything...'

Disagree, it takes an awful lot more webservice queries to have to have to consider both track times/recording times. The big problem is not so much when they are slightly different, but when set in one case but not the other.


Johannes Weißl added a comment - 15/Nov/11 12:09 AM

Disagree, it takes an awful lot more webservice queries to have to have to consider both track times/recording times. The big problem is not so much when they are slightly different, but when set in one case but not the other.

The webservice argument is not a strong one in my opinion, e.g. there could very well be a parameter ("get_track_times_from_recordings_when_unset"), not that I want to add anything like that, just as an example.

If we have a recording time and track time unset, and we add a time to one, it should definitently add the time to the other, and should work both ways.

I don't think that this is good. Adding durations to recordings usually happens through merges. Adding durations to tracks with unset duration after recording merges would be completely nonsense.


Johannes Weißl added a comment - 18/Nov/11 03:28 AM

@Oliver: I'd like to implement this, are you currently working on this?

My plan is this:
1. Make a modbot script that sets recording time (if unset!) of any recordings that are associated with only one track which has a time set (should be safe!)
2. Adapt the FixTrackLength() script to also set recording time (if unset!)

What do you think about this?


nikki added a comment - 18/Nov/11 11:07 AM

That doesn't seem like it would fix the whole problem though. There would still be a problem when the times have already been set and someone edits one of them. What would happen as well if two tracks linked to a recording have a time set, but the recording itself doesn't? Would it pick one of them, average them or just ignore those?


Oliver Charles added a comment - 18/Nov/11 12:23 PM

@hrglgrmpf: I don't think there's a clear decision on what to do, I wouldn't work on this yet


Johannes Weißl added a comment - 18/Nov/11 07:00 PM

@nikki: It may not solve the whole problem, but a large part of it. Why not solve the non-controversial part first, and then the (maybe) controversial one? As for the question, I would always take the shortest time, it makes the most sense (because we merge recordings that contain silence at the beginning or end). However, like I said, I would like first solve the non-controversial part, it is easy to extend it later.

@Oliver/all: I think setting the recording time of recordings where a) time is not set and b) which have only one linked track, which has time is not disputed by anyone. Is the technical solution (modbot) ok? Should I post to style for agreement?


Oliver Charles added a comment - 12/Jan/12 10:43 PM

As this issue has been delayed, I am moving it back to be rescheduled.


PATATE12 added a comment - 28/Nov/12 09:55 PM

If it's auto-average, please set more weight to tracks with Disc ID.


nikki added a comment - 07/Jan/13 02:38 PM

To try and get some progress started on this, I've made a poll on https://wiki.musicbrainz.org/User:Nikki/Recording_lengths


nikki added a comment - 26/Feb/13 12:31 PM

Robert Kaye added a comment - 11/Mar/13 10:44 PM

Given all things being pretty much equal, I'm going to choose the easier option: median.

(Everyone knows what a median is and you don't have to look up silly things like mode and medoid. )


nikki added a comment - 12/Mar/13 05:53 AM

To clarify: That means luks's suggestion of sorted_lengths[len(sorted_lengths)/2].



PATATE12 added a comment - 06/Sep/13 01:41 PM

Sorry it seems i lost my bitbucket password.
Does it take TOC/DiscID priority in account yet, Oliver ?


Oliver Charles added a comment - 06/Sep/13 02:03 PM

TOCs are not considered in this calculation directly. Recording length is taken from the median of all track lengths.


Oliver Charles added a comment - 14/Nov/13 11:19 AM

Re-opening as this really needs to go through a proper testing period on my sandbox (which is currently in use for other testing).


Duke Yin added a comment - 26/Nov/13 08:35 PM - edited

Before Recording durations are done away with, are we running a script to copy Recording durations to Tracklists where the duration is missing? (Only for Recordings where all the tracklist durations are all unknown)

I also believe that shortest duration is a better implementation than median duration. There could be 12 printings of a a release where there's 7 minutes of silence, and only 1 printing of a release where the duration is actually correct. Since tracklists are merged with release events (meaning that we're duplicating tracklists) in the current schema, there's no way to take a meaningful median of track durations.


PATATE12 added a comment - 26/Nov/13 09:04 PM

I wouldn't like that we copy Recording durations to Tracklists where the duration is missing because i left many durations purposely empty when it did not came from a physical release in hands.

And apart that i agree about your remarks, longer tracks are often padded with silence, especially those last tracks before bonus trucks.
and also times from CD TOC should count twice more importantly in the compute.


nikki added a comment - 27/Nov/13 04:58 AM

No, we are not copying durations to tracks. The functionality in the release editor to copy the recording durations to the tracks is not going away either.

The method for selecting the duration was already discussed and decided. The median is not perfect, but neither are any of the other possibilities.


monxton added a comment - 27/Nov/13 11:05 AM

FWIW (which I know is as good as nothing) I think the chosen solution is a miserable compromise.



Ian McEwen added a comment - 15/Apr/14 05:10 AM

Merged!