|
As this ticket has open points, I'm moving this to decision required. As per http://blog.musicbrainz.org/?p=1301 I'm not aware of any objections to being able to use ISO 639-3 codes. This ticket has a number of votes and there have been a number of requests in other places for languages only available in ISO 639-3. My proposal is: The list of languages we use should be based on ISO 639-3. This would affect release languages and, if implemented, work lyrics languages (see Addressing point 1: Addressing point 2: Addressing point 3: We should also make sure the releases on http://musicbrainz.org/search?query=lang%3A%28bih+him+afa+alg+apa+ath+aus+bad+bai+bat+ber+bnt+btk+cai+cau+cel+cmc+col+cpe+cpf+cpp+crp+cus+day+dra+fiu+gem+ijo+inc+ine+ira+iro+kar+khi+kro+map+mkh+mno+mun+myn+nah+nai+nic+nub+oto+paa+phi+pra+roa+sai+sal+sem+sgn+sio+sit+sla+smi+son+ssa+tai+tup+tut+wak+wen+ypk+znd%29&type=release&limit=25&advanced=1 Then we should set the frequency of those codes to 0 (although I think other than 'art', they are set to 0 already). I presume edits store old data and therefore we wouldn't want to remove those rows. Notes: 2. I assume we would implement this as a new column. The existing column names are confusing though, as they're based on the number of letters, not the ISO version, perhaps that's something we should fix if we're doing a schema change anyway. I'm not sure it makes sense to keep the old ISO 639-2 columns if we're not using them anywhere; as far as I know we aren't using iso_code_2 or iso_code_3b now, only iso_code_3t. iso_code_2 we only have 186 of in the DB anyway (of 485 languages). Pending your and developer approval, I'd say changing the language schema to (id serial primary key, iso_code char(3) not null, name varchar(100) not null, frequency int not null default 0) makes more sense. This also makes the language and script tables more alike (script has iso_number as well; not sure what that is/if it's used). Pardon, looks like iso_code_3b is used by /ws/1 – so we need a plan there. How would /ws/2 deal with this change, also? added schema change component /ws/1 shouldn't even be using 2/B, I've entered I don't think we can remove the 639-2/T codes. As I said: "I presume edits store old data and therefore we wouldn't want to remove those rows.". I had a look at http://musicbrainz.org/edit/8144388 {"language_id":"91","script_id":"28","old":[{"release_name":"Bwarouz","language_id":"0","release_ids":["612553"],"script_id":"0"}]}
so the name in the edit is coming from id 91 in the language table. The edits don't use ISO codes though, so it shouldn't be a problem for displaying the name. We just need to make sure current data can be displayed properly when ISO codes are used, e.g. by the WS, etc. My suggestion of implementing MBS-4342 so that Picard can use it to figure out which codes are valid would also rely on both 639-2 and 639-3 codes to be present and returned. If we use the ISO versions for naming the columns, i.e. iso_code_1, iso_code_2t, iso_code_2b and iso_code_3, it would be clearer. I'm not sure about removing the iso_code_2 and iso_code_3b columns. I just have a feeling that we might find them useful later on when we do more work on i18n. It would be really annoying if we remove them only to realise a few months later that we need a mapping between 639-1 and 639-3. This ticket is under consideration for the May 15h schema change release. However, it is under specified at this point in time. In order to keep this ticket in consideration for the release, please do the following:
User interface changes: No visible changes. Templates which display the language ISO code should be updated to use the value from the new column. Database changes: Add one column, "iso_code_3", to the language table. Ideally we would also rename the existing columns because they're misleadingly named (see my previous comment), but it is not a requirement. So ISO 629-3 is essentially a superset of ISO 629-2. Im not sure it is really an issue that TLAN specifically says ISO-639-2 because IS0-639-3 just didnt exist when the spec The webservice (lookup and search) also need updating to read from the new column, unsure as to whether it should always return the IS0-639-3 instead of the ISO-639-2 value or both. Do I understand correctly that the suggestion is to have a single language concept that is used both for releases (written language of track list) and works (vocal language of lyrics)? It's a bit tricky to say that a track list is in a specific dialect, are we sure that all ISO 629-3 codes actually make sense when applied to text? In other words, should we have a separate set of languages to represent text and audio? Yes, that's correct. I think the vast majority of ISO 639-3 codes would make sense for text at some point (I assume you're thinking of Chinese, in which case I would point out things like http://musicbrainz.org/release-group/5214f8d2-138a-333f-b50f-814a61856934 If we do end up wanting to completely prevent people from using some codes for some situations, I don't think it would make sense to use separate lists, since the majority of the list would be identical. Instead I would just adjust the (poorly named) "frequency" column we already use to determine when languages are shown. OK, I guess we can deal with the problem when it occurs, by blacklisting certain combinations, or something. So far, it seems the suggestion is: ALTER TABLE language ADD COLUMN iso_code_3 CHAR(3) NOT NULL; I'm in favour of renaming the other columns at the same time, could you please provide a mapping of old column name to new column name? Does this ticket require any UI changes? Not that I can see - same list, more contents. As mentioned in nikki's comment I think the changes to our web services need to be discussed, however. Are we adding a new element/attribute for this code, or replacing the current iso_code_3t (which is really 2t) value currently shown with iso 639-3? I discussed the question raised by ocharles in the previous comment with nikki and ianmcorvidae on irc. We concluded that we should just use the code from the new iso_code_3 column in the existing <language> element of the xml. Only 67 Part 2/T codes have no equivalent value in Part 3. These languages will remain in the database with NULL in the iso_code_3 column. As I implemented these changes the webservice will emit the Part 2/T code if the Part 3 code is NULL, to avoid crashing. After the affected releases have been migrated we may want to remove these codes from the table, so that the iso_code_3 column can get NOT NULL and UNIQUE constraints – after that change the fallback to Part 2/T can be removed from the webservice code. Can I request the ISO 639-3 code "ksh" to be activated for the schema change release? It's the dialect from Cologne, Germany and there are lots of artists with releases in that dialect, e.g.: |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
http://chatlogs.musicbrainz.org/musicbrainz-devel/2011/2011-12/2011-12-05.html