Issue Details (XML | Word | Printable)

Key: MBS-4241
Type: Bug Bug
Status: Closed Closed
Resolution: Fixed
Priority: Normal Normal
Assignee: Ian McEwen
Reporter: Ian McEwen
Votes: 0
Watchers: 0
Operations

If you were logged in you would be able to see more operations.
MusicBrainz Server

msgids for translation should not use anything but ASCII

Created: 26/Jan/12 02:44 AM   Updated: 09/Jul/12 07:14 AM   Resolved: 09/Jul/12 07:14 AM
Component/s: Internationalization
Affects Version/s: None
Fix Version/s: Bug fixes, 2012-07-09

Issue Links:
Duplicate
 


 Description  « Hide

http://ml.imperia.org/libintl-perl/2005/08/16/6289725913ab0bbaa57235297ed06db9/
https://rt.cpan.org/Public/Bug/Display.html?id=49758

See also http://i18n.mbsandbox.org/ – anything with an em-dash declared as such (rather than using the HTML entity) doesn't work, likewise smart quotes; haven't looked further. Good example is the blurb starting "Like Wikipedia ..." on the front page, and the three links below that. Those are translated in en-aq (our testing pseudo-translation) and de, at the least, probably many other places.

This is a side effect of needing to use the pure-perl gettext module (Locale::gettext_pp) – it works fine with the XS version, but that one doesn't allow the language to change per-request; it caches the first request and then uses that language from there on (we think – it may be that our way of language-switching isn't doing anything, and we should use nl_putenv and LC_ALL rather than setting $ENV{LANGUAGE} ). Clearly one thing or the other needs changing; I have no clear way to tackle the XS language-caching issue (either way it plays out), hence this.



Ian McEwen added a comment - 26/Jan/12 11:20 PM

https://gist.github.com/1685724 is a diff showing all the places we currently have non-ASCII msgids (as of current mb_server.pot used on transifex as of this writing)


Ian McEwen added a comment - 31/Jan/12 08:48 PM

Assigning to Oliver since he told me to in order to solicit feedback.


Ian McEwen made changes - 31/Jan/12 08:48 PM
Field Original Value New Value
Assignee Oliver Charles [ acid2 ]
Ian McEwen added a comment - 31/Jan/12 08:52 PM

Note: this may in fact be two bugs; we should probably be using US-ASCII for msgids for the sake of ease of use for coders, regardless of the solution to the issue where non-US-ASCII msgids don't get translated by Locale::gettext_pp.


Oliver Charles added a comment - 03/Feb/12 04:11 PM

Things shouldn't be breaking, working around that is not a solution. I don't mind moving to US-ASCII for msgids, that seems logical, but accidentally copying a certain character in should not break things. I don't like the sound of munging $ENV to change the language - that sounds wrong, so I'd look into a different approach there.


Oliver Charles made changes - 03/Feb/12 04:11 PM
Assignee Oliver Charles [ acid2 ] Ian McEwen [ ianmcorvidae ]
Ian McEwen added a comment - 03/Feb/12 11:56 PM

All available i18n libraries for perl have this or other problems. It really seems like we have exactly 0 good options.

I'll defer to pronik's rant on the topic: http://rassie.org/archives/247

http://stackoverflow.com/questions/4441399/perl-utf-8-problems-with-localemaketext is also an interesting read, linking at one point to https://metacpan.org/module/Locale::Maketext::Gettext which points out how Locale::Maketext, by default, isn't multibyte safe (even for translations!). Falling back to Gettext message catalogs avoids this, but means we have to use Maketext's format, which has problems (see pronik's post) that make Gettext a better idea.

Gettext messes with the environment (either way – nl_putenv for gettext_xs (which seems to have other lurking problems) or playing with $ENV for gettext_pp, which can't do non-ascii msgids), which makes a problem with anything multithreaded (as noted in a comment on pronik's rant).

Clearly we need to write Locale::Maketext::Gettext::Maketext::Gettext::Maketext::Gettext, that passes our strings back and forth between the two as many times as possible in order to avoid all the bugs in both implementations.


Ian McEwen added a comment - 04/Feb/12 12:11 AM

We may be able to circumvent the environment problem by using dcgettext and dcngettext instead of dgettext and dngettext, ref: http://search.cpan.org/dist/libintl-perl/lib/Locale/Messages.pm


Ian McEwen made changes - 04/May/12 02:37 PM
Link This issue is duplicated by MBS-4651 [ MBS-4651 ]
Ian McEwen added a comment - 12/Jun/12 06:17 AM

Okay, so, found a solution: Locale::Util has a web_set_locale (the web_ is a bit deceptive – it just takes a list of languages and a character set) that seems to work correctly using gettext_xs, that doesn't explode with non-ASCII msgids.

I'll submit a patch once http://codereview.musicbrainz.org/r/1941/ is done.


Ian McEwen made changes - 12/Jun/12 06:31 PM
Status Open [ 1 ] Review Submitted [ 5 ]
Oliver Charles made changes - 25/Jun/12 10:49 AM
Status Review Submitted [ 5 ] In Beta Testing [ 10002 ]
Oliver Charles added a comment - 25/Jun/12 10:49 AM

This is in beta testing with the origin/split-domains branch (one merge conflict addressed since Ian's branch).


Oliver Charles made changes - 25/Jun/12 10:49 AM
Fix Version/s Bug fixes, 2012-07-09 [ 10147 ]
Oliver Charles made changes - 09/Jul/12 07:14 AM
Status In Beta Testing [ 10002 ] Closed [ 6 ]
Resolution Fixed [ 1 ]