copied from http://bugs.musicbrainz.org/ticket/4396
It would be really really nice if lines in the chat logs which are actually UTF-8 were displayed as such. At the moment all the lines are converted from latin1, it would be better if only lines which aren't valid UTF-8 are converted.
UTF-8 displaying as complete gibberish: http://chatlogs.musicbrainz.org/2009/2009-01/2009-01-09.html#T01-01-59-679352
But now this problem is already solved, isn't it ?
I don't see any more garbage text in the chatlogs. Do you ?
Looking at the Trac ticket, it seems to have been left open because the archives didn't get converted. If we're not going to convert the archives, then it's fixed. If we are, then it's not.
re trac: "If anyone has any suggestions for how to auto-decode byte-soup using the correct encoding (and thence to utf-8), then this can be fixed more fully." – mozilla's chardet might do this for us.
Also, a cleanup script should definitely be run on the archives. If someone can point me at where to look for this I can look into it.