Currently there are no charset in content-type http response header. It is bad.
This is causes problems with no-ascii characters: http://forum.openstreetmap.org/viewforum.php?id=21
different message in different encodings (koi8-r, cp1251, utf-8), because of this problem, this forum almost unusable…
Actually, my point wasn’t that I have a problem with it, and I would never demand something like that from people working on this worthy project. I just thought that if I had problems with it, surely other names with more non-standard letters must be very problematic.
I apologize if it was taken as a demand.
The simplest (but may be not best) way to fix this problem is to add in php.ini line
default_charset= “utf-8”
bat it may have side effects for other sites, which uses same php.ini (in this case php_value default_charset “utf-8” may be added in apache’s httpd.conf only for forum)
I think the reason is that people post to the html form with bad encodings, i.e. normally punbb will see that you are using e.g. utf-8 and convert that to html entities and latin-1
if I paste russian into a form it works ok for me by default
if I change the page encoding to utf-8 I 'll get jibberish…
I think I made the proper changes to get the UTF8 encoding working. At least I see the username of Morten Juhl-Johansen Zölde-Fejér correctly now. Unfortunately, some existing posts appear to be unreadable (like in this thread). I have no idea what to do about this.
A move to another forum software version is still planned because this forum version is not designed to work properly with UTF8.
Right, the hosting provider has processed my ticket requesting a MySQL 5 database, and I’ve already confirmed UTF8 actually working on that forum. So the next thing is to copy the forum, confirm that everything is working and make the forum migration permanent.
This will require the forum to become read-only for a while though.
Using notepad++ some garbled posts were fixed alright, but others became garbled even when they were perfectly readable in the first place. So there’s no simple solution.
Luckily, I found some post about a guy who had exactly the same problem as I do: The forum database has some characters in utf8, some characters in UTF-8, some in the database as HTML equivalents (& # 20998 ; ) and some characters that are just a total mystery.
So, it looks like I need to close down the forum for a few days to sort things out for good. The other guy writes that he had a lot of manual fixes to do besides some automated fixed. I think I’m only going to perform those changes that can be done automatically, which should cover most of things anyway.
Moving the forum to a new database has begun. Posting replies won’t be possible during the move. I’m currently busy correcting every post in non-utf8 charset that has been posted the last few days (from 21 february on). Russia, Finland, etc have finished, but I’m still busy processing Germany. Please be patient…
The forum migration is complete. Most messages could successfully be migrated, however a few still contain weird characters. I hope the migration wasn’t too inconvenient.
The forum website has been adapted to do all it’s queries using UTF-8 and the database table have been changed to use utf8_unicode_ci collation by default instead of utf8_swedish (which is why I needed the forum to move to a new database because the old database did not have the UTF-8 charset).
All data has been reimported and manually converted to the correct UTF-8 charset when needed (especially, most posts since 21 february needed manual treatment).
Anyway, I think the forum now fully suports UTF-8. Enjoy!