text encoding on forum - please swith to utf-8

Please make this forum UTF-8 clean.

Currently there are no charset in content-type http response header. It is bad.

This is causes problems with no-ascii characters:
http://forum.openstreetmap.org/viewforum.php?id=21
different message in different encodings (koi8-r, cp1251, utf-8), because of this problem, this forum almost unusable…

It certainly doesn’t do much for my name.
Surely it must be quite a problem here?

Not sure how easy this is with the forum. I’ll try to make some time free for it, but have patience. I’m quite busy lately.

Hmm, the forum must change because your username is presented somwehat garbled. Mmmmkay… :confused:

Actually, my point wasn’t that I have a problem with it, and I would never demand something like that from people working on this worthy project. I just thought that if I had problems with it, surely other names with more non-standard letters must be very problematic.
I apologize if it was taken as a demand.

I guess I misinterpreted your post, so I apologize for my strong reaction too

I would appreciate it if someone has some good tips/links for adding utf-8 support to PunBB.

Never dealed with PunBB, but this might be a good starting point: http://www.punres.org/viewtopic.php?id=4393

The simplest (but may be not best) way to fix this problem is to add in php.ini line

default_charset= “utf-8”

bat it may have side effects for other sites, which uses same php.ini (in this case php_value default_charset “utf-8” may be added in apache’s httpd.conf only for forum)

I think the reason is that people post to the html form with bad encodings, i.e. normally punbb will see that you are using e.g. utf-8 and convert that to html entities and latin-1

  1. if I paste russian into a form it works ok for me by default
  2. if I change the page encoding to utf-8 I 'll get jibberish…

Sorry no more time for tests.

there is one thread that has problem
http://forum.openstreetmap.org/viewtopic.php?id=1375

The web page has the header:

And encodes utf-8 character in the russian forum as this:
берего ← These should be html entities :slight_smile: something like: & # 12123 ;

What about UTF8? Any updates?

I think I made the proper changes to get the UTF8 encoding working. At least I see the username of Morten Juhl-Johansen Zölde-Fejér correctly now. Unfortunately, some existing posts appear to be unreadable (like in this thread). I have no idea what to do about this.

A move to another forum software version is still planned because this forum version is not designed to work properly with UTF8.

Right, the hosting provider has processed my ticket requesting a MySQL 5 database, and I’ve already confirmed UTF8 actually working on that forum. So the next thing is to copy the forum, confirm that everything is working and make the forum migration permanent.

This will require the forum to become read-only for a while though.

Perhaps your provider can assist you with the conversion of older data to UTF-8? He might have experience with this.

Well, it appears that notepad++ has some nice functions which assist in re-encoding the posts… not completed my tests yet though.

Using notepad++ some garbled posts were fixed alright, but others became garbled even when they were perfectly readable in the first place. So there’s no simple solution.

Luckily, I found some post about a guy who had exactly the same problem as I do: The forum database has some characters in utf8, some characters in UTF-8, some in the database as HTML equivalents (& # 20998 ; ) and some characters that are just a total mystery.

http://www.oreillynet.com/onlamp/blog/2006/01/turning_mysql_data_in_utf8_t.html

Unfortunately, the encoding is wrong in all sorts of tables and fields, e.g. the osm_users table shows for one user:

username = Morten Juhl-Johansen Zölde-Fejér
realname = Morten Juhl-Johansen Zölde-Fejér

So, it looks like I need to close down the forum for a few days to sort things out for good. The other guy writes that he had a lot of manual fixes to do besides some automated fixed. I think I’m only going to perform those changes that can be done automatically, which should cover most of things anyway.

Moving the forum to a new database has begun. Posting replies won’t be possible during the move. I’m currently busy correcting every post in non-utf8 charset that has been posted the last few days (from 21 february on). Russia, Finland, etc have finished, but I’m still busy processing Germany. Please be patient…

The forum migration is complete. Most messages could successfully be migrated, however a few still contain weird characters. I hope the migration wasn’t too inconvenient.

The forum website has been adapted to do all it’s queries using UTF-8 and the database table have been changed to use utf8_unicode_ci collation by default instead of utf8_swedish (which is why I needed the forum to move to a new database because the old database did not have the UTF-8 charset).

All data has been reimported and manually converted to the correct UTF-8 charset when needed (especially, most posts since 21 february needed manual treatment).

Anyway, I think the forum now fully suports UTF-8. Enjoy!

Good work, but I’m going to miss the Swedish tables, or smorgasbords as we call them … :smiley:

That is great - thank you.