|
| Detecting UTF-8 correctly *solved* | |
|
| | |
Site Admin
Joined: 13 Jul 2003 |
Posts: 8344 |
|
|
|
Posted: Tue Mar 07, 2006 7:47 am |
|
|
|
|
|
I would recommend you to split mixed file(s) into different files. Each with its own encoding.
Quote: | UTF-8 encoded files are not detected as such |
PhpED does not attept to "detect" anything. It just follow your instructions. If you set Default system encoding, it will be used.
Quote: | PHPEd does not add a BOM |
BOM stands for Byte Order Mark and it has no relations to single-byte encodings like ISO-8859-1 or UTF-8.
If you need BOM'ed encodings, use UTF-16 family which has LE (little endian) and BE (big endian).
|
|
|
| | |
Joined: 08 Mar 2006 |
Posts: 63 |
|
|
|
Posted: Tue Mar 07, 2006 8:18 am |
|
|
|
|
|
ddmitrie wrote: | I would recommend you to split mixed file(s) into different files. Each with its own encoding. |
It seems we have a serious misunderstanding here. Of course I am not trying to use different encodings in the same file. That is obviously impossible.
ddmitrie wrote: | Quote: | UTF-8 encoded files are not detected as such |
PhpED does not attept to "detect" anything. It just follow your instructions. If you set Default system encoding, it will be used. |
Well, if that is the case, how can PHPEd know that a file is UTF-8 after it has been saved as such (from within PHPEd)? It is still just a bunch of bytes.
ddmitrie wrote: | Quote: | PHPEd does not add a BOM |
BOM stands for Byte Order Mark and it has no relations to single-byte encodings like ISO-8859-1 or UTF-8.
If you need BOM'ed encodings, use UTF-16 family which has LE (little endian) and BE (big endian). |
No, I know that. I just mentioned that PHPEd doesn't add a BOM (which certainly can be used for UTF-8 files) to show that the file content hasn't changed at all, but still my UTF-8 files were not detected as such until after I saved them from within PHPEd.
|
|
|
| | |
Site Admin
Joined: 13 Jul 2003 |
Posts: 8344 |
|
|
|
Posted: Tue Mar 07, 2006 8:48 am |
|
|
|
|
|
Quote: | Well, if that is the case, how can PHPEd know that a file is UTF-8 after it has been saved as such (from within PHPEd)? It is still just a bunch of bytes |
PhpED remembered that a different encoding (utf-8 ) was selected when you saved the file.
Next time when you open the file, it applies this encondig instead of the "system default".
Quote: | No, I know that. I just mentioned that PHPEd doesn't add a BOM (which certainly can be used for UTF-8 files) |
Mostly BOM make sense for encodings that use 2 or 4 bytes per symbol and while UTF-8 is single byte encoding, BOM's usage in unknown to me.
For example, many XML files are utf-8 encoded. Have you ever seen any BOMs in them?
|
|
|
| | |
Joined: 08 Mar 2006 |
Posts: 63 |
|
|
|
Posted: Tue Mar 07, 2006 8:59 am |
|
|
|
|
|
ddmitrie wrote: | Quote: | Well, if that is the case, how can PHPEd know that a file is UTF-8 after it has been saved as such (from within PHPEd)? It is still just a bunch of bytes |
PhpED remembered that a different encoding (utf-8 ) was selected when you saved the file.
Next time when you open the file, it applies this encondig instead of the "system default". |
OK, so the answer to my question is, no, PHPEdit doesn't know which files are UTF-8 encoded until they have been saved as such by the program itself. That is pretty inconvenient. Other editors can usually detect the encoding using some suitable heuristics (such as detecting a BOM mark).
ddmitrie wrote: | Quote: | No, I know that. I just mentioned that PHPEd doesn't add a BOM (which certainly can be used for UTF-8 files) |
Mostly BOM make sense for encodings that use 2 or 4 bytes per symbol and while UTF-8 is single byte encoding, BOM's usage in unknown to me.
For example, many XML files are utf-8 encoded. Have you ever seen any BOMs in them? |
Information about Byte Order Marks: http://www.unicode.org/faq/utf_bom.html#25
|
|
|
| | |
Site Admin
Joined: 13 Jul 2003 |
Posts: 8344 |
|
|
|
Posted: Tue Mar 07, 2006 9:09 am |
|
|
|
|
|
First, phped. It's phped, not phpedit
And it does not use BOM, truth.
BOM is really rarely used so I persoanally do not think it's a big deal at all.
If you have an UTF8 file, just open it as UTF8 (select Utf8 in appropriate combo in File Open dialog) and it will work fine.
Regarding "suitable heuristics", do they work stable and return correct results in all cases?
|
|
Site Admin
Joined: 13 Jul 2003 |
Posts: 8344 |
|
|
|
Posted: Tue Mar 07, 2006 1:43 pm |
|
|
|
|
|
fileenc.cfg contains all the encodings for files you opened. This file is an XML and you may change it directly.
Quote: | And as to heuristics; I would say this is a pretty good indicator, don't you agree?
<?xml version="1.0" encoding="utf-8"?> |
No doubts but in case of php it would never work
|
|
Site Admin
Joined: 13 Jul 2003 |
Posts: 8344 |
|
|
|
Posted: Wed Mar 08, 2006 6:26 pm |
|
|
|
|
|
we released build 4510.
now it recognizes BOM, encoding for xml and html
thanks for pointing out to the problem.
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
All times are GMT - 5 Hours
Page 1 of 1
|
|
|
| |