Tests of Unicode file encodings and Japanese/English display
I saved a file containing Japanese and English text with different charsets and file save types. All files have lang=ja set in the html tag. I tested them on (the browser formerly known as) Chimera 0.6, Safari beta v62 and MSIE 5.2 (all for Mac). Here are the results Sorry this is so ugly!
- divtest4.html - charset=iso-8859-1, file format=standard
- layout and English display ok, Japanese only displays if your browser's default text encoding is Shift-JIS or Japanese MacOS.
- divtest5.html - charset=utf-8, fle format=standard
- as above, but a different kind of mojibake (garbled text for the Japanese)
- divtest6.html - charset=iso-8859-1, file format=Unicode, including a Byte Order Mark and with Mac line break characters
- displays fine on Chimera and Safari, but MSIE displays the raw html code. Its interesting that the Japanese displays considering the charset - it looks like the file format is 'stronger'
- divtest7.html - charset=utf-8, file format=Unicode, + BOM/Mac line breaks
- as above
- divtest8.html - charset=utf-8, file format=Unicode, + BOM/Unix line breaks
- as above
- divtest9.html - charset=utf-8, file format=Unicode, + BOM/DOS line breaks
- as above
- divtesta.html - charset=utf-8, file format=Unicode, + BOM/Unix line breaks, plus utf-8 encoding
- both types of text and the layout now display in MSIE, however there's an additional line at the top of the file containing a Euro symbol (€). Safari also displays a blank line (about 10px extra space over divtest9.html). The body tag is becide the content div - there's no whitespace character there. (from the BBEdit manual "UTF-8 encoding is a more compact variant of Unicode that uses 8-bit tokens where possible to encode frequently-used sequences from the file. (This format makes it easier to view and edit content in non-Unicode-aware editors.)")