I use cmd Windows, chcp 65001, this is my code:
print u'\u0110 \u0110' + '\n'
(a character cmd can't display) (character what i want) Traceback (most recent call last): File "b.py", line 26, in <module> print u'\u0110 \u0110' IOError: [Errno 2] No such file or directory
But, when i use this code:
print u' \u0110 \u0110' + '\n'
(a space)(charecter what i want) (character what i want) Traceback (most recent call last): File "b.py", line 26, in <module> print u' \u0110 \u0110' + '\n' IOError: [Errno 2] No such file or directory
And my question is:
- Why python 2.7 need a space when print unicode character?
- How to fix IOError: [Errno 2]
On Windows you can’t print arbitrary strings using
There are some workarounds, as shown here: How to make python 3 print() utf8. But, despite the title of that question, you can’t use this to actually print UTF-8 using code page 65001, it will repeat the last few bytes after finishing (as I described further down)
#! python2 import sys enc = sys.stdout.encoding def outputUnicode(t): bytes = t.encode(enc, 'replace') sys.stdout.write(bytes) outputUnicode(u'The letter \u0110\n')
You can change the code page of the console using
chcp to a code page which contains the characters you want to print. In your case for instance, run
These are the results on my box if I print following strings. I’m using code page 850, which is the default for English systems:
u"\u00abHello\u00bb" # "«Hello»" u"\u0110" # "Đ" u"\u4f60\u597d" # "你好" u"a\u2192b\u2192c" # "a→b→c"
The first command will work, since all characters are in code page 850. The next 3 will fail.
UnicodeEncodeError: 'charmap' codec can't encode character u'\u0110' in position 0: character maps to <undefined>
Change the code page to 852 and the second command will work.
There is an UTF-8 code page (65001) but it doesn’t work with python 2.7.
In python 3.4 the results are the same. If you change the code page to 65001 you’ll get slightly less broken behaviour.
\Python34\python.exe -c "print(u'a\u2192b\u2192c')"
The two extra characters (�c) are a consequence of non-standard behaviour in the C standard library on Windows. They’re a repeat of the last 2 bytes in the UTF-8 encoding of the string.