Rss订阅

首页 »Linux » pythonutf8:python 中文解决方法 gb2312 <==> utf8 »正文

pythonutf8:python 中文解决方法 gb2312 <==> utf8

来源: 发布时间:星期四, 2009年2月12日浏览:53次评论:0

包见附件
也可参考
http://[email protected]

作者: quijote

抛砖引玉

这是我以前收集整理

内容比较凌乱

也比较全面

包括windows, python2.3,pyqt. 而pygtk和thinker和pyqt类似都用unicode.

我想最好

办法是做

个库直接

gb13080编码字库.
我搜集了

个gb18030映射表 > 830k, 这样双向两个表 > 1.6 M

在win2000+sp3,python2.2

from Tk

er import *
w = Button(text="中国".decode("mbcs"), font="simhei", command='exit')
w.pack

loop

这个思路方法治标不治本
有时候

我会把

串

mbcs(GB)和unicode混淆

这个思路方法有个缺点

由于mbcs

缘故

只适用于windows系统.

个解决办法

安装
http://sourceforge.net/projects/python-codecs/
A SourceForge project working _disibledevent=>#pr

uni = unicode(gb

, "gb2312")

g

= uni.encode("gb2312")

pr

"Original gb2312 encoded

:"
pr

"Transcode to Unicode encoding:"
pr

repr(uni)
pr

"Pr

as a gb2312 encoded

:"
pr

------------------------------------------------------------
运行结果:
Original gb2312 encoded

:
大家好
Transcode to Unicode encoding:
u'\u5927\u5bb6\u597d'
Pr

as a gb2312 encoded

:
大家好
------------------------------------------------------------------------------
这个思路方法

缺点

有点麻烦(unicode(gb

, "gb2312"))

只适用gb2312,而不是gb18030编码(没有unicode<-->gb18030 table)
我搜集了

个gb18030映射表 > 830k, 这样双向两个表 > 1.6 M

优点是通用性很好,无论windows, 系统,还是
Tk

er, pyQT, pyGTK, wxpython都可以使用

---------------------------------------------------------------------------
btw,
eucgb2321, 2321？ 2312? 把我搞迷糊了 ^_^
EUCGB2321_CN 是unix下汉字编码

我原本用杜文山先生

汉化包( http://dohao.org)

可是他并不能及时更新了,
只好另想办法

python 开发人员

建议

寄件者:Martin v. Loewis ([email protected])
主旨:Re: Chinese language support of Python?

View this article _disibledevent=>9&atid=309579

which allows you to declare the source encoding for IDLE.

In either

, you cannot use Chinese in Unicode literals. Instead,
you should always use

unicode("chinese

", "chinese encoding")

For portability, and

your editors support it, I recommend to use
UTF-8 as the "chinese encoding".

Regards,
Martin

又

个例子, 在python2.3a1下可以运行
不再用 .encode("gb2312")了

看来python2.3对unicode

支持真

有很大改进
这个看来是目前最好

解决思路方法

！！！注意: 编辑器使用utf-8编码

此类文本文件

般以 FF FE 开头

在python2.2下不能运行！

经人提醒

知道可以使用windows font

exunicode.py

# -*- coding: utf-8 -*-

from Tk

er import *
w = Button(text="大家好",font=("SIMSUN",8,'bold'), command='exit')
w.pack

loop

3 PEP 263: Source Code Encodings
Python source files can now be declared as being in d

ferent character

encodings. Encodings are declared by including a specially formatted comment
in the first or second line of the source file. For example, a UTF-8 file
can be declared with:

#!/usr/bin/env python
# -*- coding: UTF-8 -*-

Without such an encoding declaration, the default encoding used is ISO-8859-1,
also known as Latin1.
The encoding declaration _disibledevent=>>>> unicode(s)
u'\u6211\u4eec'

以下是pyQt

:

from qt import QString

s="A

that contains just ASCII characters"
u=u"\u963f\u554a - a

with a few chinese characters"

qs=QString(s)
qu=QString(u)

pr

str(qs)
pr

str(qu)

输出结果:
>C:\Python22\pythonw -u unicode1.py
A

that contains just ASCII characters
阿啊 - a

with a few chinese characters
>Exit code: 0

改进

思路方法:
from qt import QString

s="A

that contains just ASCII characters"
#u=u"\u963f\u554a - a

with a few chinese characters"
u1="我们 a

with a few chinese characters"
#u=unicode(u1)
qs=QString(s)
qu=QString(unicode("我们--a

with a few chines" ))

pr

str(qs)
pr

str(qu)

输出结果:
>C:\Python22\pythonw -u unicode1.py
A

that contains just ASCII characters
我们--a

with a few chines
>Exit code: 0

另外

使用qt designer设计界面

生成*.ui文件

此文件为utf-8格式
利用python目录下qtuic.exe转换成python

另外

Wenshan兄

补丁中

不知为什么

好像缺少sys.

appdefaultencoding

?

附录:
Python 多字节

支持补充包(MBCSP) 1.0

MBCSP是针对最新

python 2.2.1 提供

多字节

支持补充包

目

在于彻底解决Python里边

多字节

显示问题

原有

Python里边在处理中文、韩文或日文等多字节

时

常常显示不正常

你会经常看到类似于"\xc4\xe3\xba\xc3"这样

尤其是处理数据库时

经常看到这样

使得观察结果显得很不方便

尽管不是

操作

我对Python2.2.1

源文件进行了编辑处理

形成了MBCSP 1.0.它完全兼容Python2.2.1

对其

处理能力进行了加强

MBCSP

安装思路方法有两种

都要求你先安装Python2.2.1.如果你想运行安装

可以mbcsp100-py221.exe

只要按照其中

步骤

步

步执行完就可以了

第 2种思路方法分为 3步进行

如下:

1、下载 python22.dll

替换原来

同名文件

般位于安装目录里边

system/system32文件夹里边

替换完成后

运行python

你会看到窗口上方增加了

行文字:"With MultiByte Character Surport Surplied by dohao.org"这表示你

python已经开始支持多字节

了

2、下载 site.py

替换python安装目录\lib里边

同名文件

这是为了在

些应用

里边支持多字节

例如IDLE.

3、如果你经常使用IDLE

下载OutputWindow.py

IOBinding.py

替换Python安装目录\tools\idle里边

同名文件

这样

当你使用IDLE时就会正常显示多字节

了

注意

安装后

在Tk

er里边这样显示汉字:Tk

er.Label(text=unicode("中文汉字"))

以上

文件是针对系统

当你安装完成后

就可以用多字节

给你

变量名称、类名称、

名称等命名了

当你显示数据库里边

多字节

时

就会显示正常了

如果你需要针对系统

文件

或者是python 2.1或更早

版本

请告诉我

我将在这里加进来

新:MBCSP100-py213.zip英文版

标签：utf8gb2312asp utf8转gb2312 gb2312utf8 pythonutf8

下载文章的 PDF文档电子版离线看

我顶

专注于互联网--专注于架构

首页 »Linux » pythonutf8:python 中文解决方法 gb2312 <==> utf8 »正文

pythonutf8:python 中文解决方法 gb2312 <==> utf8

相关文章

读者评论

发表评论

热门标签

精华推荐

最新标签

Dig排行

阅读排行

最新文章