123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135 |
- =========================================
- NLTK Python 2.x - 3.x Compatibility Layer
- =========================================
- NLTK comes with a Python 2.x/3.x compatibility layer, nltk.compat
- (which is loosely based on `six <http://packages.python.org/six/>`_)::
- >>> from nltk import compat
- >>> compat.PY3
- False
- >>> # and so on
- @python_2_unicode_compatible
- ----------------------------
- Under Python 2.x ``__str__`` and ``__repr__`` methods must
- return bytestrings.
- ``@python_2_unicode_compatible`` decorator allows writing these methods
- in a way compatible with Python 3.x:
- 1) wrap a class with this decorator,
- 2) define ``__str__`` and ``__repr__`` methods returning unicode text
- (that's what they must return under Python 3.x),
- and they would be fixed under Python 2.x to return byte strings::
- >>> from nltk.compat import python_2_unicode_compatible
- >>> @python_2_unicode_compatible
- ... class Foo(object):
- ... def __str__(self):
- ... return u'__str__ is called'
- ... def __repr__(self):
- ... return u'__repr__ is called'
- >>> foo = Foo()
- >>> foo.__str__().__class__
- <type 'str'>
- >>> foo.__repr__().__class__
- <type 'str'>
- >>> print(foo)
- __str__ is called
- >>> foo
- __repr__ is called
- Original versions of ``__str__`` and ``__repr__`` are available as
- ``__unicode__`` and ``unicode_repr``::
- >>> foo.__unicode__().__class__
- <type 'unicode'>
- >>> foo.unicode_repr().__class__
- <type 'unicode'>
- >>> unicode(foo)
- u'__str__ is called'
- >>> foo.unicode_repr()
- u'__repr__ is called'
- There is no need to wrap a subclass with ``@python_2_unicode_compatible``
- if it doesn't override ``__str__`` and ``__repr__``::
- >>> class Bar(Foo):
- ... pass
- >>> bar = Bar()
- >>> bar.__str__().__class__
- <type 'str'>
- However, if a subclass overrides ``__str__`` or ``__repr__``,
- wrap it again::
- >>> class BadBaz(Foo):
- ... def __str__(self):
- ... return u'Baz.__str__'
- >>> baz = BadBaz()
- >>> baz.__str__().__class__ # this is incorrect!
- <type 'unicode'>
- >>> @python_2_unicode_compatible
- ... class GoodBaz(Foo):
- ... def __str__(self):
- ... return u'Baz.__str__'
- >>> baz = GoodBaz()
- >>> baz.__str__().__class__
- <type 'str'>
- >>> baz.__unicode__().__class__
- <type 'unicode'>
- Applying ``@python_2_unicode_compatible`` to a subclass
- shouldn't break methods that was not overridden::
- >>> baz.__repr__().__class__
- <type 'str'>
- >>> baz.unicode_repr().__class__
- <type 'unicode'>
- unicode_repr
- ------------
- Under Python 3.x ``repr(unicode_string)`` doesn't have a leading "u" letter.
- ``nltk.compat.unicode_repr`` function may be used instead of ``repr`` and
- ``"%r" % obj`` to make the output more consistent under Python 2.x and 3.x::
- >>> from nltk.compat import unicode_repr
- >>> print(repr(u"test"))
- u'test'
- >>> print(unicode_repr(u"test"))
- 'test'
- It may be also used to get an original unescaped repr (as unicode)
- of objects which class was fixed by ``@python_2_unicode_compatible``
- decorator::
- >>> @python_2_unicode_compatible
- ... class Foo(object):
- ... def __repr__(self):
- ... return u'<Foo: foo>'
- >>> foo = Foo()
- >>> repr(foo)
- '<Foo: foo>'
- >>> unicode_repr(foo)
- u'<Foo: foo>'
- For other objects it returns the same value as ``repr``::
- >>> unicode_repr(5)
- '5'
- It may be a good idea to use ``unicode_repr`` instead of ``%r``
- string formatting specifier inside ``__repr__`` or ``__str__``
- methods of classes fixed by ``@python_2_unicode_compatible``
- to make the output consistent between Python 2.x and 3.x.
|