compat.doctest 3.6 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135
  1. =========================================
  2. NLTK Python 2.x - 3.x Compatibility Layer
  3. =========================================
  4. NLTK comes with a Python 2.x/3.x compatibility layer, nltk.compat
  5. (which is loosely based on `six <http://packages.python.org/six/>`_)::
  6. >>> from nltk import compat
  7. >>> compat.PY3
  8. False
  9. >>> # and so on
  10. @python_2_unicode_compatible
  11. ----------------------------
  12. Under Python 2.x ``__str__`` and ``__repr__`` methods must
  13. return bytestrings.
  14. ``@python_2_unicode_compatible`` decorator allows writing these methods
  15. in a way compatible with Python 3.x:
  16. 1) wrap a class with this decorator,
  17. 2) define ``__str__`` and ``__repr__`` methods returning unicode text
  18. (that's what they must return under Python 3.x),
  19. and they would be fixed under Python 2.x to return byte strings::
  20. >>> from nltk.compat import python_2_unicode_compatible
  21. >>> @python_2_unicode_compatible
  22. ... class Foo(object):
  23. ... def __str__(self):
  24. ... return u'__str__ is called'
  25. ... def __repr__(self):
  26. ... return u'__repr__ is called'
  27. >>> foo = Foo()
  28. >>> foo.__str__().__class__
  29. <type 'str'>
  30. >>> foo.__repr__().__class__
  31. <type 'str'>
  32. >>> print(foo)
  33. __str__ is called
  34. >>> foo
  35. __repr__ is called
  36. Original versions of ``__str__`` and ``__repr__`` are available as
  37. ``__unicode__`` and ``unicode_repr``::
  38. >>> foo.__unicode__().__class__
  39. <type 'unicode'>
  40. >>> foo.unicode_repr().__class__
  41. <type 'unicode'>
  42. >>> unicode(foo)
  43. u'__str__ is called'
  44. >>> foo.unicode_repr()
  45. u'__repr__ is called'
  46. There is no need to wrap a subclass with ``@python_2_unicode_compatible``
  47. if it doesn't override ``__str__`` and ``__repr__``::
  48. >>> class Bar(Foo):
  49. ... pass
  50. >>> bar = Bar()
  51. >>> bar.__str__().__class__
  52. <type 'str'>
  53. However, if a subclass overrides ``__str__`` or ``__repr__``,
  54. wrap it again::
  55. >>> class BadBaz(Foo):
  56. ... def __str__(self):
  57. ... return u'Baz.__str__'
  58. >>> baz = BadBaz()
  59. >>> baz.__str__().__class__ # this is incorrect!
  60. <type 'unicode'>
  61. >>> @python_2_unicode_compatible
  62. ... class GoodBaz(Foo):
  63. ... def __str__(self):
  64. ... return u'Baz.__str__'
  65. >>> baz = GoodBaz()
  66. >>> baz.__str__().__class__
  67. <type 'str'>
  68. >>> baz.__unicode__().__class__
  69. <type 'unicode'>
  70. Applying ``@python_2_unicode_compatible`` to a subclass
  71. shouldn't break methods that was not overridden::
  72. >>> baz.__repr__().__class__
  73. <type 'str'>
  74. >>> baz.unicode_repr().__class__
  75. <type 'unicode'>
  76. unicode_repr
  77. ------------
  78. Under Python 3.x ``repr(unicode_string)`` doesn't have a leading "u" letter.
  79. ``nltk.compat.unicode_repr`` function may be used instead of ``repr`` and
  80. ``"%r" % obj`` to make the output more consistent under Python 2.x and 3.x::
  81. >>> from nltk.compat import unicode_repr
  82. >>> print(repr(u"test"))
  83. u'test'
  84. >>> print(unicode_repr(u"test"))
  85. 'test'
  86. It may be also used to get an original unescaped repr (as unicode)
  87. of objects which class was fixed by ``@python_2_unicode_compatible``
  88. decorator::
  89. >>> @python_2_unicode_compatible
  90. ... class Foo(object):
  91. ... def __repr__(self):
  92. ... return u'<Foo: foo>'
  93. >>> foo = Foo()
  94. >>> repr(foo)
  95. '<Foo: foo>'
  96. >>> unicode_repr(foo)
  97. u'<Foo: foo>'
  98. For other objects it returns the same value as ``repr``::
  99. >>> unicode_repr(5)
  100. '5'
  101. It may be a good idea to use ``unicode_repr`` instead of ``%r``
  102. string formatting specifier inside ``__repr__`` or ``__str__``
  103. methods of classes fixed by ``@python_2_unicode_compatible``
  104. to make the output consistent between Python 2.x and 3.x.