stem.doctest 2.0 KB

12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576777879
  1. .. Copyright (C) 2001-2019 NLTK Project
  2. .. For license information, see LICENSE.TXT
  3. ==========
  4. Stemmers
  5. ==========
  6. Overview
  7. ~~~~~~~~
  8. Stemmers remove morphological affixes from words, leaving only the
  9. word stem.
  10. >>> from __future__ import print_function
  11. >>> from nltk.stem import *
  12. Unit tests for the Porter stemmer
  13. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  14. >>> from nltk.stem.porter import *
  15. Create a new Porter stemmer.
  16. >>> stemmer = PorterStemmer()
  17. Test the stemmer on various pluralised words.
  18. >>> plurals = ['caresses', 'flies', 'dies', 'mules', 'denied',
  19. ... 'died', 'agreed', 'owned', 'humbled', 'sized',
  20. ... 'meeting', 'stating', 'siezing', 'itemization',
  21. ... 'sensational', 'traditional', 'reference', 'colonizer',
  22. ... 'plotted']
  23. >>> singles = [stemmer.stem(plural) for plural in plurals]
  24. >>> print(' '.join(singles)) # doctest: +NORMALIZE_WHITESPACE
  25. caress fli die mule deni die agre own humbl size meet
  26. state siez item sensat tradit refer colon plot
  27. Unit tests for Snowball stemmer
  28. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  29. >>> from nltk.stem.snowball import SnowballStemmer
  30. See which languages are supported.
  31. >>> print(" ".join(SnowballStemmer.languages))
  32. arabic danish dutch english finnish french german hungarian italian
  33. norwegian porter portuguese romanian russian spanish swedish
  34. Create a new instance of a language specific subclass.
  35. >>> stemmer = SnowballStemmer("english")
  36. Stem a word.
  37. >>> print(stemmer.stem("running"))
  38. run
  39. Decide not to stem stopwords.
  40. >>> stemmer2 = SnowballStemmer("english", ignore_stopwords=True)
  41. >>> print(stemmer.stem("having"))
  42. have
  43. >>> print(stemmer2.stem("having"))
  44. having
  45. The 'english' stemmer is better than the original 'porter' stemmer.
  46. >>> print(SnowballStemmer("english").stem("generously"))
  47. generous
  48. >>> print(SnowballStemmer("porter").stem("generously"))
  49. gener
  50. .. note::
  51. Extra stemmer tests can be found in `nltk.test.unit.test_stem`.