paice.doctest 1.2 KB

123456789101112131415161718192021222324252627282930313233343536
  1. =====================================================
  2. PAICE's evaluation statistics for stemming algorithms
  3. =====================================================
  4. Given a list of words with their real lemmas and stems according to stemming algorithm under evaluation,
  5. counts Understemming Index (UI), Overstemming Index (OI), Stemming Weight (SW) and Error-rate relative to truncation (ERRT).
  6. >>> from nltk.metrics import Paice
  7. -------------------------------------
  8. Understemming and Overstemming values
  9. -------------------------------------
  10. >>> lemmas = {'kneel': ['kneel', 'knelt'],
  11. ... 'range': ['range', 'ranged'],
  12. ... 'ring': ['ring', 'rang', 'rung']}
  13. >>> stems = {'kneel': ['kneel'],
  14. ... 'knelt': ['knelt'],
  15. ... 'rang': ['rang', 'range', 'ranged'],
  16. ... 'ring': ['ring'],
  17. ... 'rung': ['rung']}
  18. >>> p = Paice(lemmas, stems)
  19. >>> p.gumt, p.gdmt, p.gwmt, p.gdnt
  20. (4.0, 5.0, 2.0, 16.0)
  21. >>> p.ui, p.oi, p.sw
  22. (0.8..., 0.125..., 0.15625...)
  23. >>> p.errt
  24. 1.0
  25. >>> [('{0:.3f}'.format(a), '{0:.3f}'.format(b)) for a, b in p.coords]
  26. [('0.000', '1.000'), ('0.000', '0.375'), ('0.600', '0.125'), ('0.800', '0.125')]