discourse.doctest 17 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494495496497498499500501502503504505506507508509510511512513514515516517518519520521522523524525526527528529530531532533534535536537538539540541542543544545546547
  1. .. Copyright (C) 2001-2019 NLTK Project
  2. .. For license information, see LICENSE.TXT
  3. ==================
  4. Discourse Checking
  5. ==================
  6. >>> from nltk import *
  7. >>> from nltk.sem import logic
  8. >>> logic._counter._value = 0
  9. Introduction
  10. ============
  11. The NLTK discourse module makes it possible to test consistency and
  12. redundancy of simple discourses, using theorem-proving and
  13. model-building from `nltk.inference`.
  14. The ``DiscourseTester`` constructor takes a list of sentences as a
  15. parameter.
  16. >>> dt = DiscourseTester(['a boxer walks', 'every boxer chases a girl'])
  17. The ``DiscourseTester`` parses each sentence into a list of logical
  18. forms. Once we have created ``DiscourseTester`` object, we can
  19. inspect various properties of the discourse. First off, we might want
  20. to double-check what sentences are currently stored as the discourse.
  21. >>> dt.sentences()
  22. s0: a boxer walks
  23. s1: every boxer chases a girl
  24. As you will see, each sentence receives an identifier `s`\ :subscript:`i`.
  25. We might also want to check what grammar the ``DiscourseTester`` is
  26. using (by default, ``book_grammars/discourse.fcfg``):
  27. >>> dt.grammar() # doctest: +ELLIPSIS
  28. % start S
  29. # Grammar Rules
  30. S[SEM = <app(?subj,?vp)>] -> NP[NUM=?n,SEM=?subj] VP[NUM=?n,SEM=?vp]
  31. NP[NUM=?n,SEM=<app(?det,?nom)> ] -> Det[NUM=?n,SEM=?det] Nom[NUM=?n,SEM=?nom]
  32. NP[LOC=?l,NUM=?n,SEM=?np] -> PropN[LOC=?l,NUM=?n,SEM=?np]
  33. ...
  34. A different grammar can be invoked by using the optional ``gramfile``
  35. parameter when a ``DiscourseTester`` object is created.
  36. Readings and Threads
  37. ====================
  38. Depending on
  39. the grammar used, we may find some sentences have more than one
  40. logical form. To check this, use the ``readings()`` method. Given a
  41. sentence identifier of the form `s`\ :subscript:`i`, each reading of
  42. that sentence is given an identifier `s`\ :sub:`i`-`r`\ :sub:`j`.
  43. >>> dt.readings()
  44. <BLANKLINE>
  45. s0 readings:
  46. <BLANKLINE>
  47. s0-r0: exists z1.(boxer(z1) & walk(z1))
  48. s0-r1: exists z1.(boxerdog(z1) & walk(z1))
  49. <BLANKLINE>
  50. s1 readings:
  51. <BLANKLINE>
  52. s1-r0: all z2.(boxer(z2) -> exists z3.(girl(z3) & chase(z2,z3)))
  53. s1-r1: all z1.(boxerdog(z1) -> exists z2.(girl(z2) & chase(z1,z2)))
  54. In this case, the only source of ambiguity lies in the word *boxer*,
  55. which receives two translations: ``boxer`` and ``boxerdog``. The
  56. intention is that one of these corresponds to the ``person`` sense and
  57. one to the ``dog`` sense. In principle, we would also expect to see a
  58. quantifier scope ambiguity in ``s1``. However, the simple grammar we
  59. are using, namely `sem4.fcfg <sem4.fcfg>`_, doesn't support quantifier
  60. scope ambiguity.
  61. We can also investigate the readings of a specific sentence:
  62. >>> dt.readings('a boxer walks')
  63. The sentence 'a boxer walks' has these readings:
  64. exists x.(boxer(x) & walk(x))
  65. exists x.(boxerdog(x) & walk(x))
  66. Given that each sentence is two-ways ambiguous, we potentially have
  67. four different discourse 'threads', taking all combinations of
  68. readings. To see these, specify the ``threaded=True`` parameter on
  69. the ``readings()`` method. Again, each thread is assigned an
  70. identifier of the form `d`\ :sub:`i`. Following the identifier is a
  71. list of the readings that constitute that thread.
  72. >>> dt.readings(threaded=True) # doctest: +NORMALIZE_WHITESPACE
  73. d0: ['s0-r0', 's1-r0']
  74. d1: ['s0-r0', 's1-r1']
  75. d2: ['s0-r1', 's1-r0']
  76. d3: ['s0-r1', 's1-r1']
  77. Of course, this simple-minded approach doesn't scale: a discourse with, say, three
  78. sentences, each of which has 3 readings, will generate 27 different
  79. threads. It is an interesting exercise to consider how to manage
  80. discourse ambiguity more efficiently.
  81. Checking Consistency
  82. ====================
  83. Now, we can check whether some or all of the discourse threads are
  84. consistent, using the ``models()`` method. With no parameter, this
  85. method will try to find a model for every discourse thread in the
  86. current discourse. However, we can also specify just one thread, say ``d1``.
  87. >>> dt.models('d1')
  88. --------------------------------------------------------------------------------
  89. Model for Discourse Thread d1
  90. --------------------------------------------------------------------------------
  91. % number = 1
  92. % seconds = 0
  93. <BLANKLINE>
  94. % Interpretation of size 2
  95. <BLANKLINE>
  96. c1 = 0.
  97. <BLANKLINE>
  98. f1(0) = 0.
  99. f1(1) = 0.
  100. <BLANKLINE>
  101. boxer(0).
  102. - boxer(1).
  103. <BLANKLINE>
  104. - boxerdog(0).
  105. - boxerdog(1).
  106. <BLANKLINE>
  107. - girl(0).
  108. - girl(1).
  109. <BLANKLINE>
  110. walk(0).
  111. - walk(1).
  112. <BLANKLINE>
  113. - chase(0,0).
  114. - chase(0,1).
  115. - chase(1,0).
  116. - chase(1,1).
  117. <BLANKLINE>
  118. Consistent discourse: d1 ['s0-r0', 's1-r1']:
  119. s0-r0: exists z1.(boxer(z1) & walk(z1))
  120. s1-r1: all z1.(boxerdog(z1) -> exists z2.(girl(z2) & chase(z1,z2)))
  121. <BLANKLINE>
  122. There are various formats for rendering **Mace4** models --- here,
  123. we have used the 'cooked' format (which is intended to be
  124. human-readable). There are a number of points to note.
  125. #. The entities in the domain are all treated as non-negative
  126. integers. In this case, there are only two entities, ``0`` and
  127. ``1``.
  128. #. The ``-`` symbol indicates negation. So ``0`` is the only
  129. ``boxerdog`` and the only thing that ``walk``\ s. Nothing is a
  130. ``boxer``, or a ``girl`` or in the ``chase`` relation. Thus the
  131. universal sentence is vacuously true.
  132. #. ``c1`` is an introduced constant that denotes ``0``.
  133. #. ``f1`` is a Skolem function, but it plays no significant role in
  134. this model.
  135. We might want to now add another sentence to the discourse, and there
  136. is method ``add_sentence()`` for doing just this.
  137. >>> dt.add_sentence('John is a boxer')
  138. >>> dt.sentences()
  139. s0: a boxer walks
  140. s1: every boxer chases a girl
  141. s2: John is a boxer
  142. We can now test all the properties as before; here, we just show a
  143. couple of them.
  144. >>> dt.readings()
  145. <BLANKLINE>
  146. s0 readings:
  147. <BLANKLINE>
  148. s0-r0: exists z1.(boxer(z1) & walk(z1))
  149. s0-r1: exists z1.(boxerdog(z1) & walk(z1))
  150. <BLANKLINE>
  151. s1 readings:
  152. <BLANKLINE>
  153. s1-r0: all z1.(boxer(z1) -> exists z2.(girl(z2) & chase(z1,z2)))
  154. s1-r1: all z1.(boxerdog(z1) -> exists z2.(girl(z2) & chase(z1,z2)))
  155. <BLANKLINE>
  156. s2 readings:
  157. <BLANKLINE>
  158. s2-r0: boxer(John)
  159. s2-r1: boxerdog(John)
  160. >>> dt.readings(threaded=True) # doctest: +NORMALIZE_WHITESPACE
  161. d0: ['s0-r0', 's1-r0', 's2-r0']
  162. d1: ['s0-r0', 's1-r0', 's2-r1']
  163. d2: ['s0-r0', 's1-r1', 's2-r0']
  164. d3: ['s0-r0', 's1-r1', 's2-r1']
  165. d4: ['s0-r1', 's1-r0', 's2-r0']
  166. d5: ['s0-r1', 's1-r0', 's2-r1']
  167. d6: ['s0-r1', 's1-r1', 's2-r0']
  168. d7: ['s0-r1', 's1-r1', 's2-r1']
  169. If you are interested in a particular thread, the ``expand_threads()``
  170. method will remind you of what readings it consists of:
  171. >>> thread = dt.expand_threads('d1')
  172. >>> for rid, reading in thread:
  173. ... print(rid, str(reading.normalize()))
  174. s0-r0 exists z1.(boxer(z1) & walk(z1))
  175. s1-r0 all z1.(boxer(z1) -> exists z2.(girl(z2) & chase(z1,z2)))
  176. s2-r1 boxerdog(John)
  177. Suppose we have already defined a discourse, as follows:
  178. >>> dt = DiscourseTester(['A student dances', 'Every student is a person'])
  179. Now, when we add a new sentence, is it consistent with what we already
  180. have? The `` consistchk=True`` parameter of ``add_sentence()`` allows
  181. us to check:
  182. >>> dt.add_sentence('No person dances', consistchk=True)
  183. Inconsistent discourse: d0 ['s0-r0', 's1-r0', 's2-r0']:
  184. s0-r0: exists z1.(student(z1) & dance(z1))
  185. s1-r0: all z1.(student(z1) -> person(z1))
  186. s2-r0: -exists z1.(person(z1) & dance(z1))
  187. <BLANKLINE>
  188. >>> dt.readings()
  189. <BLANKLINE>
  190. s0 readings:
  191. <BLANKLINE>
  192. s0-r0: exists z1.(student(z1) & dance(z1))
  193. <BLANKLINE>
  194. s1 readings:
  195. <BLANKLINE>
  196. s1-r0: all z1.(student(z1) -> person(z1))
  197. <BLANKLINE>
  198. s2 readings:
  199. <BLANKLINE>
  200. s2-r0: -exists z1.(person(z1) & dance(z1))
  201. So let's retract the inconsistent sentence:
  202. >>> dt.retract_sentence('No person dances', verbose=True) # doctest: +NORMALIZE_WHITESPACE
  203. Current sentences are
  204. s0: A student dances
  205. s1: Every student is a person
  206. We can now verify that result is consistent.
  207. >>> dt.models()
  208. --------------------------------------------------------------------------------
  209. Model for Discourse Thread d0
  210. --------------------------------------------------------------------------------
  211. % number = 1
  212. % seconds = 0
  213. <BLANKLINE>
  214. % Interpretation of size 2
  215. <BLANKLINE>
  216. c1 = 0.
  217. <BLANKLINE>
  218. dance(0).
  219. - dance(1).
  220. <BLANKLINE>
  221. person(0).
  222. - person(1).
  223. <BLANKLINE>
  224. student(0).
  225. - student(1).
  226. <BLANKLINE>
  227. Consistent discourse: d0 ['s0-r0', 's1-r0']:
  228. s0-r0: exists z1.(student(z1) & dance(z1))
  229. s1-r0: all z1.(student(z1) -> person(z1))
  230. <BLANKLINE>
  231. Checking Informativity
  232. ======================
  233. Let's assume that we are still trying to extend the discourse *A
  234. student dances.* *Every student is a person.* We add a new sentence,
  235. but this time, we check whether it is informative with respect to what
  236. has gone before.
  237. >>> dt.add_sentence('A person dances', informchk=True)
  238. Sentence 'A person dances' under reading 'exists x.(person(x) & dance(x))':
  239. Not informative relative to thread 'd0'
  240. In fact, we are just checking whether the new sentence is entailed by
  241. the preceding discourse.
  242. >>> dt.models()
  243. --------------------------------------------------------------------------------
  244. Model for Discourse Thread d0
  245. --------------------------------------------------------------------------------
  246. % number = 1
  247. % seconds = 0
  248. <BLANKLINE>
  249. % Interpretation of size 2
  250. <BLANKLINE>
  251. c1 = 0.
  252. <BLANKLINE>
  253. c2 = 0.
  254. <BLANKLINE>
  255. dance(0).
  256. - dance(1).
  257. <BLANKLINE>
  258. person(0).
  259. - person(1).
  260. <BLANKLINE>
  261. student(0).
  262. - student(1).
  263. <BLANKLINE>
  264. Consistent discourse: d0 ['s0-r0', 's1-r0', 's2-r0']:
  265. s0-r0: exists z1.(student(z1) & dance(z1))
  266. s1-r0: all z1.(student(z1) -> person(z1))
  267. s2-r0: exists z1.(person(z1) & dance(z1))
  268. <BLANKLINE>
  269. Adding Background Knowledge
  270. ===========================
  271. Let's build a new discourse, and look at the readings of the component sentences:
  272. >>> dt = DiscourseTester(['Vincent is a boxer', 'Fido is a boxer', 'Vincent is married', 'Fido barks'])
  273. >>> dt.readings()
  274. <BLANKLINE>
  275. s0 readings:
  276. <BLANKLINE>
  277. s0-r0: boxer(Vincent)
  278. s0-r1: boxerdog(Vincent)
  279. <BLANKLINE>
  280. s1 readings:
  281. <BLANKLINE>
  282. s1-r0: boxer(Fido)
  283. s1-r1: boxerdog(Fido)
  284. <BLANKLINE>
  285. s2 readings:
  286. <BLANKLINE>
  287. s2-r0: married(Vincent)
  288. <BLANKLINE>
  289. s3 readings:
  290. <BLANKLINE>
  291. s3-r0: bark(Fido)
  292. This gives us a lot of threads:
  293. >>> dt.readings(threaded=True) # doctest: +NORMALIZE_WHITESPACE
  294. d0: ['s0-r0', 's1-r0', 's2-r0', 's3-r0']
  295. d1: ['s0-r0', 's1-r1', 's2-r0', 's3-r0']
  296. d2: ['s0-r1', 's1-r0', 's2-r0', 's3-r0']
  297. d3: ['s0-r1', 's1-r1', 's2-r0', 's3-r0']
  298. We can eliminate some of the readings, and hence some of the threads,
  299. by adding background information.
  300. >>> import nltk.data
  301. >>> bg = nltk.data.load('grammars/book_grammars/background.fol')
  302. >>> dt.add_background(bg)
  303. >>> dt.background()
  304. all x.(boxerdog(x) -> dog(x))
  305. all x.(boxer(x) -> person(x))
  306. all x.-(dog(x) & person(x))
  307. all x.(married(x) <-> exists y.marry(x,y))
  308. all x.(bark(x) -> dog(x))
  309. all x y.(marry(x,y) -> (person(x) & person(y)))
  310. -(Vincent = Mia)
  311. -(Vincent = Fido)
  312. -(Mia = Fido)
  313. The background information allows us to reject three of the threads as
  314. inconsistent. To see what remains, use the ``filter=True`` parameter
  315. on ``readings()``.
  316. >>> dt.readings(filter=True) # doctest: +NORMALIZE_WHITESPACE
  317. d1: ['s0-r0', 's1-r1', 's2-r0', 's3-r0']
  318. The ``models()`` method gives us more information about the surviving thread.
  319. >>> dt.models()
  320. --------------------------------------------------------------------------------
  321. Model for Discourse Thread d0
  322. --------------------------------------------------------------------------------
  323. No model found!
  324. <BLANKLINE>
  325. --------------------------------------------------------------------------------
  326. Model for Discourse Thread d1
  327. --------------------------------------------------------------------------------
  328. % number = 1
  329. % seconds = 0
  330. <BLANKLINE>
  331. % Interpretation of size 3
  332. <BLANKLINE>
  333. Fido = 0.
  334. <BLANKLINE>
  335. Mia = 1.
  336. <BLANKLINE>
  337. Vincent = 2.
  338. <BLANKLINE>
  339. f1(0) = 0.
  340. f1(1) = 0.
  341. f1(2) = 2.
  342. <BLANKLINE>
  343. bark(0).
  344. - bark(1).
  345. - bark(2).
  346. <BLANKLINE>
  347. - boxer(0).
  348. - boxer(1).
  349. boxer(2).
  350. <BLANKLINE>
  351. boxerdog(0).
  352. - boxerdog(1).
  353. - boxerdog(2).
  354. <BLANKLINE>
  355. dog(0).
  356. - dog(1).
  357. - dog(2).
  358. <BLANKLINE>
  359. - married(0).
  360. - married(1).
  361. married(2).
  362. <BLANKLINE>
  363. - person(0).
  364. - person(1).
  365. person(2).
  366. <BLANKLINE>
  367. - marry(0,0).
  368. - marry(0,1).
  369. - marry(0,2).
  370. - marry(1,0).
  371. - marry(1,1).
  372. - marry(1,2).
  373. - marry(2,0).
  374. - marry(2,1).
  375. marry(2,2).
  376. <BLANKLINE>
  377. --------------------------------------------------------------------------------
  378. Model for Discourse Thread d2
  379. --------------------------------------------------------------------------------
  380. No model found!
  381. <BLANKLINE>
  382. --------------------------------------------------------------------------------
  383. Model for Discourse Thread d3
  384. --------------------------------------------------------------------------------
  385. No model found!
  386. <BLANKLINE>
  387. Inconsistent discourse: d0 ['s0-r0', 's1-r0', 's2-r0', 's3-r0']:
  388. s0-r0: boxer(Vincent)
  389. s1-r0: boxer(Fido)
  390. s2-r0: married(Vincent)
  391. s3-r0: bark(Fido)
  392. <BLANKLINE>
  393. Consistent discourse: d1 ['s0-r0', 's1-r1', 's2-r0', 's3-r0']:
  394. s0-r0: boxer(Vincent)
  395. s1-r1: boxerdog(Fido)
  396. s2-r0: married(Vincent)
  397. s3-r0: bark(Fido)
  398. <BLANKLINE>
  399. Inconsistent discourse: d2 ['s0-r1', 's1-r0', 's2-r0', 's3-r0']:
  400. s0-r1: boxerdog(Vincent)
  401. s1-r0: boxer(Fido)
  402. s2-r0: married(Vincent)
  403. s3-r0: bark(Fido)
  404. <BLANKLINE>
  405. Inconsistent discourse: d3 ['s0-r1', 's1-r1', 's2-r0', 's3-r0']:
  406. s0-r1: boxerdog(Vincent)
  407. s1-r1: boxerdog(Fido)
  408. s2-r0: married(Vincent)
  409. s3-r0: bark(Fido)
  410. <BLANKLINE>
  411. .. This will not be visible in the html output: create a tempdir to
  412. play in.
  413. >>> import tempfile, os
  414. >>> tempdir = tempfile.mkdtemp()
  415. >>> old_dir = os.path.abspath('.')
  416. >>> os.chdir(tempdir)
  417. In order to play around with your own version of background knowledge,
  418. you might want to start off with a local copy of ``background.fol``:
  419. >>> nltk.data.retrieve('grammars/book_grammars/background.fol')
  420. Retrieving 'nltk:grammars/book_grammars/background.fol', saving to 'background.fol'
  421. After you have modified the file, the ``load_fol()`` function will parse
  422. the strings in the file into expressions of ``nltk.sem.logic``.
  423. >>> from nltk.inference.discourse import load_fol
  424. >>> mybg = load_fol(open('background.fol').read())
  425. The result can be loaded as an argument of ``add_background()`` in the
  426. manner shown earlier.
  427. .. This will not be visible in the html output: clean up the tempdir.
  428. >>> os.chdir(old_dir)
  429. >>> for f in os.listdir(tempdir):
  430. ... os.remove(os.path.join(tempdir, f))
  431. >>> os.rmdir(tempdir)
  432. >>> nltk.data.clear_cache()
  433. Regression Testing from book
  434. ============================
  435. >>> logic._counter._value = 0
  436. >>> from nltk.tag import RegexpTagger
  437. >>> tagger = RegexpTagger(
  438. ... [('^(chases|runs)$', 'VB'),
  439. ... ('^(a)$', 'ex_quant'),
  440. ... ('^(every)$', 'univ_quant'),
  441. ... ('^(dog|boy)$', 'NN'),
  442. ... ('^(He)$', 'PRP')
  443. ... ])
  444. >>> rc = DrtGlueReadingCommand(depparser=MaltParser(tagger=tagger))
  445. >>> dt = DiscourseTester(map(str.split, ['Every dog chases a boy', 'He runs']), rc)
  446. >>> dt.readings()
  447. <BLANKLINE>
  448. s0 readings:
  449. <BLANKLINE>
  450. s0-r0: ([z2],[boy(z2), (([z5],[dog(z5)]) -> ([],[chases(z5,z2)]))])
  451. s0-r1: ([],[(([z1],[dog(z1)]) -> ([z2],[boy(z2), chases(z1,z2)]))])
  452. <BLANKLINE>
  453. s1 readings:
  454. <BLANKLINE>
  455. s1-r0: ([z1],[PRO(z1), runs(z1)])
  456. >>> dt.readings(show_thread_readings=True)
  457. d0: ['s0-r0', 's1-r0'] : ([z1,z2],[boy(z1), (([z3],[dog(z3)]) -> ([],[chases(z3,z1)])), (z2 = z1), runs(z2)])
  458. d1: ['s0-r1', 's1-r0'] : INVALID: AnaphoraResolutionException
  459. >>> dt.readings(filter=True, show_thread_readings=True)
  460. d0: ['s0-r0', 's1-r0'] : ([z1,z3],[boy(z1), (([z2],[dog(z2)]) -> ([],[chases(z2,z1)])), (z3 = z1), runs(z3)])
  461. >>> logic._counter._value = 0
  462. >>> from nltk.parse import FeatureEarleyChartParser
  463. >>> from nltk.sem.drt import DrtParser
  464. >>> grammar = nltk.data.load('grammars/book_grammars/drt.fcfg', logic_parser=DrtParser())
  465. >>> parser = FeatureEarleyChartParser(grammar, trace=0)
  466. >>> trees = parser.parse('Angus owns a dog'.split())
  467. >>> print(list(trees)[0].label()['SEM'].simplify().normalize())
  468. ([z1,z2],[Angus(z1), dog(z2), own(z1,z2)])