featgram.doctest 28 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494495496497498499500501502503504505506507508509510511512513514515516517518519520521522523524525526527528529530531532533534535536537538539540541542543544545546547548549550551552553554555556557558559560561562563564565566567568569570571572573574575576577578579580581582583584585586587588589590591592593594595596597598599600601602603604605606607608
  1. .. Copyright (C) 2001-2019 NLTK Project
  2. .. For license information, see LICENSE.TXT
  3. =========================
  4. Feature Grammar Parsing
  5. =========================
  6. .. include:: ../../../nltk_book/definitions.rst
  7. Grammars can be parsed from strings.
  8. >>> from __future__ import print_function
  9. >>> import nltk
  10. >>> from nltk import grammar, parse
  11. >>> g = """
  12. ... % start DP
  13. ... DP[AGR=?a] -> D[AGR=?a] N[AGR=?a]
  14. ... D[AGR=[NUM='sg', PERS=3]] -> 'this' | 'that'
  15. ... D[AGR=[NUM='pl', PERS=3]] -> 'these' | 'those'
  16. ... D[AGR=[NUM='pl', PERS=1]] -> 'we'
  17. ... D[AGR=[PERS=2]] -> 'you'
  18. ... N[AGR=[NUM='sg', GND='m']] -> 'boy'
  19. ... N[AGR=[NUM='pl', GND='m']] -> 'boys'
  20. ... N[AGR=[NUM='sg', GND='f']] -> 'girl'
  21. ... N[AGR=[NUM='pl', GND='f']] -> 'girls'
  22. ... N[AGR=[NUM='sg']] -> 'student'
  23. ... N[AGR=[NUM='pl']] -> 'students'
  24. ... """
  25. >>> grammar = grammar.FeatureGrammar.fromstring(g)
  26. >>> tokens = 'these girls'.split()
  27. >>> parser = parse.FeatureEarleyChartParser(grammar)
  28. >>> trees = parser.parse(tokens)
  29. >>> for tree in trees: print(tree)
  30. (DP[AGR=[GND='f', NUM='pl', PERS=3]]
  31. (D[AGR=[NUM='pl', PERS=3]] these)
  32. (N[AGR=[GND='f', NUM='pl']] girls))
  33. In general, when we are trying to develop even a very small grammar,
  34. it is convenient to put the rules in a file where they can be edited,
  35. tested and revised. Let's assume that we have saved feat0cfg_ as a file named
  36. ``'feat0.fcfg'`` and placed it in the NLTK ``data`` directory. We can
  37. inspect it as follows:
  38. .. _feat0cfg: http://nltk.svn.sourceforge.net/svnroot/nltk/trunk/nltk/data/grammars/feat0.fcfg
  39. >>> nltk.data.show_cfg('grammars/book_grammars/feat0.fcfg')
  40. % start S
  41. # ###################
  42. # Grammar Productions
  43. # ###################
  44. # S expansion productions
  45. S -> NP[NUM=?n] VP[NUM=?n]
  46. # NP expansion productions
  47. NP[NUM=?n] -> N[NUM=?n]
  48. NP[NUM=?n] -> PropN[NUM=?n]
  49. NP[NUM=?n] -> Det[NUM=?n] N[NUM=?n]
  50. NP[NUM=pl] -> N[NUM=pl]
  51. # VP expansion productions
  52. VP[TENSE=?t, NUM=?n] -> IV[TENSE=?t, NUM=?n]
  53. VP[TENSE=?t, NUM=?n] -> TV[TENSE=?t, NUM=?n] NP
  54. # ###################
  55. # Lexical Productions
  56. # ###################
  57. Det[NUM=sg] -> 'this' | 'every'
  58. Det[NUM=pl] -> 'these' | 'all'
  59. Det -> 'the' | 'some' | 'several'
  60. PropN[NUM=sg]-> 'Kim' | 'Jody'
  61. N[NUM=sg] -> 'dog' | 'girl' | 'car' | 'child'
  62. N[NUM=pl] -> 'dogs' | 'girls' | 'cars' | 'children'
  63. IV[TENSE=pres, NUM=sg] -> 'disappears' | 'walks'
  64. TV[TENSE=pres, NUM=sg] -> 'sees' | 'likes'
  65. IV[TENSE=pres, NUM=pl] -> 'disappear' | 'walk'
  66. TV[TENSE=pres, NUM=pl] -> 'see' | 'like'
  67. IV[TENSE=past] -> 'disappeared' | 'walked'
  68. TV[TENSE=past] -> 'saw' | 'liked'
  69. Assuming we have saved feat0cfg_ as a file named
  70. ``'feat0.fcfg'``, the function ``parse.load_parser`` allows us to
  71. read the grammar into NLTK, ready for use in parsing.
  72. >>> cp = parse.load_parser('grammars/book_grammars/feat0.fcfg', trace=1)
  73. >>> sent = 'Kim likes children'
  74. >>> tokens = sent.split()
  75. >>> tokens
  76. ['Kim', 'likes', 'children']
  77. >>> trees = cp.parse(tokens)
  78. |.Kim .like.chil.|
  79. |[----] . .| [0:1] 'Kim'
  80. |. [----] .| [1:2] 'likes'
  81. |. . [----]| [2:3] 'children'
  82. |[----] . .| [0:1] PropN[NUM='sg'] -> 'Kim' *
  83. |[----] . .| [0:1] NP[NUM='sg'] -> PropN[NUM='sg'] *
  84. |[----> . .| [0:1] S[] -> NP[NUM=?n] * VP[NUM=?n] {?n: 'sg'}
  85. |. [----] .| [1:2] TV[NUM='sg', TENSE='pres'] -> 'likes' *
  86. |. [----> .| [1:2] VP[NUM=?n, TENSE=?t] -> TV[NUM=?n, TENSE=?t] * NP[] {?n: 'sg', ?t: 'pres'}
  87. |. . [----]| [2:3] N[NUM='pl'] -> 'children' *
  88. |. . [----]| [2:3] NP[NUM='pl'] -> N[NUM='pl'] *
  89. |. . [---->| [2:3] S[] -> NP[NUM=?n] * VP[NUM=?n] {?n: 'pl'}
  90. |. [---------]| [1:3] VP[NUM='sg', TENSE='pres'] -> TV[NUM='sg', TENSE='pres'] NP[] *
  91. |[==============]| [0:3] S[] -> NP[NUM='sg'] VP[NUM='sg'] *
  92. >>> for tree in trees: print(tree)
  93. (S[]
  94. (NP[NUM='sg'] (PropN[NUM='sg'] Kim))
  95. (VP[NUM='sg', TENSE='pres']
  96. (TV[NUM='sg', TENSE='pres'] likes)
  97. (NP[NUM='pl'] (N[NUM='pl'] children))))
  98. The parser works directly with
  99. the underspecified productions given by the grammar. That is, the
  100. Predictor rule does not attempt to compile out all admissible feature
  101. combinations before trying to expand the non-terminals on the left hand
  102. side of a production. However, when the Scanner matches an input word
  103. against a lexical production that has been predicted, the new edge will
  104. typically contain fully specified features; e.g., the edge
  105. [PropN[`num`:feat: = `sg`:fval:] |rarr| 'Kim', (0, 1)]. Recall from
  106. Chapter 8 that the Fundamental (or Completer) Rule in
  107. standard CFGs is used to combine an incomplete edge that's expecting a
  108. nonterminal *B* with a following, complete edge whose left hand side
  109. matches *B*. In our current setting, rather than checking for a
  110. complete match, we test whether the expected category *B* will
  111. `unify`:dt: with the left hand side *B'* of a following complete
  112. edge. We will explain in more detail in Section 9.2 how
  113. unification works; for the moment, it is enough to know that as a
  114. result of unification, any variable values of features in *B* will be
  115. instantiated by constant values in the corresponding feature structure
  116. in *B'*, and these instantiated values will be used in the new edge
  117. added by the Completer. This instantiation can be seen, for example,
  118. in the edge
  119. [NP [`num`:feat:\ =\ `sg`:fval:] |rarr| PropN[`num`:feat:\ =\ `sg`:fval:] |dot|, (0, 1)]
  120. in Example 9.2, where the feature `num`:feat: has been assigned the value `sg`:fval:.
  121. Feature structures in NLTK are ... Atomic feature values can be strings or
  122. integers.
  123. >>> fs1 = nltk.FeatStruct(TENSE='past', NUM='sg')
  124. >>> print(fs1)
  125. [ NUM = 'sg' ]
  126. [ TENSE = 'past' ]
  127. We can think of a feature structure as being like a Python dictionary,
  128. and access its values by indexing in the usual way.
  129. >>> fs1 = nltk.FeatStruct(PER=3, NUM='pl', GND='fem')
  130. >>> print(fs1['GND'])
  131. fem
  132. We can also define feature structures which have complex values, as
  133. discussed earlier.
  134. >>> fs2 = nltk.FeatStruct(POS='N', AGR=fs1)
  135. >>> print(fs2)
  136. [ [ GND = 'fem' ] ]
  137. [ AGR = [ NUM = 'pl' ] ]
  138. [ [ PER = 3 ] ]
  139. [ ]
  140. [ POS = 'N' ]
  141. >>> print(fs2['AGR'])
  142. [ GND = 'fem' ]
  143. [ NUM = 'pl' ]
  144. [ PER = 3 ]
  145. >>> print(fs2['AGR']['PER'])
  146. 3
  147. Feature structures can also be constructed using the ``parse()``
  148. method of the ``nltk.FeatStruct`` class. Note that in this case, atomic
  149. feature values do not need to be enclosed in quotes.
  150. >>> f1 = nltk.FeatStruct("[NUMBER = sg]")
  151. >>> f2 = nltk.FeatStruct("[PERSON = 3]")
  152. >>> print(nltk.unify(f1, f2))
  153. [ NUMBER = 'sg' ]
  154. [ PERSON = 3 ]
  155. >>> f1 = nltk.FeatStruct("[A = [B = b, D = d]]")
  156. >>> f2 = nltk.FeatStruct("[A = [C = c, D = d]]")
  157. >>> print(nltk.unify(f1, f2))
  158. [ [ B = 'b' ] ]
  159. [ A = [ C = 'c' ] ]
  160. [ [ D = 'd' ] ]
  161. Feature Structures as Graphs
  162. ----------------------------
  163. Feature structures are not inherently tied to linguistic objects; they are
  164. general purpose structures for representing knowledge. For example, we
  165. could encode information about a person in a feature structure:
  166. >>> person01 = nltk.FeatStruct("[NAME=Lee, TELNO='01 27 86 42 96',AGE=33]")
  167. >>> print(person01)
  168. [ AGE = 33 ]
  169. [ NAME = 'Lee' ]
  170. [ TELNO = '01 27 86 42 96' ]
  171. There are a number of notations for representing reentrancy in
  172. matrix-style representations of feature structures. In NLTK, we adopt
  173. the following convention: the first occurrence of a shared feature structure
  174. is prefixed with an integer in parentheses, such as ``(1)``, and any
  175. subsequent reference to that structure uses the notation
  176. ``->(1)``, as shown below.
  177. >>> fs = nltk.FeatStruct("""[NAME=Lee, ADDRESS=(1)[NUMBER=74, STREET='rue Pascal'],
  178. ... SPOUSE=[NAME=Kim, ADDRESS->(1)]]""")
  179. >>> print(fs)
  180. [ ADDRESS = (1) [ NUMBER = 74 ] ]
  181. [ [ STREET = 'rue Pascal' ] ]
  182. [ ]
  183. [ NAME = 'Lee' ]
  184. [ ]
  185. [ SPOUSE = [ ADDRESS -> (1) ] ]
  186. [ [ NAME = 'Kim' ] ]
  187. There can be any number of tags within a single feature structure.
  188. >>> fs3 = nltk.FeatStruct("[A=(1)[B=b], C=(2)[], D->(1), E->(2)]")
  189. >>> print(fs3)
  190. [ A = (1) [ B = 'b' ] ]
  191. [ ]
  192. [ C = (2) [] ]
  193. [ ]
  194. [ D -> (1) ]
  195. [ E -> (2) ]
  196. >>> fs1 = nltk.FeatStruct(NUMBER=74, STREET='rue Pascal')
  197. >>> fs2 = nltk.FeatStruct(CITY='Paris')
  198. >>> print(nltk.unify(fs1, fs2))
  199. [ CITY = 'Paris' ]
  200. [ NUMBER = 74 ]
  201. [ STREET = 'rue Pascal' ]
  202. Unification is symmetric:
  203. >>> nltk.unify(fs1, fs2) == nltk.unify(fs2, fs1)
  204. True
  205. Unification is commutative:
  206. >>> fs3 = nltk.FeatStruct(TELNO='01 27 86 42 96')
  207. >>> nltk.unify(nltk.unify(fs1, fs2), fs3) == nltk.unify(fs1, nltk.unify(fs2, fs3))
  208. True
  209. Unification between `FS`:math:\ :subscript:`0` and `FS`:math:\
  210. :subscript:`1` will fail if the two feature structures share a path |pi|,
  211. but the value of |pi| in `FS`:math:\ :subscript:`0` is a distinct
  212. atom from the value of |pi| in `FS`:math:\ :subscript:`1`. In NLTK,
  213. this is implemented by setting the result of unification to be
  214. ``None``.
  215. >>> fs0 = nltk.FeatStruct(A='a')
  216. >>> fs1 = nltk.FeatStruct(A='b')
  217. >>> print(nltk.unify(fs0, fs1))
  218. None
  219. Now, if we look at how unification interacts with structure-sharing,
  220. things become really interesting.
  221. >>> fs0 = nltk.FeatStruct("""[NAME=Lee,
  222. ... ADDRESS=[NUMBER=74,
  223. ... STREET='rue Pascal'],
  224. ... SPOUSE= [NAME=Kim,
  225. ... ADDRESS=[NUMBER=74,
  226. ... STREET='rue Pascal']]]""")
  227. >>> print(fs0)
  228. [ ADDRESS = [ NUMBER = 74 ] ]
  229. [ [ STREET = 'rue Pascal' ] ]
  230. [ ]
  231. [ NAME = 'Lee' ]
  232. [ ]
  233. [ [ ADDRESS = [ NUMBER = 74 ] ] ]
  234. [ SPOUSE = [ [ STREET = 'rue Pascal' ] ] ]
  235. [ [ ] ]
  236. [ [ NAME = 'Kim' ] ]
  237. >>> fs1 = nltk.FeatStruct("[SPOUSE=[ADDRESS=[CITY=Paris]]]")
  238. >>> print(nltk.unify(fs0, fs1))
  239. [ ADDRESS = [ NUMBER = 74 ] ]
  240. [ [ STREET = 'rue Pascal' ] ]
  241. [ ]
  242. [ NAME = 'Lee' ]
  243. [ ]
  244. [ [ [ CITY = 'Paris' ] ] ]
  245. [ [ ADDRESS = [ NUMBER = 74 ] ] ]
  246. [ SPOUSE = [ [ STREET = 'rue Pascal' ] ] ]
  247. [ [ ] ]
  248. [ [ NAME = 'Kim' ] ]
  249. >>> fs2 = nltk.FeatStruct("""[NAME=Lee, ADDRESS=(1)[NUMBER=74, STREET='rue Pascal'],
  250. ... SPOUSE=[NAME=Kim, ADDRESS->(1)]]""")
  251. >>> print(fs2)
  252. [ ADDRESS = (1) [ NUMBER = 74 ] ]
  253. [ [ STREET = 'rue Pascal' ] ]
  254. [ ]
  255. [ NAME = 'Lee' ]
  256. [ ]
  257. [ SPOUSE = [ ADDRESS -> (1) ] ]
  258. [ [ NAME = 'Kim' ] ]
  259. >>> print(nltk.unify(fs2, fs1))
  260. [ [ CITY = 'Paris' ] ]
  261. [ ADDRESS = (1) [ NUMBER = 74 ] ]
  262. [ [ STREET = 'rue Pascal' ] ]
  263. [ ]
  264. [ NAME = 'Lee' ]
  265. [ ]
  266. [ SPOUSE = [ ADDRESS -> (1) ] ]
  267. [ [ NAME = 'Kim' ] ]
  268. >>> fs1 = nltk.FeatStruct("[ADDRESS1=[NUMBER=74, STREET='rue Pascal']]")
  269. >>> fs2 = nltk.FeatStruct("[ADDRESS1=?x, ADDRESS2=?x]")
  270. >>> print(fs2)
  271. [ ADDRESS1 = ?x ]
  272. [ ADDRESS2 = ?x ]
  273. >>> print(nltk.unify(fs1, fs2))
  274. [ ADDRESS1 = (1) [ NUMBER = 74 ] ]
  275. [ [ STREET = 'rue Pascal' ] ]
  276. [ ]
  277. [ ADDRESS2 -> (1) ]
  278. >>> sent = 'who do you claim that you like'
  279. >>> tokens = sent.split()
  280. >>> cp = parse.load_parser('grammars/book_grammars/feat1.fcfg', trace=1)
  281. >>> trees = cp.parse(tokens)
  282. |.w.d.y.c.t.y.l.|
  283. |[-] . . . . . .| [0:1] 'who'
  284. |. [-] . . . . .| [1:2] 'do'
  285. |. . [-] . . . .| [2:3] 'you'
  286. |. . . [-] . . .| [3:4] 'claim'
  287. |. . . . [-] . .| [4:5] 'that'
  288. |. . . . . [-] .| [5:6] 'you'
  289. |. . . . . . [-]| [6:7] 'like'
  290. |# . . . . . . .| [0:0] NP[]/NP[] -> *
  291. |. # . . . . . .| [1:1] NP[]/NP[] -> *
  292. |. . # . . . . .| [2:2] NP[]/NP[] -> *
  293. |. . . # . . . .| [3:3] NP[]/NP[] -> *
  294. |. . . . # . . .| [4:4] NP[]/NP[] -> *
  295. |. . . . . # . .| [5:5] NP[]/NP[] -> *
  296. |. . . . . . # .| [6:6] NP[]/NP[] -> *
  297. |. . . . . . . #| [7:7] NP[]/NP[] -> *
  298. |[-] . . . . . .| [0:1] NP[+WH] -> 'who' *
  299. |[-> . . . . . .| [0:1] S[-INV] -> NP[] * VP[] {}
  300. |[-> . . . . . .| [0:1] S[-INV]/?x[] -> NP[] * VP[]/?x[] {}
  301. |[-> . . . . . .| [0:1] S[-INV] -> NP[] * S[]/NP[] {}
  302. |. [-] . . . . .| [1:2] V[+AUX] -> 'do' *
  303. |. [-> . . . . .| [1:2] S[+INV] -> V[+AUX] * NP[] VP[] {}
  304. |. [-> . . . . .| [1:2] S[+INV]/?x[] -> V[+AUX] * NP[] VP[]/?x[] {}
  305. |. [-> . . . . .| [1:2] VP[] -> V[+AUX] * VP[] {}
  306. |. [-> . . . . .| [1:2] VP[]/?x[] -> V[+AUX] * VP[]/?x[] {}
  307. |. . [-] . . . .| [2:3] NP[-WH] -> 'you' *
  308. |. . [-> . . . .| [2:3] S[-INV] -> NP[] * VP[] {}
  309. |. . [-> . . . .| [2:3] S[-INV]/?x[] -> NP[] * VP[]/?x[] {}
  310. |. . [-> . . . .| [2:3] S[-INV] -> NP[] * S[]/NP[] {}
  311. |. [---> . . . .| [1:3] S[+INV] -> V[+AUX] NP[] * VP[] {}
  312. |. [---> . . . .| [1:3] S[+INV]/?x[] -> V[+AUX] NP[] * VP[]/?x[] {}
  313. |. . . [-] . . .| [3:4] V[-AUX, SUBCAT='clause'] -> 'claim' *
  314. |. . . [-> . . .| [3:4] VP[] -> V[-AUX, SUBCAT='clause'] * SBar[] {}
  315. |. . . [-> . . .| [3:4] VP[]/?x[] -> V[-AUX, SUBCAT='clause'] * SBar[]/?x[] {}
  316. |. . . . [-] . .| [4:5] Comp[] -> 'that' *
  317. |. . . . [-> . .| [4:5] SBar[] -> Comp[] * S[-INV] {}
  318. |. . . . [-> . .| [4:5] SBar[]/?x[] -> Comp[] * S[-INV]/?x[] {}
  319. |. . . . . [-] .| [5:6] NP[-WH] -> 'you' *
  320. |. . . . . [-> .| [5:6] S[-INV] -> NP[] * VP[] {}
  321. |. . . . . [-> .| [5:6] S[-INV]/?x[] -> NP[] * VP[]/?x[] {}
  322. |. . . . . [-> .| [5:6] S[-INV] -> NP[] * S[]/NP[] {}
  323. |. . . . . . [-]| [6:7] V[-AUX, SUBCAT='trans'] -> 'like' *
  324. |. . . . . . [->| [6:7] VP[] -> V[-AUX, SUBCAT='trans'] * NP[] {}
  325. |. . . . . . [->| [6:7] VP[]/?x[] -> V[-AUX, SUBCAT='trans'] * NP[]/?x[] {}
  326. |. . . . . . [-]| [6:7] VP[]/NP[] -> V[-AUX, SUBCAT='trans'] NP[]/NP[] *
  327. |. . . . . [---]| [5:7] S[-INV]/NP[] -> NP[] VP[]/NP[] *
  328. |. . . . [-----]| [4:7] SBar[]/NP[] -> Comp[] S[-INV]/NP[] *
  329. |. . . [-------]| [3:7] VP[]/NP[] -> V[-AUX, SUBCAT='clause'] SBar[]/NP[] *
  330. |. . [---------]| [2:7] S[-INV]/NP[] -> NP[] VP[]/NP[] *
  331. |. [-----------]| [1:7] S[+INV]/NP[] -> V[+AUX] NP[] VP[]/NP[] *
  332. |[=============]| [0:7] S[-INV] -> NP[] S[]/NP[] *
  333. >>> trees = list(trees)
  334. >>> for tree in trees: print(tree)
  335. (S[-INV]
  336. (NP[+WH] who)
  337. (S[+INV]/NP[]
  338. (V[+AUX] do)
  339. (NP[-WH] you)
  340. (VP[]/NP[]
  341. (V[-AUX, SUBCAT='clause'] claim)
  342. (SBar[]/NP[]
  343. (Comp[] that)
  344. (S[-INV]/NP[]
  345. (NP[-WH] you)
  346. (VP[]/NP[] (V[-AUX, SUBCAT='trans'] like) (NP[]/NP[] )))))))
  347. A different parser should give the same parse trees, but perhaps in a different order:
  348. >>> cp2 = parse.load_parser('grammars/book_grammars/feat1.fcfg', trace=1,
  349. ... parser=parse.FeatureEarleyChartParser)
  350. >>> trees2 = cp2.parse(tokens)
  351. |.w.d.y.c.t.y.l.|
  352. |[-] . . . . . .| [0:1] 'who'
  353. |. [-] . . . . .| [1:2] 'do'
  354. |. . [-] . . . .| [2:3] 'you'
  355. |. . . [-] . . .| [3:4] 'claim'
  356. |. . . . [-] . .| [4:5] 'that'
  357. |. . . . . [-] .| [5:6] 'you'
  358. |. . . . . . [-]| [6:7] 'like'
  359. |> . . . . . . .| [0:0] S[-INV] -> * NP[] VP[] {}
  360. |> . . . . . . .| [0:0] S[-INV]/?x[] -> * NP[] VP[]/?x[] {}
  361. |> . . . . . . .| [0:0] S[-INV] -> * NP[] S[]/NP[] {}
  362. |> . . . . . . .| [0:0] S[-INV] -> * Adv[+NEG] S[+INV] {}
  363. |> . . . . . . .| [0:0] S[+INV] -> * V[+AUX] NP[] VP[] {}
  364. |> . . . . . . .| [0:0] S[+INV]/?x[] -> * V[+AUX] NP[] VP[]/?x[] {}
  365. |> . . . . . . .| [0:0] NP[+WH] -> * 'who' {}
  366. |[-] . . . . . .| [0:1] NP[+WH] -> 'who' *
  367. |[-> . . . . . .| [0:1] S[-INV] -> NP[] * VP[] {}
  368. |[-> . . . . . .| [0:1] S[-INV]/?x[] -> NP[] * VP[]/?x[] {}
  369. |[-> . . . . . .| [0:1] S[-INV] -> NP[] * S[]/NP[] {}
  370. |. > . . . . . .| [1:1] S[-INV]/?x[] -> * NP[] VP[]/?x[] {}
  371. |. > . . . . . .| [1:1] S[+INV]/?x[] -> * V[+AUX] NP[] VP[]/?x[] {}
  372. |. > . . . . . .| [1:1] V[+AUX] -> * 'do' {}
  373. |. > . . . . . .| [1:1] VP[]/?x[] -> * V[-AUX, SUBCAT='trans'] NP[]/?x[] {}
  374. |. > . . . . . .| [1:1] VP[]/?x[] -> * V[-AUX, SUBCAT='clause'] SBar[]/?x[] {}
  375. |. > . . . . . .| [1:1] VP[]/?x[] -> * V[+AUX] VP[]/?x[] {}
  376. |. > . . . . . .| [1:1] VP[] -> * V[-AUX, SUBCAT='intrans'] {}
  377. |. > . . . . . .| [1:1] VP[] -> * V[-AUX, SUBCAT='trans'] NP[] {}
  378. |. > . . . . . .| [1:1] VP[] -> * V[-AUX, SUBCAT='clause'] SBar[] {}
  379. |. > . . . . . .| [1:1] VP[] -> * V[+AUX] VP[] {}
  380. |. [-] . . . . .| [1:2] V[+AUX] -> 'do' *
  381. |. [-> . . . . .| [1:2] S[+INV]/?x[] -> V[+AUX] * NP[] VP[]/?x[] {}
  382. |. [-> . . . . .| [1:2] VP[]/?x[] -> V[+AUX] * VP[]/?x[] {}
  383. |. [-> . . . . .| [1:2] VP[] -> V[+AUX] * VP[] {}
  384. |. . > . . . . .| [2:2] VP[] -> * V[-AUX, SUBCAT='intrans'] {}
  385. |. . > . . . . .| [2:2] VP[] -> * V[-AUX, SUBCAT='trans'] NP[] {}
  386. |. . > . . . . .| [2:2] VP[] -> * V[-AUX, SUBCAT='clause'] SBar[] {}
  387. |. . > . . . . .| [2:2] VP[] -> * V[+AUX] VP[] {}
  388. |. . > . . . . .| [2:2] VP[]/?x[] -> * V[-AUX, SUBCAT='trans'] NP[]/?x[] {}
  389. |. . > . . . . .| [2:2] VP[]/?x[] -> * V[-AUX, SUBCAT='clause'] SBar[]/?x[] {}
  390. |. . > . . . . .| [2:2] VP[]/?x[] -> * V[+AUX] VP[]/?x[] {}
  391. |. . > . . . . .| [2:2] NP[-WH] -> * 'you' {}
  392. |. . [-] . . . .| [2:3] NP[-WH] -> 'you' *
  393. |. [---> . . . .| [1:3] S[+INV]/?x[] -> V[+AUX] NP[] * VP[]/?x[] {}
  394. |. . . > . . . .| [3:3] VP[]/?x[] -> * V[-AUX, SUBCAT='trans'] NP[]/?x[] {}
  395. |. . . > . . . .| [3:3] VP[]/?x[] -> * V[-AUX, SUBCAT='clause'] SBar[]/?x[] {}
  396. |. . . > . . . .| [3:3] VP[]/?x[] -> * V[+AUX] VP[]/?x[] {}
  397. |. . . > . . . .| [3:3] V[-AUX, SUBCAT='clause'] -> * 'claim' {}
  398. |. . . [-] . . .| [3:4] V[-AUX, SUBCAT='clause'] -> 'claim' *
  399. |. . . [-> . . .| [3:4] VP[]/?x[] -> V[-AUX, SUBCAT='clause'] * SBar[]/?x[] {}
  400. |. . . . > . . .| [4:4] SBar[]/?x[] -> * Comp[] S[-INV]/?x[] {}
  401. |. . . . > . . .| [4:4] Comp[] -> * 'that' {}
  402. |. . . . [-] . .| [4:5] Comp[] -> 'that' *
  403. |. . . . [-> . .| [4:5] SBar[]/?x[] -> Comp[] * S[-INV]/?x[] {}
  404. |. . . . . > . .| [5:5] S[-INV]/?x[] -> * NP[] VP[]/?x[] {}
  405. |. . . . . > . .| [5:5] NP[-WH] -> * 'you' {}
  406. |. . . . . [-] .| [5:6] NP[-WH] -> 'you' *
  407. |. . . . . [-> .| [5:6] S[-INV]/?x[] -> NP[] * VP[]/?x[] {}
  408. |. . . . . . > .| [6:6] VP[]/?x[] -> * V[-AUX, SUBCAT='trans'] NP[]/?x[] {}
  409. |. . . . . . > .| [6:6] VP[]/?x[] -> * V[-AUX, SUBCAT='clause'] SBar[]/?x[] {}
  410. |. . . . . . > .| [6:6] VP[]/?x[] -> * V[+AUX] VP[]/?x[] {}
  411. |. . . . . . > .| [6:6] V[-AUX, SUBCAT='trans'] -> * 'like' {}
  412. |. . . . . . [-]| [6:7] V[-AUX, SUBCAT='trans'] -> 'like' *
  413. |. . . . . . [->| [6:7] VP[]/?x[] -> V[-AUX, SUBCAT='trans'] * NP[]/?x[] {}
  414. |. . . . . . . #| [7:7] NP[]/NP[] -> *
  415. |. . . . . . [-]| [6:7] VP[]/NP[] -> V[-AUX, SUBCAT='trans'] NP[]/NP[] *
  416. |. . . . . [---]| [5:7] S[-INV]/NP[] -> NP[] VP[]/NP[] *
  417. |. . . . [-----]| [4:7] SBar[]/NP[] -> Comp[] S[-INV]/NP[] *
  418. |. . . [-------]| [3:7] VP[]/NP[] -> V[-AUX, SUBCAT='clause'] SBar[]/NP[] *
  419. |. [-----------]| [1:7] S[+INV]/NP[] -> V[+AUX] NP[] VP[]/NP[] *
  420. |[=============]| [0:7] S[-INV] -> NP[] S[]/NP[] *
  421. >>> sorted(trees) == sorted(trees2)
  422. True
  423. Let's load a German grammar:
  424. >>> cp = parse.load_parser('grammars/book_grammars/german.fcfg', trace=0)
  425. >>> sent = 'die Katze sieht den Hund'
  426. >>> tokens = sent.split()
  427. >>> trees = cp.parse(tokens)
  428. >>> for tree in trees: print(tree)
  429. (S[]
  430. (NP[AGR=[GND='fem', NUM='sg', PER=3], CASE='nom']
  431. (Det[AGR=[GND='fem', NUM='sg', PER=3], CASE='nom'] die)
  432. (N[AGR=[GND='fem', NUM='sg', PER=3]] Katze))
  433. (VP[AGR=[NUM='sg', PER=3]]
  434. (TV[AGR=[NUM='sg', PER=3], OBJCASE='acc'] sieht)
  435. (NP[AGR=[GND='masc', NUM='sg', PER=3], CASE='acc']
  436. (Det[AGR=[GND='masc', NUM='sg', PER=3], CASE='acc'] den)
  437. (N[AGR=[GND='masc', NUM='sg', PER=3]] Hund))))
  438. Grammar with Binding Operators
  439. ------------------------------
  440. The `bindop.fcfg`_ grammar is a semantic grammar that uses lambda
  441. calculus. Each element has a core semantics, which is a single lambda
  442. calculus expression; and a set of binding operators, which bind
  443. variables.
  444. .. _bindop.fcfg: http://nltk.svn.sourceforge.net/svnroot/nltk/trunk/nltk/data/grammars/bindop.fcfg
  445. In order to make the binding operators work right, they need to
  446. instantiate their bound variable every time they are added to the
  447. chart. To do this, we use a special subclass of `Chart`, called
  448. `InstantiateVarsChart`.
  449. >>> from nltk.parse.featurechart import InstantiateVarsChart
  450. >>> cp = parse.load_parser('grammars/sample_grammars/bindop.fcfg', trace=1,
  451. ... chart_class=InstantiateVarsChart)
  452. >>> print(cp.grammar())
  453. Grammar with 15 productions (start state = S[])
  454. S[SEM=[BO={?b1+?b2}, CORE=<?vp(?subj)>]] -> NP[SEM=[BO=?b1, CORE=?subj]] VP[SEM=[BO=?b2, CORE=?vp]]
  455. VP[SEM=[BO={?b1+?b2}, CORE=<?v(?obj)>]] -> TV[SEM=[BO=?b1, CORE=?v]] NP[SEM=[BO=?b2, CORE=?obj]]
  456. VP[SEM=?s] -> IV[SEM=?s]
  457. NP[SEM=[BO={?b1+?b2+{bo(?det(?n),@x)}}, CORE=<@x>]] -> Det[SEM=[BO=?b1, CORE=?det]] N[SEM=[BO=?b2, CORE=?n]]
  458. Det[SEM=[BO={/}, CORE=<\Q P.exists x.(Q(x) & P(x))>]] -> 'a'
  459. N[SEM=[BO={/}, CORE=<dog>]] -> 'dog'
  460. N[SEM=[BO={/}, CORE=<dog>]] -> 'cat'
  461. N[SEM=[BO={/}, CORE=<dog>]] -> 'mouse'
  462. IV[SEM=[BO={/}, CORE=<\x.bark(x)>]] -> 'barks'
  463. IV[SEM=[BO={/}, CORE=<\x.bark(x)>]] -> 'eats'
  464. IV[SEM=[BO={/}, CORE=<\x.bark(x)>]] -> 'walks'
  465. TV[SEM=[BO={/}, CORE=<\x y.feed(y,x)>]] -> 'feeds'
  466. TV[SEM=[BO={/}, CORE=<\x y.feed(y,x)>]] -> 'walks'
  467. NP[SEM=[BO={bo(\P.P(John),@x)}, CORE=<@x>]] -> 'john'
  468. NP[SEM=[BO={bo(\P.P(John),@x)}, CORE=<@x>]] -> 'alex'
  469. A simple intransitive sentence:
  470. >>> from nltk.sem import logic
  471. >>> logic._counter._value = 100
  472. >>> trees = cp.parse('john barks'.split())
  473. |. john.barks.|
  474. |[-----] .| [0:1] 'john'
  475. |. [-----]| [1:2] 'barks'
  476. |[-----] .| [0:1] NP[SEM=[BO={bo(\P.P(John),z101)}, CORE=<z101>]] -> 'john' *
  477. |[-----> .| [0:1] S[SEM=[BO={?b1+?b2}, CORE=<?vp(?subj)>]] -> NP[SEM=[BO=?b1, CORE=?subj]] * VP[SEM=[BO=?b2, CORE=?vp]] {?b1: {bo(\P.P(John),z2)}, ?subj: <IndividualVariableExpression z2>}
  478. |. [-----]| [1:2] IV[SEM=[BO={/}, CORE=<\x.bark(x)>]] -> 'barks' *
  479. |. [-----]| [1:2] VP[SEM=[BO={/}, CORE=<\x.bark(x)>]] -> IV[SEM=[BO={/}, CORE=<\x.bark(x)>]] *
  480. |[===========]| [0:2] S[SEM=[BO={bo(\P.P(John),z2)}, CORE=<bark(z2)>]] -> NP[SEM=[BO={bo(\P.P(John),z2)}, CORE=<z2>]] VP[SEM=[BO={/}, CORE=<\x.bark(x)>]] *
  481. >>> for tree in trees: print(tree)
  482. (S[SEM=[BO={bo(\P.P(John),z2)}, CORE=<bark(z2)>]]
  483. (NP[SEM=[BO={bo(\P.P(John),z101)}, CORE=<z101>]] john)
  484. (VP[SEM=[BO={/}, CORE=<\x.bark(x)>]]
  485. (IV[SEM=[BO={/}, CORE=<\x.bark(x)>]] barks)))
  486. A transitive sentence:
  487. >>> trees = cp.parse('john feeds a dog'.split())
  488. |.joh.fee. a .dog.|
  489. |[---] . . .| [0:1] 'john'
  490. |. [---] . .| [1:2] 'feeds'
  491. |. . [---] .| [2:3] 'a'
  492. |. . . [---]| [3:4] 'dog'
  493. |[---] . . .| [0:1] NP[SEM=[BO={bo(\P.P(John),z102)}, CORE=<z102>]] -> 'john' *
  494. |[---> . . .| [0:1] S[SEM=[BO={?b1+?b2}, CORE=<?vp(?subj)>]] -> NP[SEM=[BO=?b1, CORE=?subj]] * VP[SEM=[BO=?b2, CORE=?vp]] {?b1: {bo(\P.P(John),z2)}, ?subj: <IndividualVariableExpression z2>}
  495. |. [---] . .| [1:2] TV[SEM=[BO={/}, CORE=<\x y.feed(y,x)>]] -> 'feeds' *
  496. |. [---> . .| [1:2] VP[SEM=[BO={?b1+?b2}, CORE=<?v(?obj)>]] -> TV[SEM=[BO=?b1, CORE=?v]] * NP[SEM=[BO=?b2, CORE=?obj]] {?b1: {/}, ?v: <LambdaExpression \x y.feed(y,x)>}
  497. |. . [---] .| [2:3] Det[SEM=[BO={/}, CORE=<\Q P.exists x.(Q(x) & P(x))>]] -> 'a' *
  498. |. . [---> .| [2:3] NP[SEM=[BO={?b1+?b2+{bo(?det(?n),@x)}}, CORE=<@x>]] -> Det[SEM=[BO=?b1, CORE=?det]] * N[SEM=[BO=?b2, CORE=?n]] {?b1: {/}, ?det: <LambdaExpression \Q P.exists x.(Q(x) & P(x))>}
  499. |. . . [---]| [3:4] N[SEM=[BO={/}, CORE=<dog>]] -> 'dog' *
  500. |. . [-------]| [2:4] NP[SEM=[BO={bo(\P.exists x.(dog(x) & P(x)),z103)}, CORE=<z103>]] -> Det[SEM=[BO={/}, CORE=<\Q P.exists x.(Q(x) & P(x))>]] N[SEM=[BO={/}, CORE=<dog>]] *
  501. |. . [------->| [2:4] S[SEM=[BO={?b1+?b2}, CORE=<?vp(?subj)>]] -> NP[SEM=[BO=?b1, CORE=?subj]] * VP[SEM=[BO=?b2, CORE=?vp]] {?b1: {bo(\P.exists x.(dog(x) & P(x)),z2)}, ?subj: <IndividualVariableExpression z2>}
  502. |. [-----------]| [1:4] VP[SEM=[BO={bo(\P.exists x.(dog(x) & P(x)),z2)}, CORE=<\y.feed(y,z2)>]] -> TV[SEM=[BO={/}, CORE=<\x y.feed(y,x)>]] NP[SEM=[BO={bo(\P.exists x.(dog(x) & P(x)),z2)}, CORE=<z2>]] *
  503. |[===============]| [0:4] S[SEM=[BO={bo(\P.P(John),z2), bo(\P.exists x.(dog(x) & P(x)),z3)}, CORE=<feed(z2,z3)>]] -> NP[SEM=[BO={bo(\P.P(John),z2)}, CORE=<z2>]] VP[SEM=[BO={bo(\P.exists x.(dog(x) & P(x)),z3)}, CORE=<\y.feed(y,z3)>]] *
  504. >>> for tree in trees: print(tree)
  505. (S[SEM=[BO={bo(\P.P(John),z2), bo(\P.exists x.(dog(x) & P(x)),z3)}, CORE=<feed(z2,z3)>]]
  506. (NP[SEM=[BO={bo(\P.P(John),z102)}, CORE=<z102>]] john)
  507. (VP[SEM=[BO={bo(\P.exists x.(dog(x) & P(x)),z2)}, CORE=<\y.feed(y,z2)>]]
  508. (TV[SEM=[BO={/}, CORE=<\x y.feed(y,x)>]] feeds)
  509. (NP[SEM=[BO={bo(\P.exists x.(dog(x) & P(x)),z103)}, CORE=<z103>]]
  510. (Det[SEM=[BO={/}, CORE=<\Q P.exists x.(Q(x) & P(x))>]] a)
  511. (N[SEM=[BO={/}, CORE=<dog>]] dog))))
  512. Turn down the verbosity:
  513. >>> cp = parse.load_parser('grammars/sample_grammars/bindop.fcfg', trace=0,
  514. ... chart_class=InstantiateVarsChart)
  515. Reuse the same lexical item twice:
  516. >>> trees = cp.parse('john feeds john'.split())
  517. >>> for tree in trees: print(tree)
  518. (S[SEM=[BO={bo(\P.P(John),z2), bo(\P.P(John),z3)}, CORE=<feed(z2,z3)>]]
  519. (NP[SEM=[BO={bo(\P.P(John),z104)}, CORE=<z104>]] john)
  520. (VP[SEM=[BO={bo(\P.P(John),z2)}, CORE=<\y.feed(y,z2)>]]
  521. (TV[SEM=[BO={/}, CORE=<\x y.feed(y,x)>]] feeds)
  522. (NP[SEM=[BO={bo(\P.P(John),z105)}, CORE=<z105>]] john)))
  523. >>> trees = cp.parse('a dog feeds a dog'.split())
  524. >>> for tree in trees: print(tree)
  525. (S[SEM=[BO={bo(\P.exists x.(dog(x) & P(x)),z2), bo(\P.exists x.(dog(x) & P(x)),z3)}, CORE=<feed(z2,z3)>]]
  526. (NP[SEM=[BO={bo(\P.exists x.(dog(x) & P(x)),z106)}, CORE=<z106>]]
  527. (Det[SEM=[BO={/}, CORE=<\Q P.exists x.(Q(x) & P(x))>]] a)
  528. (N[SEM=[BO={/}, CORE=<dog>]] dog))
  529. (VP[SEM=[BO={bo(\P.exists x.(dog(x) & P(x)),z2)}, CORE=<\y.feed(y,z2)>]]
  530. (TV[SEM=[BO={/}, CORE=<\x y.feed(y,x)>]] feeds)
  531. (NP[SEM=[BO={bo(\P.exists x.(dog(x) & P(x)),z107)}, CORE=<z107>]]
  532. (Det[SEM=[BO={/}, CORE=<\Q P.exists x.(Q(x) & P(x))>]] a)
  533. (N[SEM=[BO={/}, CORE=<dog>]] dog))))