framenet.doctest 10 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289
  1. .. Copyright (C) 2001-2019 NLTK Project
  2. .. For license information, see LICENSE.TXT
  3. ========
  4. FrameNet
  5. ========
  6. The FrameNet corpus is a lexical database of English that is both human-
  7. and machine-readable, based on annotating examples of how words are used
  8. in actual texts. FrameNet is based on a theory of meaning called Frame
  9. Semantics, deriving from the work of Charles J. Fillmore and colleagues.
  10. The basic idea is straightforward: that the meanings of most words can
  11. best be understood on the basis of a semantic frame: a description of a
  12. type of event, relation, or entity and the participants in it. For
  13. example, the concept of cooking typically involves a person doing the
  14. cooking (Cook), the food that is to be cooked (Food), something to hold
  15. the food while cooking (Container) and a source of heat
  16. (Heating_instrument). In the FrameNet project, this is represented as a
  17. frame called Apply_heat, and the Cook, Food, Heating_instrument and
  18. Container are called frame elements (FEs). Words that evoke this frame,
  19. such as fry, bake, boil, and broil, are called lexical units (LUs) of
  20. the Apply_heat frame. The job of FrameNet is to define the frames
  21. and to annotate sentences to show how the FEs fit syntactically around
  22. the word that evokes the frame.
  23. ------
  24. Frames
  25. ------
  26. A Frame is a script-like conceptual structure that describes a
  27. particular type of situation, object, or event along with the
  28. participants and props that are needed for that Frame. For
  29. example, the "Apply_heat" frame describes a common situation
  30. involving a Cook, some Food, and a Heating_Instrument, and is
  31. evoked by words such as bake, blanch, boil, broil, brown,
  32. simmer, steam, etc.
  33. We call the roles of a Frame "frame elements" (FEs) and the
  34. frame-evoking words are called "lexical units" (LUs).
  35. FrameNet includes relations between Frames. Several types of
  36. relations are defined, of which the most important are:
  37. - Inheritance: An IS-A relation. The child frame is a subtype
  38. of the parent frame, and each FE in the parent is bound to
  39. a corresponding FE in the child. An example is the
  40. "Revenge" frame which inherits from the
  41. "Rewards_and_punishments" frame.
  42. - Using: The child frame presupposes the parent frame as
  43. background, e.g the "Speed" frame "uses" (or presupposes)
  44. the "Motion" frame; however, not all parent FEs need to be
  45. bound to child FEs.
  46. - Subframe: The child frame is a subevent of a complex event
  47. represented by the parent, e.g. the "Criminal_process" frame
  48. has subframes of "Arrest", "Arraignment", "Trial", and
  49. "Sentencing".
  50. - Perspective_on: The child frame provides a particular
  51. perspective on an un-perspectivized parent frame. A pair of
  52. examples consists of the "Hiring" and "Get_a_job" frames,
  53. which perspectivize the "Employment_start" frame from the
  54. Employer's and the Employee's point of view, respectively.
  55. To get a list of all of the Frames in FrameNet, you can use the
  56. `frames()` function. If you supply a regular expression pattern to the
  57. `frames()` function, you will get a list of all Frames whose names match
  58. that pattern:
  59. >>> from pprint import pprint
  60. >>> from operator import itemgetter
  61. >>> from nltk.corpus import framenet as fn
  62. >>> from nltk.corpus.reader.framenet import PrettyList
  63. >>> x = fn.frames(r'(?i)crim')
  64. >>> x.sort(key=itemgetter('ID'))
  65. >>> x
  66. [<frame ID=200 name=Criminal_process>, <frame ID=500 name=Criminal_investigation>, ...]
  67. >>> PrettyList(sorted(x, key=itemgetter('ID')))
  68. [<frame ID=200 name=Criminal_process>, <frame ID=500 name=Criminal_investigation>, ...]
  69. To get the details of a particular Frame, you can use the `frame()`
  70. function passing in the frame number:
  71. >>> from pprint import pprint
  72. >>> from nltk.corpus import framenet as fn
  73. >>> f = fn.frame(202)
  74. >>> f.ID
  75. 202
  76. >>> f.name
  77. 'Arrest'
  78. >>> f.definition # doctest: +ELLIPSIS
  79. "Authorities charge a Suspect, who is under suspicion of having committed a crime..."
  80. >>> len(f.lexUnit)
  81. 11
  82. >>> pprint(sorted([x for x in f.FE]))
  83. ['Authorities',
  84. 'Charges',
  85. 'Co-participant',
  86. 'Manner',
  87. 'Means',
  88. 'Offense',
  89. 'Place',
  90. 'Purpose',
  91. 'Source_of_legal_authority',
  92. 'Suspect',
  93. 'Time',
  94. 'Type']
  95. >>> pprint(f.frameRelations)
  96. [<Parent=Intentionally_affect -- Inheritance -> Child=Arrest>, <Complex=Criminal_process -- Subframe -> Component=Arrest>, ...]
  97. The `frame()` function shown above returns a dict object containing
  98. detailed information about the Frame. See the documentation on the
  99. `frame()` function for the specifics.
  100. You can also search for Frames by their Lexical Units (LUs). The
  101. `frames_by_lemma()` function returns a list of all frames that contain
  102. LUs in which the 'name' attribute of the LU matchs the given regular
  103. expression. Note that LU names are composed of "lemma.POS", where the
  104. "lemma" part can be made up of either a single lexeme (e.g. 'run') or
  105. multiple lexemes (e.g. 'a little') (see below).
  106. >>> PrettyList(sorted(fn.frames_by_lemma(r'(?i)a little'), key=itemgetter('ID'))) # doctest: +ELLIPSIS
  107. [<frame ID=189 name=Quanti...>, <frame ID=2001 name=Degree>]
  108. -------------
  109. Lexical Units
  110. -------------
  111. A lexical unit (LU) is a pairing of a word with a meaning. For
  112. example, the "Apply_heat" Frame describes a common situation
  113. involving a Cook, some Food, and a Heating Instrument, and is
  114. _evoked_ by words such as bake, blanch, boil, broil, brown,
  115. simmer, steam, etc. These frame-evoking words are the LUs in the
  116. Apply_heat frame. Each sense of a polysemous word is a different
  117. LU.
  118. We have used the word "word" in talking about LUs. The reality
  119. is actually rather complex. When we say that the word "bake" is
  120. polysemous, we mean that the lemma "bake.v" (which has the
  121. word-forms "bake", "bakes", "baked", and "baking") is linked to
  122. three different frames:
  123. - Apply_heat: "Michelle baked the potatoes for 45 minutes."
  124. - Cooking_creation: "Michelle baked her mother a cake for her birthday."
  125. - Absorb_heat: "The potatoes have to bake for more than 30 minutes."
  126. These constitute three different LUs, with different
  127. definitions.
  128. Multiword expressions such as "given name" and hyphenated words
  129. like "shut-eye" can also be LUs. Idiomatic phrases such as
  130. "middle of nowhere" and "give the slip (to)" are also defined as
  131. LUs in the appropriate frames ("Isolated_places" and "Evading",
  132. respectively), and their internal structure is not analyzed.
  133. Framenet provides multiple annotated examples of each sense of a
  134. word (i.e. each LU). Moreover, the set of examples
  135. (approximately 20 per LU) illustrates all of the combinatorial
  136. possibilities of the lexical unit.
  137. Each LU is linked to a Frame, and hence to the other words which
  138. evoke that Frame. This makes the FrameNet database similar to a
  139. thesaurus, grouping together semantically similar words.
  140. In the simplest case, frame-evoking words are verbs such as
  141. "fried" in:
  142. "Matilde fried the catfish in a heavy iron skillet."
  143. Sometimes event nouns may evoke a Frame. For example,
  144. "reduction" evokes "Cause_change_of_scalar_position" in:
  145. "...the reduction of debt levels to $665 million from $2.6 billion."
  146. Adjectives may also evoke a Frame. For example, "asleep" may
  147. evoke the "Sleep" frame as in:
  148. "They were asleep for hours."
  149. Many common nouns, such as artifacts like "hat" or "tower",
  150. typically serve as dependents rather than clearly evoking their
  151. own frames.
  152. Details for a specific lexical unit can be obtained using this class's
  153. `lus()` function, which takes an optional regular expression
  154. pattern that will be matched against the name of the lexical unit:
  155. >>> from pprint import pprint
  156. >>> PrettyList(sorted(fn.lus(r'(?i)a little'), key=itemgetter('ID')))
  157. [<lu ID=14733 name=a little.n>, <lu ID=14743 name=a little.adv>, ...]
  158. You can obtain detailed information on a particular LU by calling the
  159. `lu()` function and passing in an LU's 'ID' number:
  160. >>> from pprint import pprint
  161. >>> from nltk.corpus import framenet as fn
  162. >>> fn.lu(256).name
  163. 'foresee.v'
  164. >>> fn.lu(256).definition
  165. 'COD: be aware of beforehand; predict.'
  166. >>> fn.lu(256).frame.name
  167. 'Expectation'
  168. >>> fn.lu(256).lexemes[0].name
  169. 'foresee'
  170. Note that LU names take the form of a dotted string (e.g. "run.v" or "a
  171. little.adv") in which a lemma preceeds the "." and a part of speech
  172. (POS) follows the dot. The lemma may be composed of a single lexeme
  173. (e.g. "run") or of multiple lexemes (e.g. "a little"). The list of
  174. POSs used in the LUs is:
  175. v - verb
  176. n - noun
  177. a - adjective
  178. adv - adverb
  179. prep - preposition
  180. num - numbers
  181. intj - interjection
  182. art - article
  183. c - conjunction
  184. scon - subordinating conjunction
  185. For more detailed information about the info that is contained in the
  186. dict that is returned by the `lu()` function, see the documentation on
  187. the `lu()` function.
  188. -------------------
  189. Annotated Documents
  190. -------------------
  191. The FrameNet corpus contains a small set of annotated documents. A list
  192. of these documents can be obtained by calling the `docs()` function:
  193. >>> from pprint import pprint
  194. >>> from nltk.corpus import framenet as fn
  195. >>> d = fn.docs('BellRinging')[0]
  196. >>> d.corpname
  197. 'PropBank'
  198. >>> d.sentence[49] # doctest: +ELLIPSIS
  199. full-text sentence (...) in BellRinging:
  200. <BLANKLINE>
  201. <BLANKLINE>
  202. [POS] 17 tags
  203. <BLANKLINE>
  204. [POS_tagset] PENN
  205. <BLANKLINE>
  206. [text] + [annotationSet]
  207. <BLANKLINE>
  208. `` I live in hopes that the ringers themselves will be drawn into
  209. ***** ******* *****
  210. Desir Cause_t Cause
  211. [1] [3] [2]
  212. <BLANKLINE>
  213. that fuller life .
  214. ******
  215. Comple
  216. [4]
  217. (Desir=Desiring, Cause_t=Cause_to_make_noise, Cause=Cause_motion, Comple=Completeness)
  218. <BLANKLINE>
  219. >>> d.sentence[49].annotationSet[1] # doctest: +ELLIPSIS
  220. annotation set (...):
  221. <BLANKLINE>
  222. [status] MANUAL
  223. <BLANKLINE>
  224. [LU] (6605) hope.n in Desiring
  225. <BLANKLINE>
  226. [frame] (366) Desiring
  227. <BLANKLINE>
  228. [GF] 2 relations
  229. <BLANKLINE>
  230. [PT] 2 phrases
  231. <BLANKLINE>
  232. [text] + [Target] + [FE] + [Noun]
  233. <BLANKLINE>
  234. `` I live in hopes that the ringers themselves will be drawn into
  235. - ^^^^ ^^ ***** ----------------------------------------------
  236. E supp su Event
  237. <BLANKLINE>
  238. that fuller life .
  239. -----------------
  240. <BLANKLINE>
  241. (E=Experiencer, su=supp)
  242. <BLANKLINE>
  243. <BLANKLINE>