An Overview of The Event Extraction Task in NLP

An Overview of The Event Extraction Task in NLP

Events possess a rich structure that is important for intelligent information access systems (information retrieval, question answering, summarization, etc.). Without information about what happened, where, and to whom, temporal information about an event may not be very useful.

In light of this importance, the event extraction task emerges. In compare to the event detection task, which aims to discover related stories in a continuous stream of news articles, event extraction tries to extract the information about an event that is reported in one or multiple documents.

Task Description

The ACE program provides annotated data and evaluation tools for a variety of information extraction tasks. There are five basic kinds of extraction targets supported by ACE: entities, times, values, relations, and events.

The ACE Event Model

  • An event is something that happens.
  • An event extent is a sentence within which a taggable event is described.
  • An event trigger is the word that most clearly expresses the occurrence of an event (main verb, adjective, past-participle, nouns, and pronouns).
  • Event participants are the Entities that are involved in that event.
  • Event attributes are frequently entities and values within the scope of an event that are not properly participants.
  • Event arguments are event participants and event attributes.
  • An event properties are several properties related to the event e.g., when and if the event really took place. Currently, they are the features (Polarity, Tense, Genericity and Modality).
  • A participant role: each event type and sub-type will have its own set of potential participant roles for the entities which occur within the scopes of its exemplars.

In the ACE model, only “interesting” events (events that fall into one of 34 predefined categories) are extracted.

As a part of this program corpora that support the tasks (entities, relations, events) were developed for English, Chinese, and Arabic. Moreover, the guidelines to annotate an Arabic event extraction corpus according to the ACE model were described in [1].

Methodologies

We distinguish between two main approaches for event extraction, in analogy with the classic distinction that is made in the field of modeling.

Data-Driven Event Extraction

Despite their differences, all approaches focus on discovering statistical relations, i.e., facts that are supported by statistical evidence. Examples of discovered facts are words or concepts that are (statistically) associated with one another. However, statistical relations do not necessarily imply semantically valid relations, nor relations that have proper semantic meaning.

Several examples of the usage of the data-driven approaches for event extraction can be found in the literature. For instance, [2] broke down the ACE task of extracting events into a series of classification sub-tasks, each of which is handled by a machine-learned classifier:

  1. Triggers Identification: finding event triggers in text and assigning them an event type. That was modeled as a word classification task.
  2. Argument identification: determining which entity mentions are arguments of each event mention. That was modeled as a pair-classification task i.e. Each event mention is paired with each of the entity mentions occurring in the same sentence to form a single classification instance.
  3. Attribute assignment: determining the values of the modality, polarity, genericity, and tense attributes for each event mention. A separate classifier was trained for each attribute.
  4. Event coreference: determining which event mentions refer to the same event. Each event mention in a document is paired with every other event mention, and a classifier assigns to each pair of mentions the probability that the paired mentions corefer.

Clustering techniques were also employed to solve this task. For instance, [3] aimed for real-time news event extraction, but focus especially on violence and disaster events.

They developed an event extraction engine, which for each detected violent event produces a frame, whose main slots are: date and location, number of killed and injured, kidnapped people, actors, and type of event.

First, their event extraction system, used linear patterns in order to extract entities that have specific semantic roles in a news cluster. Second, they merged the single extracted pieces into event descriptions via application of information aggregation algorithm.

They used machine learning algorithms for the acquisition of the patterns. However, ML approaches are never 100% accurate, therefore they manually filtered out implausible patterns and added hand-crafted ones.

In [4] they simplified the problem into a sentence-level classification problem. They used machine learning classification methods to differentiate between sentences that describe one or more event and those that do not.

Knowledge-Driven Event Extraction

In contrast to data-driven methods, knowledge-driven models are often based on patterns that express rules representing expert knowledge. It is inherently based on linguistic and lexicographic knowledge, as well as existing human knowledge regarding the contents of the text that is to be processed.

In [5] they aimed to recognize events in the Arabic language. They were interested in the annotation of verbal events only. Although they didn’t depend on huge lexical resources they constructed a minimal set of general and simple hand-crafted rules for time and location expressions recognition.

Another work in the Arabic language is [6]. They considered only verbs and nouns as events while adjectives are less significant. Their system was built using GATE. It identified predefined named entities (names of “people”, “places”, “organization”, and “date”), and the relations between the entities and the defined events.

The extraction of an event consisted of the discovery of links between the “trigger” of the event and its arguments. The extraction of the link is established based on a syntactic analysis “dependency analysis” and of extraction rules exploiting this analysis. To implement this task they used JAPE transducer provided by GATE Toolkit. While the identification of triggers was done by the use of manually collected gazetteers.

Conclusion

The event extraction task aims to extract the information related to the events mentioned in texts. It’s considered useful for many NLP tasks including information retrieval, question answering, summarization, etc.

Extracting events is a complex task consisting of multiple sub-tasks of varying difficulty, involving detection of event triggers, assignment of attributes, identification of arguments and assignment of roles, and determination of event co-reference.

The task has many simplifications and variations in literature. Many methods including data-driven and knowledge-driven ones have been explored to solve this problem.

In this article, we introduced the task of event extraction. If you are interested in this task, you can review the rest of our series on the related topics of even detection:

Do you know that we use all this and other AI technologies in our app? Look at what you’re reading now applied in action. Try our Almeta News app. You can download it from google play: https://play.google.com/store/apps/details?id=io.almeta.almetanewsapp&hl=ar_AR

References

[1] Linguistic Data Consortium. “ACE (Automatic Content Extraction) Arabic Annotation Guidelines for Entities.” (2005).

[2] Ahn, David. “The stages of event extraction.” Proceedings of the Workshop on Annotating and Reasoning about Time and Events. 2006.

[3] Tanev, Hristo, Jakub Piskorski, and Martin Atkinson. “Real-time news event extraction for global monitoring systems.” Joint Research Center of the European Commission, Web and Language Technology Group of IPSC, TP 267.

[4] Naughton, Martina, Nicholas Kushmerick, and Joseph Carthy. “Event extraction from heterogeneous news sources.” proceedings of the AAAI workshop event extraction and synthesis. 2006.

[5] Aliane, Hassina, Wassila Guendouzi, and Amina Mokrani. “Annotating events, time and place expressions in arabic texts.” Proceedings of the International Conference Recent Advances in Natural Language Processing RANLP 2013. 2013.

[6] Hkiri, Emna, Souheyl Mallat, and Mounir Zrigui. “Events automatic extraction from Arabic texts.” Natural Language Processing: Concepts, Methodologies, Tools, and Applications. IGI Global, 2020. 1686-1704.

Further Reading

[1] Hogenboom, Frederik, et al. “An Overview of Event Extraction from Text.” DeRiVE@ ISWC. 2011.

Leave a Reply

Your email address will not be published. Required fields are marked *