Natural Language Toolkit

...software, data sets and tutorials for natural language processing...

Electronic Grammar: Focus on Grammar

 

From NLTK

Jump to: navigation, search

These tasks focus on the structure of sentences and phrases. In particular, word order in sentences and the structure of noun phrases and prepositional phrases. We'll also look at how a grammar is constructed, and how syntactic (structural) ambiguity can change the meaning of an utterance.

Contents

Part 1: Grammar (2-3 periods)

The order of words in a sentence is important. Changing the order can change the meaning, or produce nonsense, as the following examples show:

  • Strathmore beat Essendon
  • Essendon beat Strathmore
  • Strathmore Essendon beat

Noun Phrases

Noun phrases are expressions that refer to real or fictional entities, like three blind mice, or the present king of France. Noun phrases can be used as the subject or object of a verb. A simple test for a noun phrase is to replace it with a pronoun and see if the resulting sentence is still grammatical, e.g.

  • Jack and Jill went up the hill
  • They went up the hill

Workbook questions:
  1. Write down three noun phrases, then write down the corresponding word classes (use the table with the tagger codes from the previous section to help). For example, given the red chair we would write down D J N
  2. Write down several more of these using the following format:
NP -> D J N
NP -> etc...

Running the Parser

Parsing is a process that takes a sequence of words and fits them into a hierarchical sentence structure. We do parsing when we take a mathematical expression like 1 + 2 * 3 and decide that it means 1 + (2 * 3) - remember BODMAS. We can visualise this nested structure using a diagram as follows:

[[image:Parsing-example.png]

This kind of diagram is called a parse tree. In this section we will generate some parse trees for noun phrases. You will need to go through the following steps:

  1. Start IDLE, and open the editor window using File -> New Window
  2. Cut and paste the following program into the editor, and then save it to a file grammar1.py
  3. Run the program using Run -> Run Module


from nltk import parse_cfg, draw
from nltk.parse import ChartParser, TD_STRATEGY

def parse(sentence, cfg):
    grammar = parse_cfg(cfg)
    parser = ChartParser(grammar, TD_STRATEGY)
    words = sentence.split()
    draw.draw_trees(*parser.get_parse_list(words))

grammar = """
   NP -> P | D J N
   D -> 'a'
   J -> 'red' | 'green'
   N -> 'chair' | 'couch'
"""

phrase = 'a red chair'
parse(phrase, grammar)

It should produce the following output:

image:Noun_phrase.png

Workbook questions:
  1. Extend the grammar to cover the following noun phrases: three red chairs, all girls, ... (make sure that it doesn't allow red three chairs, ...)

Prepositional Phrases

A prepositional phrase specifies properties of an entity or action, such as location, ownership, time, instrument.

  • I sat on the couch
  • Kim is in the kitchen
  • The parents of my friend have arrived
  • I broke the window with the cricket ball
  • The test will be held on Tuesday

Prepositional phrases consist of a preposition followed by a noun phrase. We can write a rule for this: PP -> I NP, and add it to our grammar for noun phrases, e.g.:

grammar = """
  NP -> D N
  PP -> I NP
  I -> 'on' | 'in'

As we have seen, prepositional phrases modify noun phrases, e.g. the cat (on the couch), the window (in the kitchen). So we need to add the rule:

  NP -> NP PP

This rule says that a noun phrase consists of a noun phrase followed by a prepositional phrase. We can apply the rule to itself, for phrases like the cat (on the couch (by the window)). This is a kind of recursion, like what happens when you stand between a pair of mirrors, or view a picture that contains itself.

megamonalisa_recursion.jpg

Workbook questions:
  1. Modify the grammar with these rules, then parse on the couch and in the kitchen, and copy the resulting parse trees into your workbook.

Parsing whole sentences

When you parse a whole sentence, you generally chunk it into parts and interpret it subconsciously. Part of this you've seen in the exercise above, with prepositional phrases such as on the couch and in the kitchen. Earlier this semester we discussed what all those chunks were called: Noun Phrase, Verb Phrase, Prepositional Phrase, Adjective Phrase, etc.

Verb Phrases and Noun Phrases

Verb phrases are generally composed of a verb, plus a noun phrase. For example: chased the dog, where chased is a verb, and the dog is a noun phrase. When writing rules to train the computer to parse the entence, you break those rules down into parts that are easier to understand.

The following code does just that. The grammar below (a very simplistic one!) handles sentences that are made up of a noun phrase (NP) and a verb phrase (VP). The NP is made up of a determiner (Det) and a noun (N), and a VP is made up of a verb (V) followed by another NP.

Once these kinds of rules have been defined and a list of words have been added to the N and V list, the computer can then decide if a sentence is grammatical given what you have trained it to know.

grammar = """
   S -> NP VP
   VP -> V NP
   NP -> Det N
   Det -> 'the'
   N -> 'cat' | 'dog'
   V -> 'chased'
"""

sent = 'the cat chased the dog'
parse(sent, grammar)

Part 2: Ambiguity (1-2 periods)

Earlier in the unit we also discussed structural ambiguity in sentences - where the syntax (structure) of the sentence allowed more than one meaning to be attributed to a sentence.

Some ambiguous sentences:

  • I hopped on the bed with the lace trim.
  • I shot the elephant in my pajamas.
  • I put the box in the table by the window in the kitchen.
  • I saw the girl on the hill with a telescope.
  • I painted the cat on the couch.
  • I drank the water on the table.

These are ambiguous because the prepositional phrase could modify the meaning of the previous noun, or the previous verb.

Workbook questions:
  1. Pick one of the sentences above, or make up one of your own, and explain at least two possible interpretations.

Parsing ambiguous sentences

It is easy to extend a grammar to cover ambiguous sentences. The following grammar has rules which allow a prepositional phrase to modify a verb phrase (i.e. VP PP), or a noun phrase (i.e. NP PP).

grammar = """
   S -> NP VP
   VP -> V NP | VP PP
   NP -> Det N | NP PP
   PP -> P NP
   NP -> 'I'
   Det -> 'the' | 'my'
   N -> 'elephant' | 'pajamas'
   V -> 'shot'
   P -> 'in'
"""

sent = 'I shot the elephant in my pajamas'
parse(sent, grammar)

image:elephant.png

Workbook questions:
  1. The output of this program is shown in the above diagram. Explain the interpretation of each of the trees. Which one is the nonsense one?
  2. Choose one of the other ambiguous sentences we saw above, and modify the grammar so that it can be used to parse the sentence, and produce two or more trees. Copy these into your workbook, and paraphrase each one.
  3. Create a sentence with lots of prepositional phrases, and see how many parse trees you can get. Write down the sentence in your workbook, and specify how many interpretations (parse trees) you found.

Just for fun: highly ambiguous sentences

Example for time flies like an arrow

grammar = """
   S -> NP VP | VP
   PP -> P NP
   NP -> N | Det N | N N | NP PP | N VP
   VP -> V | V NP | VP PP | VP ADVP
   ADVP -> ADV NP
   Det -> 'a' | 'the'
   N -> 'flies' | 'banana' | 'fruit' | 'arrow' | 'time'
   V -> 'like' | 'flies' | 'time'
   P -> 'on' | 'in' | 'by'
   ADV -> 'like'
"""

sent = 'time flies like a arrow'
parse(sent, grammar)

Going Further...

If you've found these materials helpful and want to go further, you might like to start reading the Book on natural language processing. It will teach you Python, some linguistics, and introduce you to the exciting field of computational linguistics. If you enjoy programming you might like to enrol in the National Computer Science Challenge.

Personal tools