Lecture 21

Syntax and Semantics

Any language, computer or human, has these two components. The syntax of a language is a set of rules about how valid sentences are constructed. The semantics of a langauge concerns the meaning of a sentence. Semantics are very difficult to describe but we can easily describe the sytnax of a language.

The syntax of a language is described by a grammar, grammars must be precise and unambiguous so there is no possibility of misinterpretation.

BNF

One method for describing the syntax of a language is called Backus-Naur Form (BNF).

BNF consists of a set of rules called production rules, each rule has the form:

<symbol> := <symbol1>|<symbol2><symbol3>| C<symbol>

Parts of a BNF production rule:

:=  meaning "is defined as"
|   meaning "or"
< > angle brackets used to surround symbols.
The angle brackets distinguish syntax rule symbols (also called non-terminal symbols) from terminal symbols which are written exactly as they are to be represented. A BNF rule defining a nonterminal has the form:
nonterminal := sequence_of_alternatives consisting
               of strings of terminals or nonterminals
               separated by the meta-symbol |
For example, a BNF production rule for a mini-language might be:
<program> :=  program
                   <declaration_sequence>
               begin
                   <statements_sequence>
               end ;
This shows that a mini-language program consists of the keyword "program" followed by the declaration sequence, then the keyword "begin" and the statements sequence, finally the keyword "end" and a semicolon.

recursive rules are used to define sequences. '..' may be used to simplify long lists of terminal symbols.

<letter> := a..z|A..Z
<digit> := 0..9
<alnum> := <letter>|<digit>
<id> := <letter>|<id><alnum>
This says that an id is a single letter or a letter followed by any number of letters or digits.

Now here is the definition of BNF expressed in BNF: ( non terminals are shown in italics and terminals in bold underlined to avoid confusion)

<rule> := <<id>> := <expression>
<expression> := <term> | <term> | <expression>
<term> := <<id>> | <terminal> | 
          <term><<id>> | <term><terminal>
BNF is not only important for describing syntax rules in books, it is very commonly used (with variants) by syntactic tools. See for example any book on LEX and YACC, the standard UNIX parser generators.
Most Prolog interpreters have an extention that allows you to write BNF grammars. The syntax is slightly different to the one given above, a terminal is enclosed in square brackets, terms are separated by commas and --> is used instead of :=
i.e.
<rule> := <id> --> <expression>
<expression> := <term> | <term> | <expression>
<term> := <id> | [<terminal>] | 
          <term><id> | <term>[<terminal>]

Rules

Query

The BNF rules for a simplified C language:
<program> := <decls>|<functiondef>|<program><decls>|
             <program><functiondef>

<all_stmts> := <stmtlist><stop_stmt>|<stmtlist>

<stmtlist> := <stmtlist><stmt>|<stmt>
<stmt> := <compstmt>|<expr>';'|
         if(<condexpr>)<stmt>|
         if(<condexpr>)<stmt>else<stmt>|
         while(<condexpr>)<stmt>|
         do<stmt>while(condexpr);|
         for(<exprseq>;<condexpr>;<exprseq>)<stmt>|
         switch(val)<casestmts>|;
<compsmt> := {<all_stmts>}|
         {<decls><all_stmts>}
<type> := int|char|unsigned|unsigned char|
          float|short|long|unsigned short|
          unsigned long|double|<usertype>|
          struct <structname> {<decls>}|
          <type> *
<typedef> := typedef <type> <usertype>
<functiondef> := <type> <id>(<args>)<compsmt>|
                 <type> <id>()<compsmt>
<args> := <type> <id>|<args>,<type> <id>
<decls> := <type> <id>;|<type> <id>;<decls>
<exprseq> := <expr>|<exprseq>,<expr>
<stop_stmt> := break;|continue;|
               return;|return <val>;|return(<val>);
<casestmts> := {<cases><default>}
<cases> := <cases><case>|<case>
<case> := case <const>:<stmtlist><stop_stmt>
<default> := default:<stmtlist><stop_stmt>
<expr> := <rhs>|<modify_expr>
<call_expr> := <id>(<arglist>)
<arglist> := <arglist>,<val>|<val>
<modify_expr> := <varname>=<rhs>|*<id>=<rhs>
<rhs> := <binary_expr>|<unary_expr>
<unary_expr> := <simp_expr>|*<id>|&<varname>|
       <call_expr>|<unop><val>|(<type>)<varname>
<binary_expr> := <val><binop><val>

<unop> := +|-|~|!
<binop> :=  -|+|/|*|%|&|&&||||||^
<relop> : <|<=|>|>=
<condexpr> := <val>|<val><relop><val>
<simp_expr> := <varname>|<const>

<varname> := <idlist>.<id>|<id>
<arrayref> := <id><reflist>
<reflist> := [<val>]|<reflist>[<val>]
<idlist> := <id>|<idlist>.<id>