Appendix B

Language Grammar


This appendix provides a commented Java grammar. A grammar is a series of rules that have the following form:

nonterminal = meta-expression ;

Any quoted symbols (such as "0") indicate a literal symbol or keyword. The comments should help you understand the more formal grammar.

Meta-expressions use the following additional notation:

The grammar is not exactly BNF (Backus-Naur Form), but it gets the job done. I prefer a more formal notation, but it's Sun's prerogative to choose the grammar specification format.

Note
The following terminal symbols are undefined: DocComment, Identifier, Number, String, and Character. "Undefined" specifically means that the terminal symbols are referenced in a right-hand rule, but never defined as a left-hand quantity.

CompilationUnit =
  PackageStatement? ImportStatement* TypeDeclaration*
;

A compilation unit is the outermost definition in this grammar. An application or applet can be made up of multiple compilation units. The stars after the last two names indicate zero or more occurrences, and the question mark indicates zero or one occurrence.

PackageStatement =
  'package' PackageName ';'
;
ImportStatement =
  'import' PackageName '.' '*' ';'
| 'import' ( ClassName | InterfaceName ) ';'
;
TypeDeclaration =
  ClassDeclaration
| InterfaceDeclaration
| ';'
;

This rule forces all fields and methods to appear within a class. In C, this rule also contains variable and function definition options:

ClassDeclaration =
  Modifier* 'class' Identifier
  ('extends' ClassName)?
  ('implements' InterfaceName (',' InterfaceName)*)?
  '{' FieldDeclaration* '}'
;

This rule looks a little confusing because of its optional notation. A class declaration may have one or more modifiers before the keyword class. The extends clause and interface clause are optional. If the interface clause is present, there may be multiple comma-separated interface names. The class must be followed by braces, but there do not have to be any fields defined within them.

InterfaceDeclaration =
  Modifier* 'interface' Identifier
  ('extends' InterfaceName (',' InterfaceName)*)?
  '{' FieldDeclaration* '}'
;

Interface declarations are similar to classes. The main difference between them is that interface declarations can extend one or more existing interfaces.

FieldDeclaration =
  DocComment? MethodDeclaration
| DocComment? ConstructorDeclaration
| DocComment? VarableDeclaration
| StaticInitializer
| ';'
;

DocComment is an undefined terminal. It takes the form of a multiline comment beginning with /** and ending with the standard */.

MethodDeclaration =
  Modifier* Type Identifier '(' ParameterList? ')' ( '[' ']' )*
  ( '{' Statement* '}' | ';' )
;

Grammars enable syntactic constructs that cause the compiler to issue semantic errors. Notice that the body of a method is optional. Syntactically, this is correct. Semantically, however, this is correct only if a native or abstract modifier is present.

ConstructorDeclaration =
  Modifier* Identifier '(' ParameterList? ')'
  '{' Statement* '}'
;
VariableDeclaration =
  Modifier* Type VariableDeclarator (',' VariableDeclarator)* ';'
;
VariableDeclarator =
  Identifier ('[' ']')* ('=' VariableInitializer)?
;

A variable declarator may specify an array: int name[]. It also is legal for a Type to specify an array: int[] name. Either form is correct.

VariableInitializer =
  Expression
| '{' (VariableInitializer ( ',' VariableInitializer )* ','? )? '}'
;

The second rule is for array initializations:

int x[] = { 1, 2, 3, 5, 9 };

The preceding statement creates an array of integers with a length of 5.

StaticInitializer =
  'static' '{' Statement* '}'
;

You used a static initializer in Chapter 10, "Native Methods and Java," to load a native library:

ParameterList =
  Parameter (',' Parameter)*
;
Parameter =
  Type Identifier ('[' ']')*
;
Statement =
  VariableDeclaration
| Expression ';'
| '{' Statement* '}'
| 'if' '(' Expression ')' Statement ('else' Statement)?
| 'while' '(' Expression ')' Statement
| 'do' Statement 'while' '(' Expression ')' ';'
| 'for' '(' (VariableDeclaration | Expression ';' | ';')
            Expression? ';' Expression?')' Statement
| 'try' Statement ('catch' '(' Parameter ')' Statement)*
   ('finally' Statement)?
| 'switch' '(' Expression ')' '{' Statement* '}'
| 'synchronized' '(' Expression ')' Statement
| 'return' Expression? ';'
| 'throw' Expression ';'
| 'case' Expression ':'
| 'default' ':'
| Identifier ':' Statement
| 'break' Identifier? ';'
| 'continue' Identifer? ';'
| ';'
;

Unlike C, break and continue have an optional identifier. This enables branching to a label. For loops may declare a new variable just as in C++. Notice that each loop expression is optional.

Several control statements (if, while, and for) specify an expression in parentheses. Semantically, the expression must evaluate to a boolean type or an error is issued.

Expression =
  Expression '+' Expression
| Expression '-' Expression
| Expression '*' Expression
| Expression '/' Expression
| Expression '%' Expression
| Expression '^' Expression
| Expression '&' Expression
| Expression '|' Expression
| Expression '&&' Expression
| Expression '||' Expression
| Expression '<<' Expression
| Expression '>>' Expression
| Expression '>>>' Expression
| Expression '=' Expression
| Expression '+=' Expression
| Expression '-=' Expression
| Expression '*=' Expression
| Expression '/=' Expression
| Expression '%=' Expression
| Expression '^=' Expression
| Expression '&=' Expression
| Expression '|=' Expression
| Expression '<<=' Expression
| Expression '>>=' Expression
| Expression '>>>=' Expression
| Expression '<' Expression
| Expression '>' Expression
| Expression '<=' Expression
| Expression '>=' Expression
| Expression '==' Expression
| Expression '!=' Expression
| Expression '.' Expression
| Expression ',' Expression
| Expression 'indtanceof' ( ClassName | InterfaceName )
| Expression '?' Expression ':' Expression
| ''++'' Expression
| ''-''Expression
| '++' Expression
| '-'Expression
| Expression '++'
| Expression '-'
| '-' Expression
| '!' Expression
| '~' Expression
| '('Expression ')'
| '(' Type ')' Expression
| Expression '(' ArgList? ')'
| 'new' ClassName '(' ArgList? ')'
| 'new' TypeSpecifier ( '[' Expression ']' )+ ('[' ']')*
| 'new' '(' Expression ')'
| 'true'
| 'false'
| 'null'
| 'super'
| 'this'
| Identifier
| Number
| String
| Character
;

Comparison expressions always evaluate to a boolean expression.

Declaring new arrays can be confusing. The syntax states that there must be a new keyword followed by a type and one or more defined dimensions: new int[2][3]. A trailing undefined dimension is also allowed: new int[2][3][]. The following is not legal because there must be one or more defined dimensions: new int[].

ArgList =
  Expression (',' Expression )*
;
Type =
  TypeSpecifier ('[' ']')*
;

Here is the second method for declaring an array: int[] name.

TypeSpecifier =
  'boolean'
| 'byte'
| 'char'
| 'short'
| 'int'
| 'float'
| 'long'
| 'double'
| ClassName
| InterfaceName
;
Modifier =
  'public'
| 'private'
| 'protected'
| 'static'
| 'final'
| 'native'
| 'synchronized'
| 'abstract'
| 'threadsafe'
| 'transient'
;
PackageName =
  Identifier
| PackageName '.' Identifier
;
ClassName =
  Identifier
| PackageName '.' Identifier
;
InterfcaeName =
  Identifer
| PackageName '.' Identifier
;