The General Problem of Describing Syntax
A language, whether natural- like english, artificial like C, Java, is a set of strings of characters from some alphabet.
The strings of a language are called " Sentences or Statements". So Syntax rules of a language specify which strings characters
from the languages alphabet are in the language.
Formal description of the syntax of programming languages, for simplicity sake , often do not include descriptions of the lowest - 7 synthetic units. The small unit are called "LEXEMES".
Lexemes include numerical literals, operators and special words, among others. We can think of program as strings of Lexemes rather than of characters.
Lexemes are partitioned into group - for example. The names of variables, methods, classes and so forth in a programming language form a group called identifiers.
Each Lexemes group is represented by a name, or token. So, a token of a language is a category of its Lexemes.
Consider the following Java statement:
Lexemes | Tokens |
---|---|
= | equal-sign |
index | indentifier |
2 | int-literal |
* | mult-operator |
count | identifier |
+ | plus-operator |
17 | int-literals |
; | semicolon |
Two distinct ways of defining a language
A) Language Recognizers:
• A recognition device reads input strings of the language and decide whether the input string belong to the language.
Syntax Analyzers: Determine whether the given program or syntactically correct.
B) Language Generator:
• It generate sentences of a language
• people prefers certain forms of generators over recognizers because they can more easily read and understand them.
By contrast, the syntax checking portion of a compiler.(a language recognizer) is not as useful as language description for a programmer because it can be used only in trial- and- error mode.
To determine correct Syntax of a particular statement using a compiler, the programmer can only submit circulated version and not whether the compiler accepts it.
On the other hand, it is often possible to determine whether Syntax of a particular statement is correctly comparing it with the structure of the generator.