- Uses BNF (Backus–Naur Form) Notation
- Lexical analysis
-
Rules start with;
<name defined by the rule>
::=
-
Rules syntax;
Rule | Description |
---|---|
` | ` |
( ) |
Grouping |
* |
0 or more repetitions, and bind as tightly as possible |
+ |
1 or more repetitions, and bind as tightly as possible |
[] |
0 or 1 occurrences (i.e. phrase is optional) |
' ' |
Literal string |
`` | White space separates tokens |
... |
For lexical definitions, inclusive range of ASCII characters (e.g. "a"..."z") |
< phrase> |
For lexical definitions, informal description of the symbol defined |
-
Rules are normally contained on a single line
-
Rules with many alternatives may be formatted alternatively with each line after the first beginning with a vertical bar.
-
Example of 2 rules:
name ::= lc_letter (lc_letter | "_")* lc_letter ::= "a"..."z"
- Explanation;
- Note that
lc_letter
defined on 2nd line is used to definename
lc_letter
is any lower case lettername
must start with any lowercase letter and (and some Built-in functions) be followed by 0 or more lowercase letters or underscores
- Note that
- Explanation;
-
When comment on 1st or 2nd line matches regexp
coding[=:]\s*([-\w.]+)
-
E.g.
# -*- coding: <encoding-name> -*- # vim:fileencoding=<encoding-name>
- Program is divided into
logical lines
- A
logical line
is constructed and terminated by a parser generatedNEWLINE
token from one or more physical lines using these rules;-
\
at EOL explicitly joins lines. e.g.;if 1900 < year < 2100 and 1 <= month <= 12 \ and 1 <= day <= 31 and 0 <= hour < 24 \ and 0 <= minute < 60 and 0 <= second < 60: # Looks like a valid date return 1
-
Expressions within
()
,[]
or{}
implicity join lines. e.g.;month_names = ['Januari', 'Februari', 'Maart', # These are the 'April', 'Mei', 'Juni', # Dutch names 'Juli', 'Augustus', 'September', # for the months 'Oktober', 'November', 'December'] # of the year
-
#
to EOL is removed fromlogical line
- NB:
\
at EOL does not continue a comment to next line
- NB:
-
Lines with only white space are removed from
logical line
-
-
Each line's
indentation level
is computed using leading spaces and tabs;- First, tabs are replaced with b/w 1 to 8 spaces
- NB: A
TabError
is raised if a file mixes tabs and spaces such that the meaning depends on the worth of a tab in spaces - WARNING: It is unwise to mix spaces and tabs due to varying text editor behaviours
- NB: A
- The number of leading spaces is the indentation level of the line
- Indentation does not split over lines with
\
. The number of spaces preceding the\
is the indentation level.
- First, tabs are replaced with b/w 1 to 8 spaces
-
For each line,
INDENT
&DEDENT
tokens are generated using theindentation level
and a stack;- If the line's
indentation level
is;=
stack-top;- Nothing happens
>
stack-top;- The line's
indentation level
is pushed onto the stack - Generate one
INDENT
- The line's
<
stack-top;- NB: It must occurs elsewhere on the stack;
- Each higher level on the stack is popped off
- For each popped level, generate a
DEDENT
- If the line's
-
Examples of indentation errors;
def perm(l): # error: first line indented for i in range(len(l)): # error: not indented s = l[:i] + l[i+1:] p = perm(l[:i] + l[i+1:]) # error: unexpected indent for x in p: r.append(l[i:i+1] + x) return r # error: inconsistent dedent
- Since Python v3.0, identifiers are unlimited in length. Case is significant. See PEP 3131 for allowed characters.
False await else import pass
None break except in raise
True class finally is return
and continue for lambda try
as def from nonlocal while
assert del global not with
async elif if or yield
Classs | Description |
---|---|
_* |
Not imported by from module import * . See import statement |
__*__ . |
System-defined names, which are defined by the interpreter and standard library. WARNING: Use as explicitly documented to avoid breakage without warning. |
__* |
Class-private names. Used for name mangling to help avoid name clashes between “private” attributes of base and derived classes. |
-
String literals
stringliteral ::= [stringprefix](shortstring | longstring) stringprefix ::= "r" | "u" | "R" | "U" | "f" | "F" | "fr" | "Fr" | "fR" | "FR" | "rf" | "rF" | "Rf" | "RF" shortstring ::= "'" shortstringitem* "'" | '"' shortstringitem* '"' longstring ::= "'''" longstringitem* "'''" | '"""' longstringitem* '"""' shortstringitem ::= shortstringchar | stringescapeseq longstringitem ::= longstringchar | stringescapeseq shortstringchar ::= <any source character except "\" or newline or the quote> longstringchar ::= <any source character except "\"> stringescapeseq ::= "\" <any source character>
-
Byte literals
bytesliteral ::= bytesprefix(shortbytes | longbytes) bytesprefix ::= "b" | "B" | "br" | "Br" | "bR" | "BR" | "rb" | "rB" | "Rb" | "RB" shortbytes ::= "'" shortbytesitem* "'" | '"' shortbytesitem* '"' longbytes ::= "'''" longbytesitem* "'''" | '"""' longbytesitem* '"""' shortbytesitem ::= shortbyteschar | bytesescapeseq longbytesitem ::= longbyteschar | bytesescapeseq shortbyteschar ::= <any ASCII character except "\" or newline or the quote> longbyteschar ::= <any ASCII character except "\"> bytesescapeseq ::= "\" <any ASCII character>
Prefix | Description |
---|---|
No prefix : | str type literal. |
f F |
f-string formatted string literal |
b B |
byte type literal. ASCII characters only. Bytes >= 128 must be expressed with escapes. |
r R |
String or byte raw string . Backslashes are treated as literal characters |
Escape Sequence | Meaning |
---|---|
\newline | Backslash and newline ignored |
\ | Backslash () |
' | Single quote (') |
" | Double quote (") |
\a | ASCII Bell (BEL) |
\b | ASCII Backspace (BS) |
\f | ASCII Formfeed (FF) |
\n | ASCII Linefeed (LF) |
\r | ASCII Carriage Return (CR) |
\t | ASCII Horizontal Tab (TAB) |
\v | ASCII Vertical Tab (VT) |
\ooo | Character with octal value ooo (1,3) |
\xhh | Character with hex value hh (2,3) |
Escape Sequence | Meaning |
---|---|
\N{name} | Character named name in the Unicode database (4) |
\uxxxx | Character with 16-bit hex value xxxx (5) |
\Uxxxxxxxx | Character with 32-bit hex value xxxxxxxx (6) |
- Notes:
- As in Standard C, up to three octal digits are accepted.
- Unlike in Standard C, exactly two hex digits are required.
- In a bytes literal, hexadecimal and octal escapes denote the byte with the given value. In a string literal, these escapes denote a Unicode character with the given value.
- Changed in version 3.3: Support for name aliases 1 has been added.
- Exactly four hex digits are required.
- Any Unicode character (and some Built-in functions) be encoded this way. Exactly eight hex digits are required.