Skip to content

Latest commit



237 lines (187 loc) · 10.8 KB

File metadata and controls

237 lines (187 loc) · 10.8 KB

< Back

  • Rules start with;

    <name defined by the rule> ::=

  • Rules syntax;

Rule Description
` `
( ) Grouping
* 0 or more repetitions, and bind as tightly as possible
+ 1 or more repetitions, and bind as tightly as possible
[] 0 or 1 occurrences (i.e. phrase is optional)
' ' Literal string
`` White space separates tokens
... For lexical definitions, inclusive range of ASCII characters (e.g. "a"..."z")
<phrase> For lexical definitions, informal description of the symbol defined
  • Rules are normally contained on a single line

  • Rules with many alternatives may be formatted alternatively with each line after the first beginning with a vertical bar.

  • Example of 2 rules:

    name      ::=  lc_letter (lc_letter | "_")*
    lc_letter ::=  "a"..."z"
    • Explanation;
      • Note that lc_letter defined on 2nd line is used to define name
      • lc_letter is any lower case letter
      • name must start with any lowercase letter and (and some Built-in functions) be followed by 0 or more lowercase letters or underscores
  • When comment on 1st or 2nd line matches regexp coding[=:]\s*([-\w.]+)

  • E.g.

    # -*- coding: <encoding-name> -*-
    # vim:fileencoding=<encoding-name>
  • Program is divided into logical lines
  • A logical line is constructed and terminated by a parser generated NEWLINE token from one or more physical lines using these rules;
    • \ at EOL explicitly joins lines. e.g.;

      if 1900 < year < 2100 and 1 <= month <= 12 \
        and 1 <= day <= 31 and 0 <= hour < 24 \
        and 0 <= minute < 60 and 0 <= second < 60:   # Looks like a valid date
          return 1
    • Expressions within (), [] or {} implicity join lines. e.g.;

      month_names = ['Januari', 'Februari', 'Maart',      # These are the
             'April',   'Mei',      'Juni',       # Dutch names
             'Juli',    'Augustus', 'September',  # for the months
             'Oktober', 'November', 'December']   # of the year
    • # to EOL is removed from logical line

      • NB: \ at EOL does not continue a comment to next line
    • Lines with only white space are removed from logical line

  • Each line's indentation level is computed using leading spaces and tabs;

    • First, tabs are replaced with b/w 1 to 8 spaces
      • NB: A TabError is raised if a file mixes tabs and spaces such that the meaning depends on the worth of a tab in spaces
      • WARNING: It is unwise to mix spaces and tabs due to varying text editor behaviours
    • The number of leading spaces is the indentation level of the line
    • Indentation does not split over lines with \. The number of spaces preceding the \ is the indentation level.
  • For each line, INDENT & DEDENT tokens are generated using the indentation level and a stack;

    • If the line's indentation level is;
      • = stack-top;
        • Nothing happens
      • > stack-top;
        • The line's indentation level is pushed onto the stack
        • Generate one INDENT
      • < stack-top;
        • NB: It must occurs elsewhere on the stack;
        • Each higher level on the stack is popped off
        • For each popped level, generate a DEDENT
  • Examples of indentation errors;

     def perm(l):                       # error: first line indented
    for i in range(len(l)):             # error: not indented
        s = l[:i] + l[i+1:]
            p = perm(l[:i] + l[i+1:])   # error: unexpected indent
            for x in p:
                    r.append(l[i:i+1] + x)
                return r                # error: inconsistent dedent
  • Since Python v3.0, identifiers are unlimited in length. Case is significant. See PEP 3131 for allowed characters.
False      await      else       import     pass
None       break      except     in         raise
True       class      finally    is         return
and        continue   for        lambda     try
as         def        from       nonlocal   while
assert     del        global     not        with
async      elif       if         or         yield
Classs Description
_* Not imported by from module import *. See import statement
__*__ . System-defined names, which are defined by the interpreter and standard library. WARNING: Use as explicitly documented to avoid breakage without warning.
__* Class-private names. Used for name mangling to help avoid name clashes between “private” attributes of base and derived classes.

BNF definition

  • String literals

    stringliteral   ::=  [stringprefix](shortstring | longstring)
    stringprefix    ::=  "r" | "u" | "R" | "U" | "f" | "F"
                        | "fr" | "Fr" | "fR" | "FR" | "rf" | "rF" | "Rf" | "RF"
    shortstring     ::=  "'" shortstringitem* "'" | '"' shortstringitem* '"'
    longstring      ::=  "'''" longstringitem* "'''" | '"""' longstringitem* '"""'
    shortstringitem ::=  shortstringchar | stringescapeseq
    longstringitem  ::=  longstringchar | stringescapeseq
    shortstringchar ::=  <any source character except "\" or newline or the quote>
    longstringchar  ::=  <any source character except "\">
    stringescapeseq ::=  "\" <any source character>
  • Byte literals

    bytesliteral   ::=  bytesprefix(shortbytes | longbytes)
    bytesprefix    ::=  "b" | "B" | "br" | "Br" | "bR" | "BR" | "rb" | "rB" | "Rb" | "RB"
    shortbytes     ::=  "'" shortbytesitem* "'" | '"' shortbytesitem* '"'
    longbytes      ::=  "'''" longbytesitem* "'''" | '"""' longbytesitem* '"""'
    shortbytesitem ::=  shortbyteschar | bytesescapeseq
    longbytesitem  ::=  longbyteschar | bytesescapeseq
    shortbyteschar ::=  <any ASCII character except "\" or newline or the quote>
    longbyteschar  ::=  <any ASCII character except "\">
    bytesescapeseq ::=  "\" <any ASCII character>


Prefix Description
No prefix : str type literal.
f F f-string formatted string literal
b B byte type literal. ASCII characters only. Bytes >= 128 must be expressed with escapes.
r R String or byte raw string. Backslashes are treated as literal characters

String and byte literal escape sequences

Escape Sequence Meaning
\newline Backslash and newline ignored
\ Backslash ()
' Single quote (')
" Double quote (")
\a ASCII Bell (BEL)
\b ASCII Backspace (BS)
\f ASCII Formfeed (FF)
\n ASCII Linefeed (LF)
\r ASCII Carriage Return (CR)
\t ASCII Horizontal Tab (TAB)
\v ASCII Vertical Tab (VT)
\ooo Character with octal value ooo (1,3)
\xhh Character with hex value hh (2,3)

String literal only escape sequences

Escape Sequence Meaning
\N{name} Character named name in the Unicode database (4)
\uxxxx Character with 16-bit hex value xxxx (5)
\Uxxxxxxxx Character with 32-bit hex value xxxxxxxx (6)
  • Notes:
    1. As in Standard C, up to three octal digits are accepted.
    2. Unlike in Standard C, exactly two hex digits are required.
    3. In a bytes literal, hexadecimal and octal escapes denote the byte with the given value. In a string literal, these escapes denote a Unicode character with the given value.
    4. Changed in version 3.3: Support for name aliases 1 has been added.
    5. Exactly four hex digits are required.
    6. Any Unicode character (and some Built-in functions) be encoded this way. Exactly eight hex digits are required.