The first stage on the way from text to graphics is to parse the Python code and to collect everything what is there in a form of data structures which could be used for generating graphics. I definitely wanted to use something which has already been developed. It turned out that there is a function in the Python interpreter dynamic library which can help. It's a C function which provides a syntax tree.

While working on this part I wrote a simple utility which prints the data structure produced by the Python interpreter function. Here is an example of a very simple Python file and the produced data structure. Generally it looks nice: there are line numbers and column numbers, the node types correspond to the formal Python grammar specification. However there are some problems too.

The sample code has a few comments - the syntax tree lost them. The encoding line number and column number are wrong. Even the encoding name is wrong: the file says it is latin-1 but the syntax tree reports iso-8859-1. It turned out that the Python interpreter code has a normalization procedure for the encoding spec. There are some problems with multiline string literals as well - the line numbers are not supplied. All these surprises had to be considered in the parser module. On the other hand all the text parsing complexity is gone, all I had to do is to walk the syntax tree and build data structures convenient for generating graphics.