2 posts tagged “ironturing”
I've had a few issues with moving from VS2008 beta 2 to the release version, so have not really been able to progress with the code for IronTuring. This has lead me to step back and think a little harder about the syntax I am using in terms of implementation (I know design and implementation should be kept separate, but it should be in the back of your mind) and readability.
My initial syntax was based heavily on my introductory text to Turing machines, Algorithmics, and looked a little like this:
Where statea is the current state, stateb is the state to move to, a is the trigger, b is the character to be written upon activation of the trigger and R (or it could be L) is the direction in which the head should move.
statea stateb (a/b, R)
These statements are then listed, building up a state graph. In simple cases the syntax is concise whilst still clearly separating the various components. It also has the advantage of being quite similar to what most people would be used to.
However, in some edge (or perhaps not so edge) cases it can become a bit of a mess to understand. For instance, consider the following:
Here the intention is that, upon reading the character "(", "/" should be written. Not only is this potentially unclear and hard to read, it also (along with the use or R and L for direction indicators) presents a context-sensitive grammar. "(" could either be an open parenthesis or a character depending on where it appears. This makes writing the tokenizer/parser marginally more difficult.
statea stateb ((//, R)
Below are some examples of various methods to try and get around this ambiguity between characters and states:
I personally think the clearest is with the use of quotation marks. However, this then requires a lot of extra redundant typing. The use of ticks is a compromise between clarity and characters, and I think works fairly well given spacing.
statea stateb (a/b, R)
statea stateb ((//, R)
statea stateb (a / b , R)
statea stateb (( / / , R)statea stateb ('a/'b, R)
statea stateb ('(/'/, R)
statea stateb (''/',, R)
statea stateb ('a / 'b , R)
statea stateb ('( / '/ , R)
statea stateb ('' / ', , R)statea stateb ("a"/"b", R)
statea stateb ("("/"/", R)
statea stateb ("""/",", R)
statea stateb ("a" / "b" , R)
statea stateb ("(" / "/" , R)
statea stateb (""" / "," , R)
I couldn't help thinking, though, that part of the problem was perhaps an overuse of symbols in the first place. Did I really need to stick with convention here? I experimented with removing the excess symbols to give the following two examples, one using ticks to donate characters and create an unambiguous grammar, and one without to easy on repetitive typing. I also used < and > to replace L and R to aid in the unambiguousness of the grammar. I have used a sample program that detects palindromes in an alphabet of 2 characters to demonstrate this, as it helps to see it in context with unequal line lengths etc.
It's honestly very difficult to chose between the two. In terms of readability I think the unticked is marginally better, but with the ticked version we have the bonus of a context-free grammar. Ultimately I think, given the ease in this context in which a context-sensitive grammar could be implemented the unticked version wins out (of course this decision is not at all influenced by my desire, as you will see, to use the tick elsewhere).
mark YES $# $# <
mark movea 'a $# >
movea movea 'a 'a >
movea movea 'b 'b >
movea testa $# $# <
testa return 'a $# <
testa YES $# $# <
testa NO 'b 'b <
mark moveb 'b $# >
moveb moveb 'a 'a >
moveb moveb 'b 'b >
moveb testb $# $# <
testb return 'b $# <
testb YES $# $# <
testb NO 'a 'a <
return mark $# $# >mark YES $# $# <
mark movea a $# >
movea movea a a >
movea movea b b >
movea testa $# $# <
testa return a $# <
testa YES $# $# <
testa NO b b <
mark moveb b $# >
moveb moveb a a >
moveb moveb b b >
moveb testb $# $# <
testb return b $# <
testb YES $# $# <
testb NO a a <
return mark $# $# >
Also in this sample you can see the first glimpse at escaped characters. Almost any character you can have in a .NET unicode string you can read and write to the tape. However, not all of them can be written in the context of the syntax, such as whitespace (which is ignored). To get around this, escaped characters are needed. In this case the use of an escaped # indicates a null, or blank, character. That is, it effectively erases whatever was on the tape at that point. Any uninitialised/unassigned areas of the infinite tape are blank.
The $ symbol was chosen for a couple of reasons. Firstly, because the traditional "\" was too close to "/" for my liking, which already has a meaning as the read/write separator. If "\" was used statements would look like this:
Even with spacing it becomes difficult to read. Secondly, I had already decided that "$" would prefix commands, if just for purely arbitrary reasons.
mark YES (\# / \#, R)
mark movea (a / \# L)
However, the weight of the "$" symbol combined with the weight of the "#" symbol means that null (and other escaped characters) stand out more than others. As a solution to this a lighter prefix is needed. After experimenting with various symbols, and having already decided that the tick was not going to be used to specify a character, I settled upon the tick as the perfect escape symbol. The sample program now looks like this:
mark YES '# '# <
mark movea a '# >
movea movea a a >
movea movea b b >
movea testa '# '# <
testa return a '# <
testa YES '# '# <
testa NO b b <
mark moveb b '# >
moveb moveb a a >
moveb moveb b b >
moveb testb '# '# <
testb return b '# <
testb YES '# '# <
testb NO a a <
return mark '# '# >
A lot of the weight has been removed from the code and it is now arguably more readable.
I've experimented a little with grouping triggers together under states to remove a lot of the repetition, but as of yet haven't come up with a satisfactory syntax. However, I'm fairly sure that the final syntax will use grouping of some form, as it saves on repetitive typing and hence possible errors.
I've been wanting to look into compilers (lexer and parser) for a while now, and recently decided to scratch my itch and design and write the compiler for a toy language I've called IronTuring.
The language defines a simple Turing machine (a variation of a Finite State Automaton) which can then be compiled either into a command line program that accepts the content of the tape as an argument and prints the resultant tape, or a .NET library that does much the same except it will either return the tape as a string or a boolean result depending on the final state. It will compile into CIL, which means that any .NET language will be able to utilise the libraries.
Obviously the Turing machine was never designed to be of practical use, and to write anything more than the odd toy using it would require excessive time and energy when compared to a more feature complete environment. Practical use isn't really the point of this project, it's about how I get there and what I learn along the way, and to maximise that I will be assuming a practical use to maximise the outcome of my design decisions.
So why not implement something with an actual practical use? Well firstly I can't really think of one and secondly I like Turing machines.
