18 posts tagged “programming”
I quite often find myself needing to store relationships between objects and as such have developed a small library for defining relation tables.
What sort of things would you need to store relationships about?
- The similarity between music tracks.
- The number of words spoken in an IM conversation between people.
- The position of a playoff in a tournament.
- How much people feel they trust other people.
This sort of information is perhaps best visualised in a table:
| Colour | Red | Green | Blue |
|---|---|---|---|
| Red | Red | Yellow | Magenta |
| Green | Yellow | Green | Cyan |
| Blue | Magenta | Cyan | Blue |
Here we are storing the relationship of colour combinations. If we want to find out what green combied with blue makes we look across to the row headered green, then down to the column headered blue and find that they combine to make cyan.
A table such as this is essentially a dictionary that takes a sequence as a key. An example C# implimentation would be:
public class RelationTable : Dictionary<IList, TValue>
{
private class SequenceEqualityComparer : IEqualityComparer<IList>
{
public bool Equals(IList x, IList y)
{
if (x.Count != y.Count) return false;
for (int i = 0; i < x.Count; i++)
{
if (!x[i].Equals(y[i])) return false;
}
return true;
}
public int GetHashCode(IList obj)
{
// TODO: Find a better hash function.1
int tally = 0;
foreach (TKey item in obj)
{
int hash = item.GetHashCode();
tally += (hash * hash);
}
return tally;
}
}
public RelationTable() : base(new SequenceEqualityComparer()) { }
}
Notice we had to define our own comparer so that two different objects containing the same values will be considered equal.
This is all well and good, but if you take a closer look at the table you should be able to see that we are actually duplicating a lot of information. Red combined with blue makes magenta, but then so does blue combined with red. This is because the arguments are commutative - the order can change without affecting the result. Because of this we can get away with storing only half the information (well, \frac{n^2+n} {2} to be precise):
| Colour | Red | Green | Blue |
|---|---|---|---|
| Red | Red | Yellow | Magenta |
| Green | Yellow | Green | Cyan |
| Blue | Magenta | Cyan | Blue |
In such a representation we can be more efficient about what we store by saying it is a dictionary that takes an unordered set as a key. An example C# implimentation would be:
public class CommutativeRelationTable : Dictionary<HashSet<TKey>, TValue>
{
public CommutativeRelationTable() : base(new SetEqualityComparer()) { }
public class SetEqualityComparer : IEqualityComparer<HashSet<TKey>>
{
public bool Equals(HashSet<TKey> x, HashSet<TKey> y)
{
return x.All(i => y.Contains(i)) && y.All(i => x.Contains(i));
}
public int GetHashCode(HashSet<TKey> obj)
{
// TODO: Find a better hash function.1
int tally = 0;
foreach (TKey item in obj)
{
int hash = item.GetHashCode();
tally += (hash * hash);
}
return tally;
}
}
}
This adds a little more time to the equality comparison, but makes managing the table a lot easier if you find yourself needing to change values often.
However, what is perhaps a little more interesting is how to model a commutative relation table with dense data over a finite set of orderable keys. If you have such a situation you can model it as a linear array:
| Row: | 0 | 1 | 2 | 3 | ||||||
| Index: | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 |
| Column: | 0 | 1 | 2 | 3 | 1 | 2 | 3 | 2 | 3 | 3 |
The row and column of the diagram above correspond to the row and column index's of the commutative relation table. So, for instance, if you wanted to find the relationship between the 1st and the 3rd element you would look at index 6 of the array. Likewise, if you wanted to find the relationship between the 2nd and 0th element you would look at index 2.
The psuedocode for finding the index given the row and column is as follows:
row ← min(a, b)
column ← max(a, b)
row_start ← row*(size + 1) - 0.5*(row)*(row-1)
column_offset ← column - row
index ← row_start + column_offset
where size is the number of items in the relation
Note that we must first order the key values before we find the index.
Hopefully other people will find this information useful, I know I've certainly been using these classes alot lately.
1 If you know of a better hashing algorithm for these cases people tell me.
As I hinted in the update in my last post, I discovered a very nice addition to Gmail 2.0 which allows for much better Greasemonkey integration, meaning it's possible for me to update the script.
There are now two different versions, one for Firefox and one for Safari*. The one for Firefox will work with both Gmail 1.0 and Gmail 2.0. However, the one for Safari will unfortunately only work with Gmail 1.0 as the integration point isn't exposed in Safari.
I'll work on better integration (rather than a popup window) sometime later this week. However now I really need to sleep.
* Why two different scripts rather than checking for the presence of the API? It comes down to the differences between the implimentation of Greasemonkey. Greasemonkey uses unsafeWindow whereas Greasekit and Opera use window.
A while ago I wrote a plugin for Firefox that added a link to gmail enabling you to easily delegate emails to Todoist. Unfortunately the update of Gmail to 2.0 broke the plugin. Fortunately you can still use the old interface, and I suspect that's what a few people have been doing.
I've looked a few times into how to update it but I could never get the new interfact (turns out it doesn't work with English (U.K.)). Finally today I got it working but there is a bit problem. Gmail 2.0 preloads the messages, making it quicker to view messages but there is also now no request when you view an email, which the plugin relied upon to know when an email was being looked at and hence to inject the button.
It's probably possible to do some fancy injecting, function overriding or specific browser plugin tricks. However, all of those are beyond my immediate capabilities and I would have to either trawl through and understand Gmails compressed code or learn a heck of a lot more about plugins.
As an apology, here is a greasemonkey script which does what the old code did (minus the greybox stuff, which I'll probably add sometime soon). It can be used with Firefox's Greasemonkey plugin, Safari's Greasekit plugin or Opera 8's user scripting functionality (It probably works with some IE plugin, but I can't find or get one to work).
Update: I just discovered Google have release an experimental API. It now works with Gmail 2.0 in Firefox.
Quite often I end up doing something like this in my code:
This covers every value in the ReturnType enum, yet (for a very good reason that I forgot) the compiler isn't able to detect that, because of this, returnType will always be initialized after the switch statement. The way I used to get around this was by using:Type returnType;
switch (function.ReturnType)
{
case ReturnType.Boolean :
returnType = typeof(bool);
break;
case ReturnType.State:
returnType = stateEnum;
break;
case ReturnType.Tape:
returnType = typeof(Tape);
break;
}
The problem with this is that it isn't obvious what the last "default" condition is. Then today I had a revelation (one I'm sure everyone else had a long time ago). It now looks like this:...
case ReturnType.State:
returnType = typeof(stateEnum);
break;
default:
returnType = typeof(Tape);
break;
Now not only is returnType seen to be initialized at every point it needs to be and every condition is clear, but we get a debug message if we add an item to the enum and forget to handle it. Okay, it's not as good as a compile time message but it's the next best thing.Type returnType;
switch (function.ReturnType)
{
case ReturnType.Boolean :
returnType = typeof(bool);
break;
case ReturnType.State:
returnType = stateEnum;
break;
case ReturnType.Tape:
returnType = typeof(Tape);
break;
default:
Debug.Assert(false, "Not implemented all the enums.");
return;
}
I've tried in the past to work out what a valid name looks like in CIL so I can implement it in my Turing.Net compiler and never got very far. Today I decided to just knuckle down and plow through the ECMA Specification. It's actually rather interesting, and I eventually came across section 8.5.1 Valid names*:
When you look at the standard and normalized forms it's a little scary, but it turns out the framework makes it all very easy. First, normalization:CLS Rule 4: Assemblies shall follow Annex 7 of Technical Report 15 of the Unicode Standard 3.0 governing
the set of characters permitted to start and be included in identifiers, available on-line at
http://www.unicode.org/unicode/reports/tr15/tr15-18.html. Identifiers shall be in the canonical format defined
by Unicode Normalization Form C. For CLS purposes, two identifiers are the same if their lowercase mappings
(as specified by the Unicode locale-insensitive, one-to-one lowercase mappings) are the same. That is, for two
identifiers to be considered different under the CLS they shall differ in more than simply their case. However,
in order to override an inherited definition the CLI requires the precise encoding of the original declaration be
used.
Told you it was easy. It defaults to Form C, but you can specify other forms too. As for checking whether a string conforms to the identifier we can use the very simple regular expression:s = s.Normalize();
The \p basically mean "match anything of this Unicode class" and if you look at the specification given on the Unicode website it matches up very nicely.start = (\p{Lu}|\p{Ll}|\p{Lt}|\p{Lm}|\p{Lo}|\p{Nl})
extend = (\p{Mn}|\p{Mc}|\p{Nd}|\p{Pc}|\p{Cf})
start(start|extend)*
* It turns out there is a formal Common Language Specification which it comes under, as opposed to the watered down one I'd previously only been able to find.
I've been reading through the Ecma specification for the CLI in order to find out some things for Turing.Net and came across an interesting little fact, represented by the following code:
This is because, under the contract for type value, equality identity implies equality. However, "two floating point NaNs are defined by IEC 60559:1989 to always compare as unequal."* Well I thought it was interesting.Console.WriteLine(float.NaN.Equals(float.NaN)); // True
Console.WriteLine(float.NaN == float.NaN); // False
* http://www.ecma-international.org/publications/files/ECMA-ST/Ecma-335.pdf p21 ac. Feb 2008
Previously I've posted about a technique to create conditional methods on generics, that is methods on a generic class that only appear if the generic type impliments some interface. Perhaps not the best solution to the problem, but an interesting one.
Last night I had a brainwave - perhaps you can create a much cleaner conditional method with extension methods. Extension methods were added to C# 3.0 as a syntactic sugar for turning:
intoDateTime.Now.Funkify(10);
such that you write the first line, but the compiler treats it as the second.ICR.ExtensionClasses.Funkify(DateTime.Now, 10);
Here's how you would use extension methods to create a conditional method on a generic class:
Then you would use it like so:public class MyClass<T>
{
public T Item1 { get; set; }
public T Item2 { get; set; }
}public static class MyClassExtensions
{
public static void MyMethod(this MyClass<IComparable> myClass)
{
return myClass.Item1.CompareTo(myClass.Item2);
}
}
There are a few limitations on how this works. Firstly the conditional method can only operate on the public members of the class, as it's really a static call from outside the class passing the instance in as a parameter. Secondly, when you are using it you can only use it if you cast the generic type to the interface type. This is perhaps a bit of a bonus - it makes the code arguably easier to understand (it makes it seem less arbitrary what classes have the method and what don't).MyClass<IComparable> comparableClass = new MyClass<ComparableType>();
int i = comparableClass.MyMethod();
My favourite feature of Visual Studio 2008, I've decided, is automatic properties. I say they're a feature of VS2008 because you can use them in code that targets .NET 2.0 - after all the compiler just expands them out like a code snippet. I've been forced to work with VS2005 recently (for the XNA Studio) and I really miss them, my code is begining to look like a mess with the expanded properties everywhere.
If you've not come across automatic properties before, they take this:
and turn it into:private string name;
/// <summary>
/// The name of the employee.
/// </summary>
public string Name
{
get { return name; }
set { name = value; }
}
A lot cleaner and more readable! The compiler generates the backing variable for you, which is great because most of the time properties are just simple accessors for fields. Of course, if you need to do more exciting stuff like validation you can still use the full property./// <summary>
/// The name of the employee.
/// </summary>
public string Name { get; set; }
Even if you know about automatic properties, you may not have come across this very useful trick:
If you've got an automatic backing variable, there is no way to assign a value to it, so you can't create read-only variables by ommiting the set accessor. The solution to this is to make the set accessor private such that the property can only be set from inside the containing class. What I really like about this is that it means, with a proper naming convention, it's now more obvious when you're modifying a property rather than a field. But then I guess it's not less obvious when you're modifying a read-only property. Swings and roundabouts./// <summary>
/// The amount of stock left.
/// </summary>
public int StockCount { get; private set; }
This is all a very long winded way of saying "I miss you automatic properties, but I'll be home soon, I promise!"
As I move more and more towards a functional style of programming I often find myself using referentially transparent functions in amongst the rest of my C# code. A referentially transparent function is basically one that guarantees that it has no side-effects (modifies anything outside of the function) such that every time you call it with a particular argument it will give you the same result.
In these cases it is trivial to optimise it using momoization - that is caching the results for given inputs. However, it is a lot of redundant code to repeat the same caching mechanism again and again, and actually quite difficult when you start to try and get the correct balance between time and space with sophisticated caching algorithms. The ideal place for this would be in the CLR which has much better knowledge about the current demands on memory (think garbage collection).
To this end, I tentatively propose const methods. These are basically methods that are guaranteed to be referentially transparent by denying access to variables and non-const methods outside of the function (of course reference to constants are allowed). These could then be optimised using several different techniques. A const method is implied to be static (an instance method would allow access to the mutable state of the instance object - this.MutableProperty).
Below is an example syntax:
public const int Factorial(int x)
{
return x*(x+1)/2;
}
The lack of reference to variables outside of the function doesn't restrict the use of variables inside the function:
private const int Factorial(int x)
{
int tally = 0;
for (int i = 1; i <= x; i++)
{
tally += i;
}
return tally;
}
Note that this is slightly different to a const method in C++ which, afaik (though I may be wrong), is an assurance that it won't modify any state, but it can still read from it (which leads to the confusing situation that a const method is not referentially transparent - that is a const method isn't constant for a given input).
A lot of the library functions would have to be modified - though only to add the "const" keyword in most cases (some could be trivially re-written to be const).
Over the past few weeks I've been building a steadily growing collection of helper functions. I've gotten to the stage where I've had to write enough reduce functions, manually fill enough arrays and rewritten enough foreach as for loops to get sick of it and begin writing my own helper library.
Below is what I would say is the most useful to others that I would like to share. The purpose of it is to allow one to utilise a foreach loop over an enumerable collection whilst still being able to get at the index. In the past I have often found myself needing to use a for loop instead of a foreach loop simply because I needed the index of the item in the collection. I didn't need to do anything fancy with it, I didn't need to increment it by more than 1, I just needed to be able to refer to it.
A method is written that extends anything that implements IEnumerable, returning a custom enumerator. All this enumerator does is itself enumerate over the collection, yielding the current item along with it's index. Below is a sample usage.using System;
using System.Collections.Generic;
using System.Collections;namespace ICR
{
public static class Collections
{
public static IEnumerable<IndexValuePair<T>> GetIndexValuePairs<T>(this IEnumerable<T> collection)
{
return new IndexValuePairsCollection<T>(collection);
}public sealed class IndexValuePair<T>
{
public int Index { get; private set; }
public T Value { get; private set; }public IndexValuePair(int index, T value)
{
Index = index;
Value = value;
}
}private class IndexValuePairsCollection<T> : IEnumerable<IndexValuePair<T>>
{
private IEnumerable<T> Collection { get; set; }public IndexValuePairsCollection(IEnumerable<T> collection)
{
Collection = collection;
}public IEnumerator<IndexValuePair<T>> GetEnumerator()
{
int index = 0;
foreach (T item in Collection)
{
yield return new IndexValuePair<T>(index++, item);
}
}IEnumerator IEnumerable.GetEnumerator()
{
return ((IndexValuePairsCollection<T>)this).GetEnumerator();
}
}}
}
Sequence.Range is another helper function I have written that returns an object that enumerates over ints/floats/doubles/characters in a given range.foreach (var character in Sequence.Range('a', 'z').GetIndexValuePairs())
{
Console.WriteLine("{0}: {1}", character.Index, character.Value);
}
This code is concise with a clear intent. Compare with the alternative.
This also highlights another advantage - in the second example the sequence had to be stored in a local variable to allow for the multiple references to it (characters.Count and characters[index]). We did not have to do this in the first, allowing for a tighter and cleaner scope control.List<char> characters = (List<char>)Sequence.Range('a', 'z');
for (int index = 0; index < characters.Count; index++)
{
Console.WriteLine("{0}: {1}", index, characters[index]);
}
Our solution also allows one to get the index of any IEnumerable, something which is even more untidy to do with a for loop. (N.B. This is not always appropriate, but depends on the context of the IEnumerable)
Feel free to use the code yourself if you think it would help. Any suggestions and feedback would be welcome,
