Relation Tables
I quite often find myself needing to store relationships between objects and as such have developed a small library for defining relation tables.
What sort of things would you need to store relationships about?
- The similarity between music tracks.
- The number of words spoken in an IM conversation between people.
- The position of a playoff in a tournament.
- How much people feel they trust other people.
This sort of information is perhaps best visualised in a table:
| Colour | Red | Green | Blue |
|---|---|---|---|
| Red | Red | Yellow | Magenta |
| Green | Yellow | Green | Cyan |
| Blue | Magenta | Cyan | Blue |
Here we are storing the relationship of colour combinations. If we want to find out what green combied with blue makes we look across to the row headered green, then down to the column headered blue and find that they combine to make cyan.
A table such as this is essentially a dictionary that takes a sequence as a key. An example C# implimentation would be:
public class RelationTable : Dictionary<IList, TValue>
{
private class SequenceEqualityComparer : IEqualityComparer<IList>
{
public bool Equals(IList x, IList y)
{
if (x.Count != y.Count) return false;
for (int i = 0; i < x.Count; i++)
{
if (!x[i].Equals(y[i])) return false;
}
return true;
}
public int GetHashCode(IList obj)
{
// TODO: Find a better hash function.1
int tally = 0;
foreach (TKey item in obj)
{
int hash = item.GetHashCode();
tally += (hash * hash);
}
return tally;
}
}
public RelationTable() : base(new SequenceEqualityComparer()) { }
}
Notice we had to define our own comparer so that two different objects containing the same values will be considered equal.
This is all well and good, but if you take a closer look at the table you should be able to see that we are actually duplicating a lot of information. Red combined with blue makes magenta, but then so does blue combined with red. This is because the arguments are commutative - the order can change without affecting the result. Because of this we can get away with storing only half the information (well, \frac{n^2+n} {2} to be precise):
| Colour | Red | Green | Blue |
|---|---|---|---|
| Red | Red | Yellow | Magenta |
| Green | Yellow | Green | Cyan |
| Blue | Magenta | Cyan | Blue |
In such a representation we can be more efficient about what we store by saying it is a dictionary that takes an unordered set as a key. An example C# implimentation would be:
public class CommutativeRelationTable : Dictionary<HashSet<TKey>, TValue>
{
public CommutativeRelationTable() : base(new SetEqualityComparer()) { }
public class SetEqualityComparer : IEqualityComparer<HashSet<TKey>>
{
public bool Equals(HashSet<TKey> x, HashSet<TKey> y)
{
return x.All(i => y.Contains(i)) && y.All(i => x.Contains(i));
}
public int GetHashCode(HashSet<TKey> obj)
{
// TODO: Find a better hash function.1
int tally = 0;
foreach (TKey item in obj)
{
int hash = item.GetHashCode();
tally += (hash * hash);
}
return tally;
}
}
}
This adds a little more time to the equality comparison, but makes managing the table a lot easier if you find yourself needing to change values often.
However, what is perhaps a little more interesting is how to model a commutative relation table with dense data over a finite set of orderable keys. If you have such a situation you can model it as a linear array:
| Row: | 0 | 1 | 2 | 3 | ||||||
| Index: | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 |
| Column: | 0 | 1 | 2 | 3 | 1 | 2 | 3 | 2 | 3 | 3 |
The row and column of the diagram above correspond to the row and column index's of the commutative relation table. So, for instance, if you wanted to find the relationship between the 1st and the 3rd element you would look at index 6 of the array. Likewise, if you wanted to find the relationship between the 2nd and 0th element you would look at index 2.
The psuedocode for finding the index given the row and column is as follows:
row ← min(a, b)
column ← max(a, b)
row_start ← row*(size + 1) - 0.5*(row)*(row-1)
column_offset ← column - row
index ← row_start + column_offset
where size is the number of items in the relation
Note that we must first order the key values before we find the index.
Hopefully other people will find this information useful, I know I've certainly been using these classes alot lately.
1 If you know of a better hashing algorithm for these cases people tell me.
