A few years back whilst writing a file comparison and backup program, I ran into an issue whereby doing file and difference matching became very slow: searching and sorting through lists of objects was becoming very computationally expensive. In order to solve this problem, I decided to write a container class which allowed for efficient retrieval searching, and matching.
Even in the light of improvements to Delphi container classes in recent years, it is still a useful addition to my library code.
The source code for this, related units and a test app can be found here.
TIndexedStore: The Basics.
You can use TIndexedStore in much the same way as you might use a TList or other unordered container: There are four basic functions for adding an removing items from the store. It is assumed that the items being added are TObject, or can be cast to TObject.
- AddItem: This adds an Item, indicates whether it has succeeded, and gives you a TItemRec structure representing the item.
- DeleteItem: This takes a TItemRec structure representing an item, and removes it from the store, indicating whether it successfully did so. It does not delete the underlying item.
- GetAnItem: This gets the TItemRec structure for the first item in the store.
- GetAnotherItem: Given a TItemRec structure, this gets the TItemRec structure for the next item in the store.
These four functions allow for basic insertion, removal and iteration. You can assume when using these functions that items are in something like a linked list, and the order is (un)changed in much the same way as list insertion and removal.
The indexed store has a lot of internal structures which can be created when you add and remove items. TItemRec has links to not only the item you added, but also a whole bunch of internal structures.
You should not modify the internal fields of TItemRec. Use the Item field to retrieve the object you originally added. If you want to change an item, then instead of modifying TItemRec.Item, simply delete the old item and insert the new one.
Similarly, do not create, destroy or copy instances of TItemRec. It is just a linking class, and its lifetime is managed by the parent indexed store. You can assume it is valid as long as you have not issued a call to remove it.
As it is, the store is slightly less useful than a TList object. However, with the addition of indexes, it can be made much more powerful. Indexes allow for:
- Retrieval that is nearly as fast as TDictionary (hash table) access, at lg(n) cost.
- The ability to supply some ordering criterion, and to move through the collection of items according to that criterion. (First, next, previous, last).
- Addition of an Index at n lg(n) cost.
- Addition of items in the index at lg(n) cost.
Internally, each index is given by a balanced binary (AVL) tree, with one node per item added to the store.
The functions used to manage indexes are:
- AddIndex: Add an index, providing an Index Tag and an Index node Class Type.
- RemoveIndex: Remove an index, given the tag.
- FirstByIndex, LastByIndex: Find the first or last item according to the ordering given by a particular index.
- NextByIndex, PreviousByIndex: Find the next or previous item according to the ordering given by a particular index.
When you add or remove an index, you specify a tag. In most cases, these can be a simple as the numbers 1,2,3, or they can be cast to an enumerated type, or a pointer, as you require. This allows you to have multiple indexes, with the same basic comparison criterion, but different behaviour or selection on different fields depending on the tag.
I decided not to use templates for this particular container, so indexes have to be implemented via polymorphism. This is potentially slightly more typing, but many Delphi programmers are not so familiar with templates, and it makes debugging the code somewhat easier. Ambitious programmers might like to think of ways to reduce the code size using templates.
Index node class types, and implementing an index node.
In order to implement an Index, you need to provide some behaviour that allows items to be distinguished from another, via an order. This is very similar to (for example) the TListSortCompareFunc that you give to a TList to allow it to sort items. To do this, you create an index node class:
- Decide whether the index requires unique values, or whether it allows duplicates. This then allows you to derive your class from TIndexNode or TDuplicateValIndexNode.
- Then simply override the function CompareItems to distinguish between the two items. You can assume both pointers are assigned.
Here’s an example index node implementation where we add TMyObject classes which have an FKey field.
TSampleIndex = class(TDuplicateValIndexNode) protected function CompareItems(OwnItem, OtherItem: TObject): integer; override; end; function TSampleIndex.CompareItems(OwnItem, OtherItem: TObject): integer; begin Assert(Assigned(OwnItem)); Assert(Assigned(OtherItem)); result := CompareStr((OtherItem as TMyObject).FKey, (OwnItem as TMyObject).FKey); end;
Search index nodes.
If all you want to do is iterate by index, i.e. have several different ordering relations on the same set of classes, then that is entirely sufficient. However, you may also wish to search for specific values, using the FindByIndex function. This requires that you pass in a search value as an index node, and override the comparison function so that it compares existing nodes with the search value (or a pre-created object with fields set up), and not the item passed in. A touch convoluted, but here’s another example. First we declare and define our search value class:
TSampleIndexSearchVal = class(TSampleIndex) private FKeyedObject: TMyObject; protected function GetSearchKey: string; procedure SetSearchKey(Key: string); public constructor Create; override; destructor Destroy; override; function CompareItems(OwnItem, OtherItem: TObject): integer; override; property SearchKey: string read GetSearchKey write SetSearchKey; end;
Then there is a little boilerplate to maintain the internal class.
constructor TSampleIndexSearchVal .Create; begin inherited; FKeyedObject := TMyObject.Create; end; destructor TSampleIndexSearchVal .Destroy; begin FKeyedObject.Free; inherited; end; function TSampleIndexSearchVal .GetSearchKey: string; begin result := FKeyedObject.FKey; end; procedure TSampleIndexSearchVal .SetSearchKey(Key: string); begin FKeyedObject.FKey := Key; end;
Then we have the all-important CompareItems function, which compares directly with the stored search value.
function TSampleIndexSearchVal .CompareItems(OwnItem, OtherItem: TObject): integer; begin Assert(not Assigned(OwnItem)); result := inherited CompareItems(FKeyedObject, OtherItem); end;
More advanced index node classes.
If you want to have just the one index node class type, but customise its behaviour based on the index tag, then you can use the IndexLink property to get hold of the Tag that the index uses.
Changing items when they are indexed.
Depending on exactly what your indexes are, you may want to change underlying data in the classes, which then makes the indexes invalid, and searching and iteration might not work. There are two possible ways of handling this:
- If you only have to change one, or a small number of items, then remove and re-add them to the store to update the indexes. In cases where an index has to be unique, the re-add operation may fail, but at least you know exactly which item has a duplicate index.
- If you need to change all the items, then use the DeleteIndex function to remove the indexes affected, and AddIndex to re-add them. The add operation can fail if items have duplicate index values.