I am in the process of developing a trading application based in C++. I've realized that it would be useful to test code and algorithms off line, before connecting to a paper-trading system, or even a live scenario.
Saturday, November 1. 2008
MinHeap: C++ Template Implementation of Minimum Heap Algorithm
For code testing and algorithm testing, I need to send the quotes and trades I've collected through an order fullfilment simualation engine. As the solution I'm developing depends upon multiple symbols, I need a synchronization process to ensure the simulated quotes and trades are processed in the same temporal order as I recieved them in real time.
One option for solving this ordering problem would be to simply merge/sort all the vectors and process the single vector. I didn't like that idea as it wasn't amenable to incorporating time based events as they are generated by the fulfillment algorithm.
I ended up designing a 'carrier' to hold each vector. In addition to a few other house-keeping chores, The carrier manages the index into the vector for the latest datum to be processed. The various carriers are then linked into another vector, with the vector sorted in ascending date/time order based upon the current datum pointed to by the carrier.
My first kick at the can for maintaining the vector of carriers was a brute force sift down approach, with the sifting occuring after each datum was processed. This could turn out to be O(n*m/2) time where n is the number of carriers and m is the number of total datums across all carriers.
After looking at performance figures, I figured I could improve on this somewhat. The MinHeap algorithm looked like a suitable candidate.
The Standard Template Library implementation of the algorithm uses pops and pushes to get carriers into and out of the vector. This creates too much overhead. I basically needed a mechanism to maintain minheap order as the carriers are added to the vector, and then perform an in-place re-minheap when an datum from a carrier was 'consumed'. Once a carrier is exhausted of all datums in its vector, the carrier is moved to the end of the carrier vector with an 'archival' call.
Because, during the process of adding carriers to the vector, they are added at the end and sifted up, once carrier archival starts to occur, no further carriers can be added. A flag in the code is used to check for this condition.
The code is written for Visual Studio, but with a few minor mods, will hopefully run with GCC.
There are two template parameters: C, T. T is the basic type of the carrier, and C is used for determining the type of the operands for the lt operator in the SiftDown and SiftUp functions. I'm sure there is a more elegant way of handling the storage of pointers in the carrier vector and trying to perform comparisons of the class represented by the pointer. I havn't take the time to read up on that yet.
The following is the 'lt' operator of the class C:
static bool lt( CMergeCarrierBase *plhs, CMergeCarrierBase *prhs ) { return plhs->m_dt < prhs->m_dt; };
Here is the rest of the code:
#pragma once #include <vector> #include <assert.h> // http://cis.stvincent.edu/html/tutorials/swd/heaps/heaps.html // http://en.wikipedia.org/wiki/Binary_heap // http://en.wikipedia.org/wiki/Heapsort // http://people.cs.vt.edu/~shaffer/Book/C++2e/progs/minheap.h // http://www.staroceans.com/minmaxHeap1.htm // http://www.cppreference.com/wiki/stl/algorithm/is_heap is_heap() template<class T, class C> class CMinHeap { public: CMinHeap<T,C>( size_t size ); CMinHeap<T,C>( void ); virtual ~CMinHeap( void ); void Append( T ); // automatic sift up T RemoveEnd( void ); T GetRoot( void ) { assert( 0 != m_cntActiveItems ); return m_vT.front(); }; void ArchiveRoot( void ); // root item of no further use, swap to end and sift down void SiftDown( void ) { SiftDown( 0 ); }; // reorder new root after use and change bool Empty( void ) { return m_vT.empty(); }; size_t Size( void ) { return m_vT.size(); }; protected: inline size_t Parent( size_t ix ) { return ( ix - 1 ) / 2; }; inline size_t RightChild( size_t ix ) { return 2 * ( ix + 1 ); }; inline size_t LeftChild( size_t ix ) { return ( 2 * ix ) + 1; }; inline bool isLeaf( size_t ix ) { return LeftChild( ix ) >= m_cntActiveItems; }; inline bool hasOneLeaf( size_t ix ) { return m_cntActiveItems == RightChild( ix ); }; inline size_t ixLastItem( void ); void SiftDown( size_t ix ); // from ix downwards, reordering item void SiftUp( size_t ix ); // from ix upwards towards root, when appending new items inline void Swap( size_t ix, size_t iy ); private: std::vector<T> m_vT; size_t m_cntActiveItems; bool m_bArchivalStarted; // prevents further Appends }; template<class T, class C> CMinHeap<T,C>::CMinHeap(size_t size) : m_cntActiveItems( 0 ), m_bArchivalStarted( false ) { m_vT.reserve( size ); } template<class T, class C> CMinHeap<T,C>::CMinHeap( void ) : m_cntActiveItems( 0 ), m_bArchivalStarted( false ) { } template<class T, class C> CMinHeap<T,C>::~CMinHeap() { } template<class T, class C> size_t CMinHeap<T,C>::ixLastItem() { assert( 0 < m_cntActiveItems ); return m_cntActiveItems - 1; } template<class T, class C> void CMinHeap<T,C>::Append( T item ) { assert( !m_bArchivalStarted ); m_vT.push_back( item ); ++m_cntActiveItems; SiftUp( ixLastItem() ); } template<class T, class C> T CMinHeap<T,C>::RemoveEnd( void ) { assert( !m_vT.empty() ); T item = m_vT.back(); m_vT.pop_back(); return item; } template<class T, class C> void CMinHeap<T,C>::ArchiveRoot() { // swap with last item and SiftDown assert( 0 < m_cntActiveItems ); m_bArchivalStarted = true; Swap( 0, ixLastItem() ); --m_cntActiveItems; if ( 1 < m_cntActiveItems ) { // sift only if 2 or more items SiftDown(); } } template<class T, class C> void CMinHeap<T,C>::Swap( size_t ix, size_t iy ) { assert( ix < m_cntActiveItems ); assert( iy < m_cntActiveItems ); T tmp = m_vT.at( ix ); m_vT.at( ix ) = m_vT.at( iy ); m_vT.at( iy ) = tmp; } template<class T, class C> void CMinHeap<T,C>::SiftUp( size_t ix ) { size_t cur = ix; size_t parent; while ( 0 != cur ) { parent = Parent( cur ); if ( C::lt( m_vT.at( parent ), m_vT.at( cur ) ) ) { break; // done sifting } else { // swap and try again Swap( cur, parent ); cur = parent; } } } template<class T, class C> void CMinHeap<T,C>::SiftDown( size_t ix ) { size_t cur = ix; while ( !isLeaf( cur ) ) { if ( hasOneLeaf( cur ) ) { size_t left = LeftChild( cur ); if ( C::lt( m_vT.at( left ), m_vT.at( cur ) ) ) { Swap( left, cur ); cur = left; } else { break; } } else { // has two leaves size_t right = RightChild( cur ); size_t left = LeftChild( cur ); // right side has shorter distance by default for same or greater bool bGoRight = !( C::lt( m_vT.at( left ), m_vT.at( right ) ) ); if ( bGoRight ) { if ( C::lt( m_vT.at( right ), m_vT.at( cur ) ) ) { Swap( right, cur ); cur = right; } else { break; } } else { // test with left if ( C::lt( m_vT.at( left ), m_vT.at( cur ) ) ) { Swap( left, cur ); cur = left; } else { break; } } } } } // if we needed to build a heap from pre-assigned vector: // for (int i=n/2-1; i>=0; i--) siftdown(i);