## Sunday, August 2, 2015

### Systematic Strategies: A Simple Statistical Pattern Matching Algorithm

Time series pattern matching in finance is an area which borrows from statistical techniques and mathematical tools from various other disciplines to make an educated guess about near term price movement.

The basic idea is simple: Suppose we have data of price evolution for last n periods, and want to know which way it statistically biased to move for the next m period. Given a set of past data, a pattern matching algorithm searches for best-fit n-period samples (matches) from the data sets and analyze statistical properties of next m period. From these information we can make some probabilistic statements about the likely evolution of the price series we are interested in. For most pattern matching algorithms the basic operation is more or less as above. The area they differ is how they define "best-fit" (a measure of fit - i.e. likeliness, or more generally, a measure of difference, i.e. distance) and also how they find the matches (clustering techniques).

One way to define distance is the conventional crow-flight distance (also know as Euclidean distance). To see how this applies to a price time series, assume we have daily data for last 20 days we are interested in. To define the distance from another 20-day sample, we compute the square of differences for each day between the two series (i.e. difference of the first day of first series and first day of the second series) and then sum them up and take a square root. There is a host of ways in which we can measure distance. See here for a list for example.

For picking up the top matches, there are many ways. One popular, and easy to implement, is a method known as k-mean clustering.

Once we have decided a measure suitable for our purpose and way to choose top matches, the rest is easy. We need to define what statistical properties of the matches we are going to assess and how to interpret that and generate a buy or sell signals.

I have here taken an example with NIFTY futures (the flagship index of National Stock Exchange in India) with 1 minute bar data. To illustrate, see the figure below (click to enlarge). Imagine for a given date (1st July here), we see the price movement (the black line) up to a certain time in the day. We need to match this with history (the orange lines) and predict the move today. Finally the green line shows the actual move that realized. Below I describe the scheme of this strategy.

For the statistical parts: I have chosen a distance measure known as Markov Operator distance (see here, opens PDF). This is a bit more suitable for our purpose, as Euclidean distances are sensitive to jagged movements typical in stock price evolution. Also before measuring the distance, I have smoothed the data to filter noises. We can, again, choose from a host of options (any low pass filter will do). I have used a simple kernel regression smoothing. Note, since we are smoothing the data, Euclidean distance measure should not perform particularly bad for us. Lastly for choosing the top matches I stick to k-mean clustering as mentioned above.

With this scheme, our strategy is simple. We look at the price evolution each day for a given n period. Then match this for daily 1 minute bar data since April 2015 and generate a trading signal based on the statistics of the top matches. Here I have chosen a set of very simple parameters - we buy (sell) if 1) the subsequent returns from the matches are positive (negative), 2) Sharpe ratio is above a certain threshold and 3) the skew (as define by ratio of 90 and 10 percentile move) is above a certain threshold in our favor. The results are as below (click to enlarge).

The graphs shows the performance of the strategy during the late June/ entire July 2015 (total 23 days). The right chart shows performance (profitable move captured) for different n (the period till which we sample each day). For example, since NIFTY starts trading from 9:15 AM, and we have 1 min data, n=100 means we wait 100 mins or till 10:55 AM before taking a decision. As can be seen performance degrades the more we wait. The left chart shows the profit (the round dots) vs max draw-downs and max upside (either end of the sticks) for each trades for n=100.

For a simple strategy, the performance is impressive. Further scope for improvement is 1) to extend scope of pattern search to any n-period time of the day instead of matching only relevant time periods. 2) improving on the distance function or the smoothing techniques. For example Dynamic Time Warp is a candidate here, especially if we extend the scope as in 1. 3) improve optimization in terms of n, the thresholds in Sharpe and skew etc. 4) improve execution in terms of take profit/ stop loss

[Edit 1: for those interested in the code and implementation, you can find it here, pick up the two files on k-nearest neighbor strategy]

[Edit 2: this extends my previous post on intraday momentum back-testing. Technically they are similar, except in the first case we take a simple snapshot to decide a signal, and here we take a certain length of sample to generate a signal]