Daily Archives: April 27, 2013

Investing Computer

One of the most interesting applications of computer technology is in the field of investing.  It is interesting that with all the sophisticated systems and all the monetary rewards possible, that there has not been a successful program that can guide a broker to make foolproof investment predictions…..until now.  It is a fact that out of all the investors and resources on Wall Street, that none of them much better than just slightly above random selection in picking the optimum investment portfolio.  Numerous studies have been done on this subject that show that the very best investment advisors have perhaps a 10% or 15% improvement over random selection and that even the best analysts cannot sustain their success for very long.

There are lots of people that are able to see very near term trends (on the order of a few days or a week or two, at most) and invest accordingly but no one has figured out how to consistently predict stock rises and falls over the long term (more than 3 or 4 weeks out).  That was the task I attempted to solve – not because I want to be rich but because it seemed like an interesting challenge.   It combines the math of finance and the psychology of sociology with computer logic.

I did a lot of research and determined that there is, in fact, no one that knows how to do it but there is a lot of math research that says that it should be able to be predictable using complex math functions, like chaos theory.  That means that I would have to create the math and I am not that good at math.  However, I do know how to design analytical software programs so I decided to take a different approach and  create a tool that will create the math for me.  That I could do.

Let me explain the difference.  In college, I took programming and one assignment was to write a program that would solve a six by six numeric matrix multiplication problem but we had to do it in 2,000 bytes of computer core memory.  This uses machine code and teaches optimum and efficient coding.  It is actually very difficult to get all the operations needed in just 2k of memory and most of my classmates either did not complete the assignment or work hundreds of hours on it.  I took a different approach.  I determined that the answer was going to be whole positive numbers so I wrote a program that asked if “1” was the answer and checked to see if that solved the problem.  When it didn’t, I added “1” to the answer and checked again.  I repeated this until I got to the answer.  My code was the most accurate and by far the fastest that the instructor had ever seen.

I got the answer correct and fast but I didn’t really “solve” the problem.  That is how I decided to approach this investment problem.  I created a program that would take an educated guess at an algorithm that would predict future stock values.  If it was wrong, then I altered the algorithm slightly and tried again.  The initial guessed algorithm needed to be workable and the method of making the incremental changes had to be well thought out.

The answer is using something called forward chaining neural nets with an internal learning or evolving capability.  I could get real technical but the gist of it is this – I first created a placeholder program (N0. 1) that allows for hundreds of possible variables but has many of them set to 1 or zero.  It then selects inputs from available data and assigns that data to the variable placeholders.  It then defines a possible formula that might predict the movements of the stock market.  This program has the option to add additional input parameters, constants, variables, input data and computations to the placeholder formula.  It seeks out data to insert into the formula.  In a sense, it allows the formula to evolve into totally new algorithms that might include content that has never been considered before.

Then I created a program (No. 2) that executes that formula created by program No. 1, using all the available input data and the selected parameters or constants and generates specific stock predictions.  This program uses a Monte Carlo kind of interruption in which all the parameters are varied over a range in various combinations and then the calculations are repeated.  It also can place any given set of available data into various or multiple positions in the formula.  This can take hundreds of thousands (up to millions) of repetitions of executing the formulas to examine all the possible combinations of all of the possible variations of all the possible variables in all the possible locations in the formula.

Then I created a program (No. 3) that evaluates the results against known historical data.  If the calculations of program No. 2 is not accurate, then this third program notifies the first program and it changes its inputs and/or its formula and then the process repeats.  This third program can keep track of trends that might indicate that the calculations are getting more accurate and makes appropriate edits in the previous programs.  This allows the process to begin to focus toward any algorithm that begins to show promise of leading to an accurate prediction capability.

I then created sort of a super command override program that first replicates this entire three-program process and then manages the results of the outputs of dozens of copies of the number 2 and 3 program and treats them as if they were one big processor.  This master executive program can override the other three by injecting changes that have been learned in other sets of the three programs.  This allowed me to setup multiple parallel versions of the three-program analysis and speed the overall analysis many times over.

As you might image this is a very computer intensive program.  The initial three programs were relatively small but as the system developed, they expanded into first hundreds and then thousands of parallel copies.  All of these copies reading from data sets placed in a bank of DBMS’s that represented hundreds of gigabytes of historical data.  As the size of the calculations and data grew, I began to divide the data and processing among multiple computers.

I began with input financial performance data that was known during the period from 1980 through 2010.  This 30 years of data includes the full details of millions of data points about tens of thousands of stocks as well as huge databases of social-economic data about the general economy, politics, international news, and research papers and surveys of the psychology of consumers, the general population and of world leaders.  I was surprised to find that a lot of this data had been accumulated for use in dozens of other previous studies.  In fact, most of the input data I used was from previous research studies and I was able to use it in its original form.

Program No. 1 used data that was readily available from various sources from these historical research records.  Program No.3 uses slightly more recent historical stock performance data.  In this way, I can look at possible predictive calculations and then check them against real world performance.  For instance, I input historical 1980 data and see if it predicts what actually happened in 1981.  Then I advance the input and the predictions by a year.  Since I have all this data, I can see if the 1980-based calculations accurately predicts what happened in 1981?  By repeating this for the entire 30 years of available data, I can try out millions of variations of the analysis algorithms.  Once I find something that works on this historical data. I can advance it forward to input current data to predict future stock performance.  If that works then I can try using it to guide actual investments.

This has actually been done before.  Back in 1991, a professor of logic and math at MIT created a neural net to do just what I have described above.  It was partially successful but the software, the input data and the computer hardware back then were far less than what I used.     In fact, I found that even my very powerful home computer systems were much too slow to process the massive volumes of data needed.  To get past this problem, I created a distributive-processing version of my programs that allowed me to split up the calculations among a large number of computers.  I then wrote a sort of computer virus that installed these various computational fragments on dozens of college and university computers around the country.  Such programs are not uncommon on campus computers and I was only using 2 or 3% of the total system assets but collectively, it was like using 500 high end PC’s or about 3/4th of one super computer.

Even with all that processing power, it was more than 18 months and more than 9,700 hours of processing time on 67 different computers before I began to see a steady improvement in the predictive powers of the programs that were evolving.  By then, the formula and data inputs had evolved into a very complex algorithm that I would never have imagined but it was closing in on a more and more accurate version.  By early 2011, I was getting up to 85% accurate predictions of both short term and long term fluctuations in the S&P and Fortune 500 index as well as several other mutual fund indexes.

Short term predictions were upward to 95% accurate but that was out only 24 to 96 hours.  The long term accuracy dropped off from 91% for 1 week out to just under 60% for 1 year out….but, it was slowly getting better and better.

By June of last year, I decided to put some money into the plan.  I invested $5,000 in a day-trader account and then allowed my software to instruct my trades.  I limited the trades to one every 72 hours and the commissions ate up a lot of the profits from such a small investment but over a period of 6 months, I had pushed that $5,000 to just over $29,000.  This partially validated the predictive quality of the formulas but it is just 2.5% of what it should be if my formulas were exactly accurate.  I have since done mock investments of much higher sums and a longer investment interval and had some very good success.  I have to be careful because if I show too much profit, I’ll attract a lot of attention and get investigated or hounded by news people.  Both of which I don’t want.

The entire system was steadily improving in its accuracy but I was also getting more and more of my distributive programs on the college systems being caught and erased.  These were simply duplicate parallel systems but it began to slow the overall advance of the processing.  I was at a point that I was making relatively minor refinements to a formula that had evolved from all of this analysis.  Actually, it was not a single formula.  To my surprise, what evolved was sort of a process of sequential interactive formulas that used a feedback loop of calculated data that was then used to analyze the next step in the process.

I tried once to reverse-engineer the whole algorithm but it got very complex and there were steps that were totally baffling. I was able to figure out that it looked at the stocks fundamentals, then it looked at the state of the economy which was applied to the stock performance.  All that seems quite logical but then it processed dozens of “if-then” statements that related to micro, macro and global economics in a sort of logical scoring process that was then used to modify parameters of the stock performance.  This looping and scoring repeated several times and seemed to be the area that was being refined in the final stages of my analysis.

By June of 2012, I was satisfied that I can accomplished my goal.  I had a processing capability that was proving to be accurate in the 89 to 95% range for predictions out two to six weeks but it was still learning and evolving when I took it offline.  I had used the system enough to earn enough to cover all the costs of the hardware and software I invested in this project plus a little extra for a much needed vacation.  I never did do this for the money but it is nice to know that it works and that if I ever need a source of funding for a project, I can get it.