Machine Learning Increases The Demand For Formal Theoretical Models of Entire Spaces
In presence of development of the internet into what now is a huge source of valuable information for firms, but with necessity of harvesting, sifting, and aggregating of the data for arrival at information that has value for managerial decision making, big data and machine learning algorithms have become buzz words within business communities.
Harvesting, sifting, and aggregation of data for arrival at valuable information has required development of new algorithms, platforms, and softwares, this because data harvested from the internet comes in many variety of forms. Comments by customers, which are non-numeric in character, and information on purchases, which by definition are numeric, are a case in point.
In order for comments penned by customers to be aggregable with data on purchases, comments have to be harvested. Clearly, algorithms that do a good job of harvesting comments must of necessity differ from those that do a good job of harvesting numbers, yet at end of the day must have capacity for conversion of ‘comments data’ into numeric data that are aggregable with purchase data that already are in numeric form. Given there typically will not exist any causal relations between say, a customer’s comments and purchases, big data is not expected to generate anything more than correlations — patterns, trends etc — that enable more precise targeting of firms’ products at customers. Voluminous nature of data that is harvested from the internet, and absence of any regular frequency for such data (differences in arrival rates, equivalently, differences in velocity) create demand for development of new approaches to mining of information, hence arrival at demand for machine learning algorithms.
For firms, applications of machine learning algorithms to analyses of big data facilitate what they always have done, which is, analyze data for insights into how to better maximize firm value.
If firms infer that an ethical action will improve firm value, there does not exist any demand for any other rationalization for their actions. The profit motive is a sufficient rationale.
Suppose, however, applications of machine learning algorithms to big data in context of scientific studies. Given machine learning algorithms inherently lack capacity for asking ‘why’ questions, only are able to generate patterns, trends, or correlations, output from machine learning algorithms do not arrive with any manual that facilitates rationalizations for patterns, trends, or correlations. In presence of stated caveat, and in context of scientific studies, robustness of interpretation of output from applications of machine learning to analyses of big data require development of manuals — require development of formal theoretical (mathematical) models.
Highlighted demand for development of formal theoretical models is not new. Approached rightly, output from implementation of regression based analyses always have been interpreted in context of existence of some formal mathematical rationalization for parameters generated by regression models. Where formal theoretical models had yet to exist, demand for such models was generated. In mean time, empirical results were endowed with some preliminary plausible interpretation.
In this respect, consider that with y as outcome variable, x as potentially a causal variable, a as the value of y when x=0, e as what is not explained by x, and m as the effect of x on y, there exist contexts within which formal theoretical models have established that a regression model specified as:
is more robust than the alternate conventional specification, which is,
The feasibility that ‘y=mx+e’ is more robust than ‘y=a+mx+e’ only could have been generated in context of an independently constructed formal theoretical model, is nigh impossible to arrive at merely via implementation of a regression model.
As the world gets excited about big data and machine learning, it will do well to remember that if output from applications of machine learning algorithms to analyses of ‘big data’ that have to do with science and technology are not subjected to independently constructed formal theoretical (mathematical) models, it just might be possible that we experience a return to the days during which ‘changes in prices of cigarettes’ were deemed to be explained by ‘changes in teachers’ salaries’.
Given prices of cigarettes are non-sticky data (data that changes frequently across months of the year), but with teachers’ salaries sticky (salaries change at most on yearly basis), it was well recognized that any relations between prices of cigarettes and teachers’ salaries represented perhaps no more than effects of inflation, was no more than a correlation.
If we are to avoid spurious interpretations of output from applications of machine learning algorithms to analyses of big data, there must be strict demand for development of formal theoretical (mathematical) models that serve as benchmarks for interpretation of such output.
Machine learning algorithms are non-analytical algorithms, that is, are asymptotically convergent series representations of desirable, but empirically non-feasible analytical solutions. If formal theoretical models designed for rationalization of output from machine learning algorithms are, themselves, outcomes of formal theoretical models that are lacking in analytical interior solutions, such models lack any real independence from output of machine learning algorithms.
Establishment of veracity of an asymptotic solution (from machine learning algorithms) with yet another asymptotic solution (from formal mathematical models that are lacking in analytical interior solutions, as such are themselves exercises in arrival at asymptotic solutions) cannot be deemed reliable or robust.
We have then that there is demand not for proofs that convergent algorithms that undergird machine learning algorithms work as expected; rather we establish existence of demand for ‘analytical interior solutions’ that serve as litmus tests of the extent to which output from machine learning algorithms do not run contradictory to steady state ‘equilibrium’ properties of the space (ecosystem) that is under study.
It is impossible for steady states that subsist within a space (ecosystem) to be described by models that are lacking in analytical interior solutions.
By definition, an asymptotically convergent formal theoretical (mathematical) solution is one of many feasible solutions, as such is inapplicable to ascertainment of steady states (unique solutions or equilibriums) that subsist within a space (ecosystem).
In this respect, consider that while you can from Atlanta, Georgia travel to Boston, Massachusetts through the states of South Carolina, North Carolina, Virginia, Maryland, New York, and Connecticut (‘Route 1’), you also can arrive at Boston by travelling through Tennessee, Alabama, Ohio, New Jersey, New York, and Connecticut (‘Route 2’).
Routes 1 & 2 represent alternate convergent algorithms that, starting out from Atlanta, Georgia, enable arrival at Boston, Massachusetts. Without the analytical interior solution that precisely (exactly) identifies geographical locations of Atlanta, Boston, and the United States of America, however, assumption of equivalence of the two convergent algorithms — Route 1 or Route 2 — would not have any basis in reality.
If output from applications of machine learning algorithms are not shown to have similar effects on analytically derived stable state interior analytic solutions that are unique, robustness of alternate algorithms is not discernible, and output from machine learning algorithms are susceptible to subjective interpretations of researchers, as such are manipulable.
In absence of formal theoretical (mathematical) parameterization of spaces (ecosystems), which generate analytical interior solutions that are unique, equivalence of different convergent algorithms, and robustness of interpretation of output from machine learning algorithms are non-discernible.
The art and science of arrival at formal theoretical (mathematical) models, which have analytical interior solutions that are unique is a dying form. We all will be the worse for it if we assume that arrival at machine learning algorithms obviates demand for formal theoretical models that have analytical interior solutions.
In this respect, it is important to note that models, which are constructed using tools of differentiation (ODEs, PDEs etc.) and integration (e.g. Fourier Transforms), only are able to generate asymptotically convergent algorithms, intrinsically lack capacity for generation of formal theoretical models that have interior analytic solutions.
It is time to reinvigorate attention to models that facilitate parameterization of entire spaces — models that enable formal theoretical parameterization of stable states, and stable state interactions that subsist between parameters of a space (ecosystem). In Finance, an entire market, such as the Stock Market is a space. In Astronomy, the Milky Way Galaxy, within which all entities are connected by space and time, or our Solar System are spaces. In technology, given all of a car’s components are linked, directly or indirectly to the car’s engine, a car qualifies as a space.
If ever there was a need for reinvigoration of demand for formal tools of abstract (pure) mathematics — for tools that enable modeling of entire spaces — with outcome there exist tests for the extent to which output from machine learning algorithms do not run contradictory to independently and analytically derived stable state equilibriums, which have uniqueness properties, that time is now.