Personality Insights From Machine Learning

 

An artist’s visualization of synapses firing in the brain. See Tobias Gremmler’s Neuroscape.

This post might be a bit bizarre, but hey, isn’t this the world of the esoteric? Two subjects that I’ve really been enjoying are Machine Learning and Personality Types, so naturally I’m thinking about Jungian cognitive functions from the lens of machine learning algorithms. Some areas of Machine Learning are inspired by biology (f.ex. the Perceptron, Neural Networks, and Convolutional Nets), but instead of asking how biology can make better machines, I ask what can data science tell us about people? And in particular, I think algorithms and their individual weaknesses highlights some of the cognitive biases to which we may be predisposed. This article might get a bit technical and could be a bit difficult to follow depending on how familiar you are with the underlying functions of the Jungian functions.

Introverted Sensing (Si)

In Jungian psychology, Introverted Sensing  (Si) is a memory-based learning process that is about matching current events to history. The adage “history repeats itself” sums it up and if you’ve seen something happen in the past Introverted Sensing projects history into the future. I often see this kind of logic in news headlines, where f.ex. current recessions are compared to past ones. Si doesn’t try to understand the underlying mechanisms for those recessions or stock price moves, but simply looks at surface-level data. It is very  much a “sensory” assessment (in the Jungian sense) rather than an “intuitive” one.

Introverted Sensing (Si):
Recalling, Linking, Comparing and Contrasting
Says: “This reminds me of….”

– Paying attention to similarities and differences
– Becoming aware of differences from what was
– Noticing discrepancies
– Scanning memory for related information
– Focusing on past successes and failures
– Re-living a past experience.

Nearest Neighbor

x1 is some recorded event. The axes represent some explanatory variable (f.ex. inflation and unemployment). x represents the current (never before seen) environment along these variables. The goal is to figure out whether the economy will grow (blue dots) or contract (red dots).

In machine-learning, there is a parallel to the Si cognitive function through a very popular algorithm called the Nearest Neighbor (NN) algorithm. One of its advantages is that it’s very fast to learn and to implement the algorithm – you simply record all the data points that you’ve seen that might explain something (f.ex. employment, oil prices, etc..) and then match the current environment to similar values of those variables… The thing in the past that most resembles the current situation is then used to make predictions. The drawback to the procedure from a machine-learning computational perspective is that it is slow to match the current data to past data because you have to sift through very large histories. In other words, memorization is fast but retrieval and event-matching is relatively slow. So an ISTJ who is Si-dominant and basically sees the world through the lens of the Nearest Neighbor algorithm would be quite averse to new and evolving contexts because matching incoming data to past experiences is uncomfortable and is computationally expensive.

People tend towards NN-derived logic/ introverted sensing, especially in contexts like economics, politics, or social sciences when the underlying mechanisms can be overwhelmingly complex with circular feedback loops and (money) multipliers because it is tempting to approach the problem from a theory-less perspective. However, insights from Data Science teach us that memory-based cognition in this context is plagued with the curse of dimensionality. When the number of explanatory variables increases as things become more complex, the more spread out (in a hypercube) are the examples. You’ll need a very large volume in the explanatory hyperspace to fish out something that looks like the thing you’ve seen in the past, and chances are that it won’t be very similar at all.When one of the dimensions is off, the error gets multiplied through all the other measures.

The curse of dimensionality is illustrated by data in a unit cube. The figure on the right shows the
side-length of the subcube needed to capture a fraction r of the volume of the data,
for different explanatory variables p. With ten variables we need to cover 80% of the range
of each coordinate to capture 10% of the data.

Introverted Intuition (Ni)

Introverted Intuition
Forseeing, Conceptualizing, Understanding Patterns, Synthesizing and Symbolizing
Says: “Aha, that’s it!” or “This is how it will be.”

– Paying attention to future implications
– Becoming aware of universal meanings and symbols
– Noticing whole patterns or systems
– Scanning internal images for insights
– Focusing on depth and understanding
– Asking “What is the goal?”
– Seeking unconventional, innovative ideas
– Imagining and anticipating an experience

Relying merely on outcome as Si does can lead to erroneous conclusions, f.ex. looking at high inflation in the 1970’s as a guide for the future without understanding the nature of the inflation (i.e. stagflation) might have been one of the reasons that otherwise quite astute investors like Cliff Asness (founder of hedge fund AQR) wrongly sounded the inflation alarm during the recent financial crisis. Instead of relying on outcome this way, we might prefer to model our environment and to come up with hypotheses and theories. For example, during the Financial Crisis in 2008, Paul Krugman simply drew on Keynesian theory and the liquidity trap to explain that inflation would be impossible. The insight that it brings into the world is that not all seemingly similar environments produce the same outcome and if one dimension of similarity is off, it can lead to completely different sets of outcomes.

Ni replaces the memory-based “non-parametric” approach with a “parametric” one. Something that is parametric is the normal distribution (remember the bell curve?) which has a standard-deviation and a mean and both of those things are parameters. With a parametric approach, the curse of dimensionality is avoided but is replaced with an inductive bias, which is the error that results by making assumptions about how the world works. The Introverted Sensing approach doesn’t filter out any data and doesn’t make any assumptions about how the world works and is thus free from inductive bias. Introverted Intuition is similar to its sensing counterpart because it also recalls something from memory, but instead of recalling data, it seeks to recall maps and models, even from very different disciplines, that it might have seen in the past. The issue is then one of model selection, which seeks to find the model that will most accurately predict the future.

By focusing on prediction it is in some sense removed from ultimate truth-seeking. I knows it can’t create a function that corresponds to the underlying ground-truth and thus may be removed from how the real world actually works but it is satisfied as long as the possibly fictional model is good enough. Moreover, this removal from nature is what makes model selection introverted and subjective. In classical statistics there are many approaches that achieve model selection including Akaike’s Information Criterion (AIC), Bayesian Information Criterion (BIC), and more modern techniques use cross-validation, which reserves a portion of the data for out-of-sample validation and its goal is to ensure that the researcher didn’t overfit the model.

In other words, once an Ni-dominant person has formed a hypothesis or opinion, it is time to test that hypothesis in the real world, like bringing a fresh product to market, or maybe it is just time to troll the idea on the internet to see how well it can be argued — all of that is cross-validation for an INTJ who leads with Introverted Intuition. However, this cross-validation is really something that happens in the outer world and because it is evaluating the model to see if “it works” this process is checking in with its partner function (Extraverted Thinking) and this stresses the importance of that activity. If an Ni-dominant person forgets to cross-validate it leads to a subjective inference that bears little resemblance to the real world. No matter how ingenious that idea might be, it might be seriously flawed in its assumptions, causing the person to overfit the problem/data at hand, which of course leads to poor generalizability.

Extraverted PErceiving Functions

An attempt to classify the same data points via the Ada-Boost algorithm. The 4th step takes a weighted average of the previous 3 steps, which occur in sequence in a way that focuses on errors. Each of the 3 steps is dumb and is almost playfully partitioned by rectangular regions that misclassify a lot of the points, but the final average does a really good job and can achieve highly non-linear results which can work well with complicated data-sets. This playful handling of the data could make it a candidate for Extraverted Intuition.

I speculate that Se can be likened to a Kalman Filter, which makes use of well-developed priors/ biases, which help to quickly absorb and interpret incoming information in real-time.

Extraverted Intuition, which scans a wide array of possibilities without committing to anyone of them reminds me of ensemble methods which can combine thousands of models, each of them pretty dumb in their own right, in order to produce amazingly accurate forecasts.

One such method that I have learned is the popular Ada Boost algorithm shown in the diagram. The blue pluses correspond to an outcome (f.ex. no cancer diagnosis) and the red minuses correspond to the opposite (f.ex. cancer diagnosis). The objective is to take never-before seen inputs given as values in the two coordinates and to make new diagnoses with them.

Truth seeking behavior

The attitudes in our perceiving functions, i.e. whether they are introverted or extraverted, has ramifications about how we approximate truth. Consider that a machine learning algorithm seeks to learn how the world works by creating a function that gets as close to the truth as possible – it does this by minimizing the error or equivalently by maximizing the probability of telling the truth as shown below. Given this figure, our goal is to learn at what value of “x” the maximum is achieved. If we scan the curve very broadly we are more likely to get to the global maximum, which is where the ultimate truth lies.

 

 

I would like to argue that Ne/Se users (f.ex. ENTP/INTP/ESTP/INTP) are purer truth-seekers because they go after the global maximum. Ni/Si users develop theories or use memories, which possibly leads them to local minima and they are ok with this because they believe that new memories or theories will eventually jump to a different part of the hill. Ne and Se is much more exploratory of the whole space and is slower to commit to any particular ‘answer’ but once they feel that they have ‘converged’ to the truth, they are more likely to be resistant to new information or input.

Which Machine LEarning Method is Best?

We all like to think that we are the best. However, in statistical learning, there is no single best model that works optimally for all kinds of problems and this is sometimes called the no free lunch theorem. The reason for this is that a set of assumptions that works well in one domain may work poorly in another. In other words, people are endowed with what Howard Gardner called Multiple Intelligences, which differentiates people’s intellectual superpowers into specific domains, rather than rather than seeing intelligence as being dominated by a single general ability. Consequently, we need to develop many different types of models, to cover a wide variety of data that occurs in the real world and similarly we need many different types of people to solve various problems. For each model there could be many different algorithms that we can use to train a model and which make different speed-accuracy-complexity trade-offs. One could also imagine very complex systems requiring people to organize themselves into social ecosystems, whereby multiple intelligences interact synergistically, in order to handle dynamically changing circumstances and challenges.

 

References

Kadane, Lazar; Methods and Criteria for Model Selection;

Hastie et al., The Elements of Statistical Learning, Data Mining Inference and Prediction, Springer 2008

Murphy, Machine Learning – A Probabilistic Perspective; MIT Press 2012

 

 

About Esotariq

Quantitative Finance Professional with a passion for happy living, self-improvement, nutrition, and minimalist running over maximalist distances.