A Defense of MBTI (Part 1/2)

MBTI, a popular personality typing test, gets a lot of negative publicity with headlines like The Fad that won’t die and YouTube Videos that try to spell it out for us: MBTI is NOT scientific. So why are people so obsessed by them? If you want to defend MBTI, what can be said, or is it truly just for entertainment? Why should we use it even if psychologist don’t, and furthermore, is it a good defense?

Here is a summary of some of the important criticisms for MBTI:

  1. Reliability – A study has shown that only 50% of people taking the test get the same result twice and over short periods of time, a statistic known as Test Retest Reliability.High-Reliability
  2. Validity – An analogy of the issue of Validity can be seen from IQ tests – are people who get high IQ scores actually smarter? Does it measure what it purports to measure? Even if “Occupations attract people of different types”, MBTI isn’t useful or valid because it doesn’t correlate with satisfaction at work or job performance.
  3. Oversimplification – MBTI may be missing important variables, like neuroticism, which describes how emotionally stable a person is.
  4. False dichotomy – “People aren’t that simple” and there’s a continuous range of personalities and traits. MBTI, however, is binary and artificially forces you into one of the 16 types and says that you are either an Extravert or an Introvert, even if your underlying score was really close to the cut-off line. The underlying continuous scores are normally (rather than bi-modally) distributed, which shows that the types aren’t a natural classification.
  5. Forer Effect aka Barnum Effect – Descriptions are general enough to apply to anyone, but we identify with the descriptions anyway because they are positive and sound good. That MBTI has any validity is wishful thinking.
  6. Psychologists don’t use it.

Few would contend that personality traits don’t exist. Rather, the issue has more to do with how they are measured and whether it is at all sane to discretize the over 7 Billion unique individuals that live on this planet into a mere set of 16 types. It is brutally efficient, but are the results useful, neutral, or harmful? Perhaps we need 7 Billion types, not 16.

The most common rebuttal that I see is that MBTI-skeptics have not spent enough time understanding the underlying Jungian “Cognitive Functions“, and that might be a very relevant critique, but can you really say that about psychologists and researchers like Boyle or Pittinger? I don’t think so. We could use a little more evidence.


The most commonly quoted article intended to throw a monkey-wrench into the machinery of MBTI is a 1979 paper by Howes and Carskadon, who demonstrated that across a 5-week test-retest interval, only 49% of the participants remained the same on all four dimensions of the MBTI.

In general people who use this paper tend to ignore the actual conclusion of the authors! And as you can see it paints a completely different picture.

 “The results of the present study were extremely supportive of the reliability of the MBTI. Test-retest reliabilities of continuous scores on all four MBTI dimensions were clearly unaffected by changes in mood, despite the effectiveness of the mood manipulations themselves. This supports both the theory and the intent of the Indicator.”

“Test-retest reliabilities per se were also good in this study and somewhat superior to some others that have been reported.”

Not only do the authors point out the satisfactory reliability, but they also point to the indicator’s robustness to mood changes.

Cherry-Picked statistics

In 2002, two researchers performed a Meta-Analysis of MBTI studies and included 14 which reported reliability statistics. They concluded that “the MBTI tended to yield acceptable score reliabilities” and generally found Test-Retest Reliability statistics to be around 80%. The 50% statistic, which skeptics often site is the minimum reported statistics and ignores all the other studies, some of which show reliabilities over 90%!!! Why do MBTI poo-pooers pretend that the McCarley and Carskadon paper was the only study of its kind? And why not rather report the average?

Capraro 2002

Annie Murphy Paul in The Cult of Personality Testing, writes that “the sixteen distinctive types described by the Myers-Briggs have no scientific basis whatsoever.” It is interesting she should say that given the abundance of evidence from the scientific literature:

“MBTI is a widely used measure with adequate reliability and validity” (Churchill & Bayne, 1998, p. 383)

“Estimated reliabilities of type categories appear to be satisfactory in most cases “ (Carlyn, 1977, p. 465).

“Reliabilities for type categories appear to be satisfactory” (Buboltz, Johnson, Nichols, Miller, & Thomas, 2000, p. 135)

“These findings indicate that the MBTI measures four dimensions and the keyed items measure reliably the scales the items are expected to measure” (Fisher, Kent, & Fraser, 1998, p. 105)

“Its test-retest reliability is acceptable for its type” (Gallagher, 1998, p. 23)

“Studies that are available show satisfactory internal consistency of each of the four scales.” (Carlson 1985).

Given the above, which surely isn’t a comprehensive list, I have to call into question those who say there is no evidence for the reliability of the MBTI.

Odds Ratios and why it’s amazing

Second, even if 50% were somehow the right number, it is still a great result – it is even fair to call it solid. It is unimpressive if we were flipping coins with only two outcomes, but consider rolling a dice that comes up with a “Six” fifty percent of the time – such brilliant luck would have you banned from a casino. Yet, with MBTI, it is even more impressive since we have 16 types, and you would therefore expect to obtain the same result only 1 out of every 16 times, or 6.25% of the time. Compare that to 50%, however, and you can see that there is a big difference. The graph below shows that the more types you have, the harder it is for random questions to score a Test Re-test reliability higher than 50%.


The Odds Ratio is 8x (=50/6.25), which says that you are 8 times more likely to score the same result than if the test was completely random, which is most certainly what is implied when it is lumped in together with Astrology. Moreover, the same researchers (McCarley and Carskadon) found that 80% of testers received unchanged preferences on at least 3 scales, so there is a good chance that the test-scores will help you narrow down the possibilities. So you can see, the test results are not random, and there is thus simply no way that the resonance that most test-takers have with their MBTI type are the fictitious imaginings and unfortunate consequence of the “Forer Effect”.

reliable versions

A final note is that McCarley and Carskadon’s statistic pertains to an older version of the test and that the new one (“Form M”) has re-test reliabilities that are about 10-20% higher relative to the old version because it is better at discriminating scores.  More information about the differences is available here along with claims that Test Re-Test reliabilities range from 85% to 96%.


8-validity-and-reliability-of-research-instruments-11-6382Whereas “reliability” refers to how often a test taker achieves the same result, “validity” pertains to whether MBTI measures what you want it to measure. The left-most image helps us visualize what it would be like to have high reliability, but poor validity: all our shots cluster around the same point but they are far off target. Yet despite the clarity of these concepts, researchers seem confused about how to measure validity. Some say specifically that MBTI is valid if we can use it to predict job satisfaction and performance. Others think the only measure for a self-test is self-evaluation. The common attitude generally is that validity is very difficult to measure and that it is troublesome even if we did have a very solid model.

“Self-Reported measures of such variables are, in the strictest sense, not verifiable by other means… there is no direct means of cross-validating people’s descriptions of their feelings or intentions.” Podsakoff and Organ (1986)

On the one had we have Pittinger claiming that “there is no evidence to show a positive relation between MBTI type and success within an occupation. That is, there is nothing to show that ESFPs are better or worse salespeople than INTJs are”. Results elsewhere “indicated that personality had neither a direct effect on satisfaction nor a moderating effect on the job characteristics-job satisfaction relation.” (Thomas et al. 2004)

And on the other hand we have Carlson saying that

“The recent voluminous literature on the scale (one bibliography lists approximately 700 references) reflects largely successful efforts to apply it in a large variety of educational, clinical, counseling, business, and research settings.” (Carlson 1985).

Another researcher highlights the real-world applicability by pointing out that “managerial level was… negatively correlated with… introversion and sensing” (Moutafi et al 2007).

This corroborates research on the “Big Five”, a different personality measure to which MBTI is correlated where “It was found that Extraversion is a valid predictor of performance in jobs characterised by social interaction, such as sales personnel and managers” (Rothmann and Coetzer). We’ll return to the Big Five in Part 2.

In the balance of things, it appears to me that there is evidence to support both arguments, but given the limited amount of data and evidence that I could find that pertained specifically to MBTI, it is hard to build up a very confident case about its validity or lack thereof. And because of the dearth of evidence, it is easy for someone to succumb to the logical fallacy that is often made that the lack of evidence is evidence of no effect.


MBTI is criticized for missing important variables like Neuroticism, which indicates how you experience negative affects such as fear, sadness, embarrassment, anger, guilt and disgust. A high Neuroticism score indicates that a person is prone to having irrational ideas, being less able to control impulses, and coping poorly with stress, while a low score is indicative of emotional stability. Rothman and Coertzer (1995) points out that Neuroticism “is the second most important characteristic that affects the employability of candidates” and more recent study showed that it is “inversely related to job performance.” Wharton prof Adam Grant sums it up: “its’ an unfortunate oversight” that this dimension is missing from MBTI.

personality vs character

It is true that neuroticism is a personality trait in the way that psychologists define it, i.e. an enduring pattern of thought, emotion, and behavior. However, all the other traits (intuition-sensing, feeling-thinking, etc..) don’t measure you on a scale of good to bad, but neuroticism does. Hans Eysenck, the grandfather of trait theory, pointed out that “Neuroticism is the major factor of personality pathology”. And that is what it is, a personality pathology, which describes enduring patterns of cognition, emotion, and behavior that negatively affect a person’s adaptation. It is easy to conflate the ideas of Virtue Ethics as described by Aristotle and cognitive functions described by Carl Jung. As Dr. Mike, a professor of Psychology, questions in a YouTube interview (@13:43)

“What is the difference between character and personality? There’s a big difference in that introversion-extraversion is neutral amoral, but whether you are courageous or cowardly, Aristotle would say one is a virtue and one is a vice.”

Neuroticism is pretty much a vice. While there are some exceptions to the rule, in just about every study, Neuroticism has been linked to depression, anxiety, poor job performance, memory, and intelligence. Very little can be said about the benefits of being neurotic, except a slight advantage as a motivator for preparation and to being more realistic in certain situations, which leads to the conclusion that neuroticism is something everybody should avoid.

Being left or right-handed is a preference, but whether your arms are strong or weak is a power. Whether you choose your left or right hand may not affect an outcome, but whether one of your arms is strong or weak could potentially allow you to achieve a dramatically different result. And that’s the difference between MBTI and certain other traits like neuroticism. While the former describes “how” we accomplish things, the latter describes “what” our limitations are and this depends on the power of our character.

In order to be as successful in achieving personal growth, we should be all-inclusive, working with all dimensions of “how, what, and why” rather than working with a single map to navigate our personalities at the exclusion of others, but this doesn’t mean that each of these tools when individually assessed isn’t valid or is unworthy of our time and study. The Enneagram is a popular personality model that studies the motivations and the “why” of our actions, but I’m not aware of any models that study all three simultaneously.


This post only analyzes about half of the issues I promised to look at so we can only conclude so much. Even so, there is a high degree of reliability in MBTI test results in spite of mood swings and the ambivalent nature of anyone ever truly knowing themselves and thus being able to accurately and consistently respond to personality survey questions. Part of the inherent difficulty is not with the test itself but with the test-taker. Yet, the reliability outcomes are surprisingly high and are much higher than would be predicted under the “Forer” hypothesis of randomness. We saw that a 50% Test Re-Test rate is pretty high given how many types there are and also that this statistic is usually quoted considerably higher than that (closer to 80%).

Unfortunately, the “validity” of MBTI was harder to pin down and could be argued both ways, which means that maybe it measures “something”, but frustratingly not what we want it to measure. There aren’t really any objective measures that can confirm the validity of self-reported measures like MBTI.

MBTI certainly isn’t an all-powerful model, and nor does it claim to be so. Myers herself said that MBTI is best viewed “as affording hypotheses for further testing and verification rather than infallible expectations of all behaviors”. It can be summarized that the Jungian model investigates learning styles, describes our interests, and informs us about “how” we approach things, and we’ve seen how there is a big difference between personality assessments and assessments of character.

Next week we will investigate what if anything personalities have to do with satisfaction and performance at work to further understand the issue of validity and how to best use (or not use) MBTI. I’m also pretty excited to share with you my data on distributions of type within various occupations, Picasso, the meaning of life, and why MBTI-critics don’t believe tall people exist.

Quantitative Finance Professional with a passion for happy living, self-improvement, nutrition, and minimalist running over maximalist distances.
