The differentiation hypothesis of g tested

© October 2013 Paul Cooijmans


A phenomenon that has been observed in various studies since the early twentieth century is the apparently decreasing role of g, the general factor in mental ability testing, at higher levels of I.Q. That is, in above-average samples, g accounts for a smaller proportion of the score variance than in below-average samples, while group factors and test specificity account for larger proportions of the variance than they do in below-average samples. Charles Spearman, the discoverer of g, was the first to note this and spoke of a "law of diminishing returns" with regard to g. Others have repeated his findings, and it has been suggested that, in the high range of I.Q., g breaks up into group factors (containing variance common to some but not all tests) and specificity (unique to a particular test), and that therefore it is impossible to measure g beyond a certain point or range, or at least that I.Q. scores become meaningless beyond that, no longer representing largely g. Typically one names the 99th centile, but in any case it concerns the range from the 98th to 99.9th centile where the breakup is supposed to occur.

It must be noted that studies also exist that fail to confirm or even contradict this differentiation of g with increasing I.Q. And, in the very low range - of moderate to severe retardation - one also finds that g decreases, apparently as a result of the non-hereditary causes that often underlie very low I.Q.'s (for instance, brain damage).

The differentiation hypothesis of g, and g's possible immeasurability in the high range, were my primary motivation to begin constructing tests specifically for the high range. I wanted to find out whether or not it is possible to meaningfully test intelligence in the high range. I suspected that the decreasing role of g at high I.Q. levels in mainstream testing was simply caused by the absence of really difficult problems in regular I.Q. tests, which makes them invalid for the high range. Later I also understood that the fact that some tests are purposely constructed to yield equal scores for males and females necessarily destroys high-range validity.

Over the years I have found that the intercorrelations and g loadings of high-range tests are on the whole comparable to those found among mainstream tests, so that, at first sight, the differentiation hypothesis is either not borne out or there is only a small amount of differentiation. On those occasions where significant low correlations are found, there tend to be identifiable causes such as poor test construction, floor effects on extremely hard tests, fraud in test-taking, and dishonesty or incompleteness of candidates in reporting scores on external tests (Most candidates, when asked to report their scores, report only their highest few scores and leave out the rest, and/or report retest scores or fraudulent scores, thus depressing or making meaningless the correlations found with tests by others; for this reason I have meanwhile stopped asking candidates for their scores on other tests altogether. For the several external tests for which I am in possession of the full data, similar correlations and g loadings are found as among my own tests).

The method used

In the present study, the goal is to verify whether a decrease of g loading with increasing I.Q. can be detected within high-range tests. Selected are all of the tests that have received 60 or more submissions; this concerns 18 tests.

Estimated g factor loadings are computed separately for the top half and the bottom half of each test, as well as for the full range. The separation point is, for each test, the raw score corresponding to protonorm 413 (I.Q. 140), which is the median of scores on high-range tests according to the most recent norming of the protonorm table.

The mere fact of restricting the range like this depresses the g loading compared to computing it over the test's full range, so it would be expected for the partial values to be lower than the test's full-range g loading, but the point here is to see whether or not the top half loadings are consistently lower than the bottom half ones.

The results per test

The n in parentheses is the total number of score pairs on which that estimated g factor loading is based. For each test in the table, the loadings are based on all of the other tests in the database for which there exist at least 5 score pairs with the object test. The tests are ordered by their full-range g loadings.

Test (full-range g)Top half g (n)Bottom half g (n)
Long Test For Genius (.85)0.72 (252)0.76 (230)
Test For Genius - Revision 2004 (.81)0.74 (263)0.74 (311)
Short Test For Genius (.80)0.84 (91)0.64 (79)
The Final Test (.80)0.74 (462)0.70 (189)
Cooijmans Intelligence Test - Form 2 (.80)0.62 (234)0.81 (84)
Verbal section of Test For Genius - Revision 2004 (.78)0.73 (353)0.73 (414)
Spatial section of Test For Genius - Revision 2004 (.78)0.56 (278)0.75 (460)
Analogies subtest of Long Test For Genius (.78)0.62 (203)0.71 (336)
Cooijmans Intelligence Test - Form 1 (.75)0.56 (101)0.72 (68)
Space, Time, and Hyperspace (.74)0.61 (481)0.73 (422)
Genius Association Test (.74)0.71 (262)0.65 (391)
Association subtest of Long Test For Genius (.74)0.65 (307)0.44 (292)
Numbers (.71)0.44 (176)0.54 (281)
Lieshout International Mesospheric Intelligence Test (.67)0.48 (211)0.60 (255)
Qoymans Multiple-Choice #4 (.67)0.58 (237)0.57 (317)
Qoymans Multiple-Choice #2 (.60)0.81 (20)0.23 (106)
Qoymans Multiple-Choice #3 (.60)0.54 (83)0.58 (173)
Qoymans Multiple-Choice #1 (.47)0.58 (41)0.34 (150)

Combined results

Below are the median values from the above table for all of the tests, and for the subsets of heterogeneous tests, verbal tests, spatial tests, and the numerical test.

Set or subset of tests (median full-range g)Median top half gMedian bottom half g
All (.76).62.675
Heterogeneous (.80).72.74
Verbal (.74).65.58
Spatial (.74).56.73
Numerical (.71).44.54

Over all of the tests combined, the top half g loading is somewhat lower than the bottom half one, which is consistent with the "law of diminishing returns" or differentiation hypothesis. However, this effect is as good as entirely due to the spatial and numerical tests, while on the verbal tests the opposite is the case: top half loadings are clearly higher than bottom half loadings. On heterogeneous tests, the top half loadings are only marginally lower than the bottom half ones.

This test of the differentiation hypothesis will be repeated in the future with (even) more tests and more data, possibly with improvements to the method, and the tests themselves wil be improved whenever possible. Specifically, it is needed to verify whether the result observed on the spatial and numerical tests (lower top than bottom half loadings) holds, or is unique to the particular set of spatial and numerical tests in this study.

For comparison and information, the full and partial g loadings are also computed for I.Q.'s by assessment, which have a higher g loading than almost any test:

Test (full-range g)Top half g (n)Bottom half g (n)
Intelligence Quantifier by assessment (.86)0.85 (274)0.76 (209)

So, this very pure measure of g shows an increase of g loading with I.Q. within the high range.


  1. There is some decrease of g loading going from the bottom half to the top half of the high range, but the g loading by no means becomes low, let alone disappears;
  2. The decrease of g loading is due to the spatial and numerical tests, while on the verbal tests there is an increase;
  3. On heterogeneous tests, the decrease of g loading is only marginal;
  4. Especially the decrease of g loading observed regarding spatial and numerical tests needs to be studied again with more or other spatial and numerical tests, to know whether it is (a) a general phenomenon or (b) specific to the particular spatial and numerical tests in this report.