When dealing with high-range tests, two types of sex difference are conspicuous:
The goal of this study is to find possible relations between either of the two sex differences on the one hand, and any of a number of test properties on the other hand. The test properties in question are hardness, estimated g factor loading, and contents type. What follows is first a legend of the variables that appear in the tables, then a number of tables with numerical results, and finally some conclusions in verbal form.
The sex of a test candidate is simply that which the candidate reported when registering to take the tests, and can be either female or male. No "in between" option has been offered to date.
The following fields appear in the main table of test variables:
Remark: The male-female score difference could also have been computed in other ways, such as using medians instead of means, or expressing it in raw score standard deviations instead of protonorms, but whichever way one chooses, there remains a fair amount of error caused by the low number of females per test. The current mode of calculating M-F has been chosen because it allows a conversion of the difference to I.Q. points (1 protonorm point is on average .18 I.Q. points in the range where the female and male averages lie); means are used instead of medians because with a very low number of (female) candidates on most of the tests that seems to give a better estimate of the average per test.
The following correlations between the various variables (columns in the table) have been computed:
This moderate correlation between the two indicators of the score difference reflects the amount of error in many a test's score difference, caused by the low number of females on most of the tests.
None of these are significant:
Note the last one: hardness and g loading are apparently not related, but are distinct properties. A highly g-loaded test is not necessarily a difficult test.
|In prot.||In I.Q.||In rsex × score|
Apparently, and contrary to females' preference for one-sided verbal tests, the male-female score difference is on average smallest on heterogeneous tests, containing a mixture of item types. The somewhat anomalous behaviour of Numerical (In prot. and I.Q.) results from the fact there is only one numerical test in the study, so that the error of the score difference on that one test is not mediated out. This table also suggests that rsex × score is the better indicator of the sex difference.
|Test||PropM||Hardness||g||Cont||M-F||rsex × score|
Looking at the table with computed values per test, one thing stands out immediately: The four tests at the bottom, that have drawn the most female candidates (PropM lower than .9), have four things in common:
Another thing that can be observed about these tests is that, despite their popularity with females, female performance on them is not better but even slightly worse than over all of the tests combined. The median male-female difference on these four tests is 43.5 protonorm points, which is about 7.8 I.Q. points. The median of the correlations of sex with score is .255, which is even higher, relative to the corresponding value for all of the tests together (see "Total" in the table "Median score difference per contents type"). In other words, female candidates appear to have had an unlucky hand in their choice of tests. In actuality, the male-female difference is smallest on heterogeneous tests.
The highly significant correlation of .60 with g factor loadings implies that female candidates are seeking out tests with low g loadings, whether they are aware of it or not. The possibility that they somehow cause those loadings to be low by participating can be excluded with certainty, because the females who took the lowly g-loaded tests in this study had as good as no known scores on other tests, and have therefore not contributed to the computation of the estimated g factor loadings of those tests (which are based on a test's correlations with other tests). The fact that they had no scores on other tests is a logical consequence of the rareness of female high-range test candidates. It follows that females are actively avoiding tests with high g loadings, as if they possess a "sixth sense" - call it a g-spot - that enables them to detect how much g a task requires.
The low but significant correlation of PropM with hardness implies that female candidates are also avoiding difficult tests. This, together with their avoidance of highly g-loaded tests, provides a clue as to the low participation of females in high-range testing on the whole. High-range testing is all about difficult and g-loaded tests, and that appears to be why females avoid it.
The absence of significant correlations with the two indicators of the male-female score difference means that the sex difference is not greater (nor smaller) on difficult tests, and not greater (nor smaller) on highly g-loaded tests. In combination with the foregoing, this suggests that females are avoiding difficult and g-loaded tests for no reason. With regard to test hardness this is entirely logical, after all it should make no difference for one's score whether one takes an easy or a hard test; as long as one's true level falls within the measured range, one's score should on average be the same on either hardness level of testing.
Also, the absence of a significant correlation with g factor loadings means that there is currently no evidence that the male-female difference on high-range tests is a difference in g. Had there been a significant positive correlation with g loadings, this would have suggested that the difference was at least partly due to a sex difference in g. The absence of such a correlation leaves open the possibility that the observed sex difference lies partly or wholly in non-g factors. To understand this, one needs to know that while most of the variance in I.Q. test scores is accounted for by the g factor (typically 60 to 70 %), other portions of the variance are due to so-called "group factors" like the verbal, numerical, and spatial factor. The observed sex difference may lie in group factors as well as, or instead of, in g.
The median sex difference is markedly smaller on heterogeneous tests (with a mixture of item types) than on homogeneous tests (with only one item type), as both indicators of the sex difference reveal. This lends support to the possibility that the difference lies partly or wholly in group factors rather than in g.
The total median of the size of the sex difference (over all of the 28 tests that have female scores) is 41 protonorm points in this sample of tests, an estimated 7.4 I.Q. points. This is smaller than the difference of 50 protonorm points and 9 I.Q. points found in the current protonorm table (from 2011 for males and 2013 for females), and smaller than the difference of 11 I.Q. points found in a 2004 report. These studies used different samples and different methods of computing the difference. The variation in outcome is due to "sampling error" rather than an actual shrinking of the difference; an analysis of the development of the sex difference over time would be problematic because of the rareness of female test scores. For clarity, simply all existing scores have been used for all of the tests.