Statistics of Numerical section of The Marathon Test

© Paul Cooijmans

Scores on Numerical section of The Marathon Test as of 22 January 2023

Contents type: Numerical.   Period: 2005-present

0 *
6 *
8 *
9 *
11 *
12 **
17 *
18 *
19 *
22 *
23 **
25 *
32 ***
33 ****
34 **
36 ****
37 ******
38 *
39 ***
40 ****
41 ****
42 ***
43 *******
44 *********

Correlation of Numerical section of The Marathon Test with other mental ability tests

(Test index) Test name n r
(21) Psychometric Qrosswords110.94
(45) Numerical and spatial sections of The Marathon Test610.92
(42) The Marathon Test380.92
(115) De Laatste Test - Herziening 201950.91
(48) Narcissus' last stand140.90
(43) Test For Genius - Revision 201090.90
(7) The Final Test80.90
(118) Divine Psychometry70.89
(211) Culture Fair Numerical Spatial Examination - Final version (Etienne Forsström)50.89
(28) The Test To End All Tests170.86
(1) Cartoons of Shock110.86
(231) Tests by Greg Grove (aggregate)50.86
(87) Cooijmans Intelligence Test - Form 290.86
(0) Test of the Beheaded Man200.86
(119) A Relaxing Test70.84
(36) Reflections In Peroxide210.83
(39) Combined Numerical and Spatial sections of Test For Genius - Revision 2010160.82
(47) Psychometrically Activated Grids Acerbate Neuroticism90.82
(20) De Golfstroomtest - Herziening 201950.81
(107) The Alchemist Test140.81
(27) Spatial section of Test For Genius - Revision 2004210.80
(238) 916 Test (Laurent Dubois)50.78
(2) Cooijmans Intelligence Test - Form 3350.78
(23) Gliaweb Riddled Intelligence Test - Revision 2011250.77
(114) Dicing with death100.77
(40) Reason Behind Multiple-Choice - Revision 2008220.77
(111) Test For Genius - Revision 2016160.77
(32) Spatial section of The Marathon Test620.76
(26) Verbal section of Test For Genius - Revision 2004150.76
(66) Test For Genius - Revision 200480.75
(112) Combined Numerical and Spatial sections of Test For Genius - Revision 2016220.75
(106) Cooijmans Intelligence Test - Form 4210.74
(109) The Bonsai Test - Revision 2016200.72
(19) Numerical section of Test For Genius - Revision 2010320.72
(226) Logima Strictica 24 (Robert Lato)40.71
(16) Lieshout International Mesospheric Intelligence Test280.70
(113) The Piper's Test100.70
(33) Problems In Gentle Slopes of the first degree120.70
(30) Verbal section of The Marathon Test380.69
(12) Cooijmans On-Line Test - Two-barrelled version110.69
(105) Space, Time, and Hyperspace - Revision 2016220.69
(242) Unknown and miscellaneous tests160.67
(44) Associative LIMIT210.67
(4) A Paranoiac's Torture: Intelligence Test Utilizing Diabolic Exactitude170.67
(258) Tests by Mislav Predavec (aggregate)50.64
(25) The Sargasso Test210.64
(18) The Nemesis Test190.63
(24) Reason - Revision 2008220.63
(103) Problems In Gentle Slopes of the second degree210.62
(225) Logima Strictica 36 (Robert Lato)90.62
(10) Genius Association Test200.61
(108) Verbal section of Test For Genius - Revision 2016160.61
(37) Problems In Gentle Slopes of the third degree170.60
(117) The Hammer Of Test-Hungry - Revision 201360.60
(3) Qoymans Multiple-Choice #5220.58
(15) Letters50.55
(35) Only idiots80.55
(110) Cooijmans Intelligence Test 5200.54
(41) The LAW - Letters And Words50.51
(5) Daedalus Test170.50
(11) Isis Test150.49
(29) Words70.46
(82) Reason40.46
(46) Labyrinthine LIMIT100.44
(234) Strict Logic Sequences Exam I (Jonathan Wai)90.41
(104) The Final Test - Revision 201370.37
(80) Qoymans Multiple-Choice #450.15
(201) Wechsler Adult Intelligence Scales50.14
(240) Strict Logic Spatial Exam 48 (Jonathan Wai)90.06
(223) Strict Logic Sequences Exam II (Jonathan Wai)40.00
(62) Reason Behind Multiple-Choice4-0.07
(215) Tests by Jason Betts (aggregate)4-0.44
(216) Tests by Ivan Ivec (aggregate)4-0.45

Weighted average of correlations: 0.704 (N = 1105, weighted sum = 777.89)

Conservatively estimated minimum g loading: 0.84

Ranking in above table is based on the unrounded correlations. All available data is present in this table, no tests are left out except for those with less than 4 score pairs. All known pairs are used, including possible floor/ceiling scores or outliers.

Estimated loadings of Numerical section of The Marathon Test on particular item types

These are estimated g factor loadings, but against homogeneous tests (containing only particular item types) as opposed to non-compound heterogeneous tests. Although tending to surprise the lay person, it is not uncommon for tests to have high loadings on item types they do not actually contain themselves. Such loadings reflect the empirical fact that most tests for mental abilities measure primarily g, regardless of their contents; that the major part of test score variance is caused by g, and only a minor part by factors germane to particular item types. It is of key importance to understand that this is a fact of nature, a natural phenomenon, and not something that was built into the tests by the test constructors.

Typeng loading of Numerical section of The Marathon Test on that type
Verbal1760.81
Numerical410.81
Spatial1460.83
Logical430.75
Heterogeneous3860.85

N = 792

Compound tests have been left out of this table to avoid overlap.

Balanced g loading = 0.81

National medians for Numerical section of The Marathon Test

Country n median score
Spain442.5
Canada342.0
Australia241.5
China241.0
Belgium240.5
Germany340.0
United_Kingdom237.0
Sweden235.0
India234.5
Korea_South434.5
United_States1533.0
Finland429.5

For reasons of privacy, only countries with 2 or more candidates are included in this table. Ranking is based on the medians, and then alphabetic.

Correlation with national I.Q.'s of Numerical section of The Marathon Test

Correlation of this test with national average I.Q.'s published by Lynn and Vanhanen:

Correlation of Numerical section of The Marathon Test with personal details

Personalia n r
P.S.I.A. True - Revision 2007150.67
P.S.I.A. Neurotic - Revision 2007150.57
P.S.I.A. Orderly - Revision 2007150.53
P.S.I.A. Ethics factor - Revision 2007150.50
Observed behaviour90.48
P.S.I.A. Rational - Revision 2007150.45
Educational level630.42
P.S.I.A. System factor - Revision 2007150.35
P.S.I.A. Introverted - Revision 2007150.29
P.S.I.A. Cold - Revision 2007150.23
P.S.I.A. Aspergoid - Revision 2007150.11
P.S.I.A. Deviance factor - Revision 2007150.08
Sex640.07
P.S.I.A. Just - Revision 2007150.07
P.S.I.A. Extreme - Revision 2007150.03
Cooijmans Inventory of Neo-Marxist Attitudes80.02
P.S.I.A. Cruel - Revision 200715-0.04
Mother's educational level60-0.04
P.S.I.A. Rare - Revision 200715-0.04
Gifted Adult's Inventory of Aspergerisms11-0.04
Father's educational level58-0.06
Year of birth64-0.17
Disorders (parents and siblings)62-0.19
P.S.I.A. Antisocial - Revision 200715-0.25
Disorders (own)63-0.35

Estimated g factor loadings for restricted ranges

In parentheses the number of score pairs on which that estimated g factor loading is based. The goal of this is to verify the hypothesis that g becomes less important, accounts for a smaller proportion of the variance, at higher I.Q. levels. The mere fact of restricting the range like this also depresses the g loading compared to computing it over the test's full range, so it would be normal for these values to be lower than the test's full-range g loading.

Below 1st quartile0.80 (377)
Below median0.80 (625)
Above median0.60 (530)
Above 3rd quartile0.53 (239)

Reliability

Reliability is very high and reflects the facts that there are no problematic items and the test is long enough. However, this very high reliability of a too easy test also demonstrates that the reliability coefficient in itself is not a sufficient indicator of test quality.

Error

Scores by age

Age class n median score
70 to 74134.0
65 to 69228.5
55 to 59344.0
50 to 54542.0
45 to 49741.0
40 to 441039.5
35 to 39933.0
30 to 34437.5
25 to 291034.0
22 to 24540.0
20 or 21343.0
18 or 19434.5
17133.0

N = 64

Scores by year taken

Year taken n median score
2005140.0
2006536.0
2007143.0
2008139.0
2009241.0
2010430.0
2011143.0
2012430.5
2013221.5
2014438.5
2015224.5
2016837.5
2017225.5
2018339.0
2019633.0
2020442.5
2021737.0
2022638.5
2023143.0

ryear taken × median score = -0.02 (N = 64)

Robustness and overall test quality

Item analysis

Item statistics are not published as that would help candidates. To detect bad items, answers and comments from candidates are studied, as well as, for each problem, the correlation with total score on the remaining problems (item-rest correlation) and the proportion of candidates getting it wrong (hardness of the item). Possible bad items are revised, replaced, or removed, possibly resulting in a revised version of the test.