Statistics of The Sargasso Test

Scores on The Sargasso Test as of 30 April 2022

Contents type: Verbal, numerical, spatial, logical. Period: 2007-present

n: 64
Median: 33.0
Quartile deviation: 4.0
Range: 40
Maximum possible: 65
Male median: 33.0 (n = 61)
Female median: 30.0 (n = 3)
Resolution: 0.41 (0.58 if half points are frequently given)

7	*
16	*
20	*
23	*
25	**
26	**
27	*
28	***
29	*****
29.5	*
30	*******
31	***
32	*
33	****
34	***
35	**
35.5	***
36	***
36.5	*
37	****
38	***
39	****
40	*
40.5	*
41	*
42	****
46	*

Correlation of The Sargasso Test with other tests by I.Q. Tests for the High Range

(Test index) Test name	n	r
(55) Spatial Insight Test	4	0.98
(54) Test of Shock and Awe	6	0.89
(21) Psychometric Qrosswords	11	0.80
(30) Verbal section of The Marathon Test	19	0.78
(20) De Golfstroomtest - Herziening 2019	4	0.76
(48) Narcissus' last stand	16	0.75
(118) Divine Psychometry	7	0.75
(57) Space, Time, and Hyperspace	6	0.75
(113) The Piper's Test	12	0.74
(42) The Marathon Test	16	0.74
(28) The Test To End All Tests	23	0.74
(108) Verbal section of Test For Genius - Revision 2016	16	0.72
(7) The Final Test	16	0.71
(106) Cooijmans Intelligence Test - Form 4	27	0.69
(114) Dicing with death	10	0.69
(0) Test of the Beheaded Man	26	0.68
(33) Problems In Gentle Slopes of the first degree	12	0.67
(87) Cooijmans Intelligence Test - Form 2	11	0.67
(36) Reflections In Peroxide	20	0.65
(44) Associative LIMIT	29	0.64
(111) Test For Genius - Revision 2016	16	0.64
(31) Numerical section of The Marathon Test	19	0.63
(23) Gliaweb Riddled Intelligence Test - Revision 2011	24	0.63
(3) Qoymans Multiple-Choice #5	34	0.63
(83) KIT Intelligence Test - first attempts	4	0.62
(32) Spatial section of The Marathon Test	19	0.62
(45) Numerical and spatial sections of The Marathon Test	18	0.61
(37) Problems In Gentle Slopes of the third degree	24	0.60
(43) Test For Genius - Revision 2010	11	0.60
(109) The Bonsai Test - Revision 2016	29	0.59
(35) Intelligence Quantifier by assessment	21	0.59
(18) The Nemesis Test	20	0.57
(10) Genius Association Test	32	0.57
(2) Cooijmans Intelligence Test - Form 3	33	0.56
(80) Qoymans Multiple-Choice #4	16	0.55
(26) Verbal section of Test For Genius - Revision 2004	23	0.54
(107) The Alchemist Test	11	0.54
(4) A Paranoiac's Torture: Intelligence Test Utilizing Diabolic Exactitude	22	0.53
(41) The LAW - Letters And Words	6	0.52
(47) Psychometrically Activated Grids Acerbate Neuroticism	11	0.50
(66) Test For Genius - Revision 2004	16	0.50
(40) Reason Behind Multiple-Choice - Revision 2008	34	0.50
(29) Words	6	0.49
(112) Combined Numerical and Spatial sections of Test For Genius - Revision 2016	19	0.49
(24) Reason - Revision 2008	34	0.48
(1) Cartoons of Shock	22	0.47
(16) Lieshout International Mesospheric Intelligence Test	31	0.46
(105) Space, Time, and Hyperspace - Revision 2016	19	0.45
(15) Letters	7	0.40
(27) Spatial section of Test For Genius - Revision 2004	24	0.39
(11) Isis Test	23	0.38
(19) Numerical section of Test For Genius - Revision 2010	27	0.38
(110) Cooijmans Intelligence Test 5	20	0.34
(53) Qoymans Multiple-Choice #3	6	0.34
(79) Association subtest of Long Test For Genius	4	0.30
(12) Cooijmans On-Line Test - Two-barrelled version	15	0.29
(103) Problems In Gentle Slopes of the second degree	18	0.27
(39) Combined Numerical and Spatial sections of Test For Genius - Revision 2010	13	0.25
(82) Reason	8	0.24
(75) Analogies of Long Test For Genius	4	0.23
(68) Numbers	11	0.22
(84) Bonsai Test	6	0.21
(5) Daedalus Test	12	0.18
(116) Gliaweb Riddled Intelligence Test (old version)	4	0.16
(62) Reason Behind Multiple-Choice	7	0.15
(104) The Final Test - Revision 2013	11	0.11
(117) The Hammer Of Test-Hungry - Revision 2013	10	0.10
(69) Odds	5	0.03

Weighted average of correlations: 0.538 (N = 1100, weighted sum = 591.84)

Conservatively estimated minimum g loading: 0.73

Ranking in above table is based on the unrounded correlations. All available data is present in this table, no tests are left out except for those with less than 4 score pairs. All known pairs are used, including possible floor/ceiling scores or outliers.

Correlation of The Sargasso Test with tests by others

(Test index) Test name	n	r
(240) Strict Logic Spatial Exam 48	4	0.96
(212) Raven's Advanced Progressive Matrices (raw)	4	0.93
(236) International High IQ Society Miscellaneous tests	4	0.91
(211) Culture Fair Numerical Spatial Examination - Final version	6	0.87
(235) Nonverbal Cognitive Performance Examination	4	0.79
(223) Strict Logic Sequences Exam II	5	0.65
(239) Titan Test	7	0.60
(201) Wechsler Adult Intelligence Scales	10	0.25
(242) Unknown and miscellaneous tests	34	0.24
(225) Logima Strictica 36	10	0.13
(234) Strict Logic Sequences Exam I	8	-0.02
(220) Cattell Culture Fair	6	-0.29
(231) Mysterium Entrance Exam	6	-0.47

Weighted average of correlations: 0.318 (N = 108, weighted sum = 34.31)

Please be aware that correlations with these external tests are in most cases affected (depressed, typically) by one or more of the following: (1) Little overlap with the object test because of the much lower ceilings and inherent ceiling effects of the tests used in regular psychology; (2) Candidates reporting scores selectively, for instance only the higher ones while withholding lower ones; (3) Candidates reporting, or having been reported by psychometricians, incorrect scores.

Estimated loadings of The Sargasso Test on particular item types

These are estimated g factor loadings, but against homogeneous tests (containing only particular item types) as opposed to non-compound heterogeneous tests. Although tending to surprise the lay person, it is not uncommon for tests to have high loadings on item types they do not actually contain themselves. Such loadings reflect the empirical fact that most tests for mental abilities measure primarily g, regardless of their contents; that the major part of test score variance is caused by g, and only a minor part by factors germane to particular item types. It is of key importance to understand that this is a fact of nature, a natural phenomenon, and not something that was built into the tests by the test constructors.

Type	n	g loading of The Sargasso Test on that type
Verbal	234	0.77
Numerical	62	0.63
Spatial	103	0.71
Logical	54	0.61
Heterogeneous	406	0.75

N = 859

Compound tests have been left out of this table to avoid overlap.

Balanced g loading = 0.69

National medians for The Sargasso Test

Country	n	median score
United_Kingdom	3	38.0
Netherlands	2	37.5
Sweden	5	37.0
Romania	2	36.0
Spain	5	36.0
Canada	2	35.5
France	2	35.5
Germany	3	35.5
India	2	32.8
United_States	19	32.0
Finland	4	30.0
Korea_South	3	29.0
Greece	2	23.5

For reasons of privacy, only countries with 2 or more candidates are included in this table. Ranking is based on the medians, and then alphabetic.

Correlation with national I.Q.'s of The Sargasso Test

Correlation of this test with national average I.Q.'s published by Lynn and Vanhanen:

r = -0.01 (n = 63)

Correlation of The Sargasso Test with personal details

Personalia	n	r
Observed associative horizon	6	0.62
P.S.I.A. Rare - Revision 2007	18	0.59
P.S.I.A. Introverted - Revision 2007	18	0.53
P.S.I.A. Orderly - Revision 2007	18	0.50
P.S.I.A. Deviance factor - Revision 2007	18	0.49
P.S.I.A. True - Revision 2007	18	0.47
P.S.I.A. Ethics factor - Revision 2007	18	0.45
P.S.I.A. System factor - Revision 2007	18	0.45
P.S.I.A. Extreme - Revision 2007	18	0.43
P.S.I.A. Antisocial - Revision 2007	18	0.36
P.S.I.A. Aspergoid - Revision 2007	18	0.31
P.S.I.A. Neurotic - Revision 2007	18	0.31
P.S.I.A. Just - Revision 2007	18	0.29
P.S.I.A. Cold - Revision 2007	18	0.28
Educational level	61	0.26
P.S.I.A. Rational - Revision 2007	18	0.23
Father's educational level	58	0.04
Disorders (parents and siblings)	60	0.02
Sex	64	0.00
Cooijmans Inventory of Neo-Marxist Attitudes	11	-0.02
Observed behaviour	14	-0.03
Year of birth	63	-0.04
Mother's educational level	59	-0.10
Disorders (own)	60	-0.14
Gifted Adult's Inventory of Aspergerisms	14	-0.18
P.S.I.A. Cruel - Revision 2007	18	-0.27

Estimated g factor loadings upward and downward of particular scores

In parentheses the number of score pairs on which that estimated g factor loading is based. The goal of this is to verify the hypothesis that g becomes less important, accounts for a smaller proportion of the variance, at higher I.Q. levels. The mere fact of restricting the range like this also depresses the g loading compared to computing it over the test's full range, so it would be normal for both values to be lower than the test's full-range g loading.

Raw score	Upward g (N)	Downward g (N)
0	0.73 (1100)	NaN (0)
29	0.66 (885)	0.60 (253)
33	0.53 (574)	0.63 (544)
37	0.61 (248)	0.71 (885)
41	NaN (0)	0.70 (1012)
65	NaN (0)	0.73 (1100)

Reliability

Split-half (odd-even) = 0.86
Split-half (other division) = 0.82
Cronbach's alpha = 0.84

Remark: These reliability coefficients are low for a stand-alone I.Q. test (.9 or higher is normal) but logical if one considers that the test consists exclusively of "bad" items that were previously removed from other tests.

Error

Standard error = 2.7 raw score points

Scores by age

Age class	n	median score
65 to 69	2	33.5
60 to 64	2	31.8
50 to 54	4	35.5
45 to 49	9	34.0
40 to 44	6	36.0
35 to 39	7	35.5
30 to 34	5	30.0
25 to 29	12	34.0
22 to 24	8	33.0
20 or 21	3	35.0
18 or 19	2	29.0
17	1	30.0
15	1	28.0
13	1	32.0

N = 63

Scores by year taken

Year taken	n	median score
2007	7	33.0
2008	8	37.0
2009	1	30.0
2010	4	29.0
2011	1	39.0
2012	4	35.5
2013	1	29.0
2014	4	35.0
2015	3	25.0
2016	1	40.5
2017	4	33.0
2018	5	37.0
2019	4	33.3
2020	9	31.0
2021	6	31.0
2022	2	35.0

r_{year taken × median score} = -0.00 (N = 64)

Robustness and overall test quality

Robustness by chronological rank = 0.74
Robustness by month = 0.74 (r_{raw scores × months} = -0.10)
Quality = 0.704

Item analysis

Item statistics are not published as that would help future candidates. To detect bad items, answers and comments from candidates are studied, as well as, for each problem, the correlation with total score and the proportion of candidates getting it wrong (hardness of the item). Possible bad items are removed or revised, resulting in a revised version of the test; however, in the present test, bad items are left in as the test was purposely constructed out of bad items previously removed from other tests.

Nevertheless, about 60 of the 65 items display normal statistical behaviour in this test environment.

[More statistical reports]