Evens - statistics

Introduction

Evens consisted of 15 easy number series that were later included in The Bonsai Test - Revision 2016

Scores on Evens as of 15 February 2023

Contents type: Numerical. Period: 2002-2003

n: 29
Median: 15.0
Quartile deviation: 0.0
Range: 9
Maximum possible: 15
Male median: 15.0 (n = 27)
Unknown sex median: 11.0 (n = 2)
Proportion of males among candidates: 0.931
Hardness: 0.04
Resolution: 0.00

7	*
13	***
14	**
15	***********************

Correlation of Evens with other mental ability tests

Test name	n	r
Qoymans Multiple-Choice #3	5	0.89
Qoymans Automatic Test #2	4	0.87
Tests by Greg Grove (aggregate)	7	0.71
Qoymans Multiple-Choice #2	10	0.67
Sigma Test (Melão Hindemburg)	5	0.63
International High IQ Society tests (aggregate)	5	0.60
Odds	6	0.55
Wechsler Adult Intelligence Scales	4	0.50
Qoymans Multiple-Choice #1	12	0.46
Cattell Culture Fair	5	0.41
Analogies of Long Test For Genius	6	0.41
Unknown and miscellaneous tests	13	0.41
Qoymans Multiple-Choice #4	6	0.40
Non-Verbal Cognitive Performance Examination (Xavier Jouve)	7	0.38
Numbers	8	0.38
Tests by Nicolas Elenas (aggregate)	5	0.36
Association subtest of Long Test For Genius	7	0.28
New York High I.Q. Society tests	4	0.20
Qoymans Multiple-Choice #5	5	0.06
The Final Test	9	0.05
Qoymans Automatic Test #1	4	0.00
Analogies #1	7	-0.04
Space, Time, and Hyperspace	12	-0.10
Reason Behind Multiple-Choice - Revision 2008	5	-0.26
Reason - Revision 2008	5	-0.38
Cartoons of Shock	5	-0.83
Raven's Advanced Progressive Matrices (I.Q.)	4	-0.90

Weighted average of correlations: 0.271 (N = 175)

Estimated g factor loading: 0.52

Ranking in above table is based on the unrounded correlations. All available data is present in this table, no tests are left out except for those with less than 4 score pairs. All known pairs are used, including possible floor/ceiling scores or outliers.

The correlations here appear lower than they really are because of the strong ceiling effect.

Estimated loadings of Evens on particular item types

These are estimated g factor loadings, but against homogeneous tests (containing only particular item types) as opposed to non-compound heterogeneous tests. Although tending to surprise the lay person, it is not uncommon for tests to have high loadings on item types they do not actually contain themselves. Such loadings reflect the empirical fact that most tests for mental abilities measure primarily g, regardless of their contents; that the major part of test score variance is caused by g, and only a minor part by factors germane to particular item types. It is of key importance to understand that this is a fact of nature, a natural phenomenon, and not something that was built into the tests by the test constructors.

Type	n	g loading of Evens on that type
Verbal	67	0.60
Numerical	14	0.67
Spatial	12	-0.32
Logical	5	-0.61
Heterogeneous	25	0.46

N = 123

Compound tests have been left out of this table to avoid overlap.

Balanced g loading = 0.16

National medians for Evens

Country	n	median score
Sweden	3	15.0
United_States	3	15.0

For reasons of privacy, only countries with 3 or more candidates are included in this table. Ranking is based on the medians, and then alphabetic.

Correlation with national I.Q.'s of Evens

Correlation of this test with national average I.Q.'s published by Lynn and Vanhanen:

r = 0.60 (n = 19)

Correlation of Evens with personal details

Personalia	n	r
Disorders (parents and siblings)	10	0.33
Disorders (own)	10	0.25
Educational level	10	0.09
Year of birth	23	-0.01
Father's educational level	9	-0.71
Mother's educational level	9	-0.82

Estimated g factor loadings for restricted ranges

In parentheses the number of score pairs on which that estimated g factor loading is based. The goal of this is to verify the hypothesis that g becomes less important, accounts for a smaller proportion of the variance, at higher I.Q. levels. The mere fact of restricting the range like this also depresses the g loading compared to computing it over the test's full range, so it would be normal for these values to be lower than the test's full-range g loading.

Below 1st quartile	0.46 (193)
Below median	0.46 (193)
Above median	NaN (0)
Above 3rd quartile	NaN (0)

Reliability

Split-half (odd-even) = 0.78
Split-half (1, 2… 5, 6… vs. 3, 4… 7, 8…) = 0.85
Cronbach's alpha = 0.85

Error

Standard error = 0.6 raw score points

Scores by age

Age class	n	Median score
40 to 44	1	15.0
35 to 39	3	15.0
30 to 34	4	15.0
25 to 29	6	15.0
22 to 24	4	15.0
20 or 21	2	14.0
18 or 19	2	15.0
17	1	15.0

N = 23

Scores by year taken

Year taken	n	median score
2002	27	15.0
2003	2	15.0

r_{year taken × median score} = NaN (N = 29)

Robustness and overall test quality

Robustness by month = 0.67 (r_{raw scores × months} = 0.09)
Quality = 0.495

Item analysis

Item statistics are not published as that would help candidates. To detect bad items, answers and comments from candidates are studied, as well as, for each problem, the correlation with total score on the remaining problems (item-rest correlation) and the proportion of candidates getting it wrong (hardness of the item). Possible bad items are revised, replaced, or removed, possibly resulting in a revised version of the test.

[More statistical reports]