The Speak Assessment is a computerized, adaptive test of receptive and productive language ability. It provides scores of overall CEFR level, and a composite score made up of the CEFR scores of its different scored parameters (Vocabulary, Grammar, Phonology, Fluency, and Cohesion) from 20-120. One of its purposes is for higher educational admissions, and placement into college or preparatory English language programs. As such, it is important to ascertain whether the test is valid for this purpose. One of the ways to measure validity is concurrent validity. That is, how well does the test do what other, similar tests do.
In Israel, the commonly used English language exams for admissions and placement are the Amir, Amiram, and Psychometric tests. All three of the tests measure receptive English proficiency, specifically for academic contexts. The Amir and Psychometric tests are both paper and pencil tests. The Amiram test is a computerized, adaptive version of the Amir test. The Psychometric test also has a computerized version. All three tests measure English proficiency through multiple choice questions, with three questions types. The question types on all three tests are sentence completion, restatement, and reading comprehension. As such, these tests use reading comprehension and vocabulary as measures of English proficiency. All three tests are scored on a scale of 50-150. All three tests are produced by the Israeli National Institute of Testing and Evaluation (NITE) and all three tests are highly correlated with each other (NITE, 2021).
Although the Speak assessment is primarily a test of productive language ability (the ability to produce language), and the Amir, Amiram, and Psychometric exams test receptive language ability (the ability to understand language), these proficiencies are related. Although most people tend to have stronger receptive than productive skills, the two types of proficiency are correlated. Accordingly, we would expect that students who are stronger in production are also the students who receive the higher scores in reception. Accordingly, one would expect that if a test is valid, the ranking of students by one test would be similar to that of another, established test. This study examines the relationship of the Speak Assessment of Spoken English with the Amir, Amiram, and Psychometric tests.
Students applying to an English program at a college in Israel (n=51) took the Speak Assessment as part of their admissions process. The Amir/Amiram/Psychometric scores were compiled by the college and reported to Speak with identifying information removed.
The scores of the two tests were compared with each other. The Amir/Amiram/Psychometric tests are scored on a scale of 50-150. The Speak assessment is scored on an overall score of 20-120 and each test also receives and overall CEFR score (A1, A2, B1, B2, C1, C2) with A1 the lowest and C2 the highest. Pearson and Spearman correlations were used to test the relationships between the two tests. Pearson correlations were used to compare the overall scores of both tests, because both types of scores are continuous. To compare the Amir/Amiram/Psychometric scores and the CEFR level obtained by the Speak Assessment, Spearman correlations were used, because the CEFR levels are ordinal and scalar.
Speak Overall Scores
Figure 1 Relationship between Speak Overall Score and Amir/Amiram/Psychometric Score
|Speak Overall Score||Pearson's r||—|
|Amir/Amiram /Psychometric score||Pearson's r||0.947||—|
Figure 2 Relationship between CEFR level and Amir/Amiram/Psychometric score
|Amir/Psych||CEFR Speaking Score|
|CEFR Speaking Score||Spearman's rho||0.965||—|
Both the overall score of the Speak Assessment and the CEFR score obtained by the Speak Assessment were highly correlated with the scores of the Amir/Amiram/Psychometric tests (r=0.947, p<.001, and rho=0.965, p<.001). This suggests that the Speak Assessment can be used for the same purposes as these exams, that is, admissions and placement to Israeli colleges and universities.
Bernstein, J., A. Van Moere and J. Cheng, (2010). Validating automated speaking tests. Language Testing 27(3) 355–377 DOI: 10.1177/0265532210364404
Deygers, B. (2018) University Entrance Language Tests Examining Assumed Equivalence. In Davis, John McE.; Useful Assessment and Evaluation in Language Education, pp. 169-184. Georgetown University Press.
Bruno Falissard (2012). psy: Various procedures used in psychometry. link.
Matthias Gamer and Jim Lemon and Ian Fellows Puspendra Singh (2019). irr: Various Coefficients of Interrater Reliability and Agreement. link.
Gallucci, M. (2019). GAMLj: General analyses for linear models. [jamovi module]. Retrieved from https://gamlj.github.io/.
Kennet-Cohen, T., Bronner, S. & Oren, C. ( 1999). The Predictive Validity of the Components of the Process of Selection of Candidates for Higher Education in Israel retrieved from Research Reports - National Institute for Testing & Evaluation (nite.org.il)
Seol, H. (2020). seolmatrix: Correlations suite for jamovi. [jamovi module]. Retrieved from https://github.com/hyunsooseol/seolmatrix/. link
Sireci, S and M. Faulkner-Bond (2015). Promoting Validity in the Assessment of English Learners. Review of Research in Education , March 2015, Vol. 39, Teacher Assessment and the Assessment of Students With Diverse Learning Needs, pp. 215-252 Promoting Validity in the Assessment of English Learners on JSTOR
The jamovi project (2020). jamovi. (Version 1.2) [Computer Software]. Retrieved from https://www.jamovi.org