Research Guides

▒ Testing/Assessment



On This Page


Subject Encyclopedia Articles


Handbook Chapters

  • Chalhoub-Deville, M., & Deville, C. (2005). A look back at and forward to what language testers measure. In E. Hinkel (Ed.), Handbook of research in second language teaching and learning (pp. 815–831). Mahwah, NJ: Lawrence Erlbaum Associates.
  • Chapelle, C. (2011). Validation in language assessment. In E. Hinkel (Ed.), Handbook of research in second language teaching and learning (Vol. 2, pp. 795–855). New York, NY: Routledge.
  • Davies, A., & Elder, C. (2005). Validity and validation in language testing. In E. Hinkel (Ed.), Handbook of research in second language teaching and learning (pp. 795–813). Mahwah, NJ: Lawrence Erlbaum Associates.
  • Douglas, D. (2005). Testing languages for specific purposes. In E. Hinkel (Ed.), Handbook of research in second language teaching and learning (pp. 857–868). Mahwah, NJ: Lawrence Erlbaum Associates.
  • Jamieson, J. (2011). Assessment of classroom language learning. In E. Hinkel (Ed.), Handbook of research in second language teaching and learning (Vol. 2, pp. 768–785). New York, NY: Routledge.
  • Kunnan, A. J. (2005). Language assessment from a wider context. In E. Hinkel (Ed.), Handbook of research in second language teaching and learning (pp. 779–794). Mahwah, NJ: Lawrence Erlbaum Associates.
  • Leung, C. (2005). Classroom teacher assessment of second language development: Construct as practice. In E. Hinkel (Ed.), Handbook of research in second language teaching and learning (pp. 869–888). Mahwah, NJ: Lawrence Erlbaum Associates.
  • Lumley, T., & Brown, A. (2005). Research methods in language testing. In E. Hinkel (Ed.), Handbook of research in second language teaching and learning (pp. 833–855). Mahwah, NJ: Lawrence Erlbaum Associates.
  • Ross, S. (2011). The social and political tensions of language assessment. In E. Hinkel (Ed.), Handbook of research in second language teaching and learning (Vol. 2, pp. 786–797). New York, NY: Routledge.
  • Schoonen, R. (2011). How language ability is assessed. In E. Hinkel (Ed.), Handbook of research in second language teaching and learning (Vol. 2, pp. 701–716). New York, NY: Routledge.
  • Young, R. (2011). Interactional competence in language learning, teaching, and testing. In E. Hinkel (Ed.), Handbook of research in second language teaching and learning (pp. 426–443). New York, NY: Routledge.

Review Articles


Introductory Books

  • Davidson, F., & Lynch, B. K. (2008). Testcraft: A teachers guide to writing and using language test specifications. London: Yale University Press.
  • Edge, J. (1992). Mistakes and correction. New York, NY: Longman.
  • Ellis, R., & Barkhuizen, G. (2005). Analyzing learner language. Oxford, England: Oxford University Press.
  • Finch, A., & Shin, D. (2005). Integrating teaching and assessment in the EFL classroom: A practical guide for teachers in Korea. Seoul: Sahoipyungnon.
  • Fulcher, G., & Davidson, F. (Ed.). (2012). The Routledge handbook of language testing. Oxon, England.
  • Hughes, A. (2007). Testing for language teachers.
  • Mackey, A. (2008). Conversation interaction in second language acquisition. Oxford, England: Oxford University Press.
  • Mackey, A. (2012). Input, interaction and corrective feedback in L2 learning. Oxford, England: Oxford University Press.
  • Pawlak, M. (2014). Error correction in the foreign language classroom: Reconsidering the issues. Berlin, Germany: Springer.
  • Sheen, Y. (2011). Corrective feedback, individual differences and second language learning. New York, NY: Springer.
  • Swan, M., & Smith, B. (2001). Learner English: A teacher’s guide to interference and other problems (2nd ed.). Cambridge, England: Cambridge University Press.
  • 이완기. (2012). 영어 평가 방법론 (개정3판). 서울: 문진미디어.

Recent Articles in IGSE Journals


  • Aryadoust, V., & Zhang, L. (2016). Fitting the mixed Rasch model to a reading comprehension test: Exploring individual difference profiles in L2 reading. Language Testing, 33(4), 529–553.
  • Attali, Y. (2016). A comparison of newly-trained and experienced raters on a standardized writing assessment. Language Testing, 33(1), 99–115.
  • Babaii, E., Taghaddomi, S., & Pashmforoosh, R. (2016). Speaking self-assessment: Mismatches between learners and teachers criteria. Language Testing, 33(3), 411–437.
  • Biber, D., Gray, B., & Staples, S. (2016). Predicting patterns of grammatical complexity across language exam task types and proficiency levels. Applied Linguistics, 37(5), 1–31.
  • Bocci, M. C. (2016). Youth participatory action research in world language classrooms. Foreign Language Annals, 49(3), 455–478.
  • Bridgeman, B., Cho, Y., & DiPietro, S. (2016). Predicting grades from an English language assessment: The importance of peeling the onion. Language Testing, 33(3), 307–318.
  • Byun, J.-H. (2016). Investigating the suitability of vocabulary strength test framework to EFL context. 외국어교육/Foreign Languages Education, 23(1), 133–168. Retrieved from
  • Carroll, P. E., & Bailey, A. L. (2016). Do decision rules matter? A descriptive study of English language proficiency assessment classifications for English-language learners and native English speakers in fifth grade. Language Testing, 33(1), 23–52.
  • Chalhoub-Deville, M. (2016). Validity theory: Reform policies, accountability testing, and consequences. Language Testing, 33(4), 453–472.
  • Chen, F., & Chalhoub-Deville, M. (2016). Differential and long-term language impact on math. Language Testing, 33(4), 577–605.
  • Chon, Y. V., & Kim, S. (2016). Adolescent EFL learners’ perceived use of listening test-taking strategies and L2 proficiency. 영어어문교육/English Language & Literature Teaching, 22(2), 1–22. Retrieved from
  • Clifford, R. (2016). A rationale for criterion-referenced proficiency testing. Foreign Language Annals, 49(2), 224–234.
  • Crossley, S. A., Kyle, K., & McNamara, D. S. (2016). The development and use of cohesive devices in L2 writing and their relations to judgments of essay quality. Journal of Second Language Writing, 32, 1–16.
  • Davin, K. J., & Heineke, A. J. (2016). Preparing teachers for language assessment: A practice-based approach. TESOL Journal, 7(4), 921–938.
  • Davis, J. M. (2016). Toward a capacity framework for useful student learning outcomes assessment in college foreign language programs. The Modern Language Journal, 100(1), 377–399.
  • Davis, L. (2016). The influence of training and experience on rater performance in scoring spoken language. Language Testing, 33(1), 117–135.
  • Dolosic, H. N., Brantmeier, C., Strube, M., & Hogrebe, M. C. (2016). Living language: self-assessment, oral production, and domestic immersion. Foreign Language Annals, 49(2), 302–316.
  • Elder, C., & McNamara, T. (2016). The hunt for “indigenous criteria” in assessing communication in the physiotherapy workplace. Language Testing, 33(2), 153–174.
  • Foster, P., & Wigglesworth, G. (2016). Capturing accuracy in second language performance: The case for a weighted clause ratio. Annual Review of Applied Linguistics, 36, 98–116.
  • Han, S.-M., & Lee, Y.-W. (2016). Developing a collocation test for Korean EFL learners: Test validation and item format effect. 한국영어학/Korean Journal of English Language and Linguistics, 16(2), 221–244. Retrieved from
  • Harsch, C., & Hartig, J. (2016). Comparing C-tests and Yes/No vocabulary size tests as predictors of receptive language skills. Language Testing, 33(4), 555–575.
  • Hsu, T. H.-L. (2016). Removing bias towards World Englishes: The development of a Rater Attitude Instrument using Indian English as a stimulus. Language Testing, 33(3), 367–389.
  • Huang, H.-T. D. (2016). Exploring strategy use in L2 speaking assessment. System, 63, 13–27.
  • Innami, Y., & Koizumi, R. (2016). Task and rater effects in L2 speaking and writing: A synthesis of generalizability studies. Language Testing, 33(3), 341–366.
  • Jang, S. Y., & Kim, T. (2016). 래쉬 모형을 사용한 표준화 시험의 타당성 평가: 시·도 교육청 주관 중학생 영어듣기시험을 중심으로 [Exploring validity of the nationwide standardized English listening tests using the rasch measurement model]. 글로벌영어교육학회/Studies in English Education, 21(2), 115–145. Retrieved from
  • Kang, H. D. (2016). Development of creativity test Items for English dducation in the EFL context. 초등영어교육/Primary English Education, 22(3), 5–33. Retrieved from
  • Kaplan, C. S. (2016). Alignment of world language standards and assessments: a multiple case study. Foreign Language Annals, 49(3), 502–529.
  • Kim, H.-D. (2016). 컴퓨터 기반 그림 묘사하기 시험 문항의 수행평가 실용도. 영상영어교육/STEM Journal, 17(1), 221–239. Retrieved from
  • Kim, H., & Lee, H. (2016). 대학수학능력시험 영어 영역 화행 분석: 대화 듣기 자료를 중심으로 [Speech acts in the CSAT English conversational materials]. 글로벌영어교육학회/Studies in English Education, 21(3), 115–141. Retrieved from
  • Kim, H.-J. (2016). Peer assessment in a university writing class. 외국어교육/Foreign Languages Education, 23(1), 47–65. Retrieved from
  • Kim, J. S. (2016). 초등영어 수업에서의 문화 간 의사소통능력 평가 틀과 평가도구 개발 방향 연구 [A study on the direction of developing the assessment framework and tool for intercultural communicative competence in primary English classrooms]. 초등영어교육/Primary English Education, 22(2), 187–214. Retrieved from
  • Kim, N. (2016). Korean test takers" TOEIC-Speaking and OPIc test preparation. 응용언어학/Korean Journal of Applied Linguistics, 32(3), 51–76. Retrieved from
  • Kim, Y., Tracy-Ventura, N., & Jung, Y. (2016). A measure of proficiency or short-term memory? Validation of an elicited imitation test for SLA research. The Modern Language Journal, 100(3), 655–673.
  • Kissau, S., & Adams, M. J. (2016). Instructional decision making and IPAs: assessing the modes of communication. Foreign Language Annals, 49(1), 105–123.
  • Ko, H. (2016). Difficulty levels and characteristics of TOEFL listening items for Korean EFL learners. 영어어문교육/English Language & Literature Teaching, 22(3), 97–126. Retrieved from
  • Kyle, K., Crossley, S. A., & McNamara, D. S. (2016). Construct validity in TOEFL iBT speaking tasks: Insights from natural language processing. Language Testing, 33(3), 319–340.
  • Lee, H. (2016). 이분식 영어 쓰기 평가 문항 난이도 및 변별도 분석: 고전검사이론과 문항반응이론을 활용하여 [Analyzing item difficulty and discrimination in a dichotomously scored writing test: Focus on classical testing theorem and item response theory]. 글로벌영어교육학회/Studies in English Education, 21(3), 235–259. Retrieved from
  • Lee, H. (2016). 기자회견식 영어토론 상황에서 동료간 상호작용 분석: 평가적 관점에서 [Analyzing peer-to-peer interaction features in the media-conference English debate: From the perspective of speaking assessment]. 영어어문교육/English Language & Literature Teaching, 22(2), 237–258. Retrieved from
  • Lee, I. (2016). Putting students at the centre of classroom L2 writing assessment. Canadian Modern Language Review, 72(2), 258–280.
  • Lee, J. W. (2016). The role of vocabulary and grammar in different L2 reading comprehension measures. 영어교육/ENGLISH TEACHING, 71(3), 79–97. Retrieved from
  • Lee, Y. (2016). Investigating the feasibility of generic scoring models of e-rater for TOEFL iBT independent writing tasks. 영어교육연구/English Languae Teaching, 28(1), 101–122. Retrieved from
  • Lee, Y.-W., Chodorow, M., & Gentile, C. (2016). Investigating patterns of writing errors for different L1 groups through error-coded ESL learners’ essays. 외국어교육/Foreign Languages Education, 23(1), 169–190. Retrieved from
  • Li, H., Hunter, C. V., & Lei, P.-W. (2016). The selection of cognitive diagnostic models for a reading comprehension test. Language Testing, 33(3), 391–409.
  • Macqueen, S., Pill, J., & Knoch, U. (2016). Language test as boundary object: Perspectives from test users in the healthcare domain. Language Testing, 33(2), 271–288.
  • Manias, E., & McNamara, T. (2016). Standard setting in specific-purpose language testing: What can a qualitative study add? Language Testing, 33(2), 235–249.
  • Mann, W., Roy, P., & Morgan, G. (2016). Adaptation of a vocabulary test from British Sign Language to American Sign Language. Language Testing, 33(1), 3–22.
  • Martel, J., & Bailey, K. M. (2016). Exploring the trajectory of an educational innovation: instructors’ attitudes toward IPA implementation in a postsecondary intensive summer language program. Foreign Language Annals, 49(3), 530–543.
  • McNamara, T., Van Den Hazelkamp, C., & Verrips, M. (2016). LADO as a language test: issues of validity. Applied Linguistics, 37(2), 262–283.
  • Nix, J.-M. L. (2016). Measuring latent listening strategies: development and validation of the EFL listening strategy inventory. System, 57, 79–97.
  • Norris, J. M. (2016). Current uses for task-based language assessment. Annual Review of Applied Linguistics, 36, 230–244.
  • Ockey, G. J., & French, R. (2016). From one to multiple accents on a test of L2 listening comprehension. Applied Linguistics, 37(5), 1–24.
  • OHagan, S., Pill, J., & Zhang, Y. (2016). Extending the scope of speaking assessment criteria in a specific-purpose language test: Operationalizing a health professional perspective. Language Testing, 33(2), 195–216.
  • Park, Y.-J., & Kim, S.-Y. (2016). 초등영어 수행평가에서 루브릭과 피드백이 학습자의 성취도, 자신감, 자율성에 미치는 영향 [Scoring rubric and teacher feedback in English performance assessment: Their effects on primary school students′ achievement, confidence, and autonomy]. 초등영어교육/Primary English Education, 22(1), 97–111. Retrieved from
  • Pill, J. (2016). Drawing on indigenous criteria for more authentic assessment in a specific-purpose language test: Health professionals interacting with patients. Language Testing, 33(2), 175–193.
  • Pill, J., & McNamara, T. (2016). How much is enough? Involving occupational experts in setting standards on a specific-purpose language test for health professionals. Language Testing, 33(2), 217–234.
  • Piper, B., & Zuilkowski, S. S. (2016). The role of timing in assessing oral reading fluency and comprehension in Kenya. Language Testing, 33(1), 75–98.
  • Prefontaine, Y., Kormos, J., & Johnson, D. E. (2016). How do utterance measures predict raters perceptions of fluency in French as a second language? Language Testing, 33(1), 53–73.
  • Russell, V., & Davidson Devall, K. F. (2016). An examination of the edTPA portfolio assessment and other measures of teacher preparation and readiness. Foreign Language Annals, 49(3), 479–501.
  • Schiefele, U., & Schaffner, E. (2016). Factorial and construct validity of a new instrument for the assessment of reading motivation. Reading Research Quarterly, 51(2), 221–237.
  • Song, B. C. in a P. S. T., & Lee, H. (2016). Interactional competence in a paired speaking test. 영어교육연구/English Languae Teaching, 28(2), 133–152. Retrieved from
  • Thompson, G. L., Cox, T. L., & Knapp, N. (2016). Comparing the OPI and the OPIc: the effect of test method on oral proficiency scores and student preference. Foreign Language Annals, 49(1), 75–92.
  • Weaver, C. (2016). The TOEIC IP test as a placement test: its potential formative value. JALT Journal, 38(1), 5–25.
  • Wette, R., & Hawken, S. J. (2016). Measuring gains in an EMP course and the perspectives of language and medical educators as assessors. English for Specific Purposes, 42, 38–49. Retrieved from
  • Wicking, P. (2016). The role of formative assessment in global human resource development. JALT Journal, 38(1), 27–43.
  • Woodward-Kron, R., & Elder, C. (2016). A comparative discourse study of simulated clinical roleplays in two assessment contexts: Validating a specific-purpose language test. Language Testing, 33(2), 251–270.
  • Yan, X., Maeda, Y., Lv, J., & Ginther, A. (2016). Elicited imitation as a measure of second language proficiency: A narrative review and meta-analysis. Language Testing, 33(4), 497–528.
  • Yang, J., & Lee, H. (2016). Exploring students" voices in the development of learner-generated rubrics for EFL debate assessment. 한국영어학/Korean Journal of English Language and Linguistics, 16(3), 537–562. Retrieved from
  • Yook, C., & Chin, C. (2016). Culture teaching, world Englishes, and the college scholastic aptitude test. 영어교육연구/English Languae Teaching, 28(2), 153–169. Retrieved from
  • Zapata, G. C. (2016). University students’ perceptions of integrated performance assessment and the connection between classroom learning and assessment. Foreign Language Annals, 49(1), 93–104.