Single Blog Title

This is a single blog caption
04 Sep 2014

Lessons From Chengdu: The Case For ‘Open-Source’ Learning Metric Methods By Trey Menefee

By Trey Menefee, Hong Kong Institute of Education.


A few years ago I tried to learn a research technique called Agent-Based Modeling (ABM) that is used to understand complex adaptive systems. The learning curve for ABM was high because it required researchers to learn how to code. This learning curve was made a little less steep with, a website dedicated to “improving the way we develop, share, and utilize” these models. The ethos of the website was that people could share the codes for their models using a Creative Commons license, describe how it works and what the known issues were, and open their models for critique and improvement from a knowledgeable user community. Arizona State University and the Sante Fe Institute managed the website, ensuring a degree of quality control.

During my involvement with the Learning Metrics Task Force (LMTF) I came to see that the open-ended consultative style that Brookings and UNESCO UIS employed created something of a Rorschach Test. Nearly everyone exposed to the LMTF’s work saw something different in the inkblot. Among other things, what I saw was the possibility of something like for learning metric methodologies. It would be a place where discussions continued and methods evolve rather than having the settled finality of PISA. It would be free and have effectively no barriers to entry. Users would be people who might want to deploy these methodologies, and might need to make trade-offs between cost and rigor, and might need help in seeing the strengths and weaknesses of different assessment methodologies. Both results and methods would be shared by users.

Rebecca Winthrop of Brookings recently wrote that assessments should be seen as a “public good”, that “measures for the indicators recommended for global tracking must be considered a public good, with tools, documentation and data made freely available.” By definition, a proprietary good for sale on the market isn’t be a ‘public good’. The need for a non-proprietary and accessible toolkit like this was recently made apparent to me during a project working with migrant children in Chengdu. Condensing a very complex issue, migrant children in Chinese cities are denied the right to an education in urban public schools. Some can gain access, others enroll in low-cost private schools, and more than sixty million simply stay behind with relatives while their parents go off to work. Huizhi Social Work Service Center, an NGO that works with urban migrant populations, wanted to create a program to help some of these urban students. After consulting with stakeholders they identified English as the subject most in need of reform.

English is required for the middle school entrance exam (the Zhongkao), which was a make-or-break test to determine who would get to study for the National Entrance Exam (the Gaokao), which in turn determined higher education opportunities. My task to assist them with baseline observations and analysis was made difficult for two reasons: (a) comparative assessment methodologies for this demographic are in short supply, and (b) the contexts around student’s performance was political and needed data to be approached from this angle. The first issue is discussed in this post and the second will be addressed in a second post on NORRAG Blog.

For the most part, English assessment is a business. While the Common European Reference Framework (CERF) provides an open and internationally comparable standard for assessing language proficiency, nearly all the tools for doing the assessment are either context-specific (e.g., for a school district) or proprietary (e.g., TOEFL, TOEIC, IELTS, etc.). The internationally comparable assessments are services for sale, not methods for researchers and organizations to use in contexts of poverty and inequality. That these tests have been developed for the market rather than the needs of educational planning or evaluation warped the assessments themselves. The market is almost exclusively focused on the upper end and occupied with issues like international university admissions. There are a dozen good options for assessing whether or not a Beijing adult speaks English well enough for an American graduate school, but there was little of relevance to the ten year old son of a Chengdu construction worker.

For me, this is the most compelling case for the LMTF: there was nowhere for Huizhi to find a high quality, internationally comparable assessment for their own staff to deploy. I was stuck in a maze of proprietary vendors. English assessment experts that I asked could think of no alternatives, usually designing their own assessments for specific purposes. At present, there are few good places to look towards to find the tools for the sort of ‘wake up’ Dr. Banerji called for on NORRAG Blog in an August 2014 post.. TIMMS-PIRLS offers one open-ish model, though it’s arguably centralized, ‘one-size-fits-all’, and discourages innovation or adaptation to other learning subjects like English as a Foreign Language. The students already had such a purpose-built assessment in the Zhongkao, though its value in assessing actual English communicative competency is questionable (this is discussed in the next post on this blog). Ultimately I used a sample test for the TOEFL Junior, one of the few proprietary instruments developed for primary and secondary students, provided on the ETS website.

pic2We collected data in ninth grade classes in five schools in Chengdu. In my interpretation, the results showed that the vast majority of students were almost impossibly far behind in their English. On average, only one in twenty students crossed a threshold that could be considered lower-intermediate level (CEFR B1 equivalent, where they ‘can understand the main points of clear standard input on familiar matters regularly encountered in work, school, or leisure’). In some schools, almost half of the students told us that they found the assessment so difficult that they randomly guessed at answers without trying. These schools had systemically low quality with high turnover rates for teachers, crowded classrooms, and occasionally violent classroom management techniques. Using estimates of tertiary enrollment rates for this demographic, I estimated that out of five hundred students, Huizhi’s plan for increasing scores by 5% might send only six additional children out of 500 to university. The political contexts of what these results  mean are explored in the next post on NORRAG Blog, Lessons From Chengdu: The Political Economy of Learning Metrics.

The ease, though ambiguous legal propriety of using ETS’s copyrighted material, of deploying the TOEFL Junior practice test allowed one of my students, Li Yixing, to conduct her own survey in Shanghai at a cost of only her time. Her preliminary findings suggested that similar students were doing significantly better than those I met in Chengdu, despite being two years younger. While I have doubts about the efficacy and ability to raise migrant English scores in Chengdu, these Shanghai metrics offer a new window into understanding what Shanghai is doing differently, how comparable the groupings really are, and ultimately whether there are lessons from either Shanghai or Chengdu that might be useful for groups like Huizhi. Were we to find a better assessment and share its methodology with other organizations and researchers, there is hope that we could create a more nuanced statistical snapshot of educational inequalities within China than anything that has yet been produced.

Trey Menefee is Lecturer at the Hong Kong Institute of Education and is currently working on the primary statistical report for the 19th Conference of Commonwealth Education Ministers, which focuses on educational quality and inequality. Email:

(Visited 159 times, 1 visits today)
Sub Menu
Back to top