Big Data/Analytics in Teacher Compensation
In 1790 the Pennsylvania state constitution called for publicly funded education for poor students. The assumption was that wealthier families would fund their own children's education, and that it would be superior to the public offering. A decade before that Thomas Jefferson had proposed a two track education system, one that provided different tracks for the "laborers and the learned," which would be designed, in Jefferson's own words, to "rake a few geniuses from the rubbish." Mind you this was in the time of slavery and indentured servitude, so the assumption that students had finite and largely low-level capabilities and that this might warrant their station in life wasn't the least bit controversial. Coincidentally, this aligned with the prejudice that only land owners be allowed a vote in the new world. It was only with the eventual adoption of publicly-funded education across the country and in the 1940's the introduction of compulsory year-round secondary education and college scholarships for returning WWII GIs that America's education policies aligned with an economic growth strategy to produce a bourgeoning Middle Class.
In the 1970s government leaders began exploring how to measure the effectiveness of its K-12 instructors. Presumably the massive differences in regional and school by school student performance were clearly visible by this point. In Kevin Carey's May 19th New York Times article entitled "The Little Known Statistician Who Taught Us to Measure Teachers," he introduces William Sanders, a Tennessee statistician with a doctorate in stats and quantitative genetics who proposed that standardized testing paired with state-determined student 'growth trajectory' goals could together be harnessed to determine the effectiveness of a student's teachers year by year. The article glosses over how Carey accounts for talent, wealth and home life, suggesting that those factors are 'baked in' to the a students expected growth, insinuating that somehow state officials would accurately factor in how these variables should affect a student's expected performance to the own standards metrics. This assumption alone raises a number concerns about the work. Do local officials set lower expectations for student performance in lower income communities? How does one control for talent in setting performance expectations across the population or account for outside educational assistance that might demonstrate a teacher to be more successful, when in fact a student might be able to afford to hire outside tutors?
Then, as now, the politics around teacher compensation and employment were hotly debated. Years of experience and achievement of advanced degrees have been stalwart criteria for determining teacher pay and tenure, a practice that has long been cherished by state and national teacher unions. However, Sanders' algorithms suggested that neither pay nor years of experience ensured strong teacher performance. What he found was that the best teachers were far out-performing the lowest performing teachers, but that most were in the the middle. Soon state and local official began using his study to find ways to identify strong performers and to hire or retain more like them, while removing low-rated instructors. In the midst of the educational policy firestorm that ensued Carey shares that Sanders "made no apologies for the fact that his methods were too complex for most of the teachers whose jobs depended on them to understand." This again is another red herring because most administrators and politicians can be assumed to have no less depth in understanding of the statistical methods than the teachers, and your leadership can not tell you how your performance variables are being weighted, how are you to know or be expected to improve? Sander's methodology, known as the value-added approach, suggested that individual teachers had the capacity to alone impact student performance. By the 2000s reform policies focused on standardized testing as a cornerstone criteria of teacher evaluations, driving the federal government and societal power players like Bill Gates to invest millions in furthering the development teacher effectiveness improvement measures. Can there be any more difficult task than quantifying the value of a teacher's soft skills and partnering those criteria with dozens of other variables that would impact student performance?
Sander's value-added methodology is at the center of the Common Core debate, taking heat from both the left and the right sides of education reformers. Notably, in 2014, the American Statistical Association issued a statement that effectively put some significant distance between themselves and Sander's value-added methodology, suggesting the high year to year performance variability according to the testing raised questions on the reliability of the methodology. The practice, however, is already deeply ingrained in the high-stakes employment decisions for professional educators, and unlikely to be entirely scraped.
Aside from my already stated concerns about gaps in Sanders methods, he appears to be engaged in making numerous erroneous assumptions. First, that testing accurately measures all students performance. I have read many times over the years that standardized student testing has been slanted for cultural relevance (read as: white and affluent), and unfairly discriminates against those outside the bubble. Notably, Carl Brigham, the originator of the SAT, did research in the 40's proving that immigrants weren't as intelligent as others; they were, in his words, "feeble-minded." Not exactly the mindset you want steering the grading metrics for college admittance. Secondly, Sanders and partners assume that children can be expected to learn at the same pace as one another. If Johnny demonstrates 15% improvement this year, James should be able to as well. Child development experts now suggest that opposite. Like with physical growth, some kids will experience intellectual growth spurts that may not align with their peers, nor can they be expected to achieve long-term academic parity because they sit in the same class, school, or district. Even within a family, a child whose parents expect them display the same strengths as a high performing sibling, are likely to experience long-lasting resentment of those expectations. This leads to my third point. Children are wired for different expertise, and as the old Harvard joke goes, "The A students rule the library, the C+ students rule the world." Ultimately the goal of education is preparation for life and career, and that's not always easy to test, particularly not in a narrowly fit standardized test. Four, Sanders appears to be downplaying and controlling for the importance of outside influence. J. D. Vance in his book "Hillbilly Elegy" talks about how his home life disastrously influenced his early years of education. His family's marital troubles, drug and alcohol use, and moving, and poverty all played into his inability to focus or care about focusing in school. Despite having found a way to thrive, he can not offer easy solutions to provide help for future generations of children like him. Yet, Sanders and company seek to quantify and place blame at the feet of his teachers. Ergo, my fifth point, 200 years ago the French Ministry of Education hired Alfred Binet to develop a method of intelligence testing that would allow them to identify and marginalize young people they deemed to be unfit for their classical education system. In essence, they were developing metrics that would allow them to cop out by saying, 'these people' are people who aren't worth investing in. We should not have to invest our educational resources in them because they lack a successful trajectory in the system. Of course, today this would be unheard of, and even if children appear to have intellectual deficits, they oftentimes have unique gifts as well, and should be provided as best an education 'as is possible' and 'within reason', both criteria which we are also constantly debating. Notice however, French Ministry's method favored the state, and our method favors the tax-paying constituent.
In closing, the truth around what influences student performance and where accountability resides, lies somewhere between the student-as-flawed theory and and Sander's teacher-as-flawed approach. It is important as we incorporate big data across various student-centric and teacher-centric metrics that we use a multi-disciplinary approach to interrogate the appropriateness and true measurability of those metrics so that we don't just do a better job of obfuscating prejudice than we have in past centuries.
In the 1970s government leaders began exploring how to measure the effectiveness of its K-12 instructors. Presumably the massive differences in regional and school by school student performance were clearly visible by this point. In Kevin Carey's May 19th New York Times article entitled "The Little Known Statistician Who Taught Us to Measure Teachers," he introduces William Sanders, a Tennessee statistician with a doctorate in stats and quantitative genetics who proposed that standardized testing paired with state-determined student 'growth trajectory' goals could together be harnessed to determine the effectiveness of a student's teachers year by year. The article glosses over how Carey accounts for talent, wealth and home life, suggesting that those factors are 'baked in' to the a students expected growth, insinuating that somehow state officials would accurately factor in how these variables should affect a student's expected performance to the own standards metrics. This assumption alone raises a number concerns about the work. Do local officials set lower expectations for student performance in lower income communities? How does one control for talent in setting performance expectations across the population or account for outside educational assistance that might demonstrate a teacher to be more successful, when in fact a student might be able to afford to hire outside tutors?
Then, as now, the politics around teacher compensation and employment were hotly debated. Years of experience and achievement of advanced degrees have been stalwart criteria for determining teacher pay and tenure, a practice that has long been cherished by state and national teacher unions. However, Sanders' algorithms suggested that neither pay nor years of experience ensured strong teacher performance. What he found was that the best teachers were far out-performing the lowest performing teachers, but that most were in the the middle. Soon state and local official began using his study to find ways to identify strong performers and to hire or retain more like them, while removing low-rated instructors. In the midst of the educational policy firestorm that ensued Carey shares that Sanders "made no apologies for the fact that his methods were too complex for most of the teachers whose jobs depended on them to understand." This again is another red herring because most administrators and politicians can be assumed to have no less depth in understanding of the statistical methods than the teachers, and your leadership can not tell you how your performance variables are being weighted, how are you to know or be expected to improve? Sander's methodology, known as the value-added approach, suggested that individual teachers had the capacity to alone impact student performance. By the 2000s reform policies focused on standardized testing as a cornerstone criteria of teacher evaluations, driving the federal government and societal power players like Bill Gates to invest millions in furthering the development teacher effectiveness improvement measures. Can there be any more difficult task than quantifying the value of a teacher's soft skills and partnering those criteria with dozens of other variables that would impact student performance?
Sander's value-added methodology is at the center of the Common Core debate, taking heat from both the left and the right sides of education reformers. Notably, in 2014, the American Statistical Association issued a statement that effectively put some significant distance between themselves and Sander's value-added methodology, suggesting the high year to year performance variability according to the testing raised questions on the reliability of the methodology. The practice, however, is already deeply ingrained in the high-stakes employment decisions for professional educators, and unlikely to be entirely scraped.
Aside from my already stated concerns about gaps in Sanders methods, he appears to be engaged in making numerous erroneous assumptions. First, that testing accurately measures all students performance. I have read many times over the years that standardized student testing has been slanted for cultural relevance (read as: white and affluent), and unfairly discriminates against those outside the bubble. Notably, Carl Brigham, the originator of the SAT, did research in the 40's proving that immigrants weren't as intelligent as others; they were, in his words, "feeble-minded." Not exactly the mindset you want steering the grading metrics for college admittance. Secondly, Sanders and partners assume that children can be expected to learn at the same pace as one another. If Johnny demonstrates 15% improvement this year, James should be able to as well. Child development experts now suggest that opposite. Like with physical growth, some kids will experience intellectual growth spurts that may not align with their peers, nor can they be expected to achieve long-term academic parity because they sit in the same class, school, or district. Even within a family, a child whose parents expect them display the same strengths as a high performing sibling, are likely to experience long-lasting resentment of those expectations. This leads to my third point. Children are wired for different expertise, and as the old Harvard joke goes, "The A students rule the library, the C+ students rule the world." Ultimately the goal of education is preparation for life and career, and that's not always easy to test, particularly not in a narrowly fit standardized test. Four, Sanders appears to be downplaying and controlling for the importance of outside influence. J. D. Vance in his book "Hillbilly Elegy" talks about how his home life disastrously influenced his early years of education. His family's marital troubles, drug and alcohol use, and moving, and poverty all played into his inability to focus or care about focusing in school. Despite having found a way to thrive, he can not offer easy solutions to provide help for future generations of children like him. Yet, Sanders and company seek to quantify and place blame at the feet of his teachers. Ergo, my fifth point, 200 years ago the French Ministry of Education hired Alfred Binet to develop a method of intelligence testing that would allow them to identify and marginalize young people they deemed to be unfit for their classical education system. In essence, they were developing metrics that would allow them to cop out by saying, 'these people' are people who aren't worth investing in. We should not have to invest our educational resources in them because they lack a successful trajectory in the system. Of course, today this would be unheard of, and even if children appear to have intellectual deficits, they oftentimes have unique gifts as well, and should be provided as best an education 'as is possible' and 'within reason', both criteria which we are also constantly debating. Notice however, French Ministry's method favored the state, and our method favors the tax-paying constituent.
In closing, the truth around what influences student performance and where accountability resides, lies somewhere between the student-as-flawed theory and and Sander's teacher-as-flawed approach. It is important as we incorporate big data across various student-centric and teacher-centric metrics that we use a multi-disciplinary approach to interrogate the appropriateness and true measurability of those metrics so that we don't just do a better job of obfuscating prejudice than we have in past centuries.
Comments
Post a Comment