Oh no, robots!


How many of these robot images have I used at this point, I wonder?

Just read an article that two AIs got higher scores in reading comprehension than humans. Granted, they weren't much higher, but that still raises several questions.

Were they testing the general population in reading comprehension? What sort of language are the testers using? Are they using more academic language, or colloquial, or what?

Are reference materials allowed?

Is this result because AIs are getting better at reading comprehension, or because humans are getting worse at it?

After further reading, I found the test used was Stanford University's reading comprehension test, with questions based on Wikipedia, so it sounds more like a general population thing. The numbers were in the low 80s for all three testers (or sets of testers, if the human score was from a group), so I thought at first that the answer was no to the very first question.

And to another part of the first batch of questions - the wording was simple, like that of a trivia quiz. I have no idea who headlined any Superbowl (that's for mac and cheese, right?), but the question's easy enough to understand.

Some more reading gives me a hypothesis I can test on question 2. If I'm reading this correctly, the task is to understand (comprehend) the test questions rather than to parrot a memorized answer, so maybe the quiz does allow a person to look things up (like I implied earlier, sports don't hold my interest, so any question on that topic is something I'd have to do research on). Now I just have to find out whether or not the test permits reference materials.

The reason for question 3 is that I saw a video with some Stateside college kids taking another country's English reading comprehension exam, and the questions shown in the video, while difficult, seemed less so than they made it out to be. Granted, they may have used only a portion of the exam questions, and being on camera may have had an effect, but reading that article brought that video to mind.

I wonder how I'd do on that test.

Featured Posts
Recent Posts