A British Study measuring how detectable and how good AI-Generated examinations answers are: A "Turing Test" Case Study.
We report a rigorous, blind study in which we injected 100% AI written submissions into the examinations system in five undergraduate modules, across all years of study, for a BSc degree in Psychology at a reputable UK university. We found that 94% of our AI submissions were undetected. The grades awarded to our AI submissions were on average half a grade boundary higher than that achieved by real students. Across modules there was an 83.4% chance that the AI submissions on a module would outperform a random selection of the same number of real student submissions.
So they are generally undetectable and better than real pyschology students. Great.
I think AI would do less well in, say, math, but i could be flat wrong on that. It might be better at math and even less detectable. One of the problems with ChatGPT is that it will just make up plausible-sounding sources and information. But if you tell it at the outset that you will lose your job if everything in this report is not true and verifiable, so it has to back everything up carefully, you get a less pleasing-sounding but much more solid report, according to Razib.* It's an interesting world we are entering.
*Or was it Steve Hsu? I'll have to look at the transcripts. Darn. See, I'm already finishing behind AI on these things.
ReplyDelete"But if you tell it at the outset that you will lose your job..."
A robot may not injure a human being or, through inaction, allow a human being to come to harm. A robot must obey orders given it by human beings except where such orders would conflict with the First Law.
Ah yes. Perhaps that is the solution
ReplyDelete