The Myth of the AI Sausage
The Myth of the AI Sausage
White Paper
By human, Brent Ristow, PhD, JD
Recent news cycles are replete with articles and commentary about the presence and power of artificial intelligence (AI), including large language (text), multimodal (pictures and text), audio interpretation, and natural language inferences (interpretive) models.
If your news feed is to be believed, AI is ready for prime time, allowing for total automation of large sectors of the workforce and creating a new economy; the internet boasts things from image creation and paper writing to using AI for your data protection services (please God, no).
When taking physical chemistry as an undergraduate, we learned that the time-independent wave function, the mathematical description of an electron’s movement, is not an actual description but an equation with endless assumptions. Ignoring time, these assumptions separate the equation from reality. Time’s a thing, though.
In the scientific method, you create a hypothesis based on observations, design a study to test that hypothesis, interpret results, and reassess.
AI uses a similar set of equations or rules (algorithms) that work together to digest vast swaths of human knowledge and generate output in response to your prompt (what you type into Chat GPT). How are those outputs tested? By what is called data annotation.
Google, Open AI, and even the United States Department of Defense use data annotation providers like Scale AI and Surge AI to help train their models. On subsidiary platforms that include Outlier AI and Data Annotation, a “tasker” creates a prompt, submits the prompt to the model, the model generates 2-3 responses, the tasker grades the quality of the response, and then writes what the tasker believes is an ideal response. These companies require the tasker to complete this process in only a few hours. The ideal response is where the sausage is being made. The AI will review the ideal response and model future responses thereon. So, who is making the sausage?
To limit costs, these platforms employ only 1099 workers, typically undergraduate students, recent graduates, or individuals looking for some gap support, and pay a typical wage of ~$17/hr., less than what a California fast food worker now makes.
Remember the Google Books Library Project? There, the G worked to scan and make the library of the University of Michigan widely accessible. The scientific information from that project (a decade or so of litigation aside) resulted from years of study and intense laboratory work and only became publicly accessible after a grueling multi-stage peer review process. The product of that process is the foundation of the next generation of human knowledge. AI companies are now trying to condense that rigorous process into tasks they expect to take a couple of hours while relying on individuals working part-time from home.
In testing, these models struggle not only to identify simple words and use basic grammar, but they are ineffective at, on their own, doing something so rudimentary as finding the slope of a line, the area under a curve, or naming chemical compounds.
You cannot ignore the reality that AI training and, therefore, the eventual product is not based on rigorous training. Using AI programs to write a blog post? Great. Using them to protect you and your client’s data, draft your briefs and arguments, or prepare your FDA regulatory filings or responses to PTO office actions? Please, God, NO!
-Brent Ristow, PhD, JD
Founder, Brighton Ashford, LLC