Web Content Writing & Model Collapse

Whether we're talking about human or AI, it seems this erosion of the quality of information available to us online will continue to happen.

Photo by Saeed Karimi from Unsplash.com.

Web Content Writing and Model Collapse

0:00

/352.608

There's a popular children's game called pass the message or telephone where kids form a line and whisper a secret message to each other, and the last kid compares the final message heard with the original to see if it's preserved. Usually, it isn't. And that's what makes the game fun: because no matter how carefully the players try to listen and relay the message, in the end, it comes out distorted.

Many years ago, when I began my career in digital marketing as a web content writer, I didn't know that's essentially what I was doing: I was communicating information as best as I understood it.

And often, I got it right. But sometimes, sadly, I knew I was way off the mark.

We were a team of Filipino writers writing articles about anything under the sun, from the history of ploughs to tips on how to avoid drawing like a 5-year-old. Our company in Pasig provided a host of business process outsourcing (BPO) services, including content writing, to clients around the world. Back in those days, I didn't fully understand search engine optimization (SEO). I was just happy I could withdraw some money from the ATM down the street and have lunch at KFC.

But as years went by and I got deeper into digital marketing, I realized the game I was playing: I pretended to be the expert at ploughs, ancient Egypt, Caribbean offshore banking, spark plugs, you name it – and I got paid.

Before we go any further, I want to stress something: there's absolutely nothing wrong with this. Throughout my work as a web content writer, I did the best possible online research within the timelines given to me to make sure I conveyed the best information I could find. I personally know many web content writers who spend inordinate amounts of time studying the most technical of topics to make sure their articles are as accurate as possible.

Heck, there was a time I wrote a history essay for a college student in the United States just based on free previews from Google Books, and it apparently got such a high grade that they kept requesting me to do more homework for them.

Web content writing was—and still is—a tough job. You have to be smart and a really good writer to keep getting hired.

But like I said, there were times I knew what I was writing was just total BS.

Before AI tools emerged, the fastest way to get information was online search. Of course, you could search Google Books or actually download free e-books for research, but with a day or two max as a deadline to produce a draft, practically nobody did that unless the client specifically asked (which was the case in the freelance job I mentioned). For some very technical topics, you had no choice but to make sense of what little available documentation was out there on the web. Sometimes, information was so scarce that you just... winged it.

Your editor—if there was one—didn't know any better, anyway. To make their quota, they would likely approve your draft.

Worse, there were times the only content available to you as references were clearly also just SEO content other SEO teams - most likely from the Philippines or India - had written. Were they accurate? I'm sure a lot of them were. Others definitely were just winging it, too. Bad grammar was a telltale sign someone else was also just trying to hit their quota to afford lunch at KFC.

Depending on who you ask in the SEO world, this issue may or may not have improved over the years. Google has released so many core updates and spam detection algorithms that cracked down on such websites, but there are those who believe search engine results pages are still swamped by poorly written, inaccurate content today.

You'd think this degradation of information is finally ending with AI, but no. In fact, it might potentially get worse.

It's called model collapse. At the start, generative AI produces output based on scraped data, many from the internet. While a good portion of this output should be reliable, a fraction of them will be far from perfect. Some might be downright hallucinations.

The problem is that as AI output, both satisfactory and slop, proliferates across the web, they become input for newer iterations of AI models, and this vicious cycle of garbage in, garbage out causes the generated data to drift away from what is true.

To illustrate, a Redditor kept prompting ChatGPT to create an exact replica of a picture of Dwayne Johnson. Somehow, the AI convinced itself that The Rock had purple lips, and the 101st generated image ended up looking like an abstract painting by an amateur art student.

While this example is not a textbook case of model collapse, it describes the principle. If we assume the input is polluted by imperfect AI output, then the next generated content will be even uglier.

It's the game of telephone, just with potentially far-reaching consequences in a wide range of industries, including customer service and even finance.

So am I saying human web content writers are still better? You know, that might be a question of whether companies are paying them enough to afford something better than KFC or not.

Kidding aside, while AI models improve by training, it's a far more complex challenge for people. Hiring better writers is a solution that has worked for me as a manager, but again, that means offering competitive compensation and looking to educational institutions to continue molding highly skilled writers.

Regardless, whether we're talking about human or AI, it seems this erosion of the quality of information available to us online will continue to happen. In the game, this distortion is fun. But in the real world, its implications are terrifying.