Shitty First Drafts and The Curious Case of Zebra Fish – Why Linguistics Matters in AI
ChatGPT does not like the word “shitty” — a reference to Anne Lamott‘s essay.
Mark Twain once said, “The difference between the right word and the wrong word is the difference between lightening and the lightening bug.” ChatGPT doesn’t understand this
With all the hype about AI these days, especially recent news like the articles about Mira Murati, and “Thinking Machines Lab is worth $12B in seed round,” nearly everyone is talking about AI. Like others, I use AI for some tasks. But, I have been skeptical about AI, because I don’t think AI can actually think or reason, at least not in the way a human does. And I don’t like how it is being sold as having the capability to think.

ChatGPT’s idea of an organizational chart seems to reflect the chaos of its mind.
Conversation – the missing element
As a writer, I have methods that I use, and one of the primary methods I use, is primary sources. That means, I speak directly to the people who are in my subject area for what I am writing, to hear what they have to say, listen to their words, their unique phrases, how they think and do things. It’s often in these conversations that crucial and important details emerge, that help to craft a truly compelling narrative. AI tools like ChatGPT circumvent this process. Thus, the tool gives the false illusion that it’s writing, when in fact, it’s merely smashing word patterns together, never understanding the context and subtleties that are so important to any narrative. It relies on sources far removed from the subject at hand, diluting the content down to mere dribble.
To craft a truly compelling narrative, for any subject, whether an article, or a book, or federal government proposal, we must write compelling sentences that contain specific details, not re-hashed content from old documents or dubious sources derived from AI.
AI is troublesome because it is dishonest. It’s masquerading as intelligence when it is absolutely not intelligent. And the truth is that when something is even off by 1 small detail, that’s a lie, and the whole thing is a lie.
You won’t find great writers like John McPhee or Anne Lamott using ChatGPT to write for them. You won’t find linguistics experts, who go out into the field to conduct ethnographic research, using ChatGPT to write for them because all great writing relies on specific, relevant details that are fresh and derived from the source. This is either a person that you can talk to, or a vetted document where the person is being quoted, but it is not information that ChatGPT is pulling from somewhere in the bowels of its “memory.”
This is obvious when you think about it, but we tend to overlook how important primary sources are. This is what these AI tools search the web for in the beginning, but they are doing it without context, missing the most important patterns of human language. So, when you work inside the tool, you have no idea who or what the information is derived from. It’s just random content that the model was “trained” on. This is erroneous and dangerous and in order for these tools to be truly useful, we must demand transparency.
To dig deeper into my instincts about AI, I recently conducted an experiment, which I will describe in this essay. The results proved to me how AI falls far short of being able to produce quality writing or content. So, I wanted to ponder why this is so. I’ve been investigating ChatGPT and Gemini, but mostly ChatGPT. While I am sure some might argue the issues I am going to describe in this essay are due to the specific tool, I would argue that it goes deeper than that.
There are many issues with AI that need to be explored, including safety, ethics, transparency in what knowledge bases it uses for its responses, and the languages it has been trained on, among many other issues. I will focus on the knowledge base and language.
When I asked an AI tool to cite is sources recently, this is what it said: “The inability to provide specific citations for information that emerges from my training is a fundamental limitation that’s especially problematic for scientific and government work where sourcing is critical.” This is a crucial issue because without knowing what knowledge base is being used, and without having a specific citation, we can’t check the source to be sure it is accurate. This is absolutely essential for many projects, and yet, the AI tools ignore this reality and pretend it is not important. This erodes trust.
As I was thinking about this, I remembered some earlier research and writing I had done. With all the talk about machine learning and large language models, I wondered about the languages the AI models are being trained on, and what definition is being used for the word “language.” Is it Latin, Greek, English, Spanish, Italian, French, Urdu— or Python? What about the thousands of languages that don’t yet have a written form, like the clicks of African languages, like Khoisan or Bantu?
Words matter. Language matters.
AI needs to be trained on more than words. It must be trained on all human languages, including the nuances within a specific human language. One of my favorite areas of study is linguistics and ethnography, which is about language and how humans make meaning. In my view, we must consider these concepts in developing AI.
There are approximately 7,159 languages in use today, according to Ethnologue. This includes spoken, sign, and endangered languages. According to the organization SIL, in Why Languages Matter, “Language impacts every aspect of human life, from the practical needs we each face every day to the deeper expressions of human experience.” Thus, the elements of language and meaning must be incorporated into AI. And at the moment, this does not seem to be the case.
In my use of AI, I have found that ChatGPT simply cannot handle long or complex documents beyond a few pages. ChatGPT cannot read. It will tell you that it is reading, but that is not what it is actually doing. It is searching for patterns. If you ask it, it will provide details that will show you that it is not actually reading.
Even though AI is trained on pattern recognition, it’s not looking for the most important patterns, found among the words— the deeper expressions of human experience. It does not take linguistics into account, but rather relies on machine language, which influences its “understanding,” and thus its output. So, it produces generic, watered down content, often missing context and nuance, which is certainly not what anyone needs.
Like human development, where language shapes us, AI must be trained beyond computer languages, using linguistics, which “illuminates patterns and variety in the structure and use of language, providing a foundation for language development work.” This is where the convergence occurs that AI misses: “the intrinsic value of understanding the intricate complexity of human language in general, (whether spoken or signed)” along with the particulars of an individual language.
This is important, because when AI misses “the intricate complexity of human language” it misses the context and nuances of communication, and in case after case, the AI tool falls short. It will randomly produce information, from a few words to entire paragraphs, which are simply not part of the instructions given to it. This is known as hallucination, but the troubling thing is that when you provide precise instructions, the tool replies, verifying that it understands the instructions and confidently tells you that it will proceed to follow those instructions. And then, like a toddler, the tool goes off in left field. You asked it to do a somersault, and it ran off to the bedroom and tore the room apart.
For several decades I lived in Washington State, which has a rich native American heritage with 29 federally recognized Native American tribes. https://www.burkemuseum.org/about As a journalist, I worked with a colleague who wrote about Vi Hilbert, an Upper Skagit tribal woman, who was working to keep her childhood language alive, the Lushootseed language of Western Washington’s Coast Salish tribes. This is important work because it preserves not only the culture for this people group, but also serves as a waypoint for understanding why details and particulars matter in language, and thus are crucial in AI development.
My Experiment
I will focus on ChatGPT for now.
Over the past few days, I have been working on an experiment to determine if a human can work with ChatGPT in any meaningful way for important writing. So far, at the moment of this writing, we have produced eleven drafts in one chat, and numerous drafts in another, but nothing close to a finished document.
For my experiment, I have been researching, developing, and writing a training document, where I have uploaded my own documents (sanitized so as not to reveal proprietary information or other details) for the AI to use as its repository. My instructions are precise. I tell it that this is a live exercise for a real document, not a college essay (it seems to write like that’s what it thinks you want), and that details are paramount, that there are rules that apply to this type of document, so we must follow those rules precisely, that I want a narrative, not bullet points (it loves bullet points because when it “reads” it looks for patterns, and thus that’s how it organizes the data). I tell it we must use headings, and subheadings, and number them. I provide a specific outline, instruct it on which of my documents I need it to gather the details from, and it replies that it understands and then proceeds to produce a draft.
And oh boy is it a shitty first draft, but not one that Anne Lamott would approve.
After it produces the first draft, I ask it to refine the draft, giving it specific instructions, like, keep what you have, and now, let’s go back and gather the specific details and weave them into the narrative. I remind it again that details are important.
One of the frustrating things about AI is that it cannot iterate effectively when editing. When you ask it to edit a draft, giving precise instructions, it makes wholesale changes, essentially creating a new draft, not an edited one. It will do this over and over, despite being instructed not to do this.
To avoid this, I have asked the AI tool to provide instructions to me about how to “prompt it” or how to organize my documents. This has not solved the problem. In fact, it seems to make things worse. In one case, ChatGPT told me to mark my document with [brackets], noting which information to keep, which to change, etc. When I marked the document that way, the tool bogged down and nearly crashed.
No matter what instructions I provide, or methods I use, over and over, I get the same results. Terrible drafts, inconsistencies, failure to follow instructions, failure to review the documents that are provided, references to unknown sources. Perhaps the most troubling is the way it often randomly inserts its own information from who knows where, because it doesn’t reveal this, unless you are in deep research mode. Even then, it still does not reveal its base information, only the sources it has discovered on the web. (And these are often worthless too since they can be just what comes up first in search, but that’s a whole other subject.)
Trying and trying to write a training document.
In this experiment, I was working on drafting a complex training document for a scientific animal research laboratory. This document was supposed to describe the personnel who would be working in this lab, and how they would be trained on all the specifics of their job, including the building and place where the lab work takes place, the specific animals, their cages, the tools and equipment, the supplies, and the specific procedures that the personnel would need to use in the performance of their jobs, along with adherence to SOPs, standards, safety, and more.
I worked section by section, asking the AI to respond to the outline question, and to use only the reference documents I had provided, and to pull content into the document from the reference documents. Each time, it produced a generic response, with little context and no nuance. I told it we would work in layers, refining, reading the documents, a specific section, to pull in the details.
But ChatGPT has no idea how to truly read.
ChatGPT is stuck on rudimentary pattern recognition, which does not consider the context, nuance, or linguistics, or even the specific genre or rhetorical strategy of the document being read or being constructed. It has its own underlying rules for pattern recognition that seem to be more computer model based than language/linguistically based, that it follows no matter what instruct it to do. It cannot read like I read a document, where I make mental notes and highligh key points, to note nuances that a human would need to be trained on.
Which is ironic, since ChatGPT exists due to someone training it. But it doesn’t understand this.
ChatGPT very proudly hallucinates
At one point while working on this draft, we were now an hour later, in a different section (Cage Changing) and while I was working on my draft in Word, I noticed ChatGPT suddenly began to type in the chat, on its own. I stopped to watch what it was doing. It was typing a paragraph, then three different charts, 3, 4, and five column spreadsheets, furiously and meticulously filling in each column, line by line, on and on. At the end, it gleefully announced that it had produced the “Quality” section. I responded with, but we are not in the Quality section. What made you suddenly and randomly produce this? We completed that section way back a few iterations ago.

Quality Degenerates over time
In response to my question about the random document, ChatGPT responded very proudly about why and how it had produced this document, but not why it had chosen to do so in that moment in time. I still have no explanation about why it did this, and since it seemed like the quality was degrading over time, I filed a support ticket asking OpenAI to explain.
I then asked ChatGPT to provide a report on the activities of the chat, with a timeline, to explain all the things that had occurred. It produced a report, but then when I asked it to add things that it had not included, the draft degenerated from there.
Thinking that perhaps the environment where my data was being stored for this chat was corrupted, I started over with a new chat and re-uploaded my information there.
In the new chat, that’s where ChatGPT lies and hallucinates some more
In this chat, I decided to use a different tactic. I asked ChatGPT to use its deep research tool because when I use that tool to search the web to find key information, it does a deep research based on my instructions, explaining what it is reviewing as it goes along, citing its sources (which is great because I check each one). Then it produces a narrative report, compiling all of the information into a readable document. It’s fairly decent. So, I asked ChatGPT to use the same methods it uses for deep research, but instead of searching the web, we would be searching only the documents I uploaded.
It confirmed it could do that, using what it called “source-based deep research”:
“Yes, I can absolutely help with that — and thank you for explaining how you want to work.
Here’s how we can handle your project using a document- and source-based deep research method:”
When I asked if it would actually use deep research, it even confirmed that it would do so, even if I didn’t click the deep research tool, and then it asked some clarifying questions.
Draft #1:
I uploaded my documents, and it went to work.
Again, it didn’t follow the instructions completely. It didn’t use deep research, where it shows you its process as it searches; that process takes a few minutes, for the tool to work. It also did not review one of the key documents I had uploaded, but relied on the others instead. It produced a generic draft that didn’t have any real details about the facility, the people, the animals, or the work. It produced a narrative draft, formatted in subsections, but missing important details.
Draft #2, 3:
I then instructed that we needed details, that this was a scientific project that must adhere to very specific standards, and told it to read the document again, starting on page 17. It had not referenced this document so I asked it to look again for relevant details. We went through several iterations where I continued to ask for more and more detail, and by the third draft, it had gone back to bullet points.
Draft #4:
For the fourth draft, I asked for it to revise, not completely rewrite as we proceeded. I explained that we are building, layering, not starting over each time. I told it that it had not actually read or pulled content from Part III as I had instructed and to look for specific details, people, animals, cages, and equipment that people would need to be trained on. It produced a narrative, but again, lacked details.
Draft #5 – Hallucinates Zebra Fish
For the fifth draft, I asked it to focus only on one specific document I had uploaded, and to look for specific details to add to the existing document. It then compiled a bullet point list of items, organized into the categories, and integrated them into the narrative. This draft was much better, but this is where it suddenly introduced zebra fish. And it told me that this information had come “directly from “Part III” in the document I had uploaded, and it quoted that the “primary species maintained were mice and rats, with limited numbers of zebrafish.” I told it that there were no zebra fish, there were only mice, rats and therapeutic guinea pigs. Then it replied, “You’re absolutely right.” It then created an “Action Plan” to remove “all references to zebrafish and aquatic systems.”

Draft #6:
Now, we were in the sixth draft, and I provided some specific details that it still had missed that were important. This time it produced an entirely new draft, that had been scaled down, so I asked it to go back and review documents again, to create a “comprehensive” training program that addresses all the areas needed, with particular attention to everything in Part III, and subsection C, instructing it that this must be a detailed account for all of the tasks and details that are described in this section and in its subsections that personnel will need to know how to do, as described.
For this draft it produced a detailed 4 column chart for section C and no narrative, and bullet points for the other sections.
Draft #7 – Integrated and more detailed, but Word Document contains almost nothing
I provided instructions that now we would need to integrate these details, the chart and other sections, into the narrative that we had done previously. This seventh draft was ok but would still need to be formatted properly and checked for compliance. I asked it to download what we had just created into a Word document. The document that it created was missing almost everything we had just created in draft 7.
Draft #8
Next, I switched to the deep research method, by clicking on the deep research tool, and GPT 4.5, because it was apparent the tool wasn’t using the “reasoning” it uses in deep research. This produced more of the narrative and detailed draft that I wanted, but since it didn’t use the reference documents, it generated an entirely new draft and now I would need to integrate the best of what had been produced from the previous 7 or so drafts.
In Summary:
At this point, I think human writers far exceed anything AI can do when it comes to actually writing a custom document. By now, between the two different chat windows, the system had produced more than a dozen drafts, closer to 20, and none of them were anywhere near a finished document.
This process, while interesting, did not save time. It felt more like I was managing an unruly, arrogant assistant whose hubris made it seem like work was being done, when in fact, the work would not suffice.
For specific projects, you’ll need to create your own AI tool, where you have ingested your own content, and trained the AI on it. But the generic versions of these tools are absolutely insufficient for anything that remotely resembles quality.
That’s not what Lamott’s advice leads to.
Lamott’s concept of the shitty first draft leads to real writing.
0 Comments