Getting Certainty on Human Versus Synthetic Media

30 January 2025 — Joshua Chalifour

The Authors Guild (USA) launched its Human Authored Certification program (29 January 2025). This certification permits its members to register their written works with the Guild as having been created by a human rather than something produced by artificial intelligence (AI). These are some first thoughts I had when reading about the program and reflecting on initiatives with similar goals.

The Human Authored Certification program is an interesting and I believe reasonable effort in response to the many generative AI works flooding our world. Already there have been dangerous outcomes when people acquire these AI-generated texts and rely on their content. For example, books sold about mushroom identification that include false information on toxicity, false research outputs, unreliable historical descriptions, fabricated biographical details, and more are among the problems with these easy-to-produce AI outputs. In addition to the potential harms of AI-produced misinformation, critical issues regarding how we value the unique quality of human creativity must be central to understanding the cultural impact of AI. All of which is to say that if people cannot distinguish works produced by AI when they acquire them, we lose a critical ability to understand the work before us.

While I laud this effort from the Author's Guild and believe this is an important start, I do not think it is enough. It has some problems, which might become more visible if people rely too heavily on it. First, it is a national program based in the USA, which is perfectly fine for them but of course this is a global issue that requires efforts, ideally coordinated and collaborated on by trusted institutions from around the world.

To gain certification, an author must be a member of the Guild and provide information about their work-to-be certified. The Guild then provides a searchable database that people can access to verify works for themselves. I hope that this database will permit open connectivity and licence its content in an open way. I don't think that this system could provide absolute certainty that the author hasn't generated their work with AI but that's also probably not quite the goal. Perhaps that would be an impossible task. At least the system can help people gain confidence that a work they are engaging with was at a minimum claimed to be written by a real human. That is something valuable but that is something that I hope will be very visible too. I believe there is a risk that people will overly-rely on this type of database as claiming more than it possibly can verify.

The Author's Guild explains its boundaries around what is considered AI-generated versus what is acceptable in its press release (Authors Guild Launches “Human Authored” Certification to Preserve Authenticity in Literature) as:

"The use of the Human Authored mark is restricted to books and other works where the text was written by humans, except for a de minimis amount of AI-generated text to accommodate uses of AI-powered grammar and spell-check applications and other minor AI use. Use of AI as a tool, other than to generate text, such as for research or brainstorming, does not disqualify a book, as long as the text was human written."

I interpret this purpose as not to bar the use of digital techniques but rather to ensure some confidence in the primacy of the human-created content.

From another angle, there are people working on systems to identify works produced by AI (or not). For example there is the Coalition for Content Provenance and Authenticity (C2PA) and the resulting Content Credentials. This open technical standard is an effort to help people trace the history of a work of digital content. By examining its provenance people can better determine whether a work was generated by AI or not. This seems to be a valuable effort to pursue though not without some risks. For example, I noted that it was initially begun by Adobe and Microsoft and I think there is reason to fear the potential of enabling large companies such as those to seize control over verification and trust aspects of the digital ecosystem. If only digital systems that include their support are able to provide assurance of this kind of standard, it would focus what is considered acceptable in the hands of a few powerful entities, potentially excluding broader means of the commons. This note however is a brief commentary that requires further, fair thought. Another consideration would be things like Google's SynthID that is designed to watermark AI-generated works in imperceptible ways, which enable people to identify these works as synthetic media. But this is also not perfect. Nothing forces bad-actors to engage with these systems.

I've long felt that it is absolutely necessary that we have techniques to verify and gain certainty about the sources of our creative, informational, research, and other works whether they are in text or other forms. In my opinion, memory institutions such as libraries, archives, and museums are among the principal institutions suited to take up this responsibility. Libraries have long been reliable sources of verification for informational needs. Libraries devote significant focus on systems of authority and ought to be central in developing new techniques for demarking and verifying works produced by humans versus those generated by digital tools such as AI. However, the new initiatives we need in this domain from memory institutions are somewhat nascent.

I hope to see, and perhaps participate in, a large coming-together of skills from groups such as the Authors Guild, memory institutions, journalism organizations, open technical groups, and more to coordinate reliable approaches to the issue of ensuring trust in the fact that a work is human-created.

Digital Ecosystem

#culture #media #publishing