AI tools in journalism: Local news org reverses block on bots

Artificial intelligence technology is evolving fast but already changing how people work. It’s having an impact across dozens of industries, including, of course, media.

The tech affects the news industry from two directions: the inside — how AI tools are used to make the news — and the outside: how news products get used by AI companies building their tools.

As a news org that covers innovation, experimentation is core to Technical.ly’s culture. We encourage our team to test ways AI-powered tools, like any other new technology, can help us connect with our community more effectively, efficiently and empathetically.

In 2018, for example, we piloted a machine-learning tool that scraped SEC filings and generated draft news articles on venture capital raises. Over the last two years, the field of artificial intelligence has been dominated by generative AI, a branch of the discipline that can produce human-like speech, writing and other creative forms.

Here are some guidelines we follow when using and publishing material made using this tech:

All AI generated material, no matter what kind or at what stage, is reviewed and edited by a human before publication. Nothing is ever published without a human in the loop.
When we publish material that has been created entirely by AI tools — most likely images or videos, and at this point sparingly — we will always clearly disclose that to our audience.
We don’t feel generative AI is producing written material at a high enough level to meet our standard for published articles. We also know the quality is changing rapidly, especially when building on solid source material (such as Technical.ly’s archives, which we have used).
Technical.ly team members regularly use AI tools as part of our reporting and production process, feeding them original material for transcription, summarization, translation, rephrasing or other manipulation.
Some team members use AI tools so they can save time and do more impactful work, and to increase the accessibility of stories they publish.
Just like we have long used spell check without noting it, we do not feel the need to indicate when or where we use AI tools in this way — but we’re happy to share our techniques and tactics with anyone seeking more info or looking to apply tools in similar ways.

For example, for this article, I optimized the headline shown to search engines by having a back-and-forth conversation with a ChatGPT plugin, one I built for this specific purpose.

Playing pong with robots.txt

What about the other side of the coin, where generative AI companies are scraping our articles to build and enhance their products?

Gen AI companies are on a continual hunt for more data to feed their large language models, to improve their capabilities and keep them filled with current, accurate, up-to-date information. Oh hey! That’s what news organizations create on a regular basis. For example, Technical.ly has published more than 35,000 articles over 15 years, written and edited by dozens of professional journalists as well as industry leaders with unique business or technical expertise.

Before the tech went mainstream, there was little discussion about whether AI models could or should use online news articles as training data. The AI companies just did it.

Once ChatGPT launched in November 2022 as one of the fastest-growing products in history, the outcry began: Good journalism costs money, so why should it be free for an AI company to use for its own profit?

Technical.ly doesn’t have the resources to mount a lawsuit against an AI tech company, nor to negotiate a licensing agreement.

The news industry is divided on how to deal with this issue. Several conglomerates and big brands have negotiated licensing deals with AI companies seeking legitimate content sources, including Hearst, Reuters, Conde Nast, Axel Springer, Vox Media, NBC and more. Others are suing, arguing they should have been approached before the LLMs hoovered up their stories, and inserting instructions into a website file called “robots.txt” to tell AI scraper bots they’re blocked, and not allowed to pull in articles. Orgs taking this approach range from nonprofits like the Center for Investigative Reporting to larger outfits like the New York Times, Chicago Tribune and Denver Post.

Where does that leave smaller, independent news orgs like Technical.ly? Neither of the two paths really works.

We don’t have the resources to mount a lawsuit, nor to negotiate a licensing agreement with AI companies. Could an industry association like LION, LMA or ONA handle this? Maybe, and we’re open to joining a coalition if one forms. But none of those advocacy groups have yet made a push.

If we ever were to join a lawsuit, we figured it would bolster our case if we preemptively blocked the AI bots. So we tried it.

Instead of lawyering up, we’re sharing the knowledge

For several months, we added language to our “robots.txt” file to tell AI bots to stay away. The method is far from foolproof (it rests on an old gentleman’s agreement), but it makes a statement: We don’t want your platform to use our work without paying us.

However, blocking also keeps artificial intelligence models from having the latest, most accurate, updated info — which runs antithetical to our goal as information providers.

Part of our entire ethos is putting out freely accessible useful information. That’s why we don’t have a paywall, and instead patch together other revenue streams to keep our stories accessible for all (especially unusual with business reporting). As more and more online search queries are answered using generative AI, we should want our reporting to help make the results more accurate.

Plus, blocking AI bots from our site meant we couldn’t use our own links in the AI tools that regularly help us save time when creating new work. So we had to decide which we wanted more: to be able to use these tools, or try to wring future profit from them. The bigger deal for us was being able to use the tools.

We unblocked the bots.

Our message to the communities we cover is that financially supporting local news — generally, and Technical.ly in particular — is one way to ensure your story is credibly and accurately reflected in the evolving global narrative. We want to inform people, and, at present, we believe AI tools are an important way to do that.

Note that the guidelines outlined here are evolving. We’re constantly reevaluating our policy as the tools advance. If it’s not just hype and the technology makes transformative progress, we’ll need to have a new conversation.

Companies: Technical.ly

Tags: AI / Media

Block the bots or feed them facts? How Technical.ly uses AI in journalism

These tools are cool, if you use them carefully.

Artificial intelligence technology is evolving fast but already changing how people work. It’s having an impact across dozens of industries, including, of course, media.

Playing pong with robots.txt

Instead of lawyering up, we’re sharing the knowledge

Join our growing Slack community

Donate to the Journalism Fund

Honeycomb Credit seeks $3M, acquires fellow crowdfunding platform IFundWomen

Startups with public sector DNA compete at George Mason investor breakfast

West Virginia ranks last in innovation. Meet the people trying to change that.

Hundreds compete for Shark Tank spots at the Philly open casting call

Market-Specific

Jobs

Special Projects