Hello everyone and welcome back to the Cognixia podcast. Every week, we get together to talk about the latest happenings, bust some myths, discuss new concepts, and a lot more from the world of emerging digital technologies. From cloud computing to DevOps, containers to ChatGPT, and Project management to IT service management, we cover a little bit of everything weekly to inspire our listeners to learn something new, sharpen their skills, and move ahead in their careers.
Recently, you would have read the news that Murdoch firms – Dow Jones and New York Post are suing Perplexity AI. This wouldn’t be the first time a company or a group of companies would have sued an AI company. AI companies like Open AI and Perplexity AI have been accused of content theft before, of using content created by writers, creators, publishers, etc. to use it for training their algorithms without offering any compensation to the creators of the content. There has been a tough battle ongoing between publishers and tech companies, with the latter being time & again, accused of using copyrighted content without an authorization to build and operate the AI systems.
In today’s episode, we dig deeper into what’s this situation between Wall Street Journal & New York Post on one side and Perplexity AI is all about.
Dow Jones and New York Post, two companies under the media baron Rupert Murdoch have filed a lawsuit against Perplexity AI, claiming the artificial intelligence startup engages in a massive amount of illegal copying of their copyrighted work. According to the lawsuit filed in the Southern District of New York, “This suit is brought by news publishers who seek redress for Perplexity’s brazen scheme to compete for readers while simultaneously freeriding on the valuable content the publishers produce.”
For the uninitiated, Perplexity is one of the many startups who are actively trying to disrupt the search engine market which is currently dominated by Google Search Engine. In essence, Perplexity assembles information from webpages that it deems authoritative and then shares a summary of the content pulled from these multiple sources within the tool itself.
To do this, Perplexity uses a range of different models, large language models to be precise, from OpenAI to Llama. The presented summary does contain citations to the sources the content has been pulled from. However, Perplexity’s marketing efforts generally convey the notion that the links can be skipped on its interface.
Google has also stepped up its game and has begun showing summaries in a manner quite like Perplexity. Publishers have had to unhappily, grudgingly let Google do this, because protesting against this would make their content become invisible to the search engine spiders, and thus also invisible to the audience around the world altogether.
If Google is now doing what Perplexity already was, why is the lawsuit only against Perplexity? Well, this is there, it requires a deeper look. When Google offers AI-generated summaries, it helps audiences discover the content. Compared to this, the companies, Dow Jones and New York Post in this case, have said that Perplexity does not aid the audiences in discovering their work. The News Corp-owned publishers say that journalists investigate and write stories under tight deadlines, under a lot of pressure and unpredictable circumstances. They argue that there is a high demand for high-quality news that is presented in a timely, digestible format, and these publications rely on the sale of advertising and subscriptions to underwrite the cost of good journalist. The allegation here is that Perplexity’s AI-generated “answering machine” ingests the published copyrighted news stories, analysis, and opinions, feeding it to an internal database which is used to generate responses to users’ questions. To power its AI engine, Perplexity has seemingly copied “vast quantities of the publishers’ work into a database, which uses an AI technique called Retrieval-Augmented Generation or RAG for answering user queries,” according to the lawsuit. The lawsuit also states that Perplexity formulates its responses such that at times, the content has been reproduced verbatim, which would constitute unlawful copyright infringement.
Earlier in July, Dow Jones and the New York Post had sent a letter to Perplexity notifying it of the legal issues raised by its unauthorized use of copyrighted works and offering to discuss a potential licensing deal. According to the lawsuit, Perplexity did not respond. And, earlier in October, a cease-and-desist notice was sent to Perplexity, demanding that the latter stop using the media company’s content for Generative AI.
With the lawsuit, Dow Jones and New York Times are asking the court to get Perplexity to stop using its articles as the basis for answering questions and destroy any database of or created using its copyrighted work.
Earlier, Forbes and Wired, two other huge media companies, have also accused Perplexity of plagiarizing their content. Since then, Perplexity launched a revenue-sharing program to address some of the concerns that the publishers had raised.
Perplexity, it is reported, often hallucinates stories too. An investigation by WIRED earlier in 2024 which has also been cited in the lawsuit, found out that Perplexity inaccurately summarized WIRED stories, including one time where Perplexity falsely claimed that WIRED had reported on a California-based police officer committing a crime that he hadn’t committed. The lawsuit goes on to provide more examples of how Perplexity allegedly hallucinates the fake news sections of the news stories. In one case cited in the lawsuit, Perplexity Pro first regurgitated, word for word, two paragraphs from a New York Post story about the US senator Jim Jordan sparring with European Union commissioner Thierry Breton over Elon Musk and X. However, it was followed by five generated paragraphs about free speech and online regulation that were not in the real article. The lawsuit claims that mixing in these made-up paragraphs with real reporting and attributing it to the New York Post is a trademark dilution would potentially confuses readers.
“Perplexity’s hallucinations, passed off as authentic news and news-related content from reliable sources (using Plaintiffs’ trademarks), damage the value of Plaintiffs’ trademarks by injecting uncertainty and distrust into the newsgathering and publishing process, while also causing harm to the news-consuming public,” the states the lawsuit.
Some publishers have signed licensing deals with AI companies who were open to pay for using the content, but disagreements about the value of the work are quite common. News Corp, in May, had signed a multi-year partnership agreement with OpenAI for using the content. But many AI companies argue that they have not broken any laws in accessing content for free.
This comes at an interesting time since Perplexity is seeking to raise $500 million in its next funding round, at a whopping $8 billion valuation.
How the court will interpret the copyright law and whether the publishers will prevail, or the AI companies will, only time will tell. This battle is not going to end anytime soon.
So, which side are you on? Do you believe AI companies are wrong to violate copyright, plagiarize content, and use the hard work of countless writers and publishers without any fair compensation, or are the AI companies well within their rights to use the content freely?
Something to think about, eh?
And, with that, we come to the end of this week’s episode of the Cognixia podcast. We will be back again next week with another interesting and exciting episode.
Until then, happy learning!