Senator slams Big Tech's role in 'pirating' copyrighted books for AI training purposes

Sen Hawley questions whether Meta overstepped in pirating authors’ works for AI

Sen. Josh Hawley, R-Mo., questions whether Meta overstepped copyright laws when it pirated works from authors to improve its AI functionality.

Sen. Josh Hawley, R-Mo., called out artificial intelligence companies like Meta for their alleged role in taking more than 200 terabytes of published works from authors without paying them a dime to make AI smarter.

The Senate Judiciary Subcommittee on Crime and Counterterrorism held a hearing Wednesday to examine the AI industry’s ingestion of copyrighted works for AI training.

The hearing was held just weeks after two federal judges in San Francisco ruled that AI companies like Meta and Anthropic may use books without permission to train AI systems.

During Wednesday’s hearing, professor and legal scholar Bhamati Viswanathan explained to Hawley how tech companies acquire large sets of data to train AI systems, adding that not everything obtained is pirated work.

AMAZON CEO SAYS AI WILL REDUCE HIS COMPANY'S WORKFORCE

Sen. Josh Hawley, R-Mo., chairman of the Senate Judiciary Subcommittee on Crime and Counterterrorism Committee, speaks during a hearing April 9 in Washington, D.C. (Getty Images / Getty Images)

The companies do not buy books from authors like David Baldacci, who testified at Wednesday's hearing. Instead, they allegedly steal, or pirate, the licensed material without paying the authors, Viswanathan explained.

Attempts to hold companies criminally accountable have been made, but Viswanathan said, "It's like a game of whack-a-mole — you get one, you knock it down, it pops up again in some jurisdiction that you don’t have control over."

Viswanathan said criminal copyright liability has two prongs. One prong is that you have to do it willfully, and the second prong is that you have to do it for commercial advantage or gain. In the case of Meta, the company is allegedly doing it for commercial advantage or gain, she noted.

But for the first prong of taking the works willfully, Viswanathan said, Meta allegedly knew what it was doing was illegal.

AMAZON ANNOUNCES $20B INVESTMENT IN RURAL PENNSYLVANIA FOR AI DATA CENTERS

Meta CEO Mark Zuckerberg makes a keynote speech during the Meta Connect annual event at the company's headquarters in Menlo Park, California, on Sept. 25, 2024.

Meta CEO Mark Zuckerberg makes a keynote speech during the Meta Connect annual event at the company's headquarters in Menlo Park, Calif., Sept. 25, 2024. (Reuters/Manuel Orbegozo / Reuters)

"They even had to ask all the way up the chain of command to [Meta CEO] Mark Zuckerberg and say, ‘Hey, is this OK?’ And he said, ‘Yes, it’s OK,’" she said. "Not only did he do it knowing it was illegal, he did it knowingly. He did willfully, intentionally. And whether or not he knew what statute it was legal, doesn't matter. For this to be willful, you have to know that what you're doing is wrong, and this meets that wrong. So, this is, fact, amounting to what you might call criminal copyright."

Hawley then began to question Maxwell Pritt, who represents several authors in legal cases related to the alleged theft of copyrighted works.

Pritt claimed Meta had torrented well over 200 terabytes of copyrighted material from multiple "illicit criminal enterprises," what Hawley referred to as "shadow libraries."

The attorney also said Meta paid nothing to the authors for the billions of works and books. When asked if Meta ever explored paying the authors, Pritt said, "No."

JUDGE RULES AGAINST AUTHORS OVER AI COMPANIES TRAINING MODELS ON BOOKS WITHOUT PERMISSION

A Senate subcommittee held a hearing Wednesday regarding AI companies obtaining books without permission to make their AI smarter. (Reuters / Reuters Photos)

"Early on, they explored licensing. They assigned two individuals part-time to attempt to license, and they decided it would take too long, for example, and that’s when they turned to piracy," Pritt testified. "At the time, they had public documents showing that certainly tens of millions, if not hundreds of millions, had been contemplated for licensing at that time."

During the hearing, Hawley showed images of text messages between Meta staffers and AI engineers regarding whether they should move forward with taking the published works.

ARTIFICIAL INTELLIGENCE DRIVES DEMAND FOR ELECTRIC GRID UPDATE

One Meta engineer working on the AI project wrote, "I don’t think we should use pirated material. I really need to draw a line there." They went on to say she felt using pirated material went beyond the team’s ethical threshold.

Another person in the same chat replied, saying, "It’s the piracy (and us knowing and being accomplices) that’s the issue."

And another said, "We want to buy books and be the nice open people here…However, to make it happen and not letting the bad guys win, we need to make a case - fast - and cut some corners here and there."

Sen. Josh Hawley. R-Mo., speaks with members of the media in Washington. (Reuters/Nathan Howard, File / Reuters Photos)

Pritt testified that the "bad guys" were other AI competitors.

"Yes, this is certainly one of the many documents that show that they knew these were pirated websites that contained copyrighted materials, and they were taking them for free," Pritt alleged.

Hawley shared additional messages between Meta staffers.

"Not sure we can use meta’s IPs to load through torrent pirate content, ahah," one wrote. Another replied, "i’m curious to start looking at some samples, but i feel like we should get some clarity on what’s allowed and how <insert smiley face emoji>."

"Ahah, yeah, I think torrenting from a corporate laptop doesn’t feel right <insert cheesy smiley face emoji>."

In another string, the staffers said that they could not use Facebook servers because the downloader would trace back to Facebook.

ARTIFICIAL INTELLIGENCE FUELS BIG TECH PARTNERSHIPS WITH NUCLEAR ENERGY PRODUCERS

AI illustrations on a laptop with books in the background in an illustration photo. (Jaap Arriens/NurPhoto via Getty Images / Getty Images)

"Here we have Meta employees saying they know they're pirating, they think it's ethically wrong, they think it is illegal and they are actively avoiding trying to create a paper trail," Hawley said. "They're trying to hide it. That doesn't sound like fair use to me."

Professor Edward Lee said he agreed with U.S. District Judge Vince Chhabria’s approach in the case of Meta Platforms.

Chhabria told the authors in his ruling late last month they did not present enough evidence that Meta's AI would dilute the market for their work to be sufficient for a copyright infringement case.

"This ruling does not stand for the proposition that Meta's use of copyrighted materials to train its language models is lawful," Chhabria said, according to Reuters. "It stands only for the proposition that these plaintiffs made the wrong arguments and failed to develop a record in support of the right one."

Lee said the distribution claim is still alive in the case and the aspect of torrenting may be infringement and not fair use.

"I’ll say this. If this isn’t infringement, Congress needs to do something," Hawley said. "I mean, if the answer is that the biggest corporation in the world worth trillions of dollars can come take an individual author’s work … lie about it, hide it, profit off of it and there’s nothing our law does about that, we need to change the law."

ELECTRICITY PRICES SPIKE FOR AMERICAN HOUSEHOLDS: HERE'S WHAT'S DRIVING COSTS HIGHER

An AI logo on a circuit board. (iStock / iStock)

In addition to Chhabria’s ruling in favor of Meta, Anthropic's ruling came down late last month. U.S. District Judge William Alsup cited "fair use" of books by writers Andrea Bartz, Charles Graeber and Kirk Wallace Johnson to train its Claude large language model.

But Alsup partially sided with the authors, saying Anthropic's copying and storage of more than 7 million pirated books in a "central library" infringed the authors' copyrights and was not fair use. The judge ordered a trial in December to determine how much Anthropic owes for the infringement.

Fair use is a key legal defense for tech companies, and Alsup's decision is the first to address it in the context of generative AI.

AI companies argue their systems make fair use of copyrighted material to create new, transformative content and that being forced to pay copyright holders for their work could hamstring the booming AI industry.

Ticker	Security	Last	Change	Change %
META	META PLATFORMS INC.	702.91	-7.48	-1.05%
Powered By

Anthropic and other prominent AI companies, including OpenAI and Meta Platforms, have been accused of downloading pirated digital copies of millions of books to train their systems.

U.S. copyright law says that willful copyright infringement can justify statutory damages of up to $150,000 per work.

CLICK HERE TO READ MORE ON Gxstocks

Copyright owners say AI companies are unlawfully copying their work to generate competing content that threatens their livelihoods.

But others, like White House AI czar David Sacks, are pushing for a fair-use concept for training AI, because without them, the U.S. could lose the AI race.

"It’s very important that we end up with a sensible fair-use definition like the one the judge has come up with in this Anthropic case, because otherwise we will lose the AI race to China," Sacks said in an interview with The Wall Street Journal July 1.

Gxstocks’ Pilar Arias contributed to this report.

Senator slams Big Tech's role in 'pirating' copyrighted books for AI training purposes

Sen Hawley questions whether Meta overstepped in pirating authors’ works for AI

Olivia Smith

The overlooked obstacle keeping America from building the homes it needs

US, Canada strike deal to open bridge linking Detroit and Windsor after dispute delayed launch

You might be interested in

Trump turns NATO spending fight into win for US defense companies

First Freedom Fuel gas station opens as Trump-backed discounts roll out

Senator slams Big Tech's role in 'pirating' copyrighted books for AI training purposes

Sen Hawley questions whether Meta overstepped in pirating authors’ works for AI

Olivia Smith

Related posts

The overlooked obstacle keeping America from building the homes it needs

US, Canada strike deal to open bridge linking Detroit and Windsor after dispute delayed launch

You might be interested in

Trump turns NATO spending fight into win for US defense companies

First Freedom Fuel gas station opens as Trump-backed discounts roll out