The Copyright Controversies of AI Continue - H+ Weekly - Issue #428

This week - Geoffrey Hilton's plan for friendly AGI; bots are better than humans at CAPTCHA tests; neuroscientists re-create songs from listeners’ brain activity; and more!

Aug 18, 2023

Generative AI is a powerful tool. With a single prompt, it can generate any kind of text, from cover letters to business plans to entire books, or any kind of image. In order to be capable of such feats, generative AIs need to be trained on huge amounts of data. This data, terabytes of text and images scraped from the internet, is now causing problems for AI companies as most of it has been taken without permission from copyright owners. And the copyright owners are not happy with that and are taking action.

A group of authors filed a class-action lawsuit against OpenAI and Meta, accusing tech companies of illegally using copyrighted material to train their AI language models. Although none of the tech companies fully disclosed datasets used in training their models, the lawyers representing authors have deduced the likely data sources from clues in statements and papers released by the companies or related researchers. According to the lawsuit, both OpenAI and Meta used copyrighted material in training datasets obtained without authors' or publishers' consent, including by downloading works from some of the largest ebook pirate sites.

The law firm representing authors previously filed a lawsuit against GitHub for using copyrighted code to train their AI code assistant, Copilot. They also targeted Stability AI, Midjourney, and DeviantArt with a class-action lawsuit, also accusing them of using copyrighted material in their training data without permission.

It’s not just individual artists or authors banding together against copyright infringements made by AI companies. Big organisations are also taking action to protect their intellectual property. New York Times is one of those organisations considering legal actions against OpenAI. Both companies were in talks to establish a licensing deal (similar to what OpenAI has with Associated Press) but the negotiations went nowhere. If the lawsuit gets before the court and OpenAI is found guilty of violating copyrights, OpenAI will have to remove those materials from the datasets. Federal copyright law also carries stiff financial penalties, with fines of up to $150,000 for each infringement "committed willfully." In the worst-case scenario, that could mean a fine of millions of dollars for OpenAI.

Copyright infringements are not the only way OpenAI got itself into problems. In June, a California law firm filed another class-action lawsuit against OpenAI for using personal data to train ChatGPT. The law firm argues that OpenAI used personal data and "did so in secret, and without registering as a data broker as it was required to do under applicable law."

In July, over 10,000 authors signed an open letter calling OpenAI, Alphabet, Meta, StabilityAI, IBM and Microsoft to stop using their work without permission. They also call those companies to “compensate writers fairly for the past and ongoing use of our works in your generative AI programs” and to “compensate writers fairly for the use of our works in AI output, whether or not the outputs are infringing under current law”.

All those lawsuits and open letters are most likely just the beginning of problems for AI companies. Once the EU AI Act is enacted into law, every AI company that wants to operate in the EU will be encouraged to disclose the training datasets. That opens a way for every copyright holder, from large media organisations to individual writers and artists, to check if their work was used without their permission and take appropriate action.

While lawyers are busy dealing with lawsuits, OpenAI offered a way to opt-out of being crawled by GPTBot, OpenAI’s web crawler that runs on the internet gathering data for training purposes. New York Times and Reuters have already banned GPTBot from accessing their websites.

None of the mentioned lawsuits have been resolved yet, so we don’t have a clear picture of what is allowed from a legal point of view. If any of those lawsuits end up in court, they have the potential to change the nature of copyrighted content on the internet. We are already witnessing the impact of these lawsuits and controversies on the internet. Companies are now more protective of their data, either by moving more content behind paywalls or by limiting access to the API services.

From H+ Weekly

How to build a web without a silk gland

Conrad Gray

August 16, 2023

Today, we have a guest post written by Dylan Wintle. Dylan is a 21-year-old Computer Systems student at Heriot-Watt University where he primarily studies machine learning and uses his spare time to self-study biology. He is also the president of the fledgling

Read full story

Becoming a paid subscriber now is the best way to support the newsletter.

Become a paid subscriber

If you enjoy and find value in what I write about, feel free to hit the like button and share your thoughts in the comments. Share the newsletter with someone who will enjoy it, too. That will help the newsletter grow and reach more people.

You can also buy me a coffee.

🦾 More than a human

Neuroscientists Re-create Pink Floyd Song from Listeners’ Brain Activity
In March this year, scientists showed that it is possible to use fMRI scans and machine learning to decode what a person is looking at using only their brainwaves. Now, another group of scientists was able to reconstruct Pink Floyd songs directly from the data gathered by brain implants. Their work improves our understanding of how the brain processes sound and open the possibility of adding musical elements to the brain–computer interfaces. Here is the paper if you are interested in learning more.

Pig kidney keeps working for over a month in brain-dead man’s body
Over a month ago, surgeons in New York successfully transplanted a pig’s kidney into a brain-dead man. So far, everything looks promising, marking this the longest a pig kidney has functioned in a person, albeit a deceased one, and it’s not over. Researchers will track the kidney’s performance for a second month. Surgeons hope this experiment will be a stepping stone towards trying the same procedure in living patients.

▶️ Cyborg Armies (31:02)

In this episode of Scifi Sunday, Isaac Arthur examines the concept of cybernetically enhanced soldiers - how plausible they are and what can be done to make this concept more plausible. Isaac spends most of the time discussing ways of powering all those enhancements, from various nuclear-based power sources to batteries to external power sources, without cooking the human inside. Additionally, Isaac ponders what kind of enhancements would be useful on the battlefield, from mental augmentations to extra organs to performance-enhancing drugs.

🧠 Artificial Intelligence

The ‘Godfather of AI’ Has a Hopeful Plan for Keeping Future AI Friendly
Geoffrey Hinton suggests a technological approach that might mitigate an AI power play against humans: analog computing. “The idea is you don't make everything digital. Because every piece of analog hardware is slightly different, you can't transfer weights from one analog model to another. So there's no efficient way of learning in many different copies of the same model. If you do get AGI [via analog computing], it’ll be much more like humans, and it won’t be able to absorb as much information as those digital models can”, says Hilton in this conversation with Wired.

IBM Research's latest analog AI chip for deep learning inference
Researchers at IBM have unveiled a prototype analog chip for AI computing which, according to IBM, scores 92.81% on the CIFAR-10 image dataset and is 15 times faster than other chips using similar architecture while achieving comparable energy efficiency. Analog chips hold the potential to be equally good at AI computation as current digital chips while being smaller and consuming way less energy. For more details on how these chips work and why they might be the future of AI hardware, check this article where I explain analog chips in more detail.

Bots are better at beating ‘are you a robot?’ tests than humans are
We can now add those annoying CAPTCHA tests to things that machines are better than humans. That, however, raises questions on how to make the internet a place for humans (if that is even possible these days).

The Flemish Scrollers
Here is an example of how AI can be used in politics. The Flemish Scrollers is a project that uses computer vision to watch the livestreams from every meeting of the Flemish government in Belgium and to detect when a politician looks at their phones. Once detected, the picture of a politician not paying attention to the session is posted on Instagram and Twitter.

‘Only AI made it possible’: scientists hail breakthrough in tracking British wildlife
Thanks to machine learning, biodiversity researchers in the UK were able to sift through thousands of hours of videos and audio recordings to identify animals and birds and monitor their movements in the wild. Dozens of different birds were recognised from their songs while foxes, deer, hedgehogs and bats were pinpointed and identified by AI analysis. Systems like this one can be used to measure biodiversity and guide efforts to protect animals and their habitats.

🤖 Robotics

Here’s how much funding robotics companies raised in June
The Robot Report published its monthly report on funding in the robotics industry. According to the report, a total of $2.1 billion has been invested in various robotics companies in June 2023. 40% of those investments went to companies based in the US, followed by China (16%) and Israel (6%).

Robotic gripper is gentle enough to pick up a drop of water
Inspired by kirigami, researchers created robotic soft grippers that are capable of handling both ultrasoft, ultrathin objects and heavy objects. The gripper is gentle enough to pick up a drop of water, strong enough to pick up a 6.4kg weight (while weighing 0.4 grams), dexterous enough to fold a cloth, and precise enough to pick up microfilms that are 20 times thinner than a human hair.

💡Tangents

Molecular Landscapes by David S. Goodsell
I found those watercolour paintings by David S. Goodsell while I was going through Stanford’s Introduction to Bioengineering course and I immediately fell in love with them. Goodsell beautifully portrays the intricacies of life, from HIV-infected cells to COVID virus to blood cells, on the microscopic level and the interconnections between proteins, lipids, and other molecules that together build living organisms.

H+ Weekly sheds light on the bleeding edge of technology and how advancements in AI, robotics, and biotech can usher in abundance, expand humanity's horizons, and redefine what it means to be human.

A big thank you to my paid subscribers, to my Patrons: whmr, Florian, dux, Eric, Preppikoma and Andrew, and to everyone who bought me a coffee on Ko-Fi. Thank you for the support!

You can follow H+ Weekly on Twitter and on LinkedIn.

Thank you for reading and see you next Friday!

How to build a web without a silk gland

Discussion about this post

Ready for more?