Tag Archives: AI

Apple, Nvidia Anthropic Used Thousands Of Swiped YouTube Videos To Train AI

Tech companies are turning to controversial tactics to feed their data-hungry artificial intelligence models, vacuuming up books, websites, photos, and social media posts often unbeknownst to the creators, WIRED reported.

AI companies are generally secretive about their sources of training data, but an investigation by Proof News found some of the wealthiest AI companies in the world have used material from thousands of YouTube videos to train AI. Companies did so despite YouTube’s rules against harvesting materials from the platform without permission.

One investigation found that subtitles from 173,536 YouTube videos, siphoned from more than 48,000 channels, were used by Silicon Valley heavyweights, including Anthropic, Nividia, Apple, and Salesforce.

The dataset, called YouTube Subtitles, contains video transcripts from educational and online learning channels like Khan Academy, MIT, and Harvard. The Wall Street Journal, NPR, and the BBC also had their videos used to train AI, as did The Late Show with Stephen Colbert, Last Week Tonight With John Oliver, and Jimmy Kimmel Live.

9TO5Mac reported a number of tech giants, including Apple, trained AI models on YouTube videos without the consent of the creators, according to a new report today. 

They did this by using subtitle files downloaded by a third party from more than 170,000 videos. Creators affected include tech reviewer Marquees Brownlee (MKBHD), MrBeast, PewDePie, Stephen Colbert, John Oliver, and Jimmy Kimmel.

The subtitle files are effectively transcripts of the video content.

The downloads were reportedly preformed by a non-profit called EleutherAI, which says it helps developers train AI models. While the aim appears to have been to provide training materials to small developers and academics, the dataset has also been used by several tech giants, including Apple.

According to 9to5Mac, it’s important to emphasize here that Apple didn’t download the data itself, but this was instead preformed by EleutherAI. It is this organization which appears to have broken YouTube’s terms and conditions.

The Verge reported as part of its investigation, Proof News also released an interactive lookup tool. You can use its search engine feature to see if your content— or your favorite’s YouTuber’s — appears in the dataset.

The subtitles dataset is part of a larger collection of material from the nonprofit EleutherAI called The Pile, an open-source collection that also contains datasets of books, Wikipedia articles, and more. Last year, an analysis of one dataset called Books3 revealed which authors work had been used to train AI systems, and the dataset has been cited in lawsuits by authors against the companies that used it to train AI.

In my opinion, scraping from content creator’s works – even if it’s only the audio part of the YouTube video – should be illegal. The work made by humans should not be fed to AI systems without the creator’s consent. 

‘Little Tech’ Brings A Big Flex To Sacramento

One of Silicon Valley’s heaviest hitters is wading into the fight over California’s AI regulations, Politico reported.

Y Combinator — the venture capitalist firm that brought us Airbnb, Dropbox, and DoorDash — today issued its opening salvo against a bill by state Sen. Scott Wiener that would require large AI models to undergo safety testing.

Weiner, a San Francisco Democrat whose district includes YC, says he’s proposing reasonable precautions for a powerful technology. But the tech leaders at Y Combinator disagree, and are joining a chorus of other companies and groups that say it will stifle California’s emerging marquee industry.

“This bill, as it stands, could gravely harm California’s ability to retain its AI talent and remain the location of choice for AI companies,” read the letter, which was signed by more than 140 AI startup founders.

It’s the first time the startup incubator, led by prominent SF tech denizen Garry Tan, has publicly weighted in on the bill. They argue it could hurt the many fledgling companies Y Combinator supports — about half of which are now AI-related.

Adam Thierer posted “Coalition Letter on California SB-1047, “The Safe and Secure Innovation for Frontier Artificial Intelligence Systems Act.” on R Street:

Dear Senator Wiener and members of the California State Legislature,

We, the undersigned organizations and individuals, are writing to express our serious concerns about SB 1047, the Safe and Secure Innovation for Frontier Artificial Intelligence Systems Act. We believe that the bill, as currently written, would have severe unintended consequences that could stifle innovation, harm California’s economy, and undermine America’s global leadership on AI.

Our main concerns with SB 1047 are as follows: 

The application of the precautionary principle, codified as a “limited duty exemption,” would require developers to guarantee that their models cannot be missed for various harmful purposes, even before training begins. Given the general-purpose nature of AI technology, this is an unreasonable and impractical standard that could expose developers to criminal and civil liability for actions beyond their control.

The bill’s compliance requirements, including implementing safety guidance from multiple sources and paying fees to fund the Frontier Model Division, would be expensive and time consuming for may AI companies. This could drive businesses out of California and discourage new startups from forming. Given California’s current budget deficit and the state’s reliance upon capitol gains taxation, even a marginal shift of AI startups to other states could be deleterious to the state government’s fiscal position…

Y Combinator also posted a separate letter to Senator Wiener and two people who are on important committees. Here is a small piece from that letter:

Liability and regulation that is unusual in its burdens: The responsibility for the misuse of LLMs should rest with those who abuse these tools, not with the developers who create them. Developers cannot predict all possible applications of their models, and holding them liable for unintended misuse could stifle innovation and discourage investment in AI research. Furthermore, creating a penalty of purjury would mean that AI software developers could go to jail simply for failing to anticipate misuse of their software – a standard product liability no other product in the world suffers from.

In my opinion, it appears that Y Combinator has concerns about California’s rules regarding safety in AI. I’m not sure why the company is so upset about the state requiring safety protocols in their AI.


Meta Pauses AI Models Launch In Europe Due To Irish Request

Meta Platforms will not launch its Meta AI models in Europe for now after the Irish privacy regulator told it to delay its plan to harness data from Facebook and Instagram users, the U.S. social media company said on Friday, Reuters reported.

The move by Meta came after complaints and a call by advocacy group NYOB to data protection authorities in Austria, Belgium, France, Germany, Greece, Italy, Ireland, the Netherlands, Norway, Poland and Spain to act against the company.

At issue is Meta’s plan to use personal data to train its artificial intelligence (AI) models without seeking consent, although the company has said it would use publicly available and licensed online information.

Meta on Friday said the Irish privacy watchdog had asked it to delay training its large language models (LLM’s) using public content shared by Facebook and Instagram adult users.

“We’re disappointed by the request from the Irish Data Protection Commission (DPC), our lead regulator, on behalf of the European DPAs … particularly since we incorporated regulatory feedback and the European DPAs have been informed since March,” the company said in an updated blogpost.

The Irish Data Protection Commission wrote:

The DPC’s Engagement with Meta On AI

The DPC welcomes the decision by Meta to pause its plans to train its large language model using public content shared by adults on Facebook and Instagram across the EU/EEA. This decision followed intensive engagement between the DPC and Meta. The DPC, in co-operation with its fellow EU data protection authorities, will continue to engage with Meta on the issue.

The Verge reported Meta is putting plans for its AI assistant on hold in Europe after receiving objections from Ireland’s privacy regulator, the company announced on Friday.

In a blog post, Meta said the Irish Data Protection Commission (DPC) asked the company to delay training its large language models on content that had been publicly posted to Facebook and Instagram profiles.

Meta said it is “disappointed” by the request, “particularly since we incorporated regulatory feedback and the European [Data Protection Authorities] have been informed since March,” Per the Irish Independent. Meta had recently begun notifying European users that it would collect their data and offered an opt-out option in an attempt to comply with European privacy laws.

According to The Verge, Meta said it will “continue to work collaboratively with the DCP.” But its blog post says that Google and OpenAI have “already used data from Europeans to train AI” and claims that if regulators don’t let it use users’ information to train its models, Meta can only deliver an inferior product.

“Put simply, without including local information we’d only be able to offer people a second-rate experience. This means we aren’t able to launch Meta AI in Europe at the moment.”

In my opinion, I don’t think it should be legal for companies (like Meta and others) to scrape data off of social media platforms and feed it to their AI. It will never be ok to scrape other people’s posts – unless Meta pays a significant amount of money to the users they are stealing from.

Apple Intelligence Is The Company’s New Generative AI Offering

On Monday, at WWDC 2024, Apple unveiled Apple Intelligence, its long-awaited, ecosystem-wide push into generative AI. As earlier rumors suggested, the new feature is called Apple Intelligence (AI, get it?). The company promised the feature will be built with safety at its core, along with highly personalized experiences, TechCrunch reported.

According to TechCrunch, the company has been pushing the feature as integral to all of its various operating system offerings, including iOS, macOS, and the latest, visionOS.

The system is built on large language and intelligence models. Much of that processing is done locally according to the company, utilizes the latest version of Apple silicon. “Many of these models run entirely on device,” SVP Craig Federighi claimed during the event.

That said, these consumer systems still have limitations. As such, some of the heavy lifting needs to be done off device in the cloud. Apple is adding Private Cloud Compute to the offering. The back end uses services that run Apple chips, in a bid to increase privacy for this highly personal data.

Apple introduced Apple Intelligence. Here is part of the press release:

Apple today introduced Apple Intelligence, the personal intelligence system for iPhone, iPad, and Mac that combines the power of generative models with personal context to deliver intelligence that’s incredibly useful and relevant. 

Apple Intelligence is deeply integrated into iOS 18, iPadOS 18, and macOS Sequoia. It harnesses the power of Apple silicon to understand and create language and images, take action across apps, and draw from personal context to simplify and accelerate everyday tasks. With Private Cloud Compute, Apple sets a new standard of privacy in AI, with the ability to flex and scale computational capacity between on-device processing and larger, server-based models that run on dedicated Apple silicon servers.

“We’re thrilled to introduce a new chapter in Apple innovation. Apple Intelligence will transform what users can do with our products — and what our products can do for our users,” said Tim Cook, Apple’s CEO. “Our unique approach combines generative AI with a user’s personal context to deliver truly helpful intelligence. And it can access that information in a completely private and secure way to help users do the things that matter most to them. This is AI as only Apple can deliver it, and we can’t wait for users to experience what it can do.”

Engadget reported Apple Intelligence will be powered by both Apple’s homegrown tech as well as a partnership with OpenAI, the maker of ChatGPT, Apple announced.

One of Apple’s biggest AI upgrades is coming to Siri. The company’s built-in voice assistant will now be powered by large language models, the tech that underlies all modern-day generative AI. Siri, which has languished over the years, may become more useful now that it can interact more closely with Apple’s operation systems and apps. 

Apple Intelligence will also use AI to record, transcribe, and summarize your phone calls, rivaling third-party transcription services like Otter. All participants are automatically notified when you start recording, and a transcript if the conversation’s main points is automatically generated at the end.

In my opinion, I’m not thrilled about any of the AI-Generated additions that have suddenly popped up. I’m hoping that Apple will allow me to turn off the AI-Generated stuff.

OpenAI, WSJ News Corp Strike Content Deal Valued At $250 Million

Wall Street Journal owner News Corp struck a major content-licensing pact with generative artificial-intelligence company OpenAI, aiming to cash in on a technology that promises to have a profound impact on the news-publishing industry, The Wall Street Journal reported.

The deal could be worth more than $250 million over five years, including compensation in the form of cash and credits for use of OpenAI technology, according to people familiar with the situation. The deal lets OpenAI use content from News Corp’s consumer-facing news publications, including archives, to answer users’ queries and train its technology.

“The pact acknowledges that there is a premium for premium journalism,” News Corp Chief Executive Robert Thomson said in a memo to employees Wednesday, “The digital age has been characterized by the dominance of distributors, often at the expense of creators, and many media companies have been swept away by a remorseless technological tide. The onus is now on us to make the most of this providential opportunity,”

The rise of generative AI tools such as OpenAI’s humanlike chatbot ChatGPT is poised to transform the publishing business. AI companies are hungry for publisher’s content, which can help them refine their models and create new products such as AI-powered search.

CNBC reported as part of the deal, OpenAI will be able to display content from News Corp-owned outlets with ChatGPT chatbot, in response to user questions. The startup will “enhance its products,” or, likely, to train its artificial intelligence models.

News Corp. will also “share journalistic expertise to help ensure the highest journalism standards are present across OpenAI’s offering” as part of the deal, according to a release.

“We believe a historic agreement will set new standards for veracity, for virtue, and for value in the digital age,” Robert Thomson, CEO of News Corp, said Wednesday in a release. “We are delighted to have found principled partners in Sam Altman and his trusty, talented team who understand the commercial and social significance of journalists and journalism.”

The Hollywood Reporter wrote OpenAI has cut another major media licensing deal. The artificial intelligence firm has inked a deal with News Corp, that will bring content from its stable of media outlets to ChatGPT and other OpenAI products.

“Through this partnership, OpenAI has permission to display content from News Corp mastheads in response to user questions and to enhance its products, with the ultimate objective of providing people the ability to make informed choices based on reliable information and news sources,” the companies said in the announcement.

The News Corp. properties The Wall Street Journal, Barron’s, MarketWatch, Investor’s Business Daily, FN, and New York Post; The Times, The Sunday Times, and The Sun; The Australian, news.com.au., The Daily Telegraph, The Courier Mail, The Advertiser, and Herald Sun are all part of the deal, terms of which were not disclosed.

In my opinion, it seems like many corporations have decided that AI-generated content is the best way to go. My concern is that large corporations will decide that OpenAI is better for their needs, and will begin layoffs of human employees.

Google’s AI Search Results Are Already Getting Ads

Google only just rolled out AI summaries in search results — and now they’re getting ads. In an update on Tuesday, Google says it will soon start testing search and shopping ads within AI Overviews for users in the US, The Verge reported.

According to The Verge, in the example shared by Google, the search engine’s AI overview lists a response to the question: “How do I get wrinkles out of clothes?” Beneath the AI-generated suggestions, there’s a new “Sponsored” section with a carousel showing wrinkle spray you can buy from places like Walmart and Instacart.

Google says it will display ads in AI Overviews when “they’re relevant to both the query and the information in the AI Overview.” Advertisers that already run certain campaigns through Google will automatically become eligible to appear in AI Overviews “As we move forward, we’ll continue to test and learn from new formats, getting feedback from advertisers and the industry,” Google writes.

Google posted the following on its Ads & Commerce Blog:

An evolution of attention is underway. People have seemingly endless ways to shop, communicate and stay entertained online. For advertising to stand out, it needs to be relevant and helpful — in fact, that’s more important than ever before. Businesses need to be on every surface with creative assets that capture people’s attention.

Until now, this has felt impossible to do at scale — but generative AI is changing that. This technology is helping us better meet advertisers’ needs and unlock new possibilities across the marketing process, from new immersive ads experiences to high-performing creative assets. As we build this next era of marketing together, we’re sharing our latest creative asset generation controls, new ad experiences, visual storytelling features and more at Google Marketing Live (GML).

We’ve been working on making it easier and faster to produce great creative assets for ads across marketing channels. Creative asset variety is crucial to strong ads, and achieving this has gotten easier for more advertisers with generative AI in Performance Max. We found that advertisers who improve their Performance Max Ad Strength to Excellent see 6% more conversions on average. 

Event Tickets Center was one of the earliest beta testers for asset generation in Performance Max, which has helped the team accelerate creative production by 5x with less time and effort.

CNBC reported Google announced Tuesday that it will be giving advertisers the ability to create immersive visuals in their promotions using generative artificial intelligence, as the company rolls out more AI tools for brands.

Advertisers can take advantage of what Google is calling a visual brand profile in search “that gives richer results” for queries that include the name of a brand or retailer, the company said Tuesday at its annual Google Marketing Live. Brands can also include product videos and summaries.

Last week, Google announced plans to change its search results page to prioritize a feature called “AI Overview,” which uses AI to summarize information at the top of a search results page. The move could push organic content and ads further down the page, resulting in a potential shake-up for publishers and advertisers.

In my opinion, there are a lot of companies who have jumped on artificial intelligence, and then added something of their own to it. This might be great for the big companies, but I think the general public, overall, is not super interested in AI.

Google Is Redesigning Its Search Engine – And It’s AI All The Way Down

A year ago, Google said that it believed AI was the future of search. That future is apparently here: Google is starting to roll out “AI Overviews,” previously known as the Search Generative Experience, or SGE, to users in the US and soon around the world. Pretty soon, billions of Google users will see an AI-generated summary at the top of many of their search results. And that’s only the beginning of how AI is changing search, The Verge reported.

“What we see with generative AI is that Google can do more of the searching for you,” says Liz Reid, Google’s newly installed head of Search, who has been working on all parts of AI search for the last few years. “It can take a bunch of the hard work out of searching, so you can focus on the parts you want to do to get things done, or on the parts of exploring that you find exciting,”

According to The Verge, over most of the last decade, Google has been trying to change the way you search. It started as a box where you type keywords; now, it wants to be an all-knowing being that you can query any way you want and get answers back in whatever way is most helpful to you. 

“You increase the richness, and let people ask the question they naturally would,” Reid says. For Google, that’s the trick to getting even more people to ask even more questions, which makes Google even more money. For users, it could mean a completely different way to interact with the internet: less typing, fewer tabs, and a whole lot more chatting with a search engine.

Google posted: “Generative AI in Search: Let Google do the searching for you” written by Liz Reid, VP, Head of Google Search.

Over the past 25 years, across many technological shifts, we’ve continued to reimagine and expand what Google Search can do. We’ve meticulously honed our core information quality systems to help you find the best of what’s on the web. And we’ve built a knowledge base of billions of facts about people, places and things – all so you can get information you can trust in the blink of an eye.

Now, with generative AI, Search can do more than you ever imagined. So you can ask whatever’s on your mind or whatever you need to get done – from researching to planning to brainstorming – and Google will take care of the legwork.

This is all made possible by a new Gemini model customized for Google Search. It brings together Gemini’s advanced capabilities – including multi-step reasoning, planning and multimodality – with our best in class Search systems.

ArsTechnica reported Search is still important to Google, but it soon will change. At its all-in-one AI Google I/O event Tuesday, the company introduced a host of AI-enabled features coming to Google Search at various points in the near future, which will “do more for you than you ever imagined.”

It’s not AI in every search, but it will seemingly be hard to avoid a lot of offers to help you find, plan, and brainstorm things. “AI Overviews,” the successor to the Search Generative Experience, will provide summary answers to questions, along with links to sources. You can also soon submit a video as a search query, perhaps to identify objects or provide you own prompts by voice.

In my opinion, the new AI-enabled Google search might help some people to complete their projects, plan a trip, or look up their favorite bands. My hope is that Google’s AI feature will be useful to those who need it.