AI data

The Secretive Surge in AI Data Acquisition by Big Tech

Big Tech companies are increasingly purchasing vast quantities of AI training data from various sources, including long-forgotten social media platforms, amidst growing legal and ethical concerns.

Main Points:

  • Photobucket, once a leading image-hosting site, is negotiating with tech giants to license its 13 billion photos and videos for AI training, highlighting a booming market for such data.
  • Companies like Google, Meta, and Microsoft have historically scraped the internet for free data but are now seeking to legally acquire content, driving a hidden market for copyrighted and personal data.
  • The surge in AI data purchases raises legal, ethical, and privacy issues, with companies and data brokers creating a new economy around “ethically sourced” content for AI training.

Summary:

In an in-depth exploration by Reuters, the underground race by tech giants to acquire AI training data is brought to light, showcasing a shift from the free scraping of internet data to purchasing content from various sources. Photobucket, a relic of the early internet, has become a potential goldmine for its vast archive of images and videos, now sought after for training generative AI models. This trend is indicative of a larger move by companies like Google, Meta, and Microsoft, which face legal and ethical challenges over their data scraping practices and are now quietly paying for access to content behind paywalls and from faded social media platforms.

The burgeoning market for AI training data is not just about acquiring content; it’s also about navigating legal and ethical boundaries. Companies are entering into multimillion-dollar deals with content providers, such as stock image libraries and news organizations, to secure vast amounts of data. This rush is fueled by the need to train sophisticated AI models that require a diversity of data, from photographs to chat logs, under conditions that avoid copyright infringement and respect privacy. An emerging industry of AI data firms and brokers is catering to this demand, securing rights to real-world content and producing custom data sets labeled as “ethically sourced” to assuage concerns over consent and privacy.

However, the race to amass such data raises significant concerns. There’s an inherent risk in reviving old internet archives for AI training, as it might infringe on individuals’ privacy and consent, particularly if personal or sensitive data is used without clear authorization. Companies like Photobucket are navigating these murky waters by updating terms of service and exploring data licensing as a revenue model, but the approach is not without its critics. As the AI data market grows, the tension between the technological drive for data and the ethical considerations of privacy and consent continues to mount, setting the stage for potential regulatory scrutiny and legal challenges.

Source: Inside Big Tech’s underground race to buy AI training data

Keep up to date on the latest AI news and tools by subscribing to our weekly newsletter, or following up on Twitter and Facebook.

Spread the love

Leave a Reply

Your email address will not be published. Required fields are marked *