tech

Studies Show AI Will Consume All Human-Generated Data On The Internet By 2026

A recent estimate indicates that AI might deplete all of the Internet's text data within the next few years.

By Syazwan Bahri — 25 Jun 2024, 02:37 PM

Cover image via Shantanu Kumar/Pexels & Mikechie Esparagoza/Pexels

Follow us on Instagram, TikTok, and WhatsApp for the latest stories and breaking news.

A new study has warned that artificial intelligence (AI) systems could consume all of the Internet's existing knowledge by 2026

Researchers have published a paper indicating that large language models (LLMs) such as ChatGPT will completely go through the free Internet as soon as 2026.

AI models like GPT-4, which powers ChatGPT, and Claude 3 Opus rely on the vast amounts of text available online to improve.

To develop better models, tech companies will need to find alternative data sources. This might involve creating synthetic data, using lower-quality sources, or more concerningly, accessing private data stored on servers containing messages and emails.

Here's how the researchers came up with the conclusion:

Image used for illustration purposes only.

Image via Sankret Mishra/Pexels

To estimate the amount of text available online, researchers turned to Google's web index and found about 250 billion web pages, each with around 7,000 bytes of text.

They then analysed the flow of data across the web through IP traffic and online user activity to predict the growth of this data.

Their findings showed that high-quality information from reliable sources could run out by 2032 at the latest. Low-quality text data might be used up between 2030 and 2050. As for image data, it could be completely consumed between 2030 and 2060.

AI requires vast amounts of data to keep improving

Training data fuels AI systems' growth, allowing them to identify increasingly complex patterns within their neural networks. ChatGPT was trained on approximately 570 GB of text data, which equates to around 300 billion words sourced from e-books, online articles, Wikipedia, and other online sources.

AI algorithms trained on insufficient or low-quality data tend to produce unreliable outputs. This was prevalent in the early days of LLM AIs which spewed utter nonsense as their response. For example, a recent chat with an AI went viral for producing bad recipes that included non-edible material like glue.

https://t.co/W09ssjvOkJ pic.twitter.com/6ALCbz6EjK
— SG-r01 (@heavenrend) May 22, 2024

Though there are alternatives to this conundrum, they are far from perfect solutions

Image used for illustration purposes only.

Image via Matheus Bertelli/Pexels

One way to address the need for data to train AI models is by using synthetic, artificially generated data. This method has proven effective in training systems used in gaming, coding, and mathematical applications.

On the other hand, if companies try to gather intellectual property or private information without permission, legal issues could arise, according to experts.

Follow SAYS Tech on Facebook, Instagram, & TikTok for the latest in tech in Malaysia and the world!

Don't miss out! Here are some more trending stories on SAYS:

You may be interested in:

Live Radio

GET AUDIO+
HOT FM
Kool 101
8FM
FLY FM
MOLEK FM

SAYS is Malaysia's social news company. Get highlights of hottest news and must-share stories every day.

Registered under REV SOCIAL MALAYSIA SDN BHD (200801022427)

Studies Show AI Will Consume All Human-Generated Data On The Internet By 2026

A new study has warned that artificial intelligence (AI) systems could consume all of the Internet's existing knowledge by 2026

Here's how the researchers came up with the conclusion:

AI requires vast amounts of data to keep improving

Though there are alternatives to this conundrum, they are far from perfect solutions

Follow SAYS Tech on Facebook, Instagram, & TikTok for the latest in tech in Malaysia and the world!

Read more #tech stories:

Don't miss out! Here are some more trending stories on SAYS:

Don't miss out on Malaysia's top stories!

Thank you!

Subscription failed!

Subscription sent!

You may be interested in: