tech

Studies Show AI Will Consume All Human-Generated Data On The Internet By 2026

A recent estimate indicates that AI might deplete all of the Internet's text data within the next few years.

By Syazwan Bahri — 25 Jun 2024

Cover image via Shantanu Kumar/Pexels Mikechie Esparagoza/Pexels

Follow us on Instagram, TikTok, and WhatsApp for the latest stories and breaking news.

A new study has warned that artificial intelligence (AI) systems could consume all of the Internet's existing knowledge by 2026

Researchers have published a paper indicating that large language models (LLMs) such as ChatGPT will completely go through the free Internet as soon as 2026.

AI models like GPT-4, which powers ChatGPT, and Claude 3 Opus rely on the vast amounts of text available online to improve.

To develop better models, tech companies will need to find alternative data sources. This might involve creating synthetic data, using lower-quality sources, or more concerningly, accessing private data stored on servers containing messages and emails.

Here's how the researchers came up with the conclusion:

Image used for illustration purposes only.

Image via Sankret Mishra/Pexels

To estimate the amount of text available online, researchers turned to Google's web index and found about 250 billion web pages, each with around 7,000 bytes of text.

They then analysed the flow of data across the web through IP traffic and online user activity to predict the growth of this data.

Their findings showed that high-quality information from reliable sources could run out by 2032 at the latest. Low-quality text data might be used up between 2030 and 2050. As for image data, it could be completely consumed between 2030 and 2060.

AI requires vast amounts of data to keep improving

Training data fuels AI systems' growth, allowing them to identify increasingly complex patterns within their neural networks. ChatGPT was trained on approximately 570 GB of text data, which equates to around 300 billion words sourced from e-books, online articles, Wikipedia, and other online sources.

AI algorithms trained on insufficient or low-quality data tend to produce unreliable outputs. This was prevalent in the early days of LLM AIs which spewed utter nonsense as their response. For example, a recent chat with an AI went viral for producing bad recipes that included non-edible material like glue.

Though there are alternatives to this conundrum, they are far from perfect solutions

Image used for illustration purposes only.

Image via Matheus Bertelli/Pexels

One way to address the need for data to train AI models is by using synthetic, artificially generated data. This method has proven effective in training systems used in gaming, coding, and mathematical applications.

On the other hand, if companies try to gather intellectual property or private information without permission, legal issues could arise, according to experts.

☰

Studies Show AI Will Consume All Human-Generated Data On The Internet By 2026

A new study has warned that artificial intelligence (AI) systems could consume all of the Internet's existing knowledge by 2026

Here's how the researchers came up with the conclusion:

AI requires vast amounts of data to keep improving

Though there are alternatives to this conundrum, they are far from perfect solutions

Follow SAYS Tech on Facebook, Instagram, & TikTok for the latest in tech in Malaysia and the world!

Read more #tech stories:

Don't miss out! Here are some more trending stories on SAYS:

Leave a comment

Don't miss out! We'll send a list to your inbox, once a day. Subscribe now!

Thank you!

Subscription failed!

Thank you!

Subscription failed!

A new study has warned that artificial intelligence (AI) systems could consume all of the Internet's existing knowledge by 2026

Here's how the researchers came up with the conclusion:

AI requires vast amounts of data to keep improving

Though there are alternatives to this conundrum, they are far from perfect solutions

Follow SAYS Tech on Facebook, Instagram, & TikTok for the latest in tech in Malaysia and the world!

Read more #tech stories:

Retrofitting HID or LED Bulbs To Your Headlamps Can Get You In Trouble

M'sians Are Selling The Free Tote Bags From Apple The Exchange TRX For Up To RM10,000

Boost Bank Is Here, And You Could Get 4% P.A. Interest & Up To RM18 Cashback

Game Studio Boss Demands His Team Join Him For 'Naked Sauna' Sessions To Find New Ideas

We Built 3 LEGO Space Sets Released In 2024. Here's What We Think

[PHOTOS] Here's An Exclusive Preview Of The First Apple Store In Malaysia At TRX

Don't miss out! Here are some more trending stories on SAYS:

RM3,000 Reward For Anyone Who Can Identify A Man Who Sliced Open A Cat In Klang

28-Year-Old Bodybuilder Dies Of Heart Complications A Day After Competing In An Event

Sabah Parks Assures No Casualties After Video Of Landslide At Mount Kinabalu Goes Viral

Timers In Women's Toilets At Popular Tourist Site In China Sparks Debate Among Netizens

JKR Responds To Viral Video Of Man Complaining About Poorly Patched Pothole In Klang

Don't miss out on Malaysia's top stories!

Thank you!

Subscription failed!

Leave a comment

Thank you!

Subscription failed!