Maker Pro
Maker Pro

Searching the entire Internet by index

Keep in mind everyone, this is a theoretical question -

What I was wondering about is hypothetically, would it be possible to program a computer to crawl the entire Internet, or the entire index of a search engine like Google to search for content, say an image?

Here's the concept:

Suppose you have an image, for purposes of conversation, the bitmap icon image from Windows. Would it be possible to programmatically assign a computer to go through every image on Google's image search index and compare the icon versus the image?

Of course, image comparison algorithms and the means of performing that comparison completely aside.
 
Google has an image search function that already does that, it looks for similar shapes, colors, etc.

And they have a handle on just about every site that has an external link, you would not be able to create a program that would find more sites than they list without searching every possible web site name, with every possible beginning and end (www / .com)

Now as far as you are saying, theoretically it would be possible, but it would require a fair bit of processing power (like super workstation) and would take an immense amount of time for even the simplest of images. Do you have any idea the number of images on Google? Let alone the web?? There are literally trillions of images.

There are also other sites, like Wolfram Alpha, that will do image searches and data searches with all sorts of data types
 
I don't think that Google would be very thrilled if you tried to download every image in their index. I'm pretty sure that it would be against their terms of service.
 

(*steve*)

¡sǝpodᴉʇuɐ ǝɥʇ ɹɐǝɥd
Moderator
I don't think that Google would be very thrilled if you tried to download every image in their index. I'm pretty sure that it would be against their terms of service.

Unless you had something like a 10GB/s internet feed, I doubt they'd even notice. At normal (normal even for local networks) the task would take so long as to be practically impossible (and this is notwithstanding the fact that new content will almost certainly be being created faster than you can suck down the old stuff).

This puts things into perspective.
 
When I originally posted this I was just intending it as a sort of "think out loud" sort of question; I was more playing on the first part, that is, manually running IP addresses.

It'd be a computationally-insane task to complete nowadays because of the sheer volume, but recently I've been reading on search engine theory and it had me wondering. Much of the high-volume Internet crawling that occurs seems to be similar to key searching in cryptography. You can do it exhaustively; that is, search every possible location or possibility taking the most time and power, or you can make what amount to be educated guesses, the trade-off being completeness.

Food for thought, that's all.
 

(*steve*)

¡sǝpodᴉʇuɐ ǝɥʇ ɹɐǝɥd
Moderator
You've also got to remember that there could be a million web sites behind a given IP address. I don't mean a web site with a million pages, but a million completely independent web sites. So you can really only search by URL, and that essentially means crawling the web.
 
Top