Web-Bot Technologies – Preparing the Data

Web-Bot Technologies – Preparing the Data

Using a web crawler to index every bit of information ;
1) Divide everything into file types
2) Using MetaData, index keywords, sentences, paragraphs
3) You need to link keywords directly to the info
4) If your program does not understand keywords, sentences and paragraphs in machine language, you have failed, every bit of info can be understood in machine language eg email, tweet, link etc. A furthur enhancement you can search for the languages of the world but it is too complex to discuss here, cause we do not have the technologies now.
5) You need to write an algorithm to spur hundreds of webcrawlers to increase speed
6) Remember after indexing all of your info you need a huge cache to link to searches, once a keyword search is used, it will be kept in memory
7) So in fact everyone is searching for the info from your snapshot instead of directly from Internet
8) So frequency of updates is very important eg every 1 hour
9) Every piece of info is timestamped to give the user a choice to chose the most latest info.

– Contributed by Oogle. 

Advertisements

Author: Gilbert Tan TS

IT expert with more than 20 years experience in Apple, Andriod and Windows PC. Interests include hardware and software, Internet and multimedia. An experienced Real Estate agent, Insurance agent, and a Futures trader.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s