The Google Goal Of Indexing

by Chander Prabha

In their paper 'The Anatomy of a Large-Scale Hypertextual Web Search Engine' it is very evident that Google's goal has always been to be one of the best search engines there is in terms of the quality of the results it gives. Sergey Brin and Lawrence Page, however knew that in order to do this, Google needed to be able to store information efficiently and cost effectively and to have excellent crawling, indexing, and sorting methods or techniques. Google not only aimed to give quality results but to produce the results as fast as possible.

Google started as a high quality search engine and continues to be the best search engine today. It has managed to stay true to its original intent to be a search engine that not only crawls and indexes the web efficiently but also a search engine that produces more satisfying results in comparison to other existing search engines.

To stay true to the goal of providing the best search results, Google knew right from the start that it had to be designed so that the search engine could catch up with the web's growth. According to Brin and Page "In designing Google we have considered both the rate of growth of the Web and technological changes. Google is designed to scale well to extremely large data sets. It makes efficient use of storage space to store the index". They knew that they needed much space to store an ever growing index.

Google's index size, which started out as 24 million web pages, was large for its time and has grown to around 25 billion web pages, still keeping Google ahead of its competitors. However, Google is a company that doesn't settle for just beating the competitors. They truly aim to give their users the best service there is and that means as a search engine they want to give users access to all or at least most of the quality information that is available on the web.

Google's New System for Indexing More Pages

As mentioned earlier, Google aims to give access to even more information and has been devoting time and much effort to realize this goal. It seems that the new patent entitled 'Multiple Index Based Information Retrieval System' filed by Google employee Anna Patterson might be the answer to the problem. The patent published just this May of 2006 and filed way back in January of 2005 shows that Google might actually be aiming to expand their index size to as much as a 100 billion web pages or even more.

According to the patent, conventional information retrieval systems, more commonly known as search engines, are able to index only a small part of the documents available on the Internet. According to estimates, the existing number of web pages on the Internet as of last year was around 200 billion; however, Patterson claimed that even the best search engine (that is Google) was able to index only up to 6 to 8 billion web pages.

The disparity between the number of indexed pages and existing pages clearly signaled a need for a new breed of information retrieval system. Conventional information retrieval systems just weren't capable of doing the job and just wouldn't be able to index enough web pages to give users access to a large enough percentage of the present existing information available on the web.

The Multiple Index Based Information Retrieval System, however, is up to the challenge and is Google's answer to the problem. Two characteristics of the new system makes it stand out compared to the conventional systems. One is that it has the "capability to index an extremely large number of documents, on the order of a hundred billion or more". And the other is its capability to "index multiple versions or instances of documents for archiving...enabling a user to search for documents within a specific range of dates, and allowing date or version related relevance information to be used in evaluating documents in response to a search query and in organizing search results."

With the new system developed by Patterson, Google now has the ability to expand its index size to unbelievable proportions as well as improve document analysis and processing, document annotation, and even the process of ranking according to contained and anchor phrases.

History of Google's Index Size

Google started out with an index size of around 24 million web pages in 1996. By August of 2000, Google had managed to quadruple their index size to approximately one billion web pages. In September of 2003, Google's front-page boasted an index of 3.3 billion web pages. Microdoc, however, revealed that the actual number of web pages Google had indexed during that time was already more than five billion web pages. In their article 'Google Understates the Size of Its Database', they emphasized that Google not only specialized in simplicity but also in understating their power and complexity. Google was still managing to stay ahead of its competitors and continued to surprise everyone with what they had up their sleeves.

As Google's index continued to grow the number in their front page grew impressively large as well before it plateaued at eight billion web pages. This was around the time that Patterson filed the new patent. Then in 2005, with controversies in index size growing, Google decided to stop counting in front of the public and simply claimed that their index size was three times larger than the nearest competitor's index size. Google also maintained that it was not just the size of indexed pages that was important but how relevant the results they returned were.

Then in September of 2005, as part of Google's 7th anniversary, Anna Patterson, the same software engineer who filed the patent on the Multiple Based Index Information Retrieval System posted an entry on Google's official blog claiming that the index size was now 1,000 times larger than the original index. This pegged their index size at around 24 billion web pages, about a fourth of Google's goal of indexing a 100 billion web pages. It seems then that Google must have started using the new system in mid 2005. With the new system in place, we can only wait and see how fast Google will reach the goal of a 100 billion web pages in its index. It's most likely though that when Google has reached that goal it will set an even higher goal to provide continuous quality service.


About The Author:
Did you find this article useful? For more useful tips and hints, points to ponder and keep in mind, techniques, and insights pertaining to credit card, do please browse for more information at our websites. http://www.yoursgoogleincome.com http://www.freeearningtip.com

0 komentar:

Posting Komentar

Please write your comments here.Thanks.

Kategori

055CXPRO3 100Hz 1024x768 1080ip 1080p 10MP 10xOptical 121megapixel 121Megapixels 121MP 12Channel 12Ounce 141MP 165LBS 18Volt 19YearOlds 23Inch 240ml 25quot 2746 27Inch 2Pack 30Inch 3255 3265 3265Inch 32Inch 37Inch 37LV3500 3Inch 412Inch 46G310U 46Inch 46Inches 46LA45RQ 46PFL5706F7 46quot 46SL412U 46SL417U 47Inch 5460Inch 55Inch 58Inch 5Piece 60Watt 6by9Inch 732YB482K 800x600 812Inch 998864 accept Accessory Add-ons Mozilla Adsense Advertising Advice Affect Affordable ALAMPLB Alexa American Amplifier Android Annies Antenna Anti Virus Antivirus Antivirus Update Aperture Appetite APTMM2B Aquapac Articulating ATHM50 Attic AudioTechnica Automatic Available awning Backlink Backpack Backrest Bamboo Basics Bathroom Batteries Battery Battle BC12062 BDPS570 Bedding Bedroom Bedrooms Beechwood Before Beginners Benefits Berita Berita Terbaru Better Between Bicycle Binoculars Birthday Bisnis Internet Black Blackberry Blanket Blog Blossom Blue BlueSilver BluRay BoltOn Bose Boyfriend Bracket Brackets Brand BRAVIA Breakfast Brightess Bringing Brother BRPK3AN Bubble Building Bundle Burning Butter Buyers Buying Cabinet Cabinets Camcorders Camera Cameras Capacity Cara Bisnis Online Cara Recovery Data Cara Self Test Printer Cara Service Elektronik Cara Service Harddisk Cara Service Jaringan Cara Service Komputer Cara Service Laptop Cara Service Monitor Cara Service Ponsel Cara Service Printer Cara Service TV Cara Service UPS Carbide Carbon Carved Castings Castle Ceiling Celebrate Celtic Chairs Changeable Changing Charger Charlie Chatbox Cheapest Cheetah Cherry Chocolate Choose Choosing Chopsticks Christmas Cinderblock CINNAMON Cleaning Closer Coaxial Coffee Collapsible Collection College Color Column Combination Comforter Compact Companies Company Compare Compared Comparing Compatible Computer Concept Concrete Consider Consumer Contemporary Control Convertible Cooker Cooler Coolpix Cordless Corner Cotton cpu Crafting Creating Creative Crochet Crocheter Css Cubbies Cutting CX3800 CX3810 CX4200 CX4300 CX4600 CX4800 CX5000C CX7400 CX7800 Cybershot Danica Dawson DC91802 Decorate Decorating Definition Delight Deluxe Design Designer Development Device DEWALT Dietary Difference Different Digital Digital Camera Canon Digital Camera Canon Powershot Digital Camera Fujifilm Digital Camera Lens Digital Camera Nikon Digital Camera Panasonic Digital SLR Camera Discover Display Displays Distressed Distribution DMCLX5 DMCS1 DMCS3 DMCTS2 DMCTS3 DMCZS5 DMCZS9 Dofollow Dollars Domain Dreams Drilling Drills Driver Drivers Drives DSCH70 DSCT10 DSCT90 DSCW310BLDB Dumbbell Dymatize E-book Gratis E370VA E470VLE EA8080 Earthquake EasyShare Ebook ECES80 ECST65 Edition Effective Egyptian ELPLP49 Emoticons Entertainment Ericsson Espresseria Espresso Essentials Ethernet Everyone Everything Excellent Exchange EXFH20 Exilim Experience Explained Extended Extremely EXZ75 Facebook Family Fantastic Fashionable Fashions Faucet Features Fermented Fibromyalgia Finding FinePix Finish Firmness Follow Format Formula Foundation foursquare Freaks Free Software Freeware Fujifilm Function Furniture Future Game Gratis Keren Gaming Generation Getting Girlfriend Glasses Goggles Google Adsense Google Friend Connect Google Translate Google+ GoPro Camera Grey Grinder Hacking Halloween Handheld Handlebar Handset Headboard Headphones Headset heello HighPerformance Hitachi Hollywood Homemade Homework Housing HTML Husks Ilmu Komputer Images Imageshack Importance Important Includes Infant Info Aksessoris Komputer Info Software Information Integrated Intelligent Interface Interlock International Internet Networking Internet Business iPhone Istilah-Istilah Komputer - TI Jacquard KDL46BX420 KDL46EX403U KDL46EX500 KDL46NX810 KDL46S5100 Kegerator Kidkraft Kikkerland Kitchen Ladies Lain-lain Laptops latex LBOAS LC46D78UN LC46LE830U LCDDLP LCDDVD LCDNotebook Leather LEDLCD Lenovo Lensbaby Lenses LHB976 Lightning Lights Limited Linux Lipper Literature Lithium LithiumIon Little LN32B360 LN32D450 LN46A650 LN46B550 LN46B650 LN46B650T LN46C530 LN46C630 LN46C750 LN46D503 LN46D550 LN46D630 LNS4695D Lumens M1924A Magnesium Magnetic Maintenance Management Manfrotto Mansion Marketplace Matte Mattress mattresses Measurements MegaPixel Megapixels memory Menu Horizontal Menu Vertical MF607B MFC8670 Microwave MinoHD Minolta Mobile Model Models Modern Modulator Module Monitor Monitors Motion Mountable Mounting Mounts Movies Moving MP248B MPEG124 MultiPurpose Multiroom MultiSpeaker Muscle Muscles MustHave MX1260 Natural Nautical Newborn Nintendo Nobodys NonFerrous Notebook Novice NSX46GT1 Nursery Nutrition Nutritional Olympus Opened Optical Optimum Optimus Options Optoma Organic Original Ottoman Ottomans Outing Outstanding Overhead Pagerank Painting Pair Panasonic Panduan Service Monitor I Panduan Service Monitor II Pastures pc Personally Philips Phones Photographers Photographs Photography Picking Pictures Pieces Pillows Pink Pinpoint Pixels Plasma Plastic Platinum Player PlaySport PlayStation Pleasure Pocket Polaroid Popular Possible Posters Pounds Powder Powered Powerful PowerShot PreCharged Preparations Presentations Pressure Preview Princess Principle Printer Printers Prints Processor Product Product Review Products Professional Profile Projector Projectors Protector Protein Psyllium Public Pumpkin Purchase Purple Quality Quattron Quickly Quilted Rayovac Reader Really Reasons Rectangle RedCyan Reduction Reference Refresh Release Remote Replace Replacement Reports Research Resetter Canon Resetter Epson Resetter HP Resolution Results Reticle Review Reviews Riding Riflescope Rimfire RoomWithAView RoundAndRound RSS FEED RTB1100 RuggedWaterproof Safety Sakura Samsung Satisfactory Screen Screens Scrutiny SD780IS SDSDHCMMC Seating Seatpost Secrets Section Seniors Sensational Sensor SEO Series Serious Service Elektronik Service Komputer Laptop Service Printer Canon Service Printer Epson Service Printer HP Service Printer Laser Jet Sharpener Sheets Shelved Shelves Shelving Should Silicone Silver Sitemap Slender Software Gratis Pilihan Soldering Solutions Sooner Sourcing Spacesaver SPDIF Speakers Speaking Sports Squeeze SS1000 SSeries Stabilized Standard Standing Stands Staying Steady Stools Storage Streaming Strict Stripe Stripes Students Studio Stylish Subwoofer Suggestions Summer Suncast Supplements Suppress Surround Susan Swivel Swiveling SX10IS SX130IS SX20IS SX210IS SX230HS System T90dvfdblack Tables Tablet TBS421601 TCP46S30 TechFuel Technology Teens Television Televisions Templates Terminology Theater Themed Therapy Think Thinkpad Timelines Tip Trik Internet Tip Trik Windows Tipped Tips and Tricks Tips Blogger-Blogspot Tips Blogging Tips Blogspot Tips dan Trik Tips HTML dan JavaScript Tips Keamanan Komputer Tips PHP Tips Visual Basic Tips Windows Vista Tips Windows XP Tips-Tutorial Photoshop Toddler Toilet Toppers TopSelling Torpedo Toshiba Touchscreen Transcend Transmitter Travel TraysTables Treatment Tripod troubleshouting TS8GSDHC6E Turtle Tutorial Blogspot Tutorial Wordpress TVProtectorTM TwinFull Twitter Twitter Stuff tx1000 Ukur Komponen Elektronika UltraHD UM720S UN46B6000 UN46D6000 UN46D6400 UN55B8000 Understanding Underwater Universal V13H010L49 VariAngle Vaulted Vibration Video VideoSecu Viewing Vision Vitamin WAH1111BA0850 Wallet Walls Warta Terkini Waterproof Wave Website Weight Weller WellOrganized Western Whats White Wholesale WideAngle Widgets Windows Windows Tips Winsome Wireless Within Without Wooden Working WPS18MP X500BK ZoomNikkor