The Google Goal Of Indexing

by Chander Prabha

In their paper 'The Anatomy of a Large-Scale Hypertextual Web Search Engine' it is very evident that Google's goal has always been to be one of the best search engines there is in terms of the quality of the results it gives. Sergey Brin and Lawrence Page, however knew that in order to do this, Google needed to be able to store information efficiently and cost effectively and to have excellent crawling, indexing, and sorting methods or techniques. Google not only aimed to give quality results but to produce the results as fast as possible.

Google started as a high quality search engine and continues to be the best search engine today. It has managed to stay true to its original intent to be a search engine that not only crawls and indexes the web efficiently but also a search engine that produces more satisfying results in comparison to other existing search engines.

To stay true to the goal of providing the best search results, Google knew right from the start that it had to be designed so that the search engine could catch up with the web's growth. According to Brin and Page "In designing Google we have considered both the rate of growth of the Web and technological changes. Google is designed to scale well to extremely large data sets. It makes efficient use of storage space to store the index". They knew that they needed much space to store an ever growing index.

Google's index size, which started out as 24 million web pages, was large for its time and has grown to around 25 billion web pages, still keeping Google ahead of its competitors. However, Google is a company that doesn't settle for just beating the competitors. They truly aim to give their users the best service there is and that means as a search engine they want to give users access to all or at least most of the quality information that is available on the web.

Google's New System for Indexing More Pages

As mentioned earlier, Google aims to give access to even more information and has been devoting time and much effort to realize this goal. It seems that the new patent entitled 'Multiple Index Based Information Retrieval System' filed by Google employee Anna Patterson might be the answer to the problem. The patent published just this May of 2006 and filed way back in January of 2005 shows that Google might actually be aiming to expand their index size to as much as a 100 billion web pages or even more.

According to the patent, conventional information retrieval systems, more commonly known as search engines, are able to index only a small part of the documents available on the Internet. According to estimates, the existing number of web pages on the Internet as of last year was around 200 billion; however, Patterson claimed that even the best search engine (that is Google) was able to index only up to 6 to 8 billion web pages.

The disparity between the number of indexed pages and existing pages clearly signaled a need for a new breed of information retrieval system. Conventional information retrieval systems just weren't capable of doing the job and just wouldn't be able to index enough web pages to give users access to a large enough percentage of the present existing information available on the web.

The Multiple Index Based Information Retrieval System, however, is up to the challenge and is Google's answer to the problem. Two characteristics of the new system makes it stand out compared to the conventional systems. One is that it has the "capability to index an extremely large number of documents, on the order of a hundred billion or more". And the other is its capability to "index multiple versions or instances of documents for archiving...enabling a user to search for documents within a specific range of dates, and allowing date or version related relevance information to be used in evaluating documents in response to a search query and in organizing search results."

With the new system developed by Patterson, Google now has the ability to expand its index size to unbelievable proportions as well as improve document analysis and processing, document annotation, and even the process of ranking according to contained and anchor phrases.

History of Google's Index Size

Google started out with an index size of around 24 million web pages in 1996. By August of 2000, Google had managed to quadruple their index size to approximately one billion web pages. In September of 2003, Google's front-page boasted an index of 3.3 billion web pages. Microdoc, however, revealed that the actual number of web pages Google had indexed during that time was already more than five billion web pages. In their article 'Google Understates the Size of Its Database', they emphasized that Google not only specialized in simplicity but also in understating their power and complexity. Google was still managing to stay ahead of its competitors and continued to surprise everyone with what they had up their sleeves.

As Google's index continued to grow the number in their front page grew impressively large as well before it plateaued at eight billion web pages. This was around the time that Patterson filed the new patent. Then in 2005, with controversies in index size growing, Google decided to stop counting in front of the public and simply claimed that their index size was three times larger than the nearest competitor's index size. Google also maintained that it was not just the size of indexed pages that was important but how relevant the results they returned were.

Then in September of 2005, as part of Google's 7th anniversary, Anna Patterson, the same software engineer who filed the patent on the Multiple Based Index Information Retrieval System posted an entry on Google's official blog claiming that the index size was now 1,000 times larger than the original index. This pegged their index size at around 24 billion web pages, about a fourth of Google's goal of indexing a 100 billion web pages. It seems then that Google must have started using the new system in mid 2005. With the new system in place, we can only wait and see how fast Google will reach the goal of a 100 billion web pages in its index. It's most likely though that when Google has reached that goal it will set an even higher goal to provide continuous quality service.


About The Author:
Did you find this article useful? For more useful tips and hints, points to ponder and keep in mind, techniques, and insights pertaining to credit card, do please browse for more information at our websites. http://www.yoursgoogleincome.com http://www.freeearningtip.com

0 komentar:

Posting Komentar

Please write your comments here.Thanks.

Kategori

055CXPRO3 100Hz 1024x768 1080ip 1080p 10MP 10xOptical 121MP 121Megapixels 121megapixel 12Channel 12Ounce 141MP 165LBS 18Volt 19YearOlds 23Inch 240ml 25quot 2746 27Inch 2Pack 30Inch 3255 3265 3265Inch 32Inch 37Inch 37LV3500 3Inch 412Inch 46G310U 46Inch 46Inches 46LA45RQ 46PFL5706F7 46SL412U 46SL417U 46quot 47Inch 5460Inch 55Inch 58Inch 5Piece 60Watt 6by9Inch 732YB482K 800x600 812Inch 998864 ALAMPLB APTMM2B ATHM50 Accessory Add-ons Mozilla Adsense Advertising Advice Affect Affordable Alexa American Amplifier Android Annies Antenna Anti Virus Antivirus Antivirus Update Aperture Appetite Aquapac Articulating Attic AudioTechnica Automatic Available BC12062 BDPS570 BRAVIA BRPK3AN Backlink Backpack Backrest Bamboo Basics Bathroom Batteries Battery Battle Bedding Bedroom Bedrooms Beechwood Before Beginners Benefits Berita Berita Terbaru Better Between Bicycle Binoculars Birthday Bisnis Internet Black Blackberry Blanket Blog Blossom BluRay Blue BlueSilver BoltOn Bose Boyfriend Bracket Brackets Brand Breakfast Brightess Bringing Brother Bubble Building Bundle Burning Butter Buyers Buying CINNAMON CX3800 CX3810 CX4200 CX4300 CX4600 CX4800 CX5000C CX7400 CX7800 Cabinet Cabinets Camcorders Camera Cameras Capacity Cara Bisnis Online Cara Recovery Data Cara Self Test Printer Cara Service Elektronik Cara Service Harddisk Cara Service Jaringan Cara Service Komputer Cara Service Laptop Cara Service Monitor Cara Service Ponsel Cara Service Printer Cara Service TV Cara Service UPS Carbide Carbon Carved Castings Castle Ceiling Celebrate Celtic Chairs Changeable Changing Charger Charlie Chatbox Cheapest Cheetah Cherry Chocolate Choose Choosing Chopsticks Christmas Cinderblock Cleaning Closer Coaxial Coffee Collapsible Collection College Color Column Combination Comforter Compact Companies Company Compare Compared Comparing Compatible Computer Concept Concrete Consider Consumer Contemporary Control Convertible Cooker Cooler Coolpix Cordless Corner Cotton Crafting Creating Creative Crochet Crocheter Css Cubbies Cutting Cybershot DC91802 DEWALT DMCLX5 DMCS1 DMCS3 DMCTS2 DMCTS3 DMCZS5 DMCZS9 DSCH70 DSCT10 DSCT90 DSCW310BLDB Danica Dawson Decorate Decorating Definition Delight Deluxe Design Designer Development Device Dietary Difference Different Digital Digital Camera Canon Digital Camera Canon Powershot Digital Camera Fujifilm Digital Camera Lens Digital Camera Nikon Digital Camera Panasonic Digital SLR Camera Discover Display Displays Distressed Distribution Dofollow Dollars Domain Dreams Drilling Drills Driver Drivers Drives Dumbbell Dymatize E-book Gratis E370VA E470VLE EA8080 ECES80 ECST65 ELPLP49 EXFH20 EXZ75 Earthquake EasyShare Ebook Edition Effective Egyptian Emoticons Entertainment Ericsson Espresseria Espresso Essentials Ethernet Everyone Everything Excellent Exchange Exilim Experience Explained Extended Extremely Facebook Family Fantastic Fashionable Fashions Faucet Features Fermented Fibromyalgia Finding FinePix Finish Firmness Follow Format Formula Foundation Freaks Free Software Freeware Fujifilm Function Furniture Future Game Gratis Keren Gaming Generation Getting Girlfriend Glasses GoPro Camera Goggles Google Adsense Google Friend Connect Google Translate Google+ Grey Grinder HTML Hacking Halloween Handheld Handlebar Handset Headboard Headphones Headset HighPerformance Hitachi Hollywood Homemade Homework Housing Husks Ilmu Komputer Images Imageshack Importance Important Includes Infant Info Aksessoris Komputer Info Software Information Integrated Intelligent Interface Interlock International Internet Networking Internet Business Istilah-Istilah Komputer - TI Jacquard KDL46BX420 KDL46EX403U KDL46EX500 KDL46NX810 KDL46S5100 Kegerator Kidkraft Kikkerland Kitchen LBOAS LC46D78UN LC46LE830U LCDDLP LCDDVD LCDNotebook LEDLCD LHB976 LN32B360 LN32D450 LN46A650 LN46B550 LN46B650 LN46B650T LN46C530 LN46C630 LN46C750 LN46D503 LN46D550 LN46D630 LNS4695D Ladies Lain-lain Laptops Leather Lenovo Lensbaby Lenses Lightning Lights Limited Linux Lipper Literature Lithium LithiumIon Little Lumens M1924A MF607B MFC8670 MP248B MPEG124 MX1260 Magnesium Magnetic Maintenance Management Manfrotto Mansion Marketplace Matte Mattress Measurements MegaPixel Megapixels Menu Horizontal Menu Vertical Microwave MinoHD Minolta Mobile Model Models Modern Modulator Module Monitor Monitors Motion Mountable Mounting Mounts Movies Moving MultiPurpose MultiSpeaker Multiroom Muscle Muscles MustHave NSX46GT1 Natural Nautical Newborn Nintendo Nobodys NonFerrous Notebook Novice Nursery Nutrition Nutritional Olympus Opened Optical Optimum Optimus Options Optoma Organic Original Ottoman Ottomans Outing Outstanding Overhead Pagerank Painting Pair Panasonic Panduan Service Monitor I Panduan Service Monitor II Pastures Personally Philips Phones Photographers Photographs Photography Picking Pictures Pieces Pillows Pink Pinpoint Pixels Plasma Plastic Platinum PlaySport PlayStation Player Pleasure Pocket Polaroid Popular Possible Posters Pounds Powder PowerShot Powered Powerful PreCharged Preparations Presentations Pressure Preview Princess Principle Printer Printers Prints Processor Product Product Review Products Professional Profile Projector Projectors Protector Protein Psyllium Public Pumpkin Purchase Purple Quality Quattron Quickly Quilted RSS FEED RTB1100 Rayovac Reader Really Reasons Rectangle RedCyan Reduction Reference Refresh Release Remote Replace Replacement Reports Research Resetter Canon Resetter Epson Resetter HP Resolution Results Reticle Review Reviews Riding Riflescope Rimfire RoomWithAView RoundAndRound RuggedWaterproof SD780IS SDSDHCMMC SEO SPDIF SS1000 SSeries SX10IS SX130IS SX20IS SX210IS SX230HS Safety Sakura Samsung Satisfactory Screen Screens Scrutiny Seating Seatpost Secrets Section Seniors Sensational Sensor Series Serious Service Elektronik Service Komputer Laptop Service Printer Canon Service Printer Epson Service Printer HP Service Printer Laser Jet Sharpener Sheets Shelved Shelves Shelving Should Silicone Silver Sitemap Slender Software Gratis Pilihan Soldering Solutions Sooner Sourcing Spacesaver Speakers Speaking Sports Squeeze Stabilized Standard Standing Stands Staying Steady Stools Storage Streaming Strict Stripe Stripes Students Studio Stylish Subwoofer Suggestions Summer Suncast Supplements Suppress Surround Susan Swivel Swiveling System T90dvfdblack TBS421601 TCP46S30 TS8GSDHC6E TVProtectorTM Tables Tablet TechFuel Technology Teens Television Televisions Templates Terminology Theater Themed Therapy Think Thinkpad Timelines Tip Trik Internet Tip Trik Windows Tipped Tips Blogger-Blogspot Tips Blogging Tips Blogspot Tips HTML dan JavaScript Tips Keamanan Komputer Tips PHP Tips Visual Basic Tips Windows Vista Tips Windows XP Tips and Tricks Tips dan Trik Tips-Tutorial Photoshop Toddler Toilet TopSelling Toppers Torpedo Toshiba Touchscreen Transcend Transmitter Travel TraysTables Treatment Tripod Turtle Tutorial Blogspot Tutorial Wordpress TwinFull Twitter Twitter Stuff UM720S UN46B6000 UN46D6000 UN46D6400 UN55B8000 Ukur Komponen Elektronika UltraHD Understanding Underwater Universal V13H010L49 VariAngle Vaulted Vibration Video VideoSecu Viewing Vision Vitamin WAH1111BA0850 WPS18MP Wallet Walls Warta Terkini Waterproof Wave Website Weight WellOrganized Weller Western Whats White Wholesale WideAngle Widgets Windows Windows Tips Winsome Wireless Within Without Wooden Working X500BK ZoomNikkor accept awning cpu foursquare heello iPhone latex mattresses memory pc troubleshouting tx1000