Despite the popularity of the Internet across the planet, many users consider technology to be some kind of invisible force. Although in fact, this is more than a material thing, for the operation of which powerful computers, servers and data centers are responsible, exchanging information in a split second and interconnected by kilometers of cables and optical fibers. What are these storages, how are data centers arranged and what the data centers of the largest companies look like.
What is a data center
Like many other innovations, people owe the invention and spread of the Internet to the military industry. It was for her that the first developments of the Network were made, and it was for the connection of military bases that their laboratories were decided to be combined into a network (first local, and now ubiquitous), which is used not only in resolving military conflicts. Today, development is used to distribute content on any topic and in any direction. Uploading information to the vastness of the Web (be it a photo, video, or "quote of the day" from Jason State), it instantly gets into the data processing center (DC).
A data center is not just a big flash drive, it is huge fortress-like buildings filled with servers, optical cables and wires. The operation and maintenance of a modern storage facility requires the same amount of electricity as for the maintenance of a small town. The use of data centers allows solving several tasks simultaneously:
- round the clock and uninterrupted work. Electricity is supplied without interruption: data centers of the fourth level Tier4 are connected to two power plants at the same time, for insurance. And even if there is an accident on the line, there are always powerful generators in stock, which are ready to take over the watch at any moment.
- access protection. There are always third parties who want to take possession of this or that information, therefore, accepting data for storage, data centers ensure its confidentiality.
- safety and integrity. All information is stored in data centers: from a photo of your favorite pet to secret data.
Data center: ready for any challenge
Data centers are equipped not only with modern servers, but also with reliable fire protection. Gas systems use carbon dioxide powder, which is capable of extinguishing fires, to prevent damage to the rest of the equipment. Particular attention is paid to ensuring an appropriate climate.
Servers and hard drives generate heat during use. A matchbox cooler is enough to cool a PC; this option is not suitable for an industrial scale. Full-fledged air conditioning and ventilation systems are installed here, which protect the server labyrinths from overheating.
The spirit of commerce or what data centers make money on
Large companies such as Facebook, Google have their own storage facilities at their disposal, but for more modest consumers there is a service of renting a place in the data center. This can be a single server (dedicated server) or a place in a rack (collocation) where you can install your own server, or a place in network storage. In the case of a lease with the installation of their own equipment, the owners of data centers earn not only from renting an area, but also from electricity. they sell it to tenants with a slight wrap.
Another option for making money for data center owners is leasing licensed software. Data centers purchase software and install them on their servers, and then (for a fee) rent them out in parts. In recent years, the service of renting a virtual server is gaining popularity: a part of a server resource (VPS - virtual private server).
Where all information is stored on the Internet: data centers that are unique in their grandeur and power
IBM (USA)
The result of the experiment of the well-known corporation was a data center on the territory of Syracuse University. The essence of the task was to reduce the consumed electricity by half. And in 2009 they succeeded. A separate gas station is used for power supply.
Citigroup (Germany)
The center, developed by Arup Associates in 2008, is considered one of the greenest complexes in this category. This means that his work causes minimal harm to nature. Everything from lighting to cooling is geared towards sustainable use. You can guess about caring for nature just by looking at the structure: one of the gables is covered with a lawn that adorns the building and collects the water used in humidifiers.
Ebay (USA)
Ebay's data center is built on the sands of the Arizona desert (not an easy task for engineers working on a cooling system). The equipment in this center is placed in special containers, which not only managed to protect them from overheating, but also to increase their own energy efficiency up to 95%.
Digital Beijing (China)
The Beijing data center stands out for its power and bold architectural solutions. Especially for the 2008 Olympics, the architectural company Studio Pei-Zhu built an 11-story building, which became both the data center and the technical support headquarters for the Olympic Games. Now that the sporting events are over, the building houses a museum.
Apple (USA)
The apple company cares not only about the continuity and safety of its customers' data, but also about the environmental situation on the planet. Therefore, one of the main goals was the use of energy from renewable sources. Data center operations depend on 400,000 square meters of solar panels. The energy is sufficient to provide 60% of the center's operation, the rest of the power is supplied from the power plant (biofuel).
Google (Finland)
The giant of the web industry certainly has more than one data center. The complexes are scattered all over the planet and almost all of them meet the “green” criterion. One of the best Finnish specialists, Alvar Aalto, was invited to work on the Finnish data center. The cold waters of the Gulf of Finland are ideal for creating a suitable indoor climate.
Verne Global (Iceland)
The BMW concern uses this data center in Reykjavik for its own needs: calculating the performance of new models, processing test results and more. Due to the operation of hydroelectric power plants installed near geysers, the data center does not pollute the environment with carbon dioxide.
Facebook (USA)
In Primeville, Mark Zuckerberg's company has erected a data center with an area of \u200b\u200b28 thousand square meters. Imagine a flash drive the size of three football fields. To combine the servers, 6.5 thousand kilometers of fiber are used, and a 7-room penthouse with a modern natural air conditioning system was built for cooling.
Send your good work in the knowledge base is simple. Use the form below
Students, graduate students, young scientists using the knowledge base in their studies and work will be very grateful to you.
Similar documents
Methods and tools for storing data on the World Wide Web. The concept and varieties of hypertext documents and graphic files. The principles of search engines and the rules for finding the information you need. Characteristics of some search engines on the Web.
term paper, added 04/18/2010
Characteristics of search engines Yandex, Google, Rambler: similarities and differences, advantages and disadvantages. Search for a definition of a number of terms, software products. Search for information on directions: writers and poets, their works, doctors of science for Samara.
test, added 08/22/2011
Description and classification of modern information retrieval systems. Hypertext documents. Review and ratings of the world's major search engines. Development of an information retrieval system that demonstrates the mechanism of information retrieval on the Internet.
thesis, added 06/16/2015
The structure and principles of building the Internet, searching and storing information in it. The history of the emergence and classification of information retrieval systems. The principle of operation and characteristics of the search engines Google, Yandex, Rambler, Yahoo. Search by URLs.
term paper, added 03/29/2013
Worldwide system of interconnected computer networks, built on the use of IP and data packet routing. Domain and its levels. Basic Internet services. What you need to connect to the Internet. Rules for finding information on the Internet.
term paper added 01/10/2012
The structure of the Internet and its use in education. Description of the functioning of e-mail, teleconference, search engines, general educational catalogs and portals, electronic libraries and other links. Pros and cons of using the network.
abstract, added 11/16/2011
Consideration of Internet search engines as a software and hardware complex with a web interface that provides the ability to search for information. Search engine types: Archie, Wandex, Aliweb, WebCrawler, AltaVista, Yahoo !, Google, Yandex, Bing and Rambler.
MINISTRY OF EDUCATION AND SCIENCE OF THE RUSSIAN FEDERATION
State educational institution of higher professional education
“TAGANROG STATE PEDAGOGICAL INSTITUTE named after A.P. Chekhov "
Faculty of Informatics
Department of Informatics and Management
Course work
Organization of storage and retrieval of information on the Internet
4th year students
Sheverda M.A.
Informatics with add. specialist. Foreign language
supervisor
K.T.N. Assoc. Tyushnyakova I.A.
Taganrog
Introduction
Basic concepts of information retrieval
Network infrastructure
Search engine history
1 History of the creation of the search engine Google
2 How the Google search engine works
3 Yandex search engine
4 Rambler search engine
5 Yahoo search engine
6 Searching URLs
Searching for information on the Internet
Saving information on the Internet
Conclusion
List of references
Introduction
The Internet is a global computer network that hosts various services or services (E-mail, Word Wide Web, FTP, Usenet, Telnet, etc.). Computer networks are designed for data transmission, telephone and radio networks for voice transmission, television networks for image transmission.
Depending on the distance between PCs, local, territorial and corporate computer networks are distinguished. Convergence of telecommunication networks (computer, radio, telephone and television networks) provides the possibility of high-quality transmission of data, voice and images over single (multiservice) new generation networks (Internet networks).
The Internet has long become not only a means of communication, but also a field for serious commercial activity. Almost every foreign company has its own representative office on the Internet, a virtual office. The total turnover of companies trading on the Internet reaches billions of dollars. In Russia, an increasing number of companies are also using the Internet to promote their products and services. This is easy to verify by looking at advertising publications. More and more email and Web site addresses are found alongside familiar telephone and fax numbers. Soon, the lack of an Internet address will be as difficult as the lack of a fax.
Therefore, more and more people are turning to the Internet to get the latest information: about services and prices, weather, exchange rates, just news. You can change information on the website several times a day. In print media, advertisements must be ordered at least a week in advance, or even more. And on the Internet everything is operative: new products or services, a new discount or a new supplier - tomorrow customers will find out about it. There is no need to wait until the next print ad is released. The information on the site will always be up-to-date, the freshest. This is what is appreciated, this is what attracts millions of users to the Internet.
The most important condition and the leading factor determining the success of educational activities using computer technology is the readiness of students for productive activities in a didactic computer environment.
Most researchers in the field of pedagogical informatics note the existence of a contradiction between the concepts of the modern humanitarian-personal paradigm of education and the existing teaching system with a narrowly subject orientation, which does not ensure the student's readiness for educational activities using computer methods of obtaining and transforming information. It becomes obvious that the concepts of using information technologies in the educational process are evolving from technocratic paradigms in the direction of strengthening the role of sociocultural factors, taking into account the moral and intellectual potential of the individual.
Mastering effective methods and means of searching, processing and using educational information makes it possible not only to intensify educational processes, but also to develop the cognitive interests of students, the desire for productive, creative activity.
The purpose of the course work:
Explore existing systems and mechanisms for finding information on the network.
Coursework objectives:
1. Study the specialized literature relevant to this topic.
Based on the knowledge gained from this literature, find out how the processes of storing and retrieving information in the global network are arranged
Find the similarities and differences of search engines.
1.
Basic concepts of information retrieval
Information retrieval system (ISS) is an ordered set of documents (arrays of documents) and information technologies intended for storing and searching information - texts (documents) or data (facts). Information retrieval systems are any storage of information organized in a certain way. Moreover, information retrieval systems can be non-automated. The main thing is the target function: storage and retrieval of information.
Depending on the storage object and the type of request, two types of information retrieval are distinguished: documentary and factual - and, accordingly, two types of IRS - documentary and factual.
Documentary are called ISS, which implements a search for thematic queries in an array of documents or texts with the subsequent provision of a subset of these documents or their copies to the user. The concept of a document can change from system to system. In the general case, this is a kind of information object, fixed (usually by means of some sign system) on some material medium (paper, photographic and film, etc.) and intended for transmission in space and time in the system of social communications.
Factographic ISS implement storage, search and delivery of actual data (scientific, technical, economic characteristics and properties of objects, processes, phenomena, addresses, names, quantitative data, etc.).
The main difference between documentary and factual search lies in the approach to the semantics of documents. In documentary systems, the meaning of documents as a whole is described from the point of view of their thematic, subject content. In this case, it is important to identify and name (list) the main topics and objects to which the document is devoted. In factual systems, objects are described, their signs and the values \u200b\u200bof these signs are recorded. Hence the differences in description languages \u200b\u200band methods of storing descriptions in the system. Accordingly, each type of search has its own search tools.
Factographic systems imply the accumulation and search in an array of documents with a strictly regulated structure. Such a structure is either the result of preliminary intellectual processing of documents when entering information into the system, or the presence of such documents in a finished form in specific areas of human activity, for example, accounting forms, forms, reference books, schedules, etc. There are factual information systems that provide the accumulation of information and search for only one type of objects and only one type of queries. There are also more developed factual systems that provide storage and retrieval of data that are diverse in content and structure, but this diversity is always finite.
At the same time, there is no insurmountable difference between documentary and factual systems. Often, real IRS are an example of mixed systems in which factual information is used as an additional means of documentary search, and vice versa. In documentary systems, texts (documents) can also be structured, divided into fragments or fields, and the processing and issuance of documentary information can be carried out at the level of individual fields.
There is also a third type of systems, which are called information-logical. These are systems that respond to queries to which there is no explicit answer in the infobase. An extra-linguistic knowledge base and information generated algorithmically from the existing one (documentary or factual) helps to get the answer. This new information is either issued as a response to a request, or additionally used for searches.
An information retrieval system of a documentary type is an ordered set of documents, as well as a set of tools and methods designed for storing, searching and issuing documentary information upon requests. Documentary IRS issues documents corresponding to the query on the topic, on the subject.
most of the working ISS belongs to the class of verbal systems of the saurus-free type, when indexing terms are selected directly from the texts of documents. The avalanche growth in the volume of electronic documentary information, its species, thematic and linguistic diversity are both the cause of the crisis of modern information retrieval and the stimulus for its improvement.
The problem of searching for resources on the Internet was realized soon enough, and in response, various systems and software tools for searching appeared, among which we should mention the systems Gopher, Archie, Veronica, WAIS, WHOIS, etc. Recently, these tools have been replaced by “clients "And" servers "of the world wide web.
If we try to give a classification of the ISS of the Internet, the following main types can be distinguished:
IRS verbal type (search engines)
Classification ISS (directories)
Electronic directories ("yellow" pages, etc.)
Specialized information systems for certain types of resources
Intelligent agents.
The global accounting of all Internet resources is provided by verbal and, in part, classification systems.
2. Network infrastructure (structure and principles of building the Internet)
Worldwide information computer network, which is an unification of many regional computer networks and computers that exchange information with each other via public telecommunications channels (dedicated analog and digital telephone lines, optical communication channels and radio channels, including satellite communication lines).
Information on the Internet is stored on servers. Servers have their own addresses and are controlled by specialized programs. They allow you to transfer mail and files, search databases, and perform other tasks.
The exchange of information between the servers of the network is carried out through high-speed communication channels (dedicated telephone lines, fiber-optic and satellite communication channels). Individual users' access to information resources on the Internet is usually carried out through a provider or corporate network.
Provider - network service provider - a person or organization that provides services for connecting to computer networks. The provider is some organization that has a modem pool for connecting with clients and accessing the worldwide network.
The main cells of the global network are local area networks. If some local network is directly connected to the global network, then every workstation of this network can be connected to it. There are also computers that are directly connected to the global network. They are called host computers. A host is any computer that is a permanent part of the Internet, i.e. connected via the Internet protocol to another host, which in turn is connected to another, and so on.
Figure: 1. The structure of the global Internet
To connect communication lines to computers, special electronic devices are used, which are called network cards, network adapters, modems, etc.
Almost all Internet services are based on the client-server principle. All information on the Internet is stored on servers. The exchange of information between servers is carried out via high-speed communication channels or highways. Servers connected by high-speed backbones make up the basic part of the Internet.
The transmission of information to the Internet is ensured due to the fact that each computer on the network has a unique address (IP-address), and network protocols ensure the interaction of different types of computers running different operating systems.
Basically, the Internet uses the TCP / IP family of network protocols (stack). At the data link and physical layer, the TCP / IP stack supports Ethernet, FDDI and other technologies. The basis of the TCP / IP family of protocols is the network layer, represented by the IP protocol, as well as various routing protocols. This layer ensures the movement of packets on the network and controls their routing. Packet size, transmission parameters, integrity control is carried out at the TCP transport layer.
The application layer brings together all the services that the system provides to the user. The main application protocols include: telnet remote access protocol, FTP file transfer protocol, HTTP hypertext transfer protocol, e-mail protocols: SMTP, POP, IMAP, MIME.
3. History of the appearance of search engines
Comparative review of search engines
With the development of the Internet in the world, the problem of finding information on the network has come to the fore. Several different large firms, such as Altavista, Lycos, AOL, immediately tried to occupy this niche. Naturally, each of them developed their own methods for finding information. This is both a manual method in directories, and a method of automatically searching for sites on the Internet, and indexing them using specially designed "spiders". Their goal was to index the entire Internet, starting with a few large web sites, using the links available on them, newsgroups. But since it took a very long time for such a spider to reach your site, it was decided to manually add links to the spider base by third-party webmasters, following which the spider could quickly index the resource.
The emergence of similar systems in the CIS began almost the same way. These include, for example, Russian Express, Rambler, Aport and Yandex - since they also use spiders to find new sites. One of the differences between CIS search engines is that they index only CIS sites, or check the encoding (language) of the text - like Aport. Here is an excerpt from the Yandex FAQ:
Yandex ranks documents according to the calculated "relevance" parameter. The relevance of a document depends not only on the number of query words found in the document, but also on the frequency characteristics of the search words, the weight of a word or expression, the proximity of the search words in the text of the document to each other, etc.
Titles like "type_Document_Title_here" or "Web Page Title Here" or "Insert Page Title Here" or "Put_Your_Title_Here" or "Title" do not adorn the page or its webmasters. In addition, many search engines, including Yandex, pay special attention to the words contained in the title. You shouldn't take the first 10 most search words from any Rambler's Top100 and write them in headlines, comments, and just in the text in white and white. Firstly, it does not add fame to the creator and naturally irritates users. Secondly, search engines, and Yandex too, are starting to fight this. In addition, spam increases the size of the document and therefore reduces the contrast of the words in it.
By the way, spam should be avoided too. A word repeated more than 30 times on the 1st page will significantly reduce the relevance of the page as a whole. And yet, Russian search engines do not support meta tags, so when creating a web page in Russian, make sure that the title contains relevant phrases, as well as that they are present at the top of the text.
3.1 History of the Google search engine
In 1995, two PhD students at Stanford University — Larry Page and Sergey Brin — were involved in various aspects of data management. It was Page in the distant 1996 who began to actively use the Internet for his research projects in the field of data mining - then the Web represented for Page only a source of randomly selected information for his development. Both students were part of the MIDAS (Mining Data at Stanford) working group. A little later, under the leadership of Rajeev Motwani, an assistant professor in the Department of Computer Science, Page and Moscow-born Brin began developing their own search engine. Already at that time, there were various companies providing search services on the Internet market, but for future doctors of science the project was akin to academic fun - no one thought about quick capitalization and creating a business plan. The idea behind the search engine has been described in several scientific papers and at the same time is quite simple to understand.
The web contains a huge amount of information, and most search engines try to determine the relevance of a particular page by the presence of keywords in the HTML file that the user entered into the search form. Google, on the other hand, indexes links coming from a page, counting each link to a specific site as a "voice" that adds value to the linked site. It is logical to assume that a site that is popular and contains useful information will be referred to more often than a resource that is useless and uninteresting.
However, this does not exhaust the definition of site relevance. The resulting result - the conditional popularity rating of resources - can also be used as a source of information about the sites to which these high-quality resources point. Thus, one link to your page from the Yahoo! or About.com might be more valuable than hundreds of links from unknown homepages — in this case, Yahoo! and About.com are regarded as reputable sources and therefore contain links to high quality sites.
In 1998, Google launched on the Stanford University server and can be found at google.Stanford.edu. At a time when other startups received funding without yet writing a business plan and developing their own product, the founding fathers of Google believed that additional research would not interfere with the work of the search engine, and by the time the company was founded, a search engine based on Page Rank technology had already worked for more than two years. Back in 1996, students noted that their development in many cases provided more accurate results than other search engines, and in 1997 Google became an internal search engine at Stanford University. In the same year, Page and Brin go to the first expenses associated with the further development of Google, they buy hard drives with a total volume of 1 TV, which costs them $ 15,000. All the expenses so far have to be covered with their own credit cards.
In September 1998, it became clear that in order to develop the technology further, as well as to start licensing it to interested parties, it was necessary to create a company. Paige and Brin leave Stanford six months before their doctoral thesis and take with them Craig Silverstein, who is appointed CTO. At some point, enthusiasts meet one of the founders of Sun Microsystems, Andy Bechtolsheim, and he, after asking about the future plans of the enterprise, immediately writes out a check for $ 100,000 to ex - students. dramatic growth of media companies. All search engines, which had previously offered their users the opportunity to find the necessary information on the Web, suddenly decided to start providing Internet services: free mail, stock quotes, and other attributes of the portal. When Paige meets with George Bell, CEO of Excite, he has no interest in unique search technology. "As long as our search engine is in more or less decent condition, we are fine with that," Bell argues, hinting that search itself is no longer an object of interest for portals.
And Google had to go its own way. Rather than aggressively marketing and promoting their project, Page and Brin prefer to hire about 150 employees, 20 of whom are PhDs. The company does not advertise itself by purchasing millions of banners, does not care about branding and market development of the project, and does not intend to make money by displaying banner ads on its own website. Despite such passivity from the point of view of a marketer, the popularity of the search engine continues to grow, and many users, accustomed to accessing several search engines at the same time, choose Google, each for their own subjective reasons. Someone likes the discreet interface and ease of use, someone likes the speed of work and the site is not overloaded with advertising, someone - the quality of search results.
Well-known US usability expert Jacob Nielsen, who sits on the Google Board of Directors, once recalling the search engine, says: "I consider them my best customers. Their whole company is obsessed with usability." Moreover, convinced of the users' favor with search engines with a simple interface, Altavista is releasing a new shell for its search engine, announcing Raging Search (# "607685.files / image002.gif"\u003e
Figure: 1. Search engine optimization affects only the main search results and does not apply to paid links, such as contextual advertising AdWords
Website optimization should be designed for users. They are the target audience of the site, using search engines in order to find it. Excessive enthusiasm for specific tricks to maximize the top may not bring the desired results. Search engine optimization is just a way to be a little ahead when it comes to search engine visibility.
The title of the main page of the site may contain the name of the site or organization, as well as other useful information, such as an address and a short description of the subject or services.
Figure: 3. The user sends a request [greeting cards]
Fig. 4. the page appears in the search results, the title of which will be the first line (note that the words from the search query are in bold)
Figure: 5. If the user decides to go to another page, its name will appear in the header of the browser window
The titles of other pages on the site should also accurately describe their content, and may contain the name of the site or company.
Figure: 6. The user sends a request [Happy New Year greeting cards]
Figure: 7. The relevant page of our site appears in the search results (its name describes its content)
3.3 Search engine - Yandex
Fig. 8. Yandex search engine
The Yandex.ru search engine was officially announced on September 23, 1997 at the Soft tool exhibition. The main distinguishing features of Yandex.ru at that time were verification of the uniqueness of documents. Also, the key properties of the Yandex search engine, namely: taking into account the morphology of the Russian language, search taking into account the distance. A carefully developed algorithm for assessing the relevance (response to a query), taking into account not only the number of query words found in the text, but also the "contrast" of the word (its relative frequency for a given document), the distance between words, and the position of the word in the document. A little later, in the "Fairy Tales" section, the first Runet tale appeared - "Web - Humanism or Chernukha?" And in the "Numbers" section - the first estimate of the Runet volume, 5 thousand servers and 4 GB of texts.
Two months later, in November 1997, a natural language request was implemented. From now on, you can contact Yandex.ru simply "in Russian", ask long queries, for example: "where to buy a computer", "genetically modified products" or "international telephone codes" and receive accurate answers. The average length of a query in Yandex.ru is now 2.7 words. In 1997, it was 1.2 words, then the users of search engines were accustomed to the telegraphic style. In 1998, Yandex.ru introduced the ability to "find a similar document", a list of found servers, a search in a specified date range and sorting of search results by the time of the last change. During this year, the "volume" of the Russian Internet has doubled, which has led to the need to optimize search engines. Both then and now (with a volume of 200 GB) the search speed on Yandex.ru is a fraction of a second. In 1999, the Runet grew by an order of magnitude, both in the volume of texts and in the number of users. It was a year of rapid development for Yandex.ru as well. The new search robot made it possible to optimize and speed up the crawling of Runet sites. Today the Yandex.ru search base is twice as large as that of its closest competitors. The new robot made it possible to provide users with new opportunities - search in different zones of the text (titles, links, annotations, addresses, captions to pictures), restriction of search to a group of sites, search by links and images, as well as highlight documents in Russian. There was a search in catalog categories, and for the first time in the Russian Internet the concept of "citation index" was introduced - the number of resources that refer to a given one.
Regardless of the form in which you used the word in the query, the search takes into account all its forms according to the rules of the Russian language. However, the search is not limited to just words or phrases. Yandex will find a company's web page or a file with the desired image by name.
3.4 Search engine - Rambler
Figure: 9. Rambler search engine
In 1991, a group of like-minded people appeared in the city of Pushchino, inspired by the newly emerging communication medium, the Internet. Dmitry Kryukov, Sergey Lysakov, Victor Voronkov, Vladimir Samoilov, Yuri Ershov. The future creators of Rambler first served radio engineering devices at the Institute of Biochemistry and Physiology of Microorganisms of the Russian Academy of Sciences. Normal, prompt and efficient data exchange was essential for the realization of scientific goals. In 1992, the company launched its own ftp and mail servers. Two years later - its first www-server.
a key year for the development of Russian cyberspace. It was in this year that Sergey Lysakov and Dmitry Kryukov decided to develop the first Russian search engine for the Internet.
At that moment, there were already two or three search engines on the Runet - but they could not stand the test of time and quickly disappeared. And Rambler has developed, evolved.
Figure: 10. Search engine Rambler Top 100
In the spring of 1997, Rambler Top100 appears - a unique rating-classifier that not only evaluates the popularity of Russian resources on the basis of objective data, but also allows one "click" to get to them. Webmasters have begun to work more carefully and thoughtfully on their sites, striving to take higher positions in the Top 100. Rambler Top100 quickly became the web's universal barometer, the general standard for media measurement.
The search engine contains information about more than 12 million documents located on the servers of Russia and the CIS countries. Rambler processes at least 500 thousand search queries every day, scanning 48 thousand web servers and using several simultaneously working robot programs.
A request can consist of one or more words separated by spaces. Both Russian and English words and phrases can be used. By default, only those documents are found in which all the words you entered are found. To find documents containing at least one word from the query, use the logical link Or or select on the detailed query page: "Query words: any". To exclude documents containing certain words, indicate on the detailed request page: "Exclude documents containing the following words ...".
Rambler can search for words in all forms (for example, amino acid, amino acid, amino acid, etc.). In order for the word to be in all forms, it must be preceded by a service symbol "#" ... In the detailed query menu, this mode can be enabled for all words: "Query extension: all word forms". Service symbol "@" before a word allows you to find not only the word itself, but also words of the same root. In the detailed query menu, the symbol "@" corresponds to the "Query extension: all single-root" mode.
By default, our system looks for the words of the query as you entered them to reduce the "noise" in the found documents. If you don't remember how to spell a word, or want to expand your query, you can use the metacharacters "*" and "?" to denote an arbitrary part of a word and an arbitrary character.
You can limit the search to parts of documents, such as the name of the document, its title, URL, etc., using the "Search in ..." detailed query menu.
You can limit the search to documents only in Russian or only in English. To do this, select the appropriate mode in the "Document language ..." detailed query menu. By default, documents are searched in all languages.
By default, documents found are sorted by relevance. However, you can request that the freshest be placed at the top of the list instead. To do this, select the appropriate setting in the "Sort by ..." menu on the detailed request page.
You can also restrict the search to documents created in a certain period of time: for this you need to specify "From date ... to date ..." on the detailed request page. You can require Rambler to return only those documents where the words from the query are at a minimum distance from each other. The "Limit distance between words" mode can be enabled in a detailed query. All of the above rules can be used together with each other in the sequence you need. By default, search results are returned in portions of 15 documents. The "Issue by ..." menu on the detailed request page allows you to increase this number to 30 or 50. The "Output form ..." menu allows you to receive document descriptions with increased or decreased detail.
3.5 Search Engine - Yahoo
Figure: 11. Yahoo search engine
Yahoo! is the most famous search engine. Its sites are categorized and categorized by keywords. It contains useful information on its home page. Can connect to other search engines.
In charge is a search service for Internet resources, news, maps, advertising information, sports information, business, phone numbers, personal WWW pages, and email addresses.
The main directory contains: addresses (URLs) for Internet resources and a short description for these links. Search: All Yahoo pages offer not only a simple search box, but options for that search, as well as Usenet or Email searches. The search can be limited to specifying a certain period of time. Boolean operators (and, or) and sequential search are also supported. If Yahoo! cannot connect quickly enough with AltaVista, then Yahoo! will provide a link page with a set of search tools. After one of these links is selected, the keywords are passed to a search engine of your choice.
A means of making the search easier is the presence of a “tip search” (TS) - search using a “hint”: Yahoo! It is a subordinate directory, which means that the system does not have as many pages as search engines, however, setting the most general keywords will allow you to find the necessary topic on a high-level page (the first page that appears in front of a user when visiting a site) for an organization or company.
Links are displayed in accordance with the order of the specified words by the search sequence, along with their descriptive text and subordinate hierarchy.
3.6 Address Search (URL)
You can search for documents not only throughout the Russian-speaking Internet, but also in its part. The simplest case is to search for a specific server. For example: url \u003d www.intel.ru dog.
This request will find all documents on the server www.intel.ru containing the word "dog". You may be wondering what will happen if you write simply: url \u003d www.intel.ru.
In this case, you will receive a list of all documents located on the server you specified. You can limit your search even more - to one of the server directories. For example: url \u003d www.intel.ru / sobaki / St. Bernard.
For this request, documents containing the word "St. Bernard" will be searched only in the / sobaki directory (and its subdirectories) of the Moscow server of Intel Corporation.
The main characteristics of Russian search engines