Ways of storing information on the Internet. Organization of storage and retrieval of information on the Internet

Ways of storing information on the Internet. Organization of storage and retrieval of information on the Internet

Despite the popularity of the Internet across the planet, many users consider technology to be some kind of invisible force. Although in fact, this is more than a material thing, for the operation of which powerful computers, servers and data centers are responsible, exchanging information in a split second and interconnected by kilometers of cables and optical fibers. What are these storages, how are data centers arranged and what the data centers of the largest companies look like.

What is a data center

Like many other innovations, people owe the invention and spread of the Internet to the military industry. It was for her that the first developments of the Network were made, and it was for the connection of military bases that their laboratories were decided to be combined into a network (first local, and now ubiquitous), which is used not only in resolving military conflicts. Today, development is used to distribute content on any topic and in any direction. Uploading information to the vastness of the Web (be it a photo, video, or "quote of the day" from Jason State), it instantly gets into the data processing center (DC).

A data center is not just a big flash drive, it is huge fortress-like buildings filled with servers, optical cables and wires. The operation and maintenance of a modern storage facility requires the same amount of electricity as for the maintenance of a small town. The use of data centers allows solving several tasks simultaneously:

  • round the clock and uninterrupted work. Electricity is supplied without interruption: data centers of the fourth level Tier4 are connected to two power plants at the same time, for insurance. And even if there is an accident on the line, there are always powerful generators in stock, which are ready to take over the watch at any moment.
  • access protection. There are always third parties who want to take possession of this or that information, therefore, accepting data for storage, data centers ensure its confidentiality.
  • safety and integrity. All information is stored in data centers: from a photo of your favorite pet to secret data.

Data center: ready for any challenge

Data centers are equipped not only with modern servers, but also with reliable fire protection. Gas systems use carbon dioxide powder, which is capable of extinguishing fires, to prevent damage to the rest of the equipment. Particular attention is paid to ensuring an appropriate climate.

Servers and hard drives generate heat during use. A matchbox cooler is enough to cool a PC; this option is not suitable for an industrial scale. Full-fledged air conditioning and ventilation systems are installed here, which protect the server labyrinths from overheating.

The spirit of commerce or what data centers make money on

Large companies such as Facebook, Google have their own storage facilities at their disposal, but for more modest consumers there is a service of renting a place in the data center. This can be a single server (dedicated server) or a place in a rack (collocation) where you can install your own server, or a place in network storage. In the case of a lease with the installation of their own equipment, the owners of data centers earn not only from renting an area, but also from electricity. they sell it to tenants with a slight wrap.

Another option for making money for data center owners is leasing licensed software. Data centers purchase software and install them on their servers, and then (for a fee) rent them out in parts. In recent years, the service of renting a virtual server is gaining popularity: a part of a server resource (VPS - virtual private server).

Where all information is stored on the Internet: data centers that are unique in their grandeur and power

IBM (USA)

The result of the experiment of the well-known corporation was a data center on the territory of Syracuse University. The essence of the task was to reduce the consumed electricity by half. And in 2009 they succeeded. A separate gas station is used for power supply.

Citigroup (Germany)

The center, developed by Arup Associates in 2008, is considered one of the greenest complexes in this category. This means that his work causes minimal harm to nature. Everything from lighting to cooling is geared towards sustainable use. You can guess about caring for nature just by looking at the structure: one of the gables is covered with a lawn that adorns the building and collects the water used in humidifiers.

Ebay (USA)

Ebay's data center is built on the sands of the Arizona desert (not an easy task for engineers working on a cooling system). The equipment in this center is placed in special containers, which not only managed to protect them from overheating, but also to increase their own energy efficiency up to 95%.

Digital Beijing (China)

The Beijing data center stands out for its power and bold architectural solutions. Especially for the 2008 Olympics, the architectural company Studio Pei-Zhu built an 11-story building, which became both the data center and the technical support headquarters for the Olympic Games. Now that the sporting events are over, the building houses a museum.

Apple (USA)

The apple company cares not only about the continuity and safety of its customers' data, but also about the environmental situation on the planet. Therefore, one of the main goals was the use of energy from renewable sources. Data center operations depend on 400,000 square meters of solar panels. The energy is sufficient to provide 60% of the center's operation, the rest of the power is supplied from the power plant (biofuel).

Google (Finland)

The giant of the web industry certainly has more than one data center. The complexes are scattered all over the planet and almost all of them meet the “green” criterion. One of the best Finnish specialists, Alvar Aalto, was invited to work on the Finnish data center. The cold waters of the Gulf of Finland are ideal for creating a suitable indoor climate.

Verne Global (Iceland)

The BMW concern uses this data center in Reykjavik for its own needs: calculating the performance of new models, processing test results and more. Due to the operation of hydroelectric power plants installed near geysers, the data center does not pollute the environment with carbon dioxide.

Facebook (USA)

In Primeville, Mark Zuckerberg's company has erected a data center with an area of \u200b\u200b28 thousand square meters. Imagine a flash drive the size of three football fields. To combine the servers, 6.5 thousand kilometers of fiber are used, and a 7-room penthouse with a modern natural air conditioning system was built for cooling.

Send your good work in the knowledge base is simple. Use the form below

Students, graduate students, young scientists using the knowledge base in their studies and work will be very grateful to you.

Similar documents

    Methods and tools for storing data on the World Wide Web. The concept and varieties of hypertext documents and graphic files. The principles of search engines and the rules for finding the information you need. Characteristics of some search engines on the Web.

    term paper, added 04/18/2010

    Characteristics of search engines Yandex, Google, Rambler: similarities and differences, advantages and disadvantages. Search for a definition of a number of terms, software products. Search for information on directions: writers and poets, their works, doctors of science for Samara.

    test, added 08/22/2011

    Description and classification of modern information retrieval systems. Hypertext documents. Review and ratings of the world's major search engines. Development of an information retrieval system that demonstrates the mechanism of information retrieval on the Internet.

    thesis, added 06/16/2015

    The structure and principles of building the Internet, searching and storing information in it. The history of the emergence and classification of information retrieval systems. The principle of operation and characteristics of the search engines Google, Yandex, Rambler, Yahoo. Search by URLs.

    term paper, added 03/29/2013

    Worldwide system of interconnected computer networks, built on the use of IP and data packet routing. Domain and its levels. Basic Internet services. What you need to connect to the Internet. Rules for finding information on the Internet.

    term paper added 01/10/2012

    The structure of the Internet and its use in education. Description of the functioning of e-mail, teleconference, search engines, general educational catalogs and portals, electronic libraries and other links. Pros and cons of using the network.

    abstract, added 11/16/2011

    Consideration of Internet search engines as a software and hardware complex with a web interface that provides the ability to search for information. Search engine types: Archie, Wandex, Aliweb, WebCrawler, AltaVista, Yahoo !, Google, Yandex, Bing and Rambler.

    MINISTRY OF EDUCATION AND SCIENCE OF THE RUSSIAN FEDERATION

    State educational institution of higher professional education

    “TAGANROG STATE PEDAGOGICAL INSTITUTE named after A.P. Chekhov "

    Faculty of Informatics

    Department of Informatics and Management

    Course work

    Organization of storage and retrieval of information on the Internet

    4th year students

    Sheverda M.A.

    Informatics with add. specialist. Foreign language

    supervisor

    K.T.N. Assoc. Tyushnyakova I.A.

    Taganrog

    Introduction

    Basic concepts of information retrieval

    Network infrastructure

    Search engine history

    1 History of the creation of the search engine Google

    2 How the Google search engine works

    3 Yandex search engine

    4 Rambler search engine

    5 Yahoo search engine

    6 Searching URLs

    Searching for information on the Internet

    Saving information on the Internet

    Conclusion

    List of references

    Introduction

    The Internet is a global computer network that hosts various services or services (E-mail, Word Wide Web, FTP, Usenet, Telnet, etc.). Computer networks are designed for data transmission, telephone and radio networks for voice transmission, television networks for image transmission.

    Depending on the distance between PCs, local, territorial and corporate computer networks are distinguished. Convergence of telecommunication networks (computer, radio, telephone and television networks) provides the possibility of high-quality transmission of data, voice and images over single (multiservice) new generation networks (Internet networks).

    The Internet has long become not only a means of communication, but also a field for serious commercial activity. Almost every foreign company has its own representative office on the Internet, a virtual office. The total turnover of companies trading on the Internet reaches billions of dollars. In Russia, an increasing number of companies are also using the Internet to promote their products and services. This is easy to verify by looking at advertising publications. More and more email and Web site addresses are found alongside familiar telephone and fax numbers. Soon, the lack of an Internet address will be as difficult as the lack of a fax.

    Therefore, more and more people are turning to the Internet to get the latest information: about services and prices, weather, exchange rates, just news. You can change information on the website several times a day. In print media, advertisements must be ordered at least a week in advance, or even more. And on the Internet everything is operative: new products or services, a new discount or a new supplier - tomorrow customers will find out about it. There is no need to wait until the next print ad is released. The information on the site will always be up-to-date, the freshest. This is what is appreciated, this is what attracts millions of users to the Internet.

    The most important condition and the leading factor determining the success of educational activities using computer technology is the readiness of students for productive activities in a didactic computer environment.

    Most researchers in the field of pedagogical informatics note the existence of a contradiction between the concepts of the modern humanitarian-personal paradigm of education and the existing teaching system with a narrowly subject orientation, which does not ensure the student's readiness for educational activities using computer methods of obtaining and transforming information. It becomes obvious that the concepts of using information technologies in the educational process are evolving from technocratic paradigms in the direction of strengthening the role of sociocultural factors, taking into account the moral and intellectual potential of the individual.

    Mastering effective methods and means of searching, processing and using educational information makes it possible not only to intensify educational processes, but also to develop the cognitive interests of students, the desire for productive, creative activity.

    The purpose of the course work:

    Explore existing systems and mechanisms for finding information on the network.

    Coursework objectives:

    1. Study the specialized literature relevant to this topic.

    Based on the knowledge gained from this literature, find out how the processes of storing and retrieving information in the global network are arranged

    Find the similarities and differences of search engines.

    1.
    Basic concepts of information retrieval

    Information retrieval system (ISS) is an ordered set of documents (arrays of documents) and information technologies intended for storing and searching information - texts (documents) or data (facts). Information retrieval systems are any storage of information organized in a certain way. Moreover, information retrieval systems can be non-automated. The main thing is the target function: storage and retrieval of information.

    Depending on the storage object and the type of request, two types of information retrieval are distinguished: documentary and factual - and, accordingly, two types of IRS - documentary and factual.

    Documentary are called ISS, which implements a search for thematic queries in an array of documents or texts with the subsequent provision of a subset of these documents or their copies to the user. The concept of a document can change from system to system. In the general case, this is a kind of information object, fixed (usually by means of some sign system) on some material medium (paper, photographic and film, etc.) and intended for transmission in space and time in the system of social communications.

    Factographic ISS implement storage, search and delivery of actual data (scientific, technical, economic characteristics and properties of objects, processes, phenomena, addresses, names, quantitative data, etc.).

    The main difference between documentary and factual search lies in the approach to the semantics of documents. In documentary systems, the meaning of documents as a whole is described from the point of view of their thematic, subject content. In this case, it is important to identify and name (list) the main topics and objects to which the document is devoted. In factual systems, objects are described, their signs and the values \u200b\u200bof these signs are recorded. Hence the differences in description languages \u200b\u200band methods of storing descriptions in the system. Accordingly, each type of search has its own search tools.

    Factographic systems imply the accumulation and search in an array of documents with a strictly regulated structure. Such a structure is either the result of preliminary intellectual processing of documents when entering information into the system, or the presence of such documents in a finished form in specific areas of human activity, for example, accounting forms, forms, reference books, schedules, etc. There are factual information systems that provide the accumulation of information and search for only one type of objects and only one type of queries. There are also more developed factual systems that provide storage and retrieval of data that are diverse in content and structure, but this diversity is always finite.

    At the same time, there is no insurmountable difference between documentary and factual systems. Often, real IRS are an example of mixed systems in which factual information is used as an additional means of documentary search, and vice versa. In documentary systems, texts (documents) can also be structured, divided into fragments or fields, and the processing and issuance of documentary information can be carried out at the level of individual fields.

    There is also a third type of systems, which are called information-logical. These are systems that respond to queries to which there is no explicit answer in the infobase. An extra-linguistic knowledge base and information generated algorithmically from the existing one (documentary or factual) helps to get the answer. This new information is either issued as a response to a request, or additionally used for searches.

    An information retrieval system of a documentary type is an ordered set of documents, as well as a set of tools and methods designed for storing, searching and issuing documentary information upon requests. Documentary IRS issues documents corresponding to the query on the topic, on the subject.

    most of the working ISS belongs to the class of verbal systems of the saurus-free type, when indexing terms are selected directly from the texts of documents. The avalanche growth in the volume of electronic documentary information, its species, thematic and linguistic diversity are both the cause of the crisis of modern information retrieval and the stimulus for its improvement.

    The problem of searching for resources on the Internet was realized soon enough, and in response, various systems and software tools for searching appeared, among which we should mention the systems Gopher, Archie, Veronica, WAIS, WHOIS, etc. Recently, these tools have been replaced by “clients "And" servers "of the world wide web.

    If we try to give a classification of the ISS of the Internet, the following main types can be distinguished:

    IRS verbal type (search engines)

    Classification ISS (directories)

    Electronic directories ("yellow" pages, etc.)

    Specialized information systems for certain types of resources

    Intelligent agents.

    The global accounting of all Internet resources is provided by verbal and, in part, classification systems.

    2. Network infrastructure (structure and principles of building the Internet)

    Worldwide information computer network, which is an unification of many regional computer networks and computers that exchange information with each other via public telecommunications channels (dedicated analog and digital telephone lines, optical communication channels and radio channels, including satellite communication lines).

    Information on the Internet is stored on servers. Servers have their own addresses and are controlled by specialized programs. They allow you to transfer mail and files, search databases, and perform other tasks.

    The exchange of information between the servers of the network is carried out through high-speed communication channels (dedicated telephone lines, fiber-optic and satellite communication channels). Individual users' access to information resources on the Internet is usually carried out through a provider or corporate network.

    Provider - network service provider - a person or organization that provides services for connecting to computer networks. The provider is some organization that has a modem pool for connecting with clients and accessing the worldwide network.

    The main cells of the global network are local area networks. If some local network is directly connected to the global network, then every workstation of this network can be connected to it. There are also computers that are directly connected to the global network. They are called host computers. A host is any computer that is a permanent part of the Internet, i.e. connected via the Internet protocol to another host, which in turn is connected to another, and so on.

    Figure: 1. The structure of the global Internet

    To connect communication lines to computers, special electronic devices are used, which are called network cards, network adapters, modems, etc.

    Almost all Internet services are based on the client-server principle. All information on the Internet is stored on servers. The exchange of information between servers is carried out via high-speed communication channels or highways. Servers connected by high-speed backbones make up the basic part of the Internet.

    The transmission of information to the Internet is ensured due to the fact that each computer on the network has a unique address (IP-address), and network protocols ensure the interaction of different types of computers running different operating systems.

    Basically, the Internet uses the TCP / IP family of network protocols (stack). At the data link and physical layer, the TCP / IP stack supports Ethernet, FDDI and other technologies. The basis of the TCP / IP family of protocols is the network layer, represented by the IP protocol, as well as various routing protocols. This layer ensures the movement of packets on the network and controls their routing. Packet size, transmission parameters, integrity control is carried out at the TCP transport layer.

    The application layer brings together all the services that the system provides to the user. The main application protocols include: telnet remote access protocol, FTP file transfer protocol, HTTP hypertext transfer protocol, e-mail protocols: SMTP, POP, IMAP, MIME.

    3. History of the appearance of search engines

    Comparative review of search engines

    With the development of the Internet in the world, the problem of finding information on the network has come to the fore. Several different large firms, such as Altavista, Lycos, AOL, immediately tried to occupy this niche. Naturally, each of them developed their own methods for finding information. This is both a manual method in directories, and a method of automatically searching for sites on the Internet, and indexing them using specially designed "spiders". Their goal was to index the entire Internet, starting with a few large web sites, using the links available on them, newsgroups. But since it took a very long time for such a spider to reach your site, it was decided to manually add links to the spider base by third-party webmasters, following which the spider could quickly index the resource.

    The emergence of similar systems in the CIS began almost the same way. These include, for example, Russian Express, Rambler, Aport and Yandex - since they also use spiders to find new sites. One of the differences between CIS search engines is that they index only CIS sites, or check the encoding (language) of the text - like Aport. Here is an excerpt from the Yandex FAQ:

    Yandex ranks documents according to the calculated "relevance" parameter. The relevance of a document depends not only on the number of query words found in the document, but also on the frequency characteristics of the search words, the weight of a word or expression, the proximity of the search words in the text of the document to each other, etc.

    Titles like "type_Document_Title_here" or "Web Page Title Here" or "Insert Page Title Here" or "Put_Your_Title_Here" or "Title" do not adorn the page or its webmasters. In addition, many search engines, including Yandex, pay special attention to the words contained in the title. You shouldn't take the first 10 most search words from any Rambler's Top100 and write them in headlines, comments, and just in the text in white and white. Firstly, it does not add fame to the creator and naturally irritates users. Secondly, search engines, and Yandex too, are starting to fight this. In addition, spam increases the size of the document and therefore reduces the contrast of the words in it.

    By the way, spam should be avoided too. A word repeated more than 30 times on the 1st page will significantly reduce the relevance of the page as a whole. And yet, Russian search engines do not support meta tags, so when creating a web page in Russian, make sure that the title contains relevant phrases, as well as that they are present at the top of the text.

    3.1 History of the Google search engine

    In 1995, two PhD students at Stanford University — Larry Page and Sergey Brin — were involved in various aspects of data management. It was Page in the distant 1996 who began to actively use the Internet for his research projects in the field of data mining - then the Web represented for Page only a source of randomly selected information for his development. Both students were part of the MIDAS (Mining Data at Stanford) working group. A little later, under the leadership of Rajeev Motwani, an assistant professor in the Department of Computer Science, Page and Moscow-born Brin began developing their own search engine. Already at that time, there were various companies providing search services on the Internet market, but for future doctors of science the project was akin to academic fun - no one thought about quick capitalization and creating a business plan. The idea behind the search engine has been described in several scientific papers and at the same time is quite simple to understand.

    The web contains a huge amount of information, and most search engines try to determine the relevance of a particular page by the presence of keywords in the HTML file that the user entered into the search form. Google, on the other hand, indexes links coming from a page, counting each link to a specific site as a "voice" that adds value to the linked site. It is logical to assume that a site that is popular and contains useful information will be referred to more often than a resource that is useless and uninteresting.

    However, this does not exhaust the definition of site relevance. The resulting result - the conditional popularity rating of resources - can also be used as a source of information about the sites to which these high-quality resources point. Thus, one link to your page from the Yahoo! or About.com might be more valuable than hundreds of links from unknown homepages — in this case, Yahoo! and About.com are regarded as reputable sources and therefore contain links to high quality sites.

    In 1998, Google launched on the Stanford University server and can be found at google.Stanford.edu. At a time when other startups received funding without yet writing a business plan and developing their own product, the founding fathers of Google believed that additional research would not interfere with the work of the search engine, and by the time the company was founded, a search engine based on Page Rank technology had already worked for more than two years. Back in 1996, students noted that their development in many cases provided more accurate results than other search engines, and in 1997 Google became an internal search engine at Stanford University. In the same year, Page and Brin go to the first expenses associated with the further development of Google, they buy hard drives with a total volume of 1 TV, which costs them $ 15,000. All the expenses so far have to be covered with their own credit cards.

    In September 1998, it became clear that in order to develop the technology further, as well as to start licensing it to interested parties, it was necessary to create a company. Paige and Brin leave Stanford six months before their doctoral thesis and take with them Craig Silverstein, who is appointed CTO. At some point, enthusiasts meet one of the founders of Sun Microsystems, Andy Bechtolsheim, and he, after asking about the future plans of the enterprise, immediately writes out a check for $ 100,000 to ex - students. dramatic growth of media companies. All search engines, which had previously offered their users the opportunity to find the necessary information on the Web, suddenly decided to start providing Internet services: free mail, stock quotes, and other attributes of the portal. When Paige meets with George Bell, CEO of Excite, he has no interest in unique search technology. "As long as our search engine is in more or less decent condition, we are fine with that," Bell argues, hinting that search itself is no longer an object of interest for portals.

    And Google had to go its own way. Rather than aggressively marketing and promoting their project, Page and Brin prefer to hire about 150 employees, 20 of whom are PhDs. The company does not advertise itself by purchasing millions of banners, does not care about branding and market development of the project, and does not intend to make money by displaying banner ads on its own website. Despite such passivity from the point of view of a marketer, the popularity of the search engine continues to grow, and many users, accustomed to accessing several search engines at the same time, choose Google, each for their own subjective reasons. Someone likes the discreet interface and ease of use, someone likes the speed of work and the site is not overloaded with advertising, someone - the quality of search results.

    Well-known US usability expert Jacob Nielsen, who sits on the Google Board of Directors, once recalling the search engine, says: "I consider them my best customers. Their whole company is obsessed with usability." Moreover, convinced of the users' favor with search engines with a simple interface, Altavista is releasing a new shell for its search engine, announcing Raging Search (# "607685.files / image002.gif"\u003e

    Figure: 1. Search engine optimization affects only the main search results and does not apply to paid links, such as contextual advertising AdWords

    Website optimization should be designed for users. They are the target audience of the site, using search engines in order to find it. Excessive enthusiasm for specific tricks to maximize the top may not bring the desired results. Search engine optimization is just a way to be a little ahead when it comes to search engine visibility.

    The title of the main page of the site may contain the name of the site or organization, as well as other useful information, such as an address and a short description of the subject or services.

    Figure: 3. The user sends a request [greeting cards]

    Fig. 4. the page appears in the search results, the title of which will be the first line (note that the words from the search query are in bold)

    Figure: 5. If the user decides to go to another page, its name will appear in the header of the browser window

    The titles of other pages on the site should also accurately describe their content, and may contain the name of the site or company.

    Figure: 6. The user sends a request [Happy New Year greeting cards]

    Figure: 7. The relevant page of our site appears in the search results (its name describes its content)

    3.3 Search engine - Yandex

    Fig. 8. Yandex search engine

    The Yandex.ru search engine was officially announced on September 23, 1997 at the Soft tool exhibition. The main distinguishing features of Yandex.ru at that time were verification of the uniqueness of documents. Also, the key properties of the Yandex search engine, namely: taking into account the morphology of the Russian language, search taking into account the distance. A carefully developed algorithm for assessing the relevance (response to a query), taking into account not only the number of query words found in the text, but also the "contrast" of the word (its relative frequency for a given document), the distance between words, and the position of the word in the document. A little later, in the "Fairy Tales" section, the first Runet tale appeared - "Web - Humanism or Chernukha?" And in the "Numbers" section - the first estimate of the Runet volume, 5 thousand servers and 4 GB of texts.

    Two months later, in November 1997, a natural language request was implemented. From now on, you can contact Yandex.ru simply "in Russian", ask long queries, for example: "where to buy a computer", "genetically modified products" or "international telephone codes" and receive accurate answers. The average length of a query in Yandex.ru is now 2.7 words. In 1997, it was 1.2 words, then the users of search engines were accustomed to the telegraphic style. In 1998, Yandex.ru introduced the ability to "find a similar document", a list of found servers, a search in a specified date range and sorting of search results by the time of the last change. During this year, the "volume" of the Russian Internet has doubled, which has led to the need to optimize search engines. Both then and now (with a volume of 200 GB) the search speed on Yandex.ru is a fraction of a second. In 1999, the Runet grew by an order of magnitude, both in the volume of texts and in the number of users. It was a year of rapid development for Yandex.ru as well. The new search robot made it possible to optimize and speed up the crawling of Runet sites. Today the Yandex.ru search base is twice as large as that of its closest competitors. The new robot made it possible to provide users with new opportunities - search in different zones of the text (titles, links, annotations, addresses, captions to pictures), restriction of search to a group of sites, search by links and images, as well as highlight documents in Russian. There was a search in catalog categories, and for the first time in the Russian Internet the concept of "citation index" was introduced - the number of resources that refer to a given one.

    Regardless of the form in which you used the word in the query, the search takes into account all its forms according to the rules of the Russian language. However, the search is not limited to just words or phrases. Yandex will find a company's web page or a file with the desired image by name.

    3.4 Search engine - Rambler

    Figure: 9. Rambler search engine

    In 1991, a group of like-minded people appeared in the city of Pushchino, inspired by the newly emerging communication medium, the Internet. Dmitry Kryukov, Sergey Lysakov, Victor Voronkov, Vladimir Samoilov, Yuri Ershov. The future creators of Rambler first served radio engineering devices at the Institute of Biochemistry and Physiology of Microorganisms of the Russian Academy of Sciences. Normal, prompt and efficient data exchange was essential for the realization of scientific goals. In 1992, the company launched its own ftp and mail servers. Two years later - its first www-server.

    a key year for the development of Russian cyberspace. It was in this year that Sergey Lysakov and Dmitry Kryukov decided to develop the first Russian search engine for the Internet.

    At that moment, there were already two or three search engines on the Runet - but they could not stand the test of time and quickly disappeared. And Rambler has developed, evolved.

    Figure: 10. Search engine Rambler Top 100

    In the spring of 1997, Rambler Top100 appears - a unique rating-classifier that not only evaluates the popularity of Russian resources on the basis of objective data, but also allows one "click" to get to them. Webmasters have begun to work more carefully and thoughtfully on their sites, striving to take higher positions in the Top 100. Rambler Top100 quickly became the web's universal barometer, the general standard for media measurement.

    The search engine contains information about more than 12 million documents located on the servers of Russia and the CIS countries. Rambler processes at least 500 thousand search queries every day, scanning 48 thousand web servers and using several simultaneously working robot programs.

    A request can consist of one or more words separated by spaces. Both Russian and English words and phrases can be used. By default, only those documents are found in which all the words you entered are found. To find documents containing at least one word from the query, use the logical link Or or select on the detailed query page: "Query words: any". To exclude documents containing certain words, indicate on the detailed request page: "Exclude documents containing the following words ...".

    Rambler can search for words in all forms (for example, amino acid, amino acid, amino acid, etc.). In order for the word to be in all forms, it must be preceded by a service symbol "#" ... In the detailed query menu, this mode can be enabled for all words: "Query extension: all word forms". Service symbol "@" before a word allows you to find not only the word itself, but also words of the same root. In the detailed query menu, the symbol "@" corresponds to the "Query extension: all single-root" mode.

    By default, our system looks for the words of the query as you entered them to reduce the "noise" in the found documents. If you don't remember how to spell a word, or want to expand your query, you can use the metacharacters "*" and "?" to denote an arbitrary part of a word and an arbitrary character.

    You can limit the search to parts of documents, such as the name of the document, its title, URL, etc., using the "Search in ..." detailed query menu.

    You can limit the search to documents only in Russian or only in English. To do this, select the appropriate mode in the "Document language ..." detailed query menu. By default, documents are searched in all languages.

    By default, documents found are sorted by relevance. However, you can request that the freshest be placed at the top of the list instead. To do this, select the appropriate setting in the "Sort by ..." menu on the detailed request page.

    You can also restrict the search to documents created in a certain period of time: for this you need to specify "From date ... to date ..." on the detailed request page. You can require Rambler to return only those documents where the words from the query are at a minimum distance from each other. The "Limit distance between words" mode can be enabled in a detailed query. All of the above rules can be used together with each other in the sequence you need. By default, search results are returned in portions of 15 documents. The "Issue by ..." menu on the detailed request page allows you to increase this number to 30 or 50. The "Output form ..." menu allows you to receive document descriptions with increased or decreased detail.

    3.5 Search Engine - Yahoo

    Figure: 11. Yahoo search engine

    Yahoo! is the most famous search engine. Its sites are categorized and categorized by keywords. It contains useful information on its home page. Can connect to other search engines.

    In charge is a search service for Internet resources, news, maps, advertising information, sports information, business, phone numbers, personal WWW pages, and email addresses.

    The main directory contains: addresses (URLs) for Internet resources and a short description for these links. Search: All Yahoo pages offer not only a simple search box, but options for that search, as well as Usenet or Email searches. The search can be limited to specifying a certain period of time. Boolean operators (and, or) and sequential search are also supported. If Yahoo! cannot connect quickly enough with AltaVista, then Yahoo! will provide a link page with a set of search tools. After one of these links is selected, the keywords are passed to a search engine of your choice.

    A means of making the search easier is the presence of a “tip search” (TS) - search using a “hint”: Yahoo! It is a subordinate directory, which means that the system does not have as many pages as search engines, however, setting the most general keywords will allow you to find the necessary topic on a high-level page (the first page that appears in front of a user when visiting a site) for an organization or company.

    Links are displayed in accordance with the order of the specified words by the search sequence, along with their descriptive text and subordinate hierarchy.

    3.6 Address Search (URL)

    You can search for documents not only throughout the Russian-speaking Internet, but also in its part. The simplest case is to search for a specific server. For example: url \u003d www.intel.ru dog.

    This request will find all documents on the server www.intel.ru containing the word "dog". You may be wondering what will happen if you write simply: url \u003d www.intel.ru.

    In this case, you will receive a list of all documents located on the server you specified. You can limit your search even more - to one of the server directories. For example: url \u003d www.intel.ru / sobaki / St. Bernard.

    For this request, documents containing the word "St. Bernard" will be searched only in the / sobaki directory (and its subdirectories) of the Moscow server of Intel Corporation.

    The main characteristics of Russian search engines


    www.rambler.ru<#"607685.files/image012.gif">

    Fig. 13. Google search engine

    First, you need to decide exactly what you want to find. For example, by the word felt boots there are 131,000 thousand pages. Upon request, buy wholesale boots in Suzdal, only 259 pages. If you're looking for a phrase or quote, write it in quotation marks. You can choose not to type the whole query, but select it from the prompts that appear.

    Figure: 14.

    To see the answer directly in the search results, immediately compose the question as an answer. For example, population of St. Petersburg:

    Or Catherine the Great was born:


    You can search not only texts, but also pictures:

    You can also search for videos, maps, news, ... In the advanced search menu, you can search for information only in a specific language:

    In a certain format, for example, only presentations or on a specific site.

    5. Saving information on the Internet

    The Internet is like a huge library. It contains many Internet sites that are made up of pages.

    With the help of a computer and programs installed on it, it is possible to connect to the Internet in order to view the information stored in it: texts, pictures, photographs, music, films, and also save them to your disk.

    Internet pages are not stored on your computer. He is just a "window" through which you browse sites.

    If errors occur when entering information, it's not a big deal. It is impossible to mess up or change anything on the internet from your computer. If you close the desired page, you can always open it again in its previous form by clicking on the "Back" button or retyping its address.

    You can navigate from one page to others using links - usually links are underlined and highlighted in color.

    When the mouse pointer changes from an arrow to a hand icon, it means that you have hovered it over a link. Sometimes the link is a picture. It is enough to click once on the link with the left mouse button, and a new page will open.

    Some sites can also send emails and instant messages, post photos, and keep diaries.

    The Internet is the easiest way to connect with friends and colleagues anywhere in the world.

    The Internet contains many sites on a wide variety of topics.

    Conclusion

    With the development of INTERNET, it became possible to quickly and conveniently search for the necessary documentary information. Now you can not engage in the selection and study of a huge amount of literature in bookstores and libraries.

    Information can be obtained without leaving your home or office. To do this, you only need a computer directly connected to the INTERNET with a special program installed - a browser designed to view the content of Web pages.

    Thanks to the variety of search engines specially designed for the average user, everyone can easily cut off the obviously unnecessary flow of information, only by correctly formulating the purpose of the search.

    Completing the course work, one can come to the conclusion that a very large amount of educational information on various topics is stored on the Internet in the form of articles in electronic newspapers, reports, reference books, graphic images, audio and video files, and much more.

    There are different methods of searching for educational information on the Internet: searching using hypertext links, using search engines, searching using special tools, analyzing new resources.

    The search engines I have reviewed are far from perfect. It is believed that the ideal search engine should meet the following requirements:

    Ease of use

    A well-organized and updated index.

    Fast database search and fast response.

    The reliability and accuracy of search results.

    The scale of information resources and their number are constantly expanding. It becomes clear that the database is not perfect. Intelligent agents are a new direction at the heart of a new generation of search engines that can filter information and get more accurate results. The Internet continues to evolve with unrelenting intensity, essentially erasing the restrictions on the distribution and receipt of information in the world. However, in this information ocean it is not very easy to find the necessary document; it should also be borne in mind that new servers appear on the network along with long-standing servers.

    Information systems, in which the storage and processing of information are carried out using computer technology, are called automated, various activities and the most rapidly developing branches of the information technology industry.

    List of used literature

    1. Workshop on Informatics: Textbook. allowance / Ed. A.P. Kurnosova - Voronezh: VGAU, 2004.239 p.

    Krupnik A.B. Searching the Internet: a tutorial. - 2nd ed. - SPB .: Peter, 2004 .-- 572 p.

    Orlov A.A. Necessary programs for the Internet - SPb .: Peter, 2006 .-- 127 p.

    Solonitsyn Yu.A., Kholmogorov V. Internet. Encyclopedia. - 3rd of. - SPb .: Peter, 2003 .-- 592 p.

    Computer networks and information security tools: Textbook. allowance / Kamalyan A.K., Kulev S.A., Nazarenko K.N. and others - Voronezh: VGAU, 2003 .-- 119 p.

    Popov V. Workshop on Internet technologies: Training course / V. Popov.-SPb .; M .; Kharkov; Minsk: Peter, 2002 .-- 476 p.: Ill.

    Computer networks and information security tools: Textbook / Kamalyan A.K., Kulev S.A., Nazarenko K.N. and others-Voronezh: VGAU, 2003.-119 p.

    Foundations of modern computer technology. Ed. Khomonenko A.D. - Crown print, St. Petersburg 1998.

    Personal computers in TCP / IP networks. Craig Hunt; transl. From English. - BHV-Kiev, 1997.

    Pavel Khramtsov "Search and Navigation in the Internet".: //Www.osp.ru/cw/1996/20/31.htm

    Internet training for professions. Search engine Expert.://searchengine.narod.ru/archiv/se_2_250500.htm

    Andrey Alikberov "A few words about how search engine robots work".: //Www.citforum.ru/internet/search/art_1.shtml

    It is information that drives all modern business and is currently considered the most valuable strategic asset of any enterprise. The volume of information is growing exponentially along with the growth of global networks and the development of e-commerce. Success in information warfare requires an effective strategy for storing, protecting, sharing and managing your most important digital asset - data - both today and in the near future.

    managing storage resources has become one of the most pressing strategic challenges facing IT staff. Due to the development of the Internet and fundamental changes in business processes, information is accumulating at an unprecedented rate. According to Strategic Research, at least 200 petabytes of information are currently stored on open systems alone, and this volume doubles every 18 months. Many companies have entered a kind of competition to transform internal systems of doing business in order to use the Internet to grow it. They are globalizing their IT systems to better support e-commerce applications running 24 hours a day, 7 days a week, 365 days a year.

    Networked storage of data allows you to solve many of the current problems in business related to the storage of information, namely:

    • universal and shared access to resources;
    • maintaining the unpredictable, explosive growth of the IT system;
    • ensuring continuous availability while maintaining cost efficiency;
    • ensuring scalability and the highest speed of data storage;
    • creating the necessary conditions for new applications, such as backup applications, without the participation of the server and LAN;
    • simplification of resource management associated with their centralization;
    • increasing the level of information protection and fault tolerance.

    Until now, networked storage products have been divided into Network Attached Storage (NAS) and Storage Area Network (SAN) devices. NAS products are rooted in Ethernet and are designed around the file server concept. SAN products continue to SCSI storage technology and include several designs designed to provide I / O functionality; these include system I / O controllers and storage devices and subsystems. The most famous SAN products are those that have replaced the parallel SCSI bus with switches and hubs.

    SAN products entered the market several years after NAS products. When both technologies appeared on the market, experts raised the question about their future. This situation resulted in a number of interesting solutions, including attempts to split them into two different architectures. Although SAN and NAS are different in structure, they are much the same and have the potential for different kinds of integration.

    Data storage technologies

    networked storage is built on three fundamental components: switching, storage, and files. All storage products can be represented as a combination of the functions of these components. This can be confusing at first: Because storage products have been developed in very different directions, features often overlap.

    Quite a few experts have spent many hours at work trying to figure out how to write the best application to attract customers to NAS and how to make storage technology clearer based on their successful application. Of course, there are many ways to do this, but in this article, we assume that storage is an application itself. There are many client / server applications and various kinds of distributed applications running on a network, but storage is a unique and specialized type of application that can function in multiple network environments.

    Since storage processes are tightly integrated with networks, it is appropriate to recall that NAS are system applications. The services provided by networked storage applications can be consumed by sophisticated enterprise programs and custom applications. As with many technologies, some types of systems are better suited to the demands of complex high-level applications.

    Commutation

    The term "switching" applies to all software, hardware and services that transport and manage storage on a NAS. This includes elements as diverse as cabling, network I / O controllers, switches, hubs, address picker, data communications control, transport protocols, security, and resource reserves. SCSI and ATA data bus technologies are still widely used in NAS and are likely to be in use for a long time to come. In fact, SCSI and ATA products are much more common in NAS technology today.

    There are two important differences between SANs and regular LANs. SANs automatically synchronize data between individual systems and storage. Networked storage requires highly accurate components to provide a reliable and predictable environment. Despite distance limitations, parallel SCSI is an extremely reliable and predictable technology. If new switching technologies such as Fiber Channel, Ethernet and InfiniBand replace SCSI, they will have to demonstrate similar or better levels of reliability and predictability. There is also a point of view that considers switching as a storage channel. The term “channel” itself, which originates in the environment of large computers, implies high reliability and performance.

    Storage

    Storage mainly concerns address space block operations, including the creation of a virtual environment where the addresses of a logical storage unit are mapped from one address space to another. Generally speaking, the storage function has remained almost unchanged in NAS, apart from two notable differences.

    The first is the ability to find device virtualization technologies, such as device management within network storage hardware. This kind of function is sometimes referred to as a storage domain controller or LUN virtualization.

    The second major storage difference is scalability. Storage products such as storage subsystems have significantly more controllers / interfaces than previous generations of bus technology, as well as much more storage capacity.

    Files

    The file organizing feature exposes an abstract object to the end user and applications, and organizes the markup of data on real or virtual storage devices. Most of the functionality of files in network storage is provided by file systems and databases; they are complemented by storage management applications such as backups, which are also file applications.

    Networked storage has barely changed file functionality to date, with the exception of the development of NAS file systems, in particular Network Appliance's WAFL file system.

    In addition to the above-mentioned NAS and SAN data storage technologies, focused on large and global networks, in small local networks the dominant position is occupied by DAS technology (Direct Attached Storage - Fig. 1), in accordance with which the storage is located inside the server, which provides the storage volume and the necessary computing power. power.

    The simplest example of a DAS is a hard disk drive inside a personal computer or a tape drive connected to a single server. I / O requests (also called commands or data transfer protocols) directly address these devices. However, such systems do not scale well, and companies are forced to purchase additional servers to expand storage capacity. This architecture is very expensive and can only be used to create small data warehouses.

    Storage area network

    the SAN storage system (Fig. 2) is implemented in a dedicated local area network. As with DAS, I / O requests directly address storage devices. Most modern SANs use high-performance Fiber Channel, which provides an ad hoc connection between processors and storage devices on the network.

    SAN storage systems can solve the following tasks: soft switching, creation of remote storage, storage consolidation, creation of heterogeneous storage, and provision of backup.

    Software switching. The need to solve this problem arose on the basis of situations when the information system has a sufficiently large set of disk systems and it is required from time to time to connect sets of disks to various servers. In the case of conventional SCSI disks, this requires a physical re-connection, often a system shutdown. However, using the Fiber Channel protocol, FC hubs, and FC switches, you can use the software method. It is important to note that this leaves each disk connected to only one server. These solutions are being successfully applied today, and their further development will lead to support for more hosts and to increase the flexibility of switching.

    Remote storages. Improving technology has led to the fact that it became possible to carry disk arrays at a distance of up to 10 km from the server, thereby ensuring data protection from disasters.

    Consolidation of repositories. Above all, storage consolidation provides significant operational savings and greater system reliability.

    Heterogeneous storages. Storage consolidation leads to heterogeneous connections to the disk array, since there are always different software and hardware platforms in the information system.

    Direct backup. The idea behind direct backup is to copy data from disk to tape directly, bypassing the local network. Thus, the processing power of the servers will be loaded to a minimum.

    Network Attached Storage

    A NAS storage device (Figure 3) with an appliance typically contains a server processor and disk storage and connects to a TCP / IP network (LAN or WAN). NAS devices are accessed using special file access and file sharing protocols. File requests received by the NAS are translated by the internal processor to the I / O request layer of the storage device. The most common file access protocols are CIFS (Common Internet File System), which are used on Windows platforms and NFS (Network File System) used on UNIX platforms. These protocols run on top of the IP protocol used on Ethernet networks and the Internet. Their purpose is to exchange files between computers, so that Windows, Macintosh and UNIX clients have full access to the disk array.

    One of the attractive key features of NAS is to ease the administration of the overall network solution by installing a thin operating system on the NAS.

    Switching technologies

    Fiber Channel

    The main advantage of Fiber Channel technology is that it is a high-speed, low-latency network with modern flow control technology - handling bursty traffic such as storage I / O. It should be noted that it is this characteristic that is distinguished by the weakness of Ethernet. The Fiber Channel industry is incomparable to Ethernet and therefore has a limited choice of technologies and relatively limited implementation and management experience.

    Fiber Channel technology was the first legal development for general switching. However, as has been proven more than once, technology alone does not solve anything. The Fiber Channel industry was not interested in the potential that was presented. This technology started out as the de facto standard for SANs, but it is unlikely that Fiber Channel will be used in NAS and will enter the client / server market.

    Ethernet

    Ethernet is the most widely used networking technology in the world; there are many specialists and many methods for implementing and managing Ethernet networks. Although 10/100 Mbps Ethernet flavors are sufficient for NAS, they are not suitable for SAN support due to bandwidth limitations and lack of flow control. Therefore, the basis for building a SAN will probably be Gigabit Ethernet.

    Ethernet will no doubt be used as a general switching function for both files and storage applications, but before it can be widely used as a corporate industrial network, its relevance for storage must be proven.

    InfiniBand

    InfiniBand is a serial data bus that replaces the system PCI I / O bus. InfiniBand was spearheaded by Intel in collaboration with Compaq, Hewlett-Packard, IBM, Sun, and others. As a core system component expected to be used on both PC and UNIX platforms, InfiniBand is likely to be deployed on a significant scale. ...

    With regard to network attached storages, the following questions arise. Will file and storage applications run directly on the InfiniBand bus, or will they require any InfiniBand network adapters? And when will this happen - immediately, soon, in a few years, or never at all? Apparently, this technology must establish itself as a common system I / O bus before it can effectively conquer new markets such as the network storage market. However, InfiniBand has clear potential to become the main switching function in the future.

    Corporate storage is a great but problematic idea. How can a self-managing storage subsystem be smart enough to provide governing and monitoring services for the data it stores? The storage subsystems' support for storage-tier functions allows them to act as "supervirtual" devices, but this does not give them the ability to manipulate data objects (such as files), as IT managers would like.

    This solution is much more complex than simply placing microprocessors in storage subsystems. Self-managing storage subsystems need to be able to determine which blocks correspond to specific data objects (that is, files, database tables, and metadata) if they are going to manage them. It seems that the "missing link" is some built-in file functionality that should allow data objects to be associated with their storage location. This is entirely the responsibility of the data structure layer of the I / O stack. This layer can be thought of as the "bottom layer" of the file system, which controls the placement of data objects in real or virtual storage.

    An architectural problem with NAS and SAN is that storage subsystems with built-in file technology are generally considered NAS products. What, then, should you call a storage subsystem with half a file system? This is why analyzing networked storage in terms of SAN or NAS gives nothing. NAS and SAN are independent entities; switching, storage and files are also independent.

    NAS technology, firstly, provides a service that allows applications and users to find data in the form of objects on the network, and secondly, it supplies data to the system for storage in storage devices or subsystems. And SAN technology provides network storage functions; in general, it applies to logical blocks of addresses, but it could potentially use other methods of addressing and identifying stored data.

    SAN switching must be extremely fast and reliable. Until now, Fiber Channel has played this role, but in the future Gigabit Ethernet and InfiniBand should enter the market. The development of a common communications infrastructure for both file-based (NAS) and storage (SAN) applications seems inevitable and will eventually become a key technology.

    In the near future, networked storage technologies such as SAN and NAS will be ubiquitous - simply because the amount of information on Earth is doubling every year.

    ComputerPress 2 "2002

    Here is a list of the services I know for storing information on the Internet:

    It is possible to synchronize a folder with a server and different computers, which is convenient when using multiple computers. To synchronize, you need to install the Yandex.Disk program on your computer - it will create a folder that will be synchronized. Plus, installing this program gives an increase in the initial volume of the Disk, but then it can be removed and it will become your external file storage. By the way, if you make yourself a Disk using my links, you will get an additional volume of 1 GB, and I will get 0.5 GB ( provided that the program is installed on your computer) - this is how Yandex spins up its storage. I already use this service - very convenient, I recommend it!

    3. QIP files - http://file.qip.ru. Free of charge is given to store 2 GB of free space for files of absolutely any type for a period of 30 days. To extend the term, you need to download the file at least once. To increase the time and volume of stored information, there are paid premium accounts, in which you will be given 100 GB of space for $ 25. To access the service, you need to have a QIP account, and to download directly from your QIP client, you need to install QIP-Infium. It is also a convenient option for remote access to information from anywhere in the world. The service has stopped working

    4. Files ex.ua - Ukrainian FREE service for storing UNLIMITED VOLUME of information. Files can be ANY size and ANY format. To access, simply go to the service page and click the Create button, upload your data and write down the key and link to access your data. The system itself remembers your computer and later only from it you can edit or delete your information. With the link you will get access from everywhere. The shelf life is limited to 30 days. To renew, just download your files one time. Very simple and convenient service!

    9. is also a way of storing information, and not only free, but also profitable. Since all file hosting accounts are yours, you always have access to your information, in any volume and from anywhere. Limitations on the volume, terms and conditions in file-sharing services are different, so you can familiarize yourself with them in more detail in each of them.

    This is how, using the vastness of the Web, you can store files on the Internet, keeping your hard drive free and light.

    P.S. Read more and you will have 100 25 GB of absolutely free space for storing files on the Internet from the mail.ru service

    If this article helped you, then help the author too - get a blog and tell your friends about the blog with social buttons, suddenly, you will help them too!

    Best regards, Alexey Goncharov.

    Go through the buttons, tell your friends about the article - this is money!

    © 2021 hecc.ru - News of computer technologies