BL 20B - Biological Data Analysis

The Internet and the World Wide Web - Ver. 99.1

What is the Internet?

A network is a collection of computers that can communicate with each other. An internet is a collection of inter-communicating networks. The Internet (with a capital I) is the collection of computers worldwide that can all communicate with each other.

The Internet is not an information store - but some computers on the Internet are.

The Internet is like the telephone system which is just a way to help telephones connect with each other. The Internet is like an established bureaucracy - there are rigid systems that define how individuals work with each other - protocols.

What can the Internet do?

Allow you to retrieve information from different computers around the world. Allow you to supply information to others. Information can be in any form that computers know about - text, pictures, sound.

How does the communication work?

A single computer can have numerous communication lines active at one time. On the Internet established communication lines operate between pairs of computers. Both computers must have a special number (IP number) which is unique. There is also a basic protocol that enables two computers to connect to each other. This is TCP/IP. TCP is transmission control protocol and IP is Internet Protocol. Both computers must have TCP/IP software.

Internet numbers and names

TCP/IP requires that every computer has a unique IP number and knows what it is. IP numbers are four numbers between 0 and 255 separated by dots. E.g.

205.214.198.221

Computers need to use IP numbers for communicating but people prefer names. Special computers called name servers translate numbers to names and names to numbers. The IP address above is that of www.uwichill.edu.bb (the Cave Hill campus of the University of the West Indies). 

205.214.198.222 = scitec.uwichill.edu.bb; 207.25.71.5 = CNN.com and 209.185.151.128 = Hotbot.com.

The name of the computer need only be unique in the local network. The local network, called the Internet Domain, must have a unique name. The full name of the computer combines the computer name and the network (Domain) name. E.g.

scitec.uwichill.edu.bb

Host computer = scitec; Domain = uwichill.edu.bb; edu = educational institution (so does ac); bb = Barbados (beautiful B’dos?).

N.B. The host computer can be real or virtual. That is one physical computer can be partitioned with each partition having its own IP address.

Internet services

These are specific services which one computer may provide to other computers. Some services are as old as the Internet but others have been invented with passing time.

In addition to TCP/IP, each Internet service has a corresponding service protocol. These include: File transfer (FTP), Hypertext (HTTP), Electronic mail delivery (SMTP), and Terminal emulation via network (TELNET).

What is WWW?

The World Wide Web (WWW) was invented/ implemented in 1989/1992 at the European Laboratory for Particle Physics, known as CERN.

Internet users give you access to files, the files contain references to other files, and the sum total of all these interlinked files around the world is called the World Wide Web. Nobody knows its size but it is vast and increasing at a phenomenal rate.

Hypertext

The text in a book is usually one dimensional (linear) - it is read sequentially from start to finish. Hypertext is multidimensional because there can be links from any part of a document to any part of any other document. Links are used to navigate through information. Links can take you across the world.

HTML

HyperText Mark-up Language is a standard format for documents on the WWW. Software for viewing HTML documents called WWW browsers exist for most types of computers. Richly formatted text can be displayed in HTML. HTML files can contain links to other files on Internet computers.

HTTP

This Internet protocol is by far the most common method for fetching HTML files from remote computers. HyperText Transfer Protocol can also be used to fetch other file types.

URL

The Uniform Resource Locator of a file is a string of letters and symbols that uniquely identifies a file on a computer on the Internet. The URL most provide several pieces of information. E.g. http://scitec.uwichill.edu.bb/bcs/staff/lec/Bdosfern.htm.

The page (file) - Bdosfern.htm (written in HTML) can be fetched using the http protocol from the host computer - scitec at Domain - uwichill.edu.bb. Where it is in the sub-directory (folder) - lec within the sub-directory (folder) - staff within the directory (folder) - bcs

The web provides an excellent way to browse information because items of information can themselves direct you to other items of information (hypermedia).

WWW Browsers

A WWW browser is a program that can use Internet protocols to fetch HTML pages, can display them, and, when you select highlighted text or graphics using the mouse or keyboard, it will fetch and load the referenced document. Microsoft Internet Explorer, Netscape Communicator, Netscape Navigator and NCSA Mosaic are web browsers.

How do I find what I want?

Unlike books that you find in a library, there is no standard classification system for the information on the Internet and no librarians doing the cataloguing. But, there are things like card catalogues. Well not quite. And, unlike books, about 58% of the WWW is not indexed (catalogued).

Search Engines, Search Directories and Metacrawlers

Search engines use computer programs called bots or spiders to scour the net for sites, and store data from them in giant searchable databases.

Search directories are compiled by real people. As with a Search Engine, the end product is a database which can be searched. But, it can also be searched by browsing categories. However, it is always going to be out of date or not as up to date as a database generated by a search engine. Despite this, they are good places to start a search if you don't know too much about the subject.

Metacrawlers compile the results of multiple search engines and directories, delivering a "comprehensive" list of web links.

Suggested examples of each of these are listed below.

Search Engines Search Directories

Metacrawlers

 
HotBot
 
Yahoo!
 
Metacrawler
Northern Light Search UK Yahoo! Dogpile
AltaVista LookSmart
Google Open Directory
Subjectguide.com*

    *not searchable

Search Tips

Listen carefully ....

Uppercase? The White House and the white house.

Boolean Operators - AND, OR, NOT and NEAR (also + and - )

Complete Phrases - "Your name"

Missing sites? Saw it, didn’t save it, can’t find it. Saw it, added to favorites/bookmarked, not there anymore - 404 error.

For tips on seaching this site and more on some of the above, see this link.

 

Portals

Internet portals or gateways are where you start to crawl around the Web. Initially, these were sites of search engines and directories. Today, a typical portal offers access to news, weather, sports results, shopping, free e-mail etc. in addition to the ability to start searching.

 

For more information

Internet 101 - a little dated but still contains some good stuff.

The W3C - World Wide Web Consortium - was founded in October 1994 to lead the World Wide Web to its full potential by developing common protocols that promote its evolution and ensure its interoperability. It is an international industry consortium, jointly hosted by the Massachusetts Institute of Technology Laboratory for Computer Science [MIT/LCS] in the United States; the Institut National de Recherche en Informatique et en Automatique [INRIA] in Europe; and the Keio University Shonan Fujisawa Campus in Japan. Services provided by the Consortium include: a repository of information about the World Wide Web for developers and users; reference code implementations to embody and promote standards; and various prototype and sample applications to demonstrate use of new technology. Initially, the W3C was established in collaboration with CERN, where the Web originated, with support from DARPA and the European Commission. For details on the joint initiative and the contributions of CERN, INRIA, and MIT. See the statement on the joint World Wide Web Initiative.
The Consortium is led by Tim Berners-Lee, Director and creator of the World Wide Web, and Jean-François Abramatic, Chairman. W3C is funded by Member organizations, and is vendor neutral, working with the global community to produce specifications and reference software that is made freely available throughout the world.

A tutorial on TCP/IP can be found at - http://olinc.vuse.vanderbilt.edu/federation/welcome.html-ssi

For more on Domain names see this at webmonkey.

If you really need to know more about Simple Mail Transfer Protocol try - http://www.ietf.org/internet-drafts/draft-ietf-drums-smtpupd-10.txt

For more search engines etc. go to W3 Search Engines at or this annotated list - http://www.ub2.lu.se/nav_menu.html.

How search engines work - http://searchenginewatch.com/webmasters/work.html.

The online version of Yahoo! Internet Life magazine (including back issues) - http://www.zdnet.com/yil/.

"Searching the Internet" - a 1997 article from Scientific American - http://www.sciam.com/0397issue/0397lynch.html.

Go to http://searchenginewatch.com/facts/index.html - for a collection of Web searching tips.

 

L.E. Chinnery, October 27, 1999.