The purpose of this Computer Network-Project is to implement a simplified web system. The system consists of three programs, the DNS server program, the Web server program, and the client program. The client program consists of two parts, a browser simulator and a client program for system testing.
Domain Name Server
In Internet, packets are routed based on the 32-bit destination IP addresses. However, these numerical addresses are inconvenient for users and applications. Instead, hostnames that consist of strings separated by periods are used. Using hostnames instead of IP addresses has another advantage, namely, transparency. For example, a Web site IP address may change if a different company is hosting the content, but the hostname can remain the same. The translation of hostnames into IP addresses and vice versa is coordinated by the domain name system (DNS). DNS is a hierarchical name space that can be represented by a tree (see the following Figure). The root of the tree is an unnamed node. The first layer of the tree contains the top-level domains. The second-level domain names are given to individual
companies, institutions, and/or organizations. Various levels of subdomains further divide a domain. DNS is a distributed database that consists of a hierarchical set of DNS servers. More specifically, there is a DNS server associated with each node at the root, top-level, and second-level domains. The DNS servers for the second-level domains serve both
iterative and recursive DNS requests. For example, a client in local domain may query its local DNS server (say S) for a hostname “www.yahoo.com” using a recursive request. S will search its cache (not the mapping table) for a match. If such match does not exist, then it sends a query to the root DNS server to get the IP address of the DNS of the “com” domain. Then S queries “com” domain DNS to get the IP address of the DNS of the “yahoo.com” domain. Finally, S queries the “yahoo.com” domain DNS to get the IP address of “www.yahoo.com”. After getting the mapping, S adds the entry to its cache for future accesses. Actually, if S already has a cached entry for “com” domain DNS server, then there is no need to go to the root DNS server.
Web System
In the Web system, a client (browser) sends a Web access request to a Web server to access a web page. The Web server, in turn, receives the request, parses it to identify the file to be accessed, and transfers the file to the client (or sends back an error message if the request is not correct or the file it accesses does not exist). A Web access request is specified by a URL and HTTP protocol is used for the handshake between the client and the server. A URL consists of two parts, a host name and a file name. The host name is the name of the Web server and the file name is the Web page to be accessed. When a browser gets a URL, it separates the URL into host name and file name. It sends the host name to DNS server to obtain the corresponding IP address of the Web server. Then, it sends the file name to the
Web server with the IP address returned from the DNS server. Subsequently, it receives the response from the Web server and displays the returned content or error message.
In an older version of a web server program, a TCP connection is opened to accept the client connection requests. When a connection request is accepted, the server spawns a thread to receive the subsequent HTTP requests, process them, and send back the responses. This approach incurs high overhead for thread creation and disposal. Thus, new Web server programs use thread pools to handle client requests. A thread pool consists of multiple threads that are created up front. When a user requests for connection, an idle thread is chosen to establish the connection with the user and process subsequent requests. The
number of threads in the thread pool can expand and shrink, depending on the load of the system. For a Web site with high hit rate, one Web server may not be sufficient to handle all the client accesses. Multiple Web Servers are frequently used to share the load. However, a mechanism is needed to allow the client to transparently connect to the web server with low load. In other words, the same URL should be used for the access no matter whether the Web server is replicated. There are several methods that are commonly used for Web server load sharing. Here we introduce the DNS-based load sharing. The DNS server is used to direct the client to different Web servers. In this approach, the DNS server can map one hostname to multiple IP addresses, where each IP corresponds to one of the replicated Web servers. DNS
server can use round robin policy to select the IP address to return to the client. Some DNS servers can probe the servers to obtain server load information and, based on the load information, selects an appropriate IP address to return to the client.
Tasks
- The client program simulates a browser.
- repeat
- read in a URL;
- get the hostname h and file name f from r;
- if h is not the same as the hostname of the previous request then
- send a message to close the previous connection;
- send a query to its local DNS server to get the IP address for h;
- establish a connection to h;
- endif;
- send a request to h to get file f;
- print the file f;
- until being killed;
Your client program should provide a browser-like interface. The interface should allow users to input the URL r from standard input. Each input line consists of a URL and a sleep time. To simplify the parsing process, we define fixed formats for hostnames and file names. We assume that each hostname always consists of 3 segments separated by periods, the name of the host, the second-level domain name, and the top-level domain name. The host name and the second-level and top-level domain names are strings of 3 letters. We also assume a flat file system, i.e., no directories. Each file name contains only a string of 4 letters followed by “.htm” extension. The entire URL, thus, is a string of 20 characters. After receiving
the requested file, the client program (browser) should display the HTML file (similar to a browser). The client program may send multiple requests along the same connection if consecutive requests are for the same host (e.g., fetching multiple objects in a web page). When a request needs to be sent to a different host, then the client should send a closing message to close the connection established with the previous host. The message formats for the DNS requests and the Web requests will be discussed in the later sections.
Your client program needs to know the IP address and port number of its local DNS server. These are given in an input file which will be discussed in the next subsection.
The basic DNS server simply receives DNS queries from clients, performs name resolution, and responds to the clients. You need to implement the DNS servers to process DNS requests. In your implementation, you only need to consider mapping hostnames to IP addresses, not vise versa. Though actual DNS has a fixed port number, we will have to use different port numbers for different DNSs so that multiple DNSs can be simulated by one single processor. Thus, the mapping should include port number as well. Also, we only consider three levels in the domain hierarchy, including the root, top level, and second level. A mapping table should be maintained for all DNS servers in order to process iterative DNS requests. For the second-level domain DNS, you need to also maintain a cache for name resolution for recursive DNS
queries. UDP protocol should be used for DNS communication. Each DNS server creates a UDP socket to receive requests from clients and another UDP socket to send responses to clients. Each DNS request contains the full host/domain name string. The string should contain 11 bytes. We add a blank at the end to pad the string to 12 bytes. Thus, the request message is of 12 characters. The DNS server at different levels simply extracts the partial string that represents the domain name it can serve and finds the mapping. Each
DNS response contains 4 fields:
- <host/domain name (12 bytes)>
- <DNS level (4 bytes)>
- <IP address (16 bytes)>
- <port number (8 bytes)>
The DNS level specifies the level of the DNS in the hierarchy the response is from. It contains 4 characters, the first 3 characters are blanks and the 4th character is the actual level, where 0 represents the root level, 1 represents the top level, and 2 represents the second level. The IP address will always be a 16 byte character string. If the actual IP address string is shorter than 16 characters, the remaining bytes are filled by blanks. The port number and is converted to text format with he corresponding sizes. In total, each DNS response, no matter which level it is from, should contain 52 characters.
We use thread pool concept to implement the Web server program. When a Web server starts, it creates N threads. Then the server listens to a TCP port. When a connection request comes, the server selects an idle thread from the thread pool and let the selected thread accept the connection and process the requests. Since the load of the system changes, the thread pool size should adapt accordingly. We create another thread to perform thread pool maintenance.
The request message from the client contains the client id (4 characters, converted from the integer value) followed by a file name. As discussed previously, the file name is a string of 8 characters, including 4 letters and the “.htm” extension. Overall, each web server request contains 12 characters.
At the Web server site, the Web files are located in one directory. The directory name will be discussed later. When the Web server receives a request message containing the file name, it appends the directory name and fetches the corresponding file. It then sends the file content through the same connection to the client. The response message starts with the file size and followed by the file content. The file size is an integer in text representation and it uses 12 bytes. If the file does not exist, then the value in the response should be all 0’s, indicating an error. The connection from the client does not close till the client close the connection.