Saturday, April 11, 2015

RAID (Redundant Array of Independent Disks) -Basic

RAID (Redundant Array of Independent Disks) A disk subsystem that increases performance or provides fault tolerance or both. RAID uses two or more physical disk drives and a RAID controller, which is plugged into motherboards that do not have RAID circuits. Today, most motherboards have built-in RAID but not necessarily every RAID configuration (see below). In the past, RAID was also accomplished by software only but was much slower. In the late 1980s, the "I" in RAID stood for "inexpensive" but was later changed to "independent."

In large storage area networks (SANs), floor-standing RAID units are common with terabytes of storage and huge amounts of cache memory. RAID is also used in desktop computers by gamers for speed and by business users for reliability. Following are the various RAID configurations. See NAS and SAN.

The way in which you configure that fault tolerance depends on the RAID level you set up. RAID levels depend on how many disks you have in a storage device, how critical drive fail over and recovery is to your data needs, and how important it is to maximize performance. A business will generally find it more urgent to keep data intact in case of hardware failure than, for example, a home user will. Different RAID levels represent different configurations aimed at providing different balances between performance optimization and data protection.


 RAID Levels

On most situations you will be using one of the following four popular levels of RAIDs.
  • RAID 0
  • RAID 1
  • RAID 5
  • RAID 10 (also known as RAID 1+0)
However there are several non-standard raids, which are not used except in some rare situations. It is good to know what they are.
IN this article I tried to explain with a simple diagram how RAID 2, RAID 3, RAID 4, RAID 6, RAID 50 and RAID 60 works.

In this article I explains the main difference between these raid levels along with an easy to understand diagram.

In all the diagrams mentioned below:
  • A, B, C, D, E and F – represents blocks
  • p1, p2, and p3 – represents parity


RAID Level 0 - Disk Striping for Performance

 

RAID 0 is used to boost a server's performance. It's also known as "disk striping." With RAID 0, data is written across multiple disks. This means the work that the computer is doing is handled by multiple disks rather than just one, increasing performance because multiple drives are reading and writing data, improving disk I/O. A minimum of two disks is required. Both software and hardware RAID support RAID 0, as do most controllers. The downside is that there is no fault tolerance. If one disk fails, then that affects the entire array and the chances for data loss or corruption increases.

 
Following are the key points to remember for RAID level 0.
  • Minimum 2 disks.
  • RAID 0 implements a striped disk array, the data is broken down into blocks and each block is written to a separate disk drive
  • I/O performance is greatly improved by spreading the I/O load across many channels and drives
  • Best performance is achieved when data is striped across multiple controllers with only one drive per controller
  • No parity calculation overhead is involved
  • Very simple design
  • Easy to implement

Disadvantages

  • Not a "True" RAID because it is NOT fault-tolerant
  • The failure of just one drive will result in all data in an array being lost
  • Should never be used in mission critical environments

Recommended Applications


  • Video Production and Editing
  • Image Editing
  • Pre-Press Applications
  • Any application requiring high bandwidth 

 

RAID Level 1 - Mirroring for Fault Tolerance

 

 This is the first mode which actually has redundancy. RAID 1 is a fault-tolerance configuration known as "disk mirroring." With RAID 1, data is copied seamlessly and simultaneously, from one disk to another, creating a replica, or mirror. If one disk gets fried, the other can keep working. It's the simplest way to implement fault tolerance and it's relatively low cost.

The downside is that RAID 1 causes a slight drag on performance. RAID 1 can be implemented through either software or hardware. A minimum of two disks is required for RAID 1 hardware implementations. With software RAID 1, instead of two physical disks, data can be mirrored between volumes on a single disk. One additional point to remember is that RAID 1 cuts total disk capacity in half: If a server with two 1TB drives is configured with RAID 1, then total storage capacity will be 1TB not 2TB.


 Following are the key points to remember for RAID level 1.
  • Minimum 2 disks.
  • Twice the Read transaction rate of single disks, same Write transaction rate as single disks
  • 100% redundancy of data means no rebuild is necessary in case of a disk failure, just a copy to the replacement disk
  • Transfer rate per block is equal to that of a single disk
  • Under certain circumstances, RAID 1 can sustain multiple simultaneous drive failures
  • Simplest RAID storage subsystem design

Disadvantages

  • Highest disk overhead of all RAID types (100%) - inefficient
  • Typically the RAID function is done by system software, loading the CPU/Server and possibly degrading throughput at high activity levels. Hardware implementation is strongly recommended
  • May not support hot swap of failed disk when implemented in "software"

Recommended Applications


  • Accounting
  • Payroll
  • Financial
  • Any application requiring very high availability

RAID Level 5 - Speed and Fault Tolerance


RAID 5 is by far the most useful RAID mode when one wishes to combine a larger number of physical disks, and still maintain some redundancy for business servers and enterprise NAS devices.
RAID-5 can be (usefully) used on three or more disks, with zero or more spare-disks.  

This RAID level provides better performance than mirroring as well as fault tolerance. With RAID 5, data and parity (which is additional data used for recovery) are striped across three or more disks. If a disk gets an error or starts to fail, data is recreated from this distributed data and parity block— seamlessly and automatically. Essentially, the system is still operational even when one disk kicks the bucket and until you can replace the failed drive.

 Another benefit of RAID 5 is that it allows many NAS and server drives to be "hot-swappable" meaning in case a drive in the array fails, that drive can be swapped with a new drive without shutting down the server or NAS and without having to interrupt users who may be accessing the server or NAS. It's a great solution for fault tolerance because as drives fail (and they eventually will), the data can be rebuilt to new disks as failing disks are replaced.

Both read and write performance usually increase, but can be hard to predict how much. Reads are almost similar to RAID-0 reads.The write efficiency depends heavily on the amount of memory in the machine, and the usage pattern of the array. Heavily scattered writes are bound to be more expensive. 


 Following are the key points to remember for RAID level 5.
  • Minimum 3 disks.
  • Highest Read data transaction rate
  • Medium Write data transaction rate
  • Low ratio of ECC (Parity) disks to data disks means high efficiency
  • Good aggregate transfer rate
  • Best cost effective option providing both performance and redundancy. Use this for DB that is heavily read oriented. Write operations will be slow.

Disadvantages

  • Disk failure has a medium impact on throughput
  • Most complex controller design
  • Difficult to rebuild in the event of a disk failure (as compared to RAID level 1)
  • Individual block data transfer rate same as single disk

Recommended Applications


  • File and Application servers
  • Database servers
  • Web, E-mail, and News servers
  • Intranet servers
  • Most versatile RAID level

 RAID Level 10 

RAID 10 is a combination of RAID 1 and 0 and is often denoted as RAID 1+0. It combines the mirroring of RAID 1 with the striping of RAID 0. It's the RAID level that gives the best performance, but it is also costly, requiring twice as many disks as other RAID levels, for a minimum of four.

This is the RAID level ideal for highly utilized database servers or any server that's performing many write operations. RAID 10 can be implemented as hardware or software, but the general consensus is that many of the performance advantages are lost when you use software RAID 10.


 Following are the key points to remember for RAID level 10.
  • Minimum 4 disks.
  • RAID 10 is implemented as a striped array whose segments are RAID 1 arrays
  • RAID 10 has the same fault tolerance as RAID level 1
  • RAID 10 has the same overhead for fault-tolerance as mirroring alone
  • High I/O rates are achieved by striping RAID 1 segments
  • Under certain circumstances, RAID 10 array can sustain multiple simultaneous drive failures
  • Excellent solution for sites that would have otherwise gone with RAID 1 but need some additional performance boost.

Disadvantages

  • Very expensive / High overhead
  • All drives must move in parallel to proper track lowering sustained performance
  • Very limited scalability at a very high inherent cost

Recommended Applications


  • Database server requiring high performance and fault tolerance

( ***Other RAID Levels: These RAID levels are used for specific cases. Here are some short descriptions of each: )

RAID Level 2




  • RAID 2 is similar to RAID 5, but instead of disk striping using parity, striping occurs at the bit-level.
  • In the above diagram b1, b2, b3 are bits. E1, E2, E3 are error correction codes.
  • You need two groups of disks. One group of disks are used to write the data, another group is used to write the error correction codes.
  • This uses Hamming error correction code (ECC), and stores this information in the redundancy disks.
  • When data is written to the disks, it calculates the ECC code for the data on the fly, and stripes the data bits to the data-disks, and writes the ECC code to the redundancy disks.
  • When data is read from the disks, it also reads the corresponding ECC code from the redundancy disks, and checks whether the data is consistent. If required, it makes appropriate corrections on the fly.

Disadvantages

  • Very high ratio of ECC disks to data disks with smaller word sizes - inefficient
  • Entry level cost vey high - requires very high transfer rate requirement to justify
  • Transaction rate is equal to that of a single disk at best (with spindle synchronization)
  • No commercial implementations exist / not commercially viable

 

RAID Level 3 


 

  • RAID 3 is also similar to RAID 5.This uses byte level striping. i.e Instead of striping the blocks across the disks, it stripes the bits across the disks.
  • In the above diagram B1, B2, B3 are bytes. p1, p2, p3 are parities.
  • Uses multiple data disks, and a dedicated disk to store parity.
  • The disks have to spin in sync to get to the data.
  • Sequential read and write will have good performance.
  • Random read and write will have worst performance.
  • Controller design is fairly complex
  • Very difficult and resource intensive to do as a "software" RAID

Recommended Applications


  • Video Production and live streaming
  • Image Editing
  • Video Editing
  • Prepress Applications
  • Any application requiring high throughput

 

RAID Level 4

 

  • This uses block level striping.
  • In the above diagram B1, B2, B3 are blocks. p1, p2, p3 are parities.
  • Uses multiple data disks, and a dedicated disk to store parity.
  • Minimum of 3 disks (2 disks for data and 1 for parity)
  • Good random reads, as the data blocks are striped.
  • Bad random writes, as for every write, it has to write to the single parity disk.
  • It is somewhat similar to RAID 3 and 5, but little different.
  • This is just like RAID 3 in having the dedicated parity disk, but this stripes blocks.
  • This is just like RAID 5 in striping the blocks across the data disks, but this has only one parity disk.
  • This is not commonly used.

 

RAID Level 6

 

  • RAID 6 is essentially an extension of RAID level 5 which allows for additional fault tolerance by using a second independent distributed parity scheme (dual parity)
  • In the above diagram A, B, C are blocks. p1, p2, p3 are parities.
  • This creates two parity blocks for each data block.
  • Data is striped on a block level across a set of drives, just like in RAID 5, and a second set of parity is calculated and written across all the drives; RAID 6 provides for an extremely high data fault tolerance and can sustain multiple simultaneous drive failures
  • RAID 6 protects against multiple bad block failures while non-degraded
  • RAID 6 protects against a single bad block failure while operating in a degraded mode
  • Perfect solution for mission critical applications

Disadvantages

  • More complex controller design
  • Controller overhead to compute parity addresses is extremely high
  • Write performance can be brought on par with RAID Level 5 by using a custom ASIC for computing Reed-Solomon parity
  • Requires N+2 drives to implement because of dual parity scheme

Recommended Applications


  • File and Application servers
  • Database servers
  • Web and E-mail servers
  • Intranet servers
  • Excellent fault-tolerance with the lowest overhead

 

 RAID LEVEL 50: High I/O Rates & Data Transfer Performance

 

 

  • RAID 50 should have been called "RAID 03" because it was implemented as a striped (RAID level 0) array whose segments were RAID 3 arrays (during mid-90s)
  • Most current RAID 50 implementation is illustrated above
  • RAID 50 is more fault tolerant than RAID 5 but has twice the parity overhead
  • High data transfer rates are achieved thanks to its RAID 5 array segments
  • High I/O rates for small requests are achieved thanks to its RAID 0 striping
  • Maybe a good solution for sites who would have otherwise gone with RAID 5 but need some additional performance boost.

Disadvantages

  • Very expensive to implement
  • All disk spindles must be synchronized, which limits the choice of drives
  • Failure of two drives in one of the RAID 5 segments renders the whole array unusable

RAID Level 60

RAID 60 is two (or more) RAID 6 groups striped together. The Sun Fire X4150 we tested had eight drives on two channels, so it nicely split into two RAID 6 groups striped together to make one RAID 60. This further reduces the risk of drive loss. With RAID 60, you can lose up to four drives, compared with the two-drive limit in RAID 6.
However, this added redundancy comes at the cost of additional parity disks. Furthermore, because RAID 60 is a striped collection of RAID 6 arrays, each RAID 6 array must remain consistent. Otherwise, data will be lost.

RAID Level 7

RAID 7 is a proprietary level of RAID owned by the now-defunct Storage Computer Corporation.


                                                       ************************************************
### In here I tried to provide precised and important basic information's of all Raid Levels. In future I'll try to provide detail information's about how individual Raid Level works and the Raid calculation method. So, stay in touch :)
And, thanks for reading.###
                                                       ************************************************



Reference:
https://www.hp.com
https://www.cisco.com
https://www.datarecovery.net
https://www.cuddletech.com
https://web.eecs.umich.edu/.../papers/chen94_1.pdf
ftp://ftp.eecs.berkeley.edu/.../raid/paper.
www.asicomputers.com.au/...WhitePapers/intel_raid_controller
www.kls2.com/~karl/papers/raid-wp-10.0.pdf
thomaschneider.de/papers/DHS14.pdf

Tuesday, April 7, 2015

Proxy Server

A Proxy Server is computer that functions as an intermediary between a web browser (such as Internet Explorer) and the Internet. Proxy servers help improve web performance by storing a copy of frequently used webpages. When a browser requests a webpage stored in the proxy server's collection (its cache), it is provided by the proxy server, which is faster than going to the web. Proxy servers also help improve security by filtering out some web content and malicious software.

A Proxy Server is a server (a computer system or an application) that acts as an intermediary for requests from clients seeking resources from other servers. A client connects to the proxy server, requesting some service, such as a file, connection, web page, or other resource available from a different server and the proxy server evaluates the request as a way to simplify and control its complexity. Proxies were invented to add structure and encapsulation to distributed systems.



Purpose of Using Proxy Server:


 Improve Performance: Proxy servers can dramatically improve performance for groups of users. This is because it saves the results of all requests for a certain amount of time. Consider the case where both user X and user Y access the World Wide Web through a proxy server. First user X requests a certain Web page, which we'll call Page 1. Sometime later, user Y requests the same page. Instead of forwarding the request to the Web server where Page 1 resides, which can be a time-consuming operation, the proxy server simply returns the Page 1 that it already fetched for user X. Since the proxy server is often on the same network as the user, this is a much faster operation. Real proxy servers support hundreds or thousands of users. The major online services such as America Online, MSN and Yahoo, for example, employ an array of proxy servers.

Filter Requests: Proxy servers can also be used to filter requests. For example, a company might use a proxy server to prevent its employees from accessing a specific set of Web sites.

Translation: A translation proxy is a proxy server that is used to localize a website experience for different markets. Traffic from global audiences is routed through the translation proxy to the source website. As visitors browse the proxy site, requests go back to the source site where pages are rendered. Original language content in the response is replaced by translated content as it passes back through the proxy. The translations used in a translation proxy can be either machine translation, human translation, or a combination of machine and human translation. Different translation proxy implementations have different capabilities. Some allow further customization of the source site for local audiences such as excluding source content or substituting source content with original local content.

Accessing Services Anonymously: An anonymous proxy server (sometimes called a web proxy) generally attempts to anonymize web surfing. There are different varieties of anonymizers. The destination server (the server that ultimately satisfies the web request) receives requests from the anonymizing proxy server, and thus does not receive information about the end user's address. The requests are not anonymous to the anonymizing proxy server, however, and so a degree of trust is present between the proxy server and the user. Many proxy servers are funded through a continued advertising link to the user.

Security: A proxy can keep the internal network structure of a company secret by using network address translation, which can help the security of the internal network. This makes requests from machines and users on the local network anonymous. Proxies can also be combined with firewalls. 

  
Types of Proxy

A proxy server may reside on the user's local computer, or at various points between the user's computer and destination servers on the Internet.
  1. A proxy server that passes requests and responses unmodified is usually called a gateway or sometimes a tunneling proxy.
  2. A forward proxy is an Internet-facing proxy used to retrieve from a wide range of sources (in most cases anywhere on the Internet).
  3. A reverse proxy is usually an Internet-facing proxy used as a front-end to control and protect access to a server on a private network. A reverse proxy commonly also performs tasks such as load-balancing, authentication, decryption or caching.

Open Proxies:

 An open proxy is a forwarding proxy server that is accessible by any Internet user. An anonymous open proxy allows users to conceal their IP address while browsing the Web or using other Internet services. There are varying degrees of anonymity however, as well as a number of methods of 'tricking' the client into revealing itself regardless of the proxy being used.


Reverse Proxies:

A reverse proxy (or surrogate) is a proxy server that appears to clients to be an ordinary server. Requests are forwarded to one or more proxy servers which handle the request. The response from the proxy server is returned as if it came directly from the origin server, leaving the client no knowledge of the origin servers. Reverse proxies are installed in the neighborhood of one or more web servers. All traffic coming from the Internet and with a destination of one of the neighborhood's web servers goes through the proxy server. The use of "reverse" originates in its counterpart "forward proxy" since the reverse proxy sits closer to the web server and serves only a restricted set of websites.



There are several reasons for installing reverse proxy servers:
  • Encryption/SSL Acceleration: When secure web sites are created, the SSL encryption is often not done by the web server itself, but by a reverse proxy that is equipped with SSL acceleration hardware. See Secure Sockets Layer. Furthermore, a host can provide a single "SSL proxy" to provide SSL encryption for an arbitrary number of hosts; removing the need for a separate SSL Server Certificate for each host, with the downside that all hosts behind the SSL proxy have to share a common DNS name or IP address for SSL connections. This problem can partly be overcome by using the SubjectAltName feature of X.509 certificates.
  • Load Balancing: The reverse proxy can distribute the load to several web servers, each web server serving its own application area. In such a case, the reverse proxy may need to rewrite the URLs in each web page (translation from externally known URLs to the internal locations).
  • Serve/Cache Static Content: A reverse proxy can offload the web servers by caching static content like pictures and other static graphical content.
  • Compression: The proxy server can optimize and compress the content to speed up the load time.
  • Spoon Feeding: Reduces resource usage caused by slow clients on the web servers by caching the content the web server sent and slowly "spoon feeding" it to the client. This especially benefits dynamically generated pages.
  • Security: The proxy server is an additional layer of defense and can protect against some OS and Web Server specific attacks. However, it does not provide any protection from attacks against the web application or service itself, which is generally considered the larger threat.
  • Extranet Publishing: A reverse proxy server facing the Internet can be used to communicate to a firewall server internal to an organization, providing extranet access to some functions while keeping the servers behind the firewalls. If used in this way, security measures should be considered to protect the rest of your infrastructure in case this server is compromised, as its web application is exposed to attack from the Internet.