02-FEB-2011: Why is there packet loss ? r1 (See the current copy)

Is the Internets dying ?

Bob,

Here's a little justification, explanation, and plan of action for our experiment with Quality of Service.

Late last week, when the SAN team began to use unused bandwidth (while not exceeding our link capacity) we experienced packet loss between the datacenters. Packet loss is caused by one of two things:

  1. A device or transit (e.g. cable or repeater) malfunctioning;
  2. A queue being full (or nearly full):
    1. Either an interfaces outbound queue;
    2. A devices global queue; or
    3. Random Early Detection (RED) signaling that a queue (one of the above) is nearing fullness

Given the general reliability of modern network devices, and the fact that the packet loss stopped once we reduced the amount of traffic we were transmitting across the network I think we can eliminate a device malfunction as a cause that we should attempt to address.