docs/tcp.txt - busybox - Git at Google

 	Some less-widely known details of TCP connections.

 	Properly closing the connection.

 After this code sequence:

     sock = socket(AF_INET, SOCK_STREAM, 0);
     connect(sock, &remote, sizeof(remote));
     write(sock, buffer, 1000000);

 a large block of data is only buffered by kernel, it can't be sent all at once.
 What will happen if we close the socket?

 "A host MAY implement a 'half-duplex' TCP close sequence, so that
  an application that has called close() cannot continue to read
  data from the connection. If such a host issues a close() call
  while received data is still pending in TCP, or if new data is
  received after close() is called, its TCP SHOULD send a RST
  to show that data was lost."

 IOW: if we just close(sock) now, kernel can reset the TCP connection
 (send RST packet).

 This is problematic for two reasons: it discards some not-yet sent
 data, and it may be reported as error, not EOF, on peer's side.

 What can be done about it?

 Solution #1: block until sending is done:

     /* When enabled, a close(2) or shutdown(2) will not return until
      * all queued messages for the socket have been successfully sent
      * or the linger timeout has been reached.
      */
     struct linger {
 	int l_onoff;    /* linger active */
 	int l_linger;   /* how many seconds to linger for */
     } linger;
     linger.l_onoff = 1;
     linger.l_linger = SOME_NUM;
     setsockopt(sock, SOL_SOCKET, SO_LINGER, &linger, sizeof(linger));
     close(sock);

 Solution #2: tell kernel that you are done sending.
 This makes kernel send FIN after all data is written:

     shutdown(sock, SHUT_WR);
     close(sock);

 However, experiments on Linux 3.9.4 show that kernel can return from
 shutdown() and from close() before all data is sent,
 and if peer sends any data to us after this, kernel still responds with
 RST before all our data is sent.

 In practice the protocol in use often does not allow peer to send
 such data to us, in which case this solution is acceptable.

 Solution #3: if you know that peer is going to close its end after it sees
 our FIN (as EOF), it might be a good idea to perform a read after shutdown().
 When read finishes with 0-sized result, we conclude that peer received all
 the data, saw EOF, and closed its end.

 However, this incurs small performance penalty (we run for a longer time)
 and requires safeguards (nonblocking reads, timeouts etc) against
 malicious peers which don't close the connection.

 Solutions #1 and #2 can be combined:

     /* ...set up struct linger... then: */
     setsockopt(sock, SOL_SOCKET, SO_LINGER, &linger, sizeof(linger));
     shutdown(sock, SHUT_WR);
     /* At this point, kernel sent FIN packet, not RST, to the peer, */
     /* even if there is buffered read data from the peer. */
     close(sock);

 	Defeating Nagle.

 Method #1: manually control whether partial sends are allowed:

 This prevents partially filled packets being sent:

     int state = 1;
     setsockopt(fd, IPPROTO_TCP, TCP_CORK, &state, sizeof(state));

 and this forces last, partially filled packet (if any) to be sent:

     int state = 0;
     setsockopt(fd, IPPROTO_TCP, TCP_CORK, &state, sizeof(state));

 Method #2: make any write to immediately send data, even if it's partial:

     int state = 1;
     setsockopt(fd, IPPROTO_TCP, TCP_NODELAY, &state, sizeof(state));
	Some less-widely known details of TCP connections.

	Properly closing the connection.

	After this code sequence:

	sock = socket(AF_INET, SOCK_STREAM, 0);
	connect(sock, &remote, sizeof(remote));
	write(sock, buffer, 1000000);

	a large block of data is only buffered by kernel, it can't be sent all at once.
	What will happen if we close the socket?

	"A host MAY implement a 'half-duplex' TCP close sequence, so that
	an application that has called close() cannot continue to read
	data from the connection. If such a host issues a close() call
	while received data is still pending in TCP, or if new data is
	received after close() is called, its TCP SHOULD send a RST
	to show that data was lost."

	IOW: if we just close(sock) now, kernel can reset the TCP connection
	(send RST packet).

	This is problematic for two reasons: it discards some not-yet sent
	data, and it may be reported as error, not EOF, on peer's side.

	What can be done about it?

	Solution #1: block until sending is done:

	/* When enabled, a close(2) or shutdown(2) will not return until
	* all queued messages for the socket have been successfully sent
	* or the linger timeout has been reached.
	*/
	struct linger {
	int l_onoff; /* linger active */
	int l_linger; /* how many seconds to linger for */
	} linger;
	linger.l_onoff = 1;
	linger.l_linger = SOME_NUM;
	setsockopt(sock, SOL_SOCKET, SO_LINGER, &linger, sizeof(linger));
	close(sock);

	Solution #2: tell kernel that you are done sending.
	This makes kernel send FIN after all data is written:

	shutdown(sock, SHUT_WR);
	close(sock);

	However, experiments on Linux 3.9.4 show that kernel can return from
	shutdown() and from close() before all data is sent,
	and if peer sends any data to us after this, kernel still responds with
	RST before all our data is sent.

	In practice the protocol in use often does not allow peer to send
	such data to us, in which case this solution is acceptable.

	Solution #3: if you know that peer is going to close its end after it sees
	our FIN (as EOF), it might be a good idea to perform a read after shutdown().
	When read finishes with 0-sized result, we conclude that peer received all
	the data, saw EOF, and closed its end.

	However, this incurs small performance penalty (we run for a longer time)
	and requires safeguards (nonblocking reads, timeouts etc) against
	malicious peers which don't close the connection.

	Solutions #1 and #2 can be combined:

	/* ...set up struct linger... then: */
	setsockopt(sock, SOL_SOCKET, SO_LINGER, &linger, sizeof(linger));
	shutdown(sock, SHUT_WR);
	/* At this point, kernel sent FIN packet, not RST, to the peer, */
	/* even if there is buffered read data from the peer. */
	close(sock);

	Defeating Nagle.

	Method #1: manually control whether partial sends are allowed:

	This prevents partially filled packets being sent:

	int state = 1;
	setsockopt(fd, IPPROTO_TCP, TCP_CORK, &state, sizeof(state));

	and this forces last, partially filled packet (if any) to be sent:

	int state = 0;
	setsockopt(fd, IPPROTO_TCP, TCP_CORK, &state, sizeof(state));

	Method #2: make any write to immediately send data, even if it's partial:

	int state = 1;
	setsockopt(fd, IPPROTO_TCP, TCP_NODELAY, &state, sizeof(state));