 |
Issues related to Peer Presence
Peer Connection Lease
Cmmon complaint: Suppose that when peer "B" gets disconnected from RDV1 the the listeners of rendezvous events in RDV1 would receive a RendezvousEvent.CLIENTDISCONNECT or similar event, but it doesn't trigger any event at all !
First a correction, a Peer fail and disconnect even will be generated, but not for many minutes. This is because if a peer is disconnected rather than shutdown, the RDV does not detect a failure until the RDV connection lease expires.
The issue is related to the don't ask, don't tell policy of connections to the RDV.
First, assume that there is no real IP connection. You are connected loosely to the RDV. In other words, the RDV knows about peers because the peer signals the RDV rather than physically connecting. Thus because of a lack of a hard connection, you have no hard failure events. Remember that peers connect via endpoints that may be tcp, HTTP, or even relayed.
Now, given the lack of hard disconnects, what is happening? Simply the RDV is looking for activity. The JXTA engineers can probably define what activities refresh the peer timeout. The upshot is that if the peer's plug is pulled, the RDV will not report a fail or disconnect for several minutes or until another peer tries to connect to a downed peer and the RDV realizes that the peer is off the network. If you disconnect from a RDV cleanly, you should be able to see a proper disconnect event.
The reason for this type of behavior is to limit the number of messages. If you imagine millions of machines, the problem becomes that growth causes you to have more and more messages. If we have millions of keep-alive messages, that wastes a lot of bandwidth. If we have most computers passing through Relays, the problem is doubled.
The next realization is that the RDV is only part of the equation for knowing that a peer is online. It does do quite well at the detection of a peer connect. It also scales reasonably well. Disconnect is another story. If you cannot disconnect cleanly, you will have to wait 15/20 minutes to get the peer fail message. Thus the final realization is that you either need to live with this, reduce the timeout (lease), or add a keep-alive. Adding a keep alive is dangerous to your bandwidth, but will scale well if it is done with relatively small peer groups if you did it with a propagation pipe and kept it to a few minutes.
-- DanielBrookshier - 07 Oct 2005
|