The lightweight soft-rejoin path (jSess.Rejoin) skips Jicofo focus
allocation and joins the MUC with a bare presence. After the client
leaves and Jicofo idle-terminates the now-empty conference
(session-terminate <expired/>), the room/focus is torn down. A bare
rejoin presence is then rejected by Prosody with
<presence type='error'><not-allowed/>, and the library's JoinMUC matches
a stale status-110 left in its stanza buffer and falsely reports success.
The engine then waits forever for a session-initiate that never arrives
while actually being outside the room, so the client can never reconnect.
Re-establish the session from scratch via j.JoinMUC instead, which runs
dial -> focus allocation -> MUC join in the correct order (focus first,
so Jicofo recreates the room), exactly like the initial Connect, but
WITHOUT blocking on session-initiate. The fresh session-initiate is
awaited separately via WaitJingleReinitiate once a peer rejoins, so the
non-blocking reconnect contract is preserved.
Verified on a live deployment: two consecutive reconnect cycles now
complete (bridge open sctp -> reconnected -> session opened) where the
old path hung after "waiting for session-initiate".
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
j.Join() blocks on WaitJingle until Jicofo sends session-initiate,
which only happens when a second participant joins the room. With the
transport-level connect timeout (30-60s), the server crashes if no one
joins in time.
Switch to j.JoinMUC() which returns immediately after joining the MUC.
A background goroutine (waitForJingle) waits for session-initiate and
then opens the bridge + negotiates the PeerConnection. This way Connect
succeeds as soon as the XMPP connection is established, regardless of
whether another participant is present.
When the JVB bridge disconnects but the session is not yet closed
(e.g. during a pending reconnect), rtcpKeepalive spins indefinitely
logging "rtcp keepalive write: io: read/write on closed pipe" every
5 seconds. The process appears alive but is functionally dead —
systemd Restart=always never triggers and the instance becomes
permanently wedged.
Add an error counter that triggers requestReconnect after 3
consecutive WriteRTCP failures, allowing the supervisor to
tear down and re-establish the bridge connection. Reset the
counter on any successful write.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Previously the client latched on the first epoch it saw, which could
be another client in the room. Now it accepts frames from all epochs
until onData delivers the first KCP message (handshake welcome),
then locks to that epoch and ignores others.
Fixes#67
In single-peer mode (client), frames from unknown epochs are now
silently dropped instead of triggering a reconnect loop. This
prevents the client from mistaking another client's VP8 track
for a server restart.
Part of #67
Implement PeerTransport interface (SendTo/SupportsPeerRouting) so the
server can route KCP traffic to individual peers by their epoch.
When OnPeerData is set (server mode), each remote epoch gets its own
KCP runtime instead of triggering a reconnect loop.
Also add DNS retry in protect.NewHTTPClient to handle transient
resolver failures.
Fixes#67