The lightweight soft-rejoin path (jSess.Rejoin) skips Jicofo focus
allocation and joins the MUC with a bare presence. After the client
leaves and Jicofo idle-terminates the now-empty conference
(session-terminate <expired/>), the room/focus is torn down. A bare
rejoin presence is then rejected by Prosody with
<presence type='error'><not-allowed/>, and the library's JoinMUC matches
a stale status-110 left in its stanza buffer and falsely reports success.
The engine then waits forever for a session-initiate that never arrives
while actually being outside the room, so the client can never reconnect.
Re-establish the session from scratch via j.JoinMUC instead, which runs
dial -> focus allocation -> MUC join in the correct order (focus first,
so Jicofo recreates the room), exactly like the initial Connect, but
WITHOUT blocking on session-initiate. The fresh session-initiate is
awaited separately via WaitJingleReinitiate once a peer rejoins, so the
non-blocking reconnect contract is preserved.
Verified on a live deployment: two consecutive reconnect cycles now
complete (bridge open sctp -> reconnected -> session opened) where the
old path hung after "waiting for session-initiate".
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
j.Join() blocks on WaitJingle until Jicofo sends session-initiate,
which only happens when a second participant joins the room. With the
transport-level connect timeout (30-60s), the server crashes if no one
joins in time.
Switch to j.JoinMUC() which returns immediately after joining the MUC.
A background goroutine (waitForJingle) waits for session-initiate and
then opens the bridge + negotiates the PeerConnection. This way Connect
succeeds as soon as the XMPP connection is established, regardless of
whether another participant is present.
When the JVB bridge disconnects but the session is not yet closed
(e.g. during a pending reconnect), rtcpKeepalive spins indefinitely
logging "rtcp keepalive write: io: read/write on closed pipe" every
5 seconds. The process appears alive but is functionally dead —
systemd Restart=always never triggers and the instance becomes
permanently wedged.
Add an error counter that triggers requestReconnect after 3
consecutive WriteRTCP failures, allowing the supervisor to
tear down and re-establish the bridge connection. Reset the
counter on any successful write.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>