Synchronous Ethernet Switch – IEEE 1588 PTP Protocol
The Stagebox system is optimized for synchronous Ethernet using IEEE 1588 precision timing protocol (PTP). This distributes a common clock to all notes on the network. The system supports Internet Group Management Protocol or IGMP. This gives the system the ability to multicast the video streams or to send a video stream to multiple destinations.
This is particularly useful in the distribution of confidence video to multiple cameras. It is also useful for distributing video for live broadcast, editing, and recording simultaneously from the same multicast video stream.
This protocol is also used for AVB synchronous low latency audio and video streaming services. Precision time protocol provides a distributed clock across the entire AVB domain with accuracy up to 1 µs.
Ideally, Stagebox endpoints should be connected on a common synchronous Ethernet network utilizing synchronous Ethernet switches. This is not possible in many cases. If cameras require synchronization and genlock over long distance via Telco and telecom circuits, a synchronous Ethernet connection is typically not available. In this case, synchronous Ethernet switches can be used with external GPS timing reference. Each group of cameras can operate on an island utilizing synchronous Ethernet switches. The cameras in the two locations remain synchronous with each other via a common reference from the GPS precision timing information.
Switches with layer 2 IEEE 1588 version 2 are recommended as shown in figure 10. This maintains the integrity of camera sync over Ethernet networks. There are always changes in the Ethernet propagation times during transmission. The use of IEEE 1588 is a very comprehensive solution to precisly time the synchronization of an Ethernet network.
Figure 10, ARG Stagebox Switch with layer 2 IEEE 1588 version 2
In a Stagebox enabled production environment, high-quality 100Mbps AVCI-100 program content is made instantaneously available across a local or wide area network. AVCI-100 is compatible with most nonlinear editing systems on the market. This minimizes or eliminates ingest times during live productions. This provides the seamless integration of production into the post-production workflow in a format that allows direct editing or storage without the loss of quality through transcoding. Add-on software utilities are available for synchronous multichannel live ingest.
The Stagebox utilizes “I-frame” or intra-frame only compression which drastically limits system latency, emulating a lower bit rate version of JPEG 2000. A color space of 4:2:2 is utilized with sampling of 10 bits. The nominal bit rate is 100 Mb per second.
The Stagebox supports SD SDI and HD SDI. It will perform automatic format recognition on the input. It will extract the video, all 16 embedded digital audio channels and the embedded timecode from the incoming HD SDI signal. The video will then be encoded with AVCI-100 and transmitted onto the IP network. AVCI-100 was chosen after discussions with program makers because it achieves better quality for a given bitrate than alternatives such as JPEG 2000 and it imports easily into existing editing systems. AVCI-100 is a subset of the H.264 video coding standard and is fully compliant with the standard. It uses 10-bit, 4:2:2, intra-frame only compression. This means that each frame stands alone compared to other frames, which makes the video very easy to edit. It is also very fast to encode and decode. It is very important to minimize latency in multi-camera setups or when using bi-directional links, such as during a two-way interview. Although the network link latency will dominate, it remains preferable to keep the encode/decode latency small.
In AVCI-100, each frame is specified to be a fixed size which corresponds to a bit rate of about 100 Mbit/sec to preserve maximum video quality. If the coded length of the frame is less than the required length of the AVCI-100 intra frame, padding is added to the frame to bring it up to the required length. The Stagebox does not transmit the padding bytes over the network, but these are reinserted at the receiving end before the frames are ingested into an editing program such as Avid Media Composer or Final Cut Pro.
AVCI-100 specifies that the H.264 coding must be the High 4:2:2 Intra Profile, Level 4.1 and that it must use CAVLC entropy coding. (The other form of entropy coding permitted by the H.264 standard is CABAC which will produce better results but at the expense of a significant increase in hardware complexity). Although the default bit rate for the video encoding is 100Mbit/sec, you can set an arbitrary bit rate for the video (with a resolution of 1 bit/sec). In difficult network conditions, a small reduction in video bit rate can make a big difference to the performance of the network. At a receiving Stagebox, the video will be decoded and output on the SDI/HD SDI output. The output will be fully compliant with the relevant SMPTE standard: For example, SMPTE standard 274  when using the 1920×1080 HD mode. Up to 16 channels of audio can be embedded into the HD SDI output according to the standard SMPTE 299  , together with the timecode (SMPTE 12-2)  for the video frame and a SMPTE 352  VPID packet which gives information about the video format, such as the frame rate. This ensures seamless end-to-end transmission of the video, audio and timecode.
Stagebox can also superimpose text and a Stagebox logo onto the output video. The ability to display text is helpful because it can display arbitrary text, which can be very useful for identifying a video source. The source name can be programmed into a remote Stagebox. The local Stagebox can then interrogate the remote Stagebox to find out the name of the source and then display the name on the video output. If you have multiple video sources, this greatly simplifies identifying each one. The Stagebox can also superimpose the timecode on each output frame, which is useful for editing or creating a first draft of an edit decision list.
Stagebox will carry the 16 embedded audio channels from the HD SDI input together with two additional analog channels. The two analog channels can carry any standard balanced line level audio. They can be used for carrying production talk back. The audio is sampled at 48 kHz with 24-bit resolution and is all transmitted uncompressed over the network. If all 18 audio channels were active, then the bandwidth to carry the audio would be about 21 Mbit/sec (on top of the 100Mbit/sec to carry the video). The number of active audio channels carried by default is therefore limited to 3 stereo pairs.
The Stagebox uses open Internet standards for the transport of video, audio, data, timecode, genlock and camera control signals. The network connection uses an industry-standard SFP interface. SFP’s can be utilized with a copper or fiber optic 1 Gb per second ethernet network connection.
Stagebox utilizes the IEEE 802.3 specification for wired ethernet networks . The system does not use jumbo frames to maximize the compatibility with existing equipment. The MAC addresses needed for creation of an internet frame are acquired using the address resolution protocol (ARP) or by direct calculation in the case of multicast addresses.
The second standard that stagebox utilizes is the Internet protocol (IP). This specification is outlined in RFC 791 . Stagebox currently uses IPv4, with future provisions for IPv6. In the IPv4 layer, a source and destination IP address is specified. This can either be a unicast IP address establishing a point-to-point connection or a multicast IP address which will establish multiple connections to PCs or other Stageboxes enabling the reception of the transmitted signal from a single Stagebox. The typical Multicast IPv4 addresses range of 220.127.116.11 to 18.104.22.168 is utilized although some are reserved for other network functions. In Stagebox, source specific multicast (SSM) is used. This means that a receiving device must, in addition to subscribing to the multicast group, check the source address of the data it is receiving. It is important, if you intend to use multicast, that the network is properly configured. In some switches, multicast traffic is simply sent as broadcast traffic and it is likely that this will have an adverse effect on the performance of the network and any connected devices. Stagebox uses IGMPv3 (Internet Group Management Protocol version 3) to join and leave the desired multicast groups. It is specified in RFC 3376 .
In the layer above this, the audio and video data is sent using UDP and the control data is sent using TCP. TCP, which is specified by RFC 793 , provides a robust transmission environment, which is used to transmit the important control data to and from a Stagebox. The video and audio data is transmitted using UDP. This is often referred to as “fire-and-forget”, as any packets that are lost during transmission will be lost forever. UDP is specified in RFC 768 . A receiving Stagebox will do its best to conceal the effects of any lost packets.
In the layer above this, the video and audio data is carried using real time protocol (RTP). The main RFC for RTP is 3550 . However, there is a specific RFC for carrying H.264 video over RTP which is 6184 . There is also a specific RFC for carrying 24-bit linear sampled audio which is number 3190 . It is essential that these RFC’s are precisely followed as receiving equipment will have unexpected problems decoding it if there are any errors. The RTP header includes a timestamp which indicates the time at which the first sample of the video frame should be displayed and the first sample of the audio should be played. These timestamps use different (but locked) clocks. The video timestamps are locked to a 90 kHz clock and the audio timestamps are locked to a 48 kHz clock. Information about how to relate these two timestamps is carried in the RTCP (real time control packets) which are specified in RFC 3550 . On the first RTP packet of each frame, Stagebox also includes some optional RTP headers to assist in receiver synchronization. These include the SMPTE timecode value of the current frame and the PTP timestamp (Precision Timing Protocol – described later) of the current frame. RTP packets are carried on even numbered UDP ports and the associated RTCP packets are carried on the next higher odd port. For example, if the video RTP packets are carried on port 5004, then the associated RTCP packets will be carried on port 5005.
Finally, at the end of each video frame, a checksum packet is sent which includes a checksum for all the RTP video and audio data that was sent during the previous frame. This enables the receiver to determine whether all the information it received for the previous frame was complete and correct. It is expected that a future version of Stagebox will carry FEC data to allow missing packets to be recovered.
The camera control packets, which will carry RS232/RS422/RS485 or LANC data are carried in separate UDP packets which include a sequence number so that the receiver can tell if any have gone missing. It is possible to carry RS232 and LANC data simultaneously.
Administrative and configuration access to the stagebox will require port 8080 open on any firewall in the network path. If port 8080 is not open, some command-and-control functions of the stagebox will not pass through the firewall.
Genlock and Precision Timing Protocol
The Stagebox has the ability to generate synchronous timing protocol via one of the HD SDI inputs. The Stagebox system then distributes precision timing protocol reference from that master HD SDI input.
In a typical configuration, two Stagebox systems are interconnected. Stagebox 1 generates a precision timing protocol sync clock referenced from SDI #1 input. Stagebox 2 is designated as a slave to Stagebox 1. Stagebox 2 utilizes the PTP to synchronize the output of the SDI 1 video output.
A potential issue with this configuration is that the SDI 2 input on Stagebox 2 is asynchronous. Stagebox 1 cannot sync the SDI 2 output. Please see figure 11 below.
Figure 11, PTP Sync with two Stagebox’s End-to-End
The Solution using PTP
The solution is to use the genlock output of Stagebox 2 to sync the SDI 2 input. This assumes that your video source has a genlock input such as your typical broadcast camera. Video sources that do not have a genlock input will require an external frame synchronizer as the Stagebox does not include a frame sync.
In this scenario the SDI 1 output can be looped into the SDI 2 input on the Stagebox 2 side since they are both in sync. Looping the signal through a switch or other equipment will introduce delay which will cause the synchronization to be lost. Please see figure 12 below.
Figure 12, Genlock using PTP Sync with two Stagebox’s End-to-End
The Solution Using Source Synchronous Mode
Another option is to disable the PTP mode. In this scenario multicast is not supported. So, a point-to-point topology is required. This is being called “source synchronous mode” to avoid confusion between PTP and P2P. In this configuration each direction the transmission is synchronized using source input SDI 1 in as shown in the figure 13 below. Note that source synchronous does not generate a phase accurate genlock. It is frequency locked but the timing will require alignment.
Figure 13, Solution using “Source Synchronous” Mode