OpenTok is Vonage’s (formerly TokBox’s) PaaS (Platform as a Service) that enables developers to easily build custom video experiences within any mobile, web, or desktop application, on top of a WebRTC stack.

One of the customer projects that I am working on at Igalia requires publishing and subscribing to streams to and from OpenTok sessions. The main application of this project needs to run on a Linux box and Vonage already provides a nice OpenTok C++ SDK for Linux. However, the entire application for this customer project is written in Rust so, together with my colleague Philippe Normand, we decided to write Rust bindings for the OpenTok C++ SDK.

opentok-rs contains the result of this work. There you can find the FFI bindings, mostly generated with bindgen, and the safe wrapper API.

We recently published a first version in crates.io.

There is really not much documentation yet, apart from the rustdoc published here, that is mostly a copy & paste of the C++ documentation. But there are a few examples that demonstrate how easy and fast you can write your own custom video experiences.

Basic video chat application

With opentok-rs you can write a very basic video chat application like this one in only a few dozen lines of code.

If you are not familiar with the basic concepts of OpenTok, I recommend reading the official documentation at Vonage’s developer site.

In a nutshell, all OpenTok activity occurs within a session, which is somewhat like a “room” where clients interact with one another in real-time. Each participant in a session can publish streams to the session or subscribe to other participants’ streams.

To connect to OpenTok sessions you need its identifier and a token. For testing purposes, you can obtain a session ID and a token from the project page in your Vonage Video API account. However, in a production application, you will need to dynamically obtain the session ID and token from a web service that uses one of the Vonage Video API server SDKs.

For a basic chat application you need to create a Publisher instance, to publish your video stream, and a Subscriber instance, likely in a different thread, to subscribe to the rest of the streams in the session. Each entity may connect to the session separately.

Publisher

The OpenTok SDK is heavily based on callbacks. Starting with the session, you need to provide a SessionCallbacks instance to the Session constructor. For the sake of simplicity, we only care about the on_connected and on_error callbacks in this case.

You also need to provide the session credentials. This is the Vonage API key, the session ID and its token.

let session_callbacks = SessionCallbacks::builder()
    .on_connected(move |session| {
        // At this point, we can start publishing
        session.publish(&*publisher.lock().unwrap())
    })
    .on_error(|_, error, _| {
        eprintln!("on_error {:?}", error);
    })
    .build();
let session = Session::new(
    &credentials.api_key,
    &credentials.session_id,
    session_callbacks,
)?;
session.connect(&credentials.token)?;

The Publisher constructor gets a PublisherCallbacks instance and optionally a VideoCapturer instance. If you do not provide a custom video capturer, the default one capturing audio and video from your local mic and webcam will be used.

let publisher_callbacks = PublisherCallbacks::builder()
    .on_stream_created(move |_, stream| {
        println!("Publishing stream with ID {}", stream.id());
    })
    .on_error(|_, error, _| {
        eprintln!("on_error {:?}", error);
    })
    .build();
let publisher = Arc::new(Mutex::new(Publisher::new(
    "publisher" /* Publisher name */,
    None, /* Use WebRTC's video capturer */,
    publisher_callbacks,
)));

The basic video chat example demonstrates how to add a custom video capturer. In this case, it uses a GStreamer videotestsrc element to produce test video data. You can use whatever mechanism to produce video that you prefer though.

Subscriber

The subscriber part is somewhat similar. It needs to connect to the session, providing the credentials and the session callbacks. In this case, the callback that we care about the most is the on_stream_received callback. Within this callback, you can set the stream on your Subscriber instance and instruct the session to use it.

let session_callbacks = SessionCallbacks::builder()
    .on_stream_received(move |session, stream| {
        if subscriber.set_stream(stream).is_ok() {
            if let Err(e) = session.subscribe(&subscriber) {
                eprintln!("Could not subscribe to session {:?}", e);
            }
        }
    })
    .on_error(|_, error, _| {
        eprintln!("on_error {:?}", error);
    })
    .build();

The Subscriber gets the video frames through repeated calls to the on_render_frame callback.

let subscriber_callbacks = SubscriberCallbacks::builder()
    .on_render_frame(move |_, frame| {
        let width = frame.get_width().unwrap() as u32;
        let height = frame.get_height().unwrap() as u32;

        let get_plane_size = |format, width: u32, height: u32| match format {
            FramePlane::Y => width * height,
            FramePlane::U | FramePlane::V => {
                let pw = (width + 1) >> 1;
                let ph = (height + 1) >> 1;
                pw * ph
            }
            _ => unimplemented!(),
        };

        let offset = [
            0,
            get_plane_size(FramePlane::Y, width, height) as usize,
            get_plane_size(FramePlane::Y, width, height) as usize
                + get_plane_size(FramePlane::U, width, height) as usize,
        ];

        let stride = [
            frame.get_plane_stride(FramePlane::Y).unwrap(),
            frame.get_plane_stride(FramePlane::U).unwrap(),
            frame.get_plane_stride(FramePlane::V).unwrap(),
        ];
        renderer_
            .lock()
            .unwrap()
            .as_ref()
            .unwrap()
            .push_video_buffer(
                frame.get_buffer().unwrap(),
                frame.get_format().unwrap(),
                width,
                height,
                &offset,
                &stride,
            );
    })
    .on_error(|_, error, _| {
        eprintln!("on_error {:?}", error);
    })
    .build();

The snippet above uses a video renderer based on the GStreamer autovideosink element. But just like with the custom video capturer, you can use whatever you like to render your video frames.

Audio

The OpenTok SDK handles audio and video in different ways. While video streams are independently tied to each publisher and each subscriber in a session, audio is tied to a global audio device that is shared by all publishers and subscribers.

This design imposes two hard limitations:

  • There is no way to obtain the independent audio stream from each participant. OpenTok provides a single audio stream which is a mix of every participant’s audio, so there is no way to do things like speech-to-text, moderation or any kind of audio processing per participant, unless you create a somewhat complex workaround where you run each audio subscriber in its own dedicated process.

  • It is not possible to run two instances of the OpenTok SDK in the same process. A second instance of the OpenTok SDK overwrites the audio callbacks set from the previous instance.

Vonage claimed to be working on improving this design.

There is more

Everything in opentok-rs is meant to run on client applications, but as mentioned before, Vonage also provides server side OpenTok SDKs.

opentok-server-rs wraps a minimal subset of the OpenTok REST API. It lets developers to securely create sessions and generate tokens for their OpenTok applications.

I started it only to be able to write automatic tests for opentok-rs, so the functionality is limited and will hopefully be extended soon.

Acknowledgements