Documentation
Complete reference guide for Virtual Video Director — from initial setup through advanced broadcast integration. VVD automates multi-camera switching by detecting who’s speaking and triggering your production infrastructure in real time.
Getting Started
VVD operates on a simple principle: one mic = one camera. Each VVD Channel represents one speaker or audio source. When VVD detects that a speaker’s microphone is active, it fires the triggers assigned to that channel — switching cameras, recalling PTZ presets, or sending commands to your production infrastructure.
Core Concepts
- Channels: Each channel corresponds to one audio source (microphone). You can have 4 to 128 channels depending on your licence tier.
- Trigger Slots: Each channel has 8 trigger slots that fire simultaneously when that channel’s speaker is detected. This allows you to switch a video source, recall a PTZ preset, and send a tally command all at once.
- Special Channels:
- Channel 129 — Overview: Fires when multiple speakers are talking at the same time (e.g., a wide shot).
- Channel 130 — Silence: Fires when no one is speaking (e.g., a fallback or beauty shot).
- Channel 131 — Fixed Timer: Fires at a set interval regardless of audio activity (e.g., periodic establishing shots).
Basic Workflow
- Configure audio interfaces — Add one or more audio sources (WASAPI, ASIO, NDI, etc.) in the Audio Setup panel.
- Map channels — Assign physical audio inputs to VVD channels so each microphone feeds its own channel.
- Set up triggers — For each channel, configure what should happen when that speaker is detected (switch a camera input, recall a PTZ preset, send an HTTP command, etc.).
- Adjust detection settings — Set attack, decay, gain, and gate for each channel to match your environment.
- Test — Use the built-in meters and trigger indicators to verify everything is working correctly.
- Go live — Lock the control panel (padlock icon) to prevent accidental changes, then let VVD handle the switching.
Audio Setup
VVD supports 8 audio interface types and you can mix multiple types simultaneously in a single project.
Interface Types
- WASAPI — Standard Windows audio devices. No extra drivers required. Best for simple setups and getting started quickly.
- ASIO — Low-latency professional audio drivers. Recommended for studio and broadcast environments where minimal delay matters.
- NDI — NewTek/Vizrt network audio over IP. Receive audio from any NDI source on the network. Available from Standard tier.
- OMT (Open Media Transport) — Open-source NDI alternative by vMix for network audio. Available from Standard tier.
- Wheatstone Blade — Direct integration with Wheatstone broadcast consoles via ACI protocol. Professional tier and above.
- ClearOne Converge Pro 2 — Conference room DSP integration for meeting and corporate AV environments. Professional tier and above.
- vMix Direct Connect — Audio levels received directly via the vMix TCP API without needing a separate audio feed. Professional tier and above.
- Televic Conference — Conference microphone on/off status integration for parliamentary and council setups. Professional tier and above.
Channel Mapping
After adding audio interfaces, assign physical inputs to VVD channels. Each channel number corresponds to a speaker position. Channel mapping is done in the Audio Setup panel where you can drag or assign inputs to channels.
Per-Channel Audio Controls
- Gain: Adjustable from 0% to 400%. Use this to balance input levels between channels without touching the physical mixer.
- Gate Threshold: Sets the minimum level below which audio is ignored. Helps suppress background noise, air conditioning hum, and other low-level interference.
- 120 Hz High-Pass Filter: Removes low-frequency rumble, handling noise, and HVAC interference. Enable per channel as needed.
Speech Detection
VVD uses two proprietary detection engines. You can select the mode per channel by right-clicking the channel’s meter display.
Mills Smart Sensor (Recommended)
An AI-powered neural network voice activity detector that runs entirely on-device. It uses the Silero VAD v5 model in ONNX format to distinguish human speech from background noise, music, and other non-speech audio. No cloud connection required — all processing happens locally.
- Processes audio at 16 kHz in 32 ms chunks
- Maintains per-channel state tracking for consistent detection
- Supports DirectML GPU acceleration (NVIDIA, AMD, Intel) with automatic CPU fallback
- Scales to 40+ channels via multi-threaded worker architecture (6 channels per ONNX session)
Mills Level Sensor (Fallback)
A volume-based detection algorithm that analyses relative differences between all audio signals to determine who is speaking. No calibration required — it adapts automatically to any voice, microphone, and room.
The Mills Level Sensor is required for meter-based audio sources (Wheatstone Blade, ClearOne, vMix Direct) where raw audio samples are not available.
Detection Parameters
- Attack (0–1, default 0.5): Controls how quickly detection responds when speech begins. Lower values make detection more responsive; higher values require more sustained speech before triggering.
- Decay (0–1, default 0.3): Controls how long detection holds after speech stops. Lower values release quickly (snappy cuts); higher values hold longer (smoother, more natural switching).
Right-click a channel’s meter to switch between Mills Smart Sensor and Mills Level Sensor for that channel.
Triggers
Each channel has 8 trigger slots. All slots fire simultaneously when the channel’s speaker is detected, with optional per-slot delays from 0 to 10,000 ms. This allows you to sequence actions — for example, recall a PTZ preset 500 ms before switching the video source to give the camera time to move.
Video Switchers
- vMix — Fade, Cut, Preview, Merge, Overlay, and Mix outputs via the vMix TCP API.
- Blackmagic ATEM — Program, Preview, Cut, Auto, AUX, and multi-M/E support.
- OBS Studio — Scene and source switching via WebSocket v5.
- TriCaster — Fade, Cut, Macro execution, and PTZ preset recall.
- mimoLive — Layer activation and control.
- Roland Pro AV — V600UHD, VR400UHD, V160HD, and other Roland switchers.
PTZ Cameras
- Panasonic AW Series — HTTP-based preset recall and camera control.
- PTZOptics — HTTP preset recall for PTZOptics cameras.
- NDI PTZ — PTZ control embedded in the NDI protocol.
- VISCA / Sony — Serial and IP-based PTZ control (VISCA over IP).
- Canon XC — Canon remote camera preset recall.
Network Protocols
- HTTP / Webhook — Send GET or POST requests to any URL.
- Custom HTTP — Full control over method, headers, and body.
- TCP — Send raw text or binary data over persistent TCP connections.
- UDP — Fire-and-forget datagrams for low-latency control.
- OSC (Open Sound Control) — Standard show-control and lighting desk messaging.
- MIDI — Note On/Off and Control Change messages.
- Art-Net (DMX512) — Lighting and stage automation control.
Scripting & AI
- JavaScript — Custom scripts executed via the built-in Jint engine.
- PowerShell — Windows automation and system-level scripting.
- Super Triggers (AI) — Describe your automation in plain English and VVD’s AI generates compiled C# code that runs as a native trigger. Professional tier and above.
Broadcast Infrastructure
- Wheatstone Blade ACI — Salvos, utility mixers, SLIO GPIO, and audio ducking.
- Wheatstone Mixer — Channel control, fader automation, and TAKEPROG commands.
- Pathfinder Core Pro — Telos Alliance routing and logic control.
- Televic Camera Control — Direct conference camera integration.
- Lawo VSM (Ember+) — Broadcast control system integration via Ember+ protocol.
Shared Instances
Many trigger types use shared instances — a single network connection that serves all channels. For example, one vMix connection is shared across all channels rather than opening a new connection per channel. This reduces overhead and ensures consistent behaviour.
Control Panel
The control panel provides global switching parameters that affect how VVD decides when and how to switch between speakers.
Minimum Duration
The lockout period after each switch. During this time, VVD will not switch to a new speaker even if one is detected. This prevents rapid, jarring cuts.
- 1–2 seconds: Fast-paced debates and panel discussions where quick cuts feel natural.
- 2–4 seconds: Normal conversation, interviews, and general use.
- 4–6 seconds: Presentations, lectures, and slower-paced events where stability is preferred.
Overview Mode
Automatically switches to a wide shot (Channel 129) when two or more speakers are talking simultaneously. Configurable parameters include:
- Threshold: How many simultaneous speakers trigger the overview shot.
- Duration: How long the overview shot holds before returning to single-speaker switching.
Silence Detection
Switches to a fallback camera (Channel 130) when no one is speaking. Useful for beauty shots, wide angles, or graphics during pauses.
- Threshold: How long silence must persist before triggering (avoids false triggers during natural speech pauses).
- On Power Off: Option to fire silence triggers when VVD is powered off or in standby.
Realism Simulator
Adds reaction shots by occasionally cutting to recent speakers who are not currently talking. This creates a more natural, broadcast-style feel by showing listener reactions.
- Window: How far back in time to look for recent speakers to use as reaction shot candidates.
- Frequency: How often reaction shots are inserted into the switching sequence.
Fixed Timer
Fires Channel 131’s triggers at a set interval, regardless of audio activity. Useful for periodic establishing shots, cutaway graphics, or timed sponsor bumpers. Intervals can be set up to 1 hour.
Lock (Padlock)
Locks the control panel to prevent accidental changes during live events. When locked, all sliders, buttons, and configuration options are disabled. Click the padlock icon again to unlock.
Shows and Scenes
Shows
A Show is a complete VVD configuration saved as a .vvd file. It contains all channel mappings, trigger configurations, control panel settings, and audio setup. You can create multiple show files for different events or venues and load them as needed.
Scenes
A Scene is a snapshot within a show that captures a specific configuration state. Use scenes for different segments of the same event — for example, a “Panel Discussion” scene with 4 active channels and an “Interview” scene with 2 channels and different PTZ presets.
Settings Manager
The Settings Manager provides an interface for organising, duplicating, and switching between scenes. You can also rename channels, reorder scenes, and manage your show file from a single panel.
Import VVD 5 Settings
If you are upgrading from VVD 5, you can import your existing settings files. VVD 6 will convert the configuration to the new format, preserving your channel mappings and trigger setups.
API Reference
VVD exposes three network APIs for external control and automation. All APIs are available whenever VVD is running.
HTTP API (Port 8088)
RESTful API with JSON request and response format. CORS is enabled for browser-based integrations. Two API versions are available:
- V1 (Legacy): Original API endpoints maintained for backward compatibility.
- V2 (Full): Complete API with access to all VVD features including channel state, trigger control, mute/unmute, scene management, and configuration queries.
TCP API (Port 8089)
Persistent connection-based API using plain text commands. Ideal for applications that need to maintain a continuous link to VVD for real-time bidirectional communication. Supports event subscriptions for state change notifications.
UDP API (Port 8090)
Fire-and-forget, low-latency command interface. Best for time-critical trigger firing where connection establishment overhead is unacceptable. No acknowledgement is returned.
Common Operations
- Trigger a channel: Force-fire all triggers for a specific channel.
- Get channel status: Query which channel is currently active, audio levels, and detection state.
- Mute/Unmute: Temporarily disable or re-enable specific channels.
- Scene switching: Load a different scene via API command.
- Power on/off: Start or stop automatic switching.
Broadcast Integration
VVD integrates with broadcast control systems and GPIO hardware for professional studio and OB van environments.
Control Hardware
- Skaarhoj ETH-GPI Link — Raw Panel TCP protocol with 8 HWC (Hardware Component) inputs for physical button-based channel selection.
- ControlByWeb — Support for X-400, X-440, X-500, and X-600M web-enabled I/O modules.
- Lawo VSM / Ember+ — Full Ember+ provider protocol for integration into Lawo VSM broadcast control systems.
Outbound Connections
Outbound connections map VVD channel states to physical outputs for tally lights, on-air indicators, and external equipment control. When a channel becomes active, VVD sets the corresponding output; when it becomes inactive, the output is cleared.
Inbound Triggers
Inbound triggers accept external button presses and GPIO inputs to manually fire VVD channels. This allows operators to override automatic switching from a physical control surface.
Broadcast States
VVD supports four broadcast states that control switching behaviour:
- On Air — Normal automatic switching is active.
- Off Air — Switching is paused; triggers do not fire.
- Music Break — Custom behaviour during music segments (e.g., fixed camera, slower switching).
- Ad Break — Custom behaviour during commercial breaks.
Color Convention
VVD supports two colour conventions for tally and status indicators. Choose the convention that matches your industry:
Broadcast (TV / Radio)
RED = On Air, GREEN = Standby. This is the universal broadcast standard used in television studios and radio stations worldwide. A red tally light means the camera or microphone is live.
Live Events (Corporate AV)
GREEN = Active, RED = Off. Common in corporate AV, conference, and live event environments where green intuitively means “go” and red means “stop.”
License Tiers
VVD is available in four licence tiers. All tiers include the core switching engine, Mills Level Sensor, and Mills Smart Sensor.
| Feature | Lite | Standard | Professional | Enterprise |
|---|---|---|---|---|
| Channels | 4 | 8 | 64 | 128 |
| Triggers per Channel | 1 | 4 | 8 | 8 |
| Audio Sources | WASAPI, ASIO | + NDI, OMT | + Wheatstone, ClearOne, vMix Direct, Televic | All sources |
| Super Triggers (AI) | — | — | ✓ | ✓ |
System Requirements
- Operating System: Windows 10 or Windows 11 (64-bit)
- Runtime: .NET 8.0 Runtime
- CPU: Multi-core recommended; 4+ cores for 40+ channels
- GPU: Optional — NVIDIA, AMD, or Intel GPU for DirectML acceleration of AI voice detection
- RAM: 4 GB or more recommended
- Audio: At least one supported audio source (WASAPI device, ASIO driver, NDI source, or compatible console/DSP)
Need More Help?
Visit our support portal to submit a ticket, book a one-on-one setup session, or explore community discussions.