Signals API

Detect what transcriptscan't capture

Send a video, get structured JSON — 12 social signals with explanations, engagement level, and conversation quality metrics. One endpoint.

{
"type": "Frustration",
"start": 134.0,
"end": 139.1,
"probability": "high",
"rationale": "The speaker's voice becomes sharper and louder as he speaks, and his tone is firm and accusatory. He uses direct, confrontational language such as 'You have no idea what you've done,' indicating annoyance. His facial expression appears tense and his brows are furrowed."
}
Response structure

One call, three layers of intelligence

Every response contains all three layers: social signals, an engagement level, and actionable conversation quality scores — from a single endpoint.

Social Signals

What notable moments happened?

Agrmt
Intrst
Hesit.
Confid
Layer 2Engagement Level

How attentive was the person?

Engagement status
Layer 3Conversation Quality

How good was this interaction?

CQI
Clarity
Energy
Rapport
Social Signals

Every signal your AI needs to read the room

Each signal includes a probability level and a human-readable rationale explaining what triggered it.

Agreement

Alignment with another person's position, intent, or understanding

Confidence

How firmly and assuredly someone communicates their position or decision

Confusion

A breakdown or gap in understanding during an interaction

Disagreement

Active divergence from another person's viewpoint or proposal

Disengagement

Reduction in attention, involvement, or investment in the interaction

Sustained focus and active participation in the interaction

Frustration

Mounting tension or irritation when progress toward a goal feels blocked

Hesitation

Uncertainty or delay before committing to a response or action

Interest

Attention or curiosity toward something unexpected or stimulating

Skepticism

Questioning or doubtful stance toward a claim, proposal, or explanation

Stress

Heightened tension or unease during an interaction

Uncertainty

A temporary disruption in speaking or responding to a question

Conversation Quality Index

Interaction quality, scored automatically

The API computes a Conversation Quality Index (CQI) from the detected signals — a single 0–100 score, broken down into 5 behavioural dimensions. Updated throughout the conversation.

How it works

The API returns a single 0–100 score summarising interaction quality, computed from the detected signals. Provided as a snapshot and a rolling timeline.

Score bands
80–100Excellent
65–80Good
50–65Moderate
30–50Below avg
0–30Weak
Quality timeline
CQI
Clarity
Authority
Energy
Rapport
Learning
4060801000s20s40s

5 behavioural dimensions

Each dimension measures the balance between supportive and undermining behaviours detected in the conversation. Higher is always better.

Clarity & Structure

How easy the speaker is to follow, reflecting the organization, concision, and coherence of their ideas.

Authority & Credibility

How confident, decisive, and credible the speaker comes across in their delivery.

Energy & Presence

The speaker's level of vitality and active engagement throughout the interaction.

Rapport & Safety

The warmth and emotional quality of the interaction, reflecting how acknowledged and at ease the other person feels.

Learning & Exploration

How openly the speaker reflects, tests ideas, and adapts, capturing their curiosity and growth orientation.

The model

Powered by Inter-1

An omni-modal model purpose-built for understanding human social signals. It processes video, audio, and text together — in temporal alignment.

Purpose-built

Inter-1 is trained specifically for social signal detection — it leverages a proprietary dataset combining behavioural science anchored ontology and expert-led labeling.

Outperforms frontier

Evaluated against a wide variety of models including both commercial and open-source frontier omni and vision LLMs. Inter-1 was found to be best in terms of accuracy and speed.

Vetted by experts

In blind A/B evaluation, behavioural science experts chose Inter-1's rationale over competitor output 83% of the time (76% on evidential grounding, 91% on clarity).

Not sentiment analysis

Sentiment gives you positive/negative. We give you 12 specific signals with timestamps and rationale.

Not DIY multimodal

Stitching together face, voice, and body language models is months of work. This is one endpoint.

Not a black box

Every signal includes the observable cues that triggered it — which modalities, which behaviours. Auditable and actionable.

Capabilities

What you get from every API call

12 social signals

Agreement, confusion, frustration, hesitation, and 8 more — each with a probability level and a rationale explaining the observable cues that triggered it.

Multimodal analysis

Video, audio, and text processed together in temporal alignment. Gaze direction, vocal prosody, posture, gestures, and speech — not just transcripts.

Engagement tracking

Continuous attention monitoring with three levels — engaged, neutral, or disengaged — indicating how oriented a person is to the interaction at every moment.

Conversation Quality Index

A 0–100 score derived from social signals across five behavioural dimensions — Clarity, Authority, Energy, Rapport, and Learning — provided as both a snapshot and a timeline.

Explainable by default

Every signal includes its rationale, with the modalities and behaviours involved. Auditable, actionable, and ready to forward to your LLM.

REST

Upload a recording or connect live video. Token-based auth. One response shape for both. Any format ffmpeg supports.

Where developers are using this

Sales Coaching
AI Tutoring
User Research
Meeting Copilots
AI Interviews
Communication Training
Virtual Assistants
Healthcare Support