Signals API

Detect what transcripts
can't capture

Send a video, get structured JSON - 12 social signals with explanations, engagement level, and conversation quality metrics. One endpoint.

Get API key Read docs

{
  "type": "uncertainty",
  "start": 30.0,
  "end": 40.0,
  "probability": "high",
  "rationale": "The speaker looks away, rubs their neck, and shifts their gaze while speaking. They also use phrases like “I don’t know if” and “maybe”. Together, these verbal and nonverbal cues suggest uncertainty and self-doubt."
}

Response structure

One call, three layers of intelligence

Every response contains all three layers: social signals, an engagement level, and actionable conversation quality scores - from a single endpoint.

Layer 1Social Signals

What notable moments happened?

Agrmt

Intrst

Hesit.

Confid

Layer 2Engagement Level

How attentive was the person?

Engagement status

Layer 3Conversation Quality

How good was this interaction?

CQI

Clarity

Energy

Rapport

Layer 1Social Signals

What notable moments happened?

Agrmt

Intrst

Hesit.

Confid

Layer 2Engagement Level

How attentive was the person?

Engagement status

Layer 3Conversation Quality

How good was this interaction?

CQI

Clarity

Energy

Rapport

Social Signals

Every signal your AI needs to read the room

Each signal includes a probability level and a human-readable rationale explaining what triggered it.

Agreement

Alignment with another person's position, intent, or understanding

Confidence

How firmly and assuredly someone communicates their position or decision

Confusion

A breakdown or gap in understanding during an interaction

Disagreement

Active divergence from another person's viewpoint or proposal

Disengagement

Reduction in attention, involvement, or investment in the interaction

Engagement

Sustained focus and active participation in the interaction

Frustration

Mounting tension or irritation when progress toward a goal feels blocked

Hesitation

Uncertainty or delay before committing to a response or action

Interest

Attention or curiosity toward something unexpected or stimulating

Skepticism

Questioning or doubtful stance toward a claim, proposal, or explanation

Stress

Heightened tension or unease during an interaction

Uncertainty

A temporary disruption in speaking or responding to a question

Agreement

What it detects

Alignment with another person's position, intent, or understanding

API response

{
  "type": "Agreement",
  "start": 1.2,
  "end": 8.7,
  "probability": "high",
  "rationale": "The speaker explicitly states, "Yeah, that was great," providing clear verbal confirmation of his stance. He also nods his head while speaking and maintains steady eye contact with the camera, demonstrating active participation and alignment with the topic."
}

Use cases

Detecting buy-in during sales calls, confirming understanding in training, measuring alignment in negotiations

Engagement level

Engagement, tracked continuously across the interaction

Unlike other signals such as hesitation or agreement, engagement is estimated at every moment and shown as a continuous timeline of engaged, neutral, or disengaged states.

Engagement status

0:000:52

Engaged

Neutral

Disengaged

API response

{
  "engagement_level": [
    { "start": 0, "end": 12.3, "state": "engaged" },
    { "start": 12.3, "end": 18.5, "state": "neutral" },
    { "start": 18.5, "end": 35.2, "state": "engaged" },
    { "start": 35.2, "end": 38.8, "state": "disengaged" },
    { "start": 38.8, "end": 52.6, "state": "engaged" }
  ]
}

Conversation Quality Index

Interaction quality, scored automatically

The API computes a Conversation Quality Index (CQI) from the detected signals - a single 0-100 score, broken down into 5 behavioural dimensions. Updated throughout the conversation.

How it works

The API returns a single 0-100 score summarising interaction quality, computed from the detected signals. Provided as a snapshot and a rolling timeline.

Score bands

80–100Excellent

65–80Good

50–65Moderate

30–50Below avg

0–30Weak

Quality timeline

CQI

Clarity

Authority

Energy

Rapport

Learning

5 behavioural dimensions

Each dimension measures the balance between supportive and undermining behaviours detected in the conversation. Higher is always better.

Clarity & Structure

How easy the speaker is to follow, reflecting the organization, concision, and coherence of their ideas.

Authority & Credibility

How confident, decisive, and credible the speaker comes across in their delivery.

Energy & Presence

The speaker's level of vitality and active engagement throughout the interaction.

Rapport & Safety

The warmth and emotional quality of the interaction, reflecting how acknowledged and at ease the other person feels.

Learning & Exploration

How openly the speaker reflects, tests ideas, and adapts, capturing their curiosity and growth orientation.

The model

Powered by Inter-1

An omni-modal model purpose-built for understanding human social signals. It processes video, audio, and text together - in temporal alignment.

Purpose-built

Inter-1 is trained specifically for social signal detection - it leverages a proprietary dataset combining behavioural science anchored ontology and expert-led labeling.

Outperforms frontier

Evaluated against a wide variety of models including both commercial and open-source frontier omni and vision LLMs. Inter-1 was found to be best in terms of accuracy and speed.

Vetted by experts

In blind A/B evaluation, behavioural science experts chose Inter-1's rationale over competitor output 83% of the time (76% on evidential grounding, 91% on clarity).

Not sentiment analysis

Sentiment gives you positive/negative. We give you 12 specific signals with timestamps and rationale.

Not DIY multimodal

Don't rely on any fragile patchwork of disparate models. Inter-1 natively processes all modalities to deliver insights via one unified endpoint.

Not a black box

Every signal includes the observable cues that triggered it - which modalities, which behaviours. Auditable and actionable.

Capabilities

What you get from every API call

12 social signals

Agreement, confusion, frustration, hesitation, and 8 more - each with a probability level and a rationale explaining the observable cues that triggered it.

Multimodal analysis

Video, audio, and text processed together in temporal alignment. Gaze direction, vocal prosody, posture, gestures, and speech - not just transcripts.

Engagement tracking

Continuous attention monitoring with three levels - engaged, neutral, or disengaged - indicating how oriented a person is to the interaction at every moment.

Conversation Quality Index

A 0-100 score derived from social signals across five behavioural dimensions - Clarity, Authority, Energy, Rapport, and Learning - provided as both a snapshot and a timeline.

Explainable by default

Every signal includes its rationale, with the modalities and behaviours involved. Auditable, actionable, and ready to forward to your LLM.

REST

Upload a recording or connect live video. Pass your API key directly or exchange for a short-lived JWT. One response shape for both. Any format ffmpeg supports.

Where developers are using this

Sales Coaching

AI Tutoring

User Research

Meeting Copilots

AI Interviews

Communication Training

Virtual Assistants

Healthcare Support

Detect what transcriptscan't capture

One call, three layers of intelligence

What notable moments happened?

How attentive was the person?

How good was this interaction?

Every signal your AI needs to read the room

Agreement

Confidence

Confusion

Disagreement

Disengagement

Engagement

Frustration

Hesitation

Interest

Skepticism

Stress

Uncertainty

Agreement

Engagement, tracked continuously across the interaction

Interaction quality, scored automatically

5 behavioural dimensions

Powered by Inter-1

What you get from every API call

12 social signals

Multimodal analysis

Engagement tracking

Conversation Quality Index

Explainable by default

REST

Where developers are using this

Developer FAQ

What's the difference between signals and engagement?

What is the Conversation Quality Index?

What does "probability" mean on a signal?

What is the "rationale" field?

How is this different from sentiment analysis?

Do I need ML expertise?

Is there a free tier?

Detect what transcripts
can't capture