Skip to content
p0.
The video-call frame is the emerging pattern for voice and video AI agents: present the agent in a familiar call layout, agent as the main character, the user as a small self-view in the corner, with familiar controls. Borrowing the muscle memory of FaceTime and Zoom makes talking to an AI feel natural. An abstract orb avoids the uncanny valley; a realistic avatar adds presence but raises expectations.
A glowing orb avatar on a dark stage with a secondary frame, representing a video-call layout
Agent UX/The video-call frame

Why voice agents borrow the video-call layout

Talking to a machine is still strange. Put it inside the layout everyone knows from FaceTime and Zoom, agent as the main character, you as the small self-view, and the strangeness melts into muscle memory.

Try the live demo

Voice is the most natural interface humans have, and somehow talking to a voice assistant has always felt the least natural thing in tech. Part of the problem is that there is nowhere to look. You speak into a void and a disembodied voice answers. There is no presence, no focal point, no sense of who or what you are talking to.

The fix turned out to be a layout we already had. The video call. For fifteen years we have been training ourselves to talk to a screen with a face in the main frame and a little thumbnail of ourselves in the corner. Drop an AI agent into that exact shell and the interaction inherits all of that comfort for free.

The clever part is that none of it is new. The agent is the main character, the user is the self-view, the controls sit where they always sit. The familiarity is the feature. A brand-new way of interacting arrives wearing clothes the user has worn a thousand times.

Live demo

Bare voice, or the call frame

Switch between a disembodied voice and the video-call frame, swap the orb for an avatar, and tap to make the agent speak. Notice how much more present the framed version feels.

Voice only

Voice with no frame feels like talking to a wall. There is nowhere to look, no sense of presence, and no familiar shell to absorb the novelty of speaking to a machine. An abstract orb sidesteps the uncanny valley and reads as honestly artificial, while still giving the eye a focal point.

What the pattern looks like done well

Six rules for wrapping a voice or video agent in a frame people trust.

1

Borrow a layout the user already knows

The video-call frame is not decoration. It imports years of muscle memory from FaceTime and Zoom, so a brand-new interaction (talking to an AI) arrives inside a familiar shell.

2

Make the agent the main character

The agent takes the main frame; the user sits in a small self-view. This mirrors a call with a person and quietly frames the agent as a presence you are meeting, not a tool you are operating.

3

Keep the self-view, even though it is just you

The little corner tile of yourself is doing real work: it confirms the system can see and hear you, and it completes the call metaphor. Removing it makes the interaction feel one-sided and uncertain.

4

Choose orb or avatar deliberately

An abstract orb is honest about being artificial and dodges the uncanny valley. A realistic avatar adds presence but raises the bar: it must be responsive and lifelike, or it unsettles. Pick for the context, not the novelty.

5

Show presence and state, not just audio

Speaking animations, a connection indicator, and a listening state give the agent visible aliveness. Silence with no visual signal reads as broken, even when the audio is fine.

6

Keep the controls where calls keep them

Mute and end-call belong at the bottom centre, where every video app has trained users to find them. Familiar control placement lowers the cognitive cost of a strange new medium.

Frequently asked questions

Take it with you

Don’t just read this. Put it to work.

The whole series is distilled into one Markdown file: every pattern, the do and don’t rules, and how well each is evidenced. Download it into your project, or paste the link into any chat with your agent and tell it to improve your agent UX. It’s free, no sign-up, no attribution required.

Paste this into your agent

Use these Agent UX principles to review and improve our agent's interface: https://p0stman.com/agent-ux/agent-ux-principles.md

Part of the Agent UX series

We have already shipped this one

The Zee video agent on p0stman.com runs the exact pattern in this article: a real-time voice agent in a familiar call frame, agent as the main character, user self-view in the corner. If you want a voice or video agent that people are actually comfortable talking to, that is the job we do.

AGENT INTERFACE ACTIVE · MCP: p0stman.com/api/mcp · 5 TOOLS REGISTERED · [DISCOVERY] llms.txt · agents.md · context.md · sitemap.xml · robots.txt · TavilyBot ALLOWED · ClaudeBot ALLOWED · GPTBot ALLOWED · PerplexityBot ALLOWED · [COMPREHENSION] JSON-LD schema · /api/ai/context · /api/ai/services · /api/ai/portfolio · [ACTION] book_discovery_call · submit_inquiry · get_services · get_portfolio · search_content · [A2A] AgentCard: /.well-known/agent.json · Task endpoint: /api/agent · A2A JSON-RPC 2.0 · navigator.modelContext REGISTERED · WebMCP: 5 TOOLS · INDEXNOW: 145 URLs · Bing NOTIFIED · [MANAGED AGENTS] Lead Researcher · AgentReady Auditor · SEO Writer · Weekly Reporter · Claude Sonnet 4.6 · Cloud containers · Outcome-based grading · Multi-agent orchestration · AGENT INTERFACE ACTIVE · MCP: p0stman.com/api/mcp · 5 TOOLS REGISTERED · [DISCOVERY] llms.txt · agents.md · context.md · sitemap.xml · robots.txt · TavilyBot ALLOWED · ClaudeBot ALLOWED · GPTBot ALLOWED · PerplexityBot ALLOWED · [COMPREHENSION] JSON-LD schema · /api/ai/context · /api/ai/services · /api/ai/portfolio · [ACTION] book_discovery_call · submit_inquiry · get_services · get_portfolio · search_content · [A2A] AgentCard: /.well-known/agent.json · Task endpoint: /api/agent · A2A JSON-RPC 2.0 · navigator.modelContext REGISTERED · WebMCP: 5 TOOLS · INDEXNOW: 145 URLs · Bing NOTIFIED · [MANAGED AGENTS] Lead Researcher · AgentReady Auditor · SEO Writer · Weekly Reporter · Claude Sonnet 4.6 · Cloud containers · Outcome-based grading · Multi-agent orchestration ·