CreateBlogSupport
Log inSign up
Home
Webex Contact Center
  • Overview
  • Guides
  • API REFERENCE
  • AI
  • Configuration
  • Data
  • Desktop
  • Journey
  • Media And Routing
  • Changelog
  • SDK
  • Customer Journey Data Service
  • AI Assistant for Developers
  • Webhooks
  • Contact Center Sandbox
  • Using Webhooks
  • Beta Program
  • Webex Status API
  • Contact Center Service Apps
  • FAQs

Virtual Agents Transcript and Call Summary

Retrieves Virtual Agent transcripts and the Virtual Agent transfer summary when the call is transferred from a virtual agent to a human agent. For more information please refer to this guide. Find the proto definition here.

Services

AiInsight

Service to subscribe to, to get AI Insights

  • SERVER

    rpc StreamingInsightServing(StreamingInsightServingRequest) returns (stream StreamingInsightServingResponse)

    Server side streaming gRPC Call that takes conversation id and agent details as input and returns streaming insights for that conversation

  • UNARY

    rpc InsightServing(InsightsServingRequest) returns (InsightsServingResponse)

    This is a unary gRPC Call that takes conversation id and agent details as input and returns insights for that conversation

Messages

AgentDetails

  • agentId
    string

AgentTransfer

Call Transferred to Human Agent

  • metadata
    google.protobuf.Struct

    Call Transfer Metadata.

CallInsightsResult

Call Insights Object for VA Call Summary

  • content
    string

    Content.

  • callInsightType
    CallInsightType

    Call Insight Type.

Duration

Represents the Duration object denoting seconds and nanos

  • seconds
    int64

    Signed seconds of the span of time. Must be from -315,576,000,000 to +315,576,000,000 inclusive. Note: these bounds are computed from: 60 sec/min * 60 min/hr * 24 hr/day * 365.25 days/year * 10000 years.

  • nanos
    int32

    Signed fractions of a second at nanosecond resolution of the span of time. Durations less than one second are represented with a 0 seconds field and a positive or negative nanos field. For durations of one second or more, a non-zero value for the nanos field must be of the same sign as the seconds field. Must be from -999,999,999 to +999,999,999 inclusive.

EndVirtualAgent

Represents the Virtual Agent End Indication

  • metadata
    google.protobuf.Struct

    Call Transfer Metadata.

ExitEvent

Event received from the Virtual Agent

  • event_type
    ExitEvent.EventType

    Event Type.

  • name
    string

    Optional: To be used for the custom event.

  • metadata
    google.protobuf.Struct

    Optional: Map to pass the custom params.

InsightServingRequest

Represents the request content for retrieving AI Insights for streaming grpc call

  • conversationId
    string

    Required. Conversation ID for which insights are needed. The subscription will start listening to any insights for this conversation across multiple legs( IVR, Caller, Agent) and services (Transcription, Agent Assist).

  • orgId
    string

    Required.Control Hub OrgID for the org, this conversation belongs to. The Access token should have authorization for this Org.

  • historicalTranscripts
    bool

    Is historical transcripts from start of the conversation required? Default: false.

  • historicalVirtualAgent
    bool

    Is virtual agent from start of the conversation required? Default: false.

  • agentDetails
    AgentDetails

    Required. AgentDetails from where the call is initiated.

  • messageId
    string

    Sets the message id for the request , this uniquely identifies each request .

InsightServingResponse

Represents the content of insight serving response which is used streaming grpc call

  • orgId
    string

    Org Identifier (control hub) for which the insights need to be delivered.

  • conversationId
    string

    Identifier for the Conversation. Equivalent to Call ID, CallGUID etc.

  • roleId
    string

    Identifier for the individual leg, based on the party. GUID.

  • utteranceId
    string

    Identifier for a given utterance. The same utterance ID will be published for the transcript utterance and the insights generated from it.

  • role
    InsightServingResponse.Role

    Role specifying IVR, Caller or Agent.

  • insightType
    InsightServingResponse.ServiceType

    Type of insight : ASR, Agent Assist etc.

  • insightProvider
    InsightServingResponse.ServiceProvider

    Service Provider who produced this insight.

  • publishTimestamp
    int64

    Epoch Timestamp when this insight record was created/published. This field is always available, can be used for sorting messages by time.

  • startTimestamp
    int64

    Start time and end time corresponds to the speech interval to which this insight belongs. Epoch Timestamp. These are optional fields, not always available.

  • endTimestamp
    int64
  • isFinal
    bool

    Whether the insight is final or intermediate. Intermediate results will be overridden by the final result that follows them.

  • messageId
    string

    message id.

  • configId
    string
  • languageCode
    string
  • responseContent
    ResponseContent

    Content of the insight. This will vary based on the type of insight.

InsightsServingRequest

Represents the request content for unary grpc call retrieving AI Insights

  • conversationId
    string

    Required. Conversation ID (in combination with the messageId, if provided)for which insights are needed. The subscription will start listening to any insights for this conversation (along with messageId if provided) across multiple legs( IVR, Caller, Agent) and services (Transcription, Agent Assist).

  • messageId
    string

    Optional. if messageId is provided then the insights are fetched with the combination of conversationId. The subscription will start listening to any insights for this messageID along with the conversationId field across multiple legs( IVR, Caller, Agent) and services (Transcription, Agent Assist).

  • orgId
    string

    Required.Control Hub OrgID for the org, this conversation belongs to. The Access token should have authorization for this Org.

  • insightType
    InsightsServingRequest.InsightType

InsightsServingResponse

Represents the response content for unary grpc call retrieving AI Insights

  • conversationId
    string

    Required. Conversation ID (in combination with the messageId, if provided)for which insights are needed. The subscription will start listening to any insights for this conversation (along with messageId if provided) across multiple legs( IVR, Caller, Agent) and services (Transcription, Agent Assist).

  • messageId
    string

    Optional. if messageId is provided then the insights are fetched with the combination of conversationId. The subscription will start listening to any insights for this messageID along with the conversationId field across multiple legs( IVR, Caller, Agent) and services (Transcription, Agent Assist).

  • orgId
    string

    Required.Control Hub OrgID for the org, this conversation belongs to. The Access token should have authorization for this Org.

  • startTimestamp
    int64

    Start time and end time corresponds to the speech interval to which this insight belongs. Epoch Timestamp. These are optional fields, not always available.

  • endTimestamp
    int64
  • configId
    string
  • languageCode
    string
  • insightProvider
    InsightsServingResponse.ServiceProvider

    Service Provider who produced this insight.

  • responseContent
    ResponseContent

    Content of the insight. This will vary based on the type of insight.

Intent

Represents the Intent Detected form user utterance

  • name
    string

    Name of the Intent.

  • display_name
    string

    Display name of the Intent.

  • parameters
    google.protobuf.Struct

    Parameters of an Intent, filled / not filled.

  • match_confidence
    float

    Match Confidence.

NLU

NLU Object generated from User Utterance.

  • reply_text
    string

    Response in text. This will be used for Virtual Agent Transcript.

  • intent
    Intent

    Intent detected from the last utterance.

  • agent_transfer
    AgentTransfer

    Sent when the call is transferred to Agent.

  • end_virtual_agent
    EndVirtualAgent

    Call Ended.

  • input_text
    string

    user input uttered by caller.

  • exit_event
    ExitEvent

    Exit Event to return the control back to the calling flow.

ResponseContent

Represents the response content message

  • rawContent
    string

    Placeholder for any other types. Not returned unless stated.

  • recognitionResult
    StreamingRecognitionResult

    For Service Type = TRANSCRIPTION.

  • virtualAgentResult
    NLU

    For Service Type = VIRTUAL_AGENT.

  • callInsightsResult
    CallInsightsResult

    For Service Type = CALL_INSIGHTS.

SpeechRecognitionAlternative

Represents the Alternative hypotheses (a.k.a. n-best list).

  • transcript
    string

    Output only. Transcript text representing the words that the user spoke.

  • confidence
    float

    Output only. The confidence estimate between 0.0 and 1.0. A higher number indicates an estimated greater likelihood that the recognized words are correct. This field is set only for the top alternative of a non-streaming result or, of a streaming result where is_final=true. Not yet supported.

  • words
    WordInfo

    Output only. A list of word-specific information for each recognized word. Note: When enable_speaker_diarization is true, you will see all the words from the beginning of the audio.

StreamingInsightServingRequest

Represents the request for retrieving insights for a given conversation ID for streaming grpc call

  • insightServingRequest
    InsightServingRequest

StreamingInsightServingResponse

Response returned with Insights. There would be multiple messages in the stream. Each service type may have zero or more messages

  • insightServingResponse
    InsightServingResponse

StreamingRecognitionResult

A streaming speech recognition result corresponding to a portion of the audio that is currently being processed.

  • alternatives
    SpeechRecognitionAlternative

    Output only. May contain one or more recognition hypotheses (up to the maximum specified in max_alternatives). These alternatives are ordered in terms of accuracy, with the top (first) alternative being the most probable, as ranked by the recognizer.

  • is_final
    bool

    Output only. If false, this StreamingRecognitionResult represents an interim result that may change. If true, this is the final time the speech service will return this particular StreamingRecognitionResult, the recognizer will not return any further hypotheses for this portion of the transcript and corresponding audio.

  • result_end_time
    Duration

    Output only. Time offset of the end of this result relative to the beginning of the audio.

  • channel_tag
    int32

    For multi-channel audio, this is the channel number corresponding to the recognized result for the audio from that channel. For audio_channel_count = N, its output values can range from '1' to 'N'.

  • language_code
    string

    Output only. The BCP-47 language tag of the language in this result. This language code was detected to have the most likelihood of being spoken in the audio.

  • has_applied_recording_offsets
    bool

    Whether or not recording offsets have been applied to the word alignment values. Otherwise the word alignment start and end times are only relative within the utterance.

  • speaker_ids
    uint32

    Zero or more integers representing the speaker ID of this result. This is usually derived from the speaker integers that are passed in the streaming request.

  • last_packet_metrics_unix_timestamp_ms
    int64

    The unix time in milliseconds which was received from the client for the StreamingRecognizeRequest that was last used to complete this utterance.

  • message_type
    string

    message type.

  • response_event
    StreamingRecognitionResult.OutputEvent

    Event Based on user utterances.

  • role
    StreamingRecognitionResult.Role

WordInfo

Represents the Word-specific information for recognized words.

  • start_time
    Duration

    Output only. Time offset relative to the beginning of the audio, and corresponding to the start of the spoken word. This field is only set if enable_word_time_offsets=true and only in the top hypothesis. This is an experimental feature and the accuracy of the time offset can vary.

  • end_time
    Duration

    Output only. Time offset relative to the beginning of the audio, and corresponding to the end of the spoken word. This field is only set if enable_word_time_offsets=true and only in the top hypothesis. This is an experimental feature and the accuracy of the time offset can vary.

  • word
    string

    Output only. The word corresponding to this set of information.

Enums

CallInsightType

List of Call Insight Types

  • UNSPECIFIED
    0
  • VA_CALL_SUMMARY
    1

ExitEvent.EventType

  • UNSPECIFIED
    0
  • VA_CALL_END
    1
  • AGENT_TRANSFER
    2
  • CUSTOM
    3

InsightServingResponse.Role

Identifier for the party.

  • IVR
    0
  • CALLER
    1

InsightServingResponse.ServiceProvider

Provider List for Services

  • DEFAULT
    0
  • CISCO
    1
  • GOOGLE
    2
  • NUANCE
    3

InsightServingResponse.ServiceType

Type of service this Insight belongs to

  • DEFAULT_TRANSCRIPTION
    0
  • CALL_INSIGHTS
    5

InsightsServingRequest.InsightType

Type of service this Insight request belongs to

  • DEFAULT_TRANSCRIPTION
    0
  • CALL_INSIGHTS
    5

InsightsServingResponse.ServiceProvider

Provider List for Services

  • DEFAULT
    0
  • CISCO
    1
  • GOOGLE
    2
  • NUANCE
    3

StreamingRecognitionResult.OutputEvent

Return the Event based on the use input. Similar events will be returned for Voice / DTMF based inputs.

  • EVENT_UNSPECIFIED
    0
  • EVENT_START_OF_INPUT
    1

    Triggers when user utter the first utterance in Voice Input mode or First DTMF is pressed in DTMF Input mode. This event to be used to BargeIn the prompt based on prompt barge-in flag. The event will be sent only if the current prompt being played is bargein enabled or prompt playing is complete.

  • EVENT_END_OF_INPUT
    2

    Sent when user utterance Voice / DTMF is complete.

  • EVENT_NO_MATCH
    3

    Sent when utterance did not match any of the accepted input.

  • EVENT_NO_INPUT
    4

    Sent when no audio received with in the expected timeframe.

StreamingRecognitionResult.Role

  • UNDEFINED
    0

    Role - Undefined.

  • CALLER
    1

    Role - Caller.

  • StreamingInsightServing
  • InsightServing
Request
Authorization *

streamingInsightServingRequest

Response
// No response received yet. Click the Run button to see a response.
Request
Authorization *

insightsServingRequest

Response
// No response received yet. Click the Run button to see a response.

Connect

Support

Developer Community

Developer Events

Contact Sales

Handy Links

Webex Ambassadors

Webex App Hub

Resources

Open Source Bot Starter Kits

Download Webex

DevNet Learning Labs

Terms of Service

Privacy Policy

Cookie Policy

Trademarks

© 2025 Cisco and/or its affiliates. All rights reserved.