top of page
for cover image 2.jpg

Emotional AI.

Towards More Emotionally Intelligent Voice Interaction Design

As a part of Xiaomi's AI department, I mainly worked on enhancements to the emotional intelligence of the voice-user interface system across multimodal devices (i.e. smart displays, TVs, smartphones). By responding with positive reinforcement and empathy statements, the XiaoAI's child voice assistant provides interventions to support the children users' real-time emotional needs.

UX, user research, data analysis, product vision & strategy, product management, conversation design; 

Launched XiaoAI for Kids features by collaborating with the research, engineering, hardware and marketing teams.

Role |

Duration | 2020


XiaoAI voice assistant is among the top 3 voice assistants in China. It is widely integrated into the Xiaomi's large IoT and smart home ecosystem. We observed that children account for a significant proportion of our user base. However, children as well as their parents reported lower score for their overall experiences based on the user survey. 

-6% NPS (Net Promoter Score, Q2 2020) = Promoters% - Detractors%


  • Find out why family (parent & child) users were not very happy with our voice assistants. 

  • Enhance the experience for children with our multimodal intelligence systems


To improve the children users' experiences with the XiaoAI voice assistant, we started with two initiatives:

  • Launch of the XiaoAI's first child assistant voice, Puff (Chinese: 泡芙)

  • Enhancement of the emotional intelligence of conversations by providing positive reinforcement and empathy statements




total users

(by the end of Oct, 2020)

Top 1

the most popular voice option after the default one

Q4 2020

features launched



video views of "Call Me Dad" scenario on Tiktok China


XiaoAI for Kids features highlighted in Xiaomi’s 10th Anniversary Conference



Product Strategy



  • Usage data

  • NPS survey

  • User interviews

  • Workshops

  • Competitive analysis

  •  Research on emotional AI for children

  • Voice patterns and queries analysis

  • Resource domains' tag analysis

  • Children behavior analysis

  • Product vision

  • Scalable solutions

  • Roadmap

  • Requirement documents  

  • Emotional AI

  • Intelligent conversation design

  • Implementation, integration & testing with engineering & NLP teams


Understanding the Current Issues

User Groups & Research Methods:

  • Current customers: Net Promoter Score (NPS) survey, customer feedback, and data analysis

  • Potential markets and users: competitor analysis, workshops, and offline interviews

User Research.png

Why do we want to improve the kids' experiences? 

  • Children account for a significant proportion of our user base, and their retention rate is higher than the overall user retention rate. 29.7%-40.6% of total current users have children, and 44.3%-57.7% of new users (who consumed the content with the children recourse tags in the past 21 days) have children, showing an increasing market for the families with children. The child user retention rate is around 7%-10% higher than the overall user retention rate.

Why improve children experiences.png
  • Poor user (children & parents) experiences indicate that there is more room for improvements.​

Why improve children experiences.png

Three Directions for Improvements

After conducting a variety of user research and analyzing the usage data,  I identified three key aspects that can help improve children/parents' experiences with our VUI systems.

01 Child Protection

Parents believe the most basic function should be Child Protection. They hope to provide a safer environment or stimulation that can improve children's mental health.

02 Resource Development

Parents make a strong appeal for the Kids' features for study and entertainment. They hope the products can help their children improve concentration while providing high-quality and rich resources.

03 Emotional Intelligence

Parents hope that the XiaoAI assistant can become a reliable partner that can accompany and guide their children.


I focused on improving the emotional intelligence of the products for three main reasons:

01 Marketing Value

Strengthen the emotional brand image of the product.

02 Product Differentiation

There is no similar feature in the competitor products in China. XiaoAI will be the first one to take this step.

03 Improve UX

Satisfy parent users' needs, increase parent users' satisfaction.

Target Users.

Pain Points


  • feel a little alienated with the adult voices

  • eager to express emotions and make friends with our voice assistants


  • wonder if the voice assistants can really be a good friend or a role model for children

  • worry about the unhealthy content for young kids

User Goals


  • feel closer and get along with voice assistants

  • eager to express emotions and make friends with our voice assistants


  • hope to foster good habits and positive behaviors in their kids 

  • hope the assistant limits itself to child-friendly information and activities

Product Strategy.

To help children build stronger relationship with our assistants and to create positive impacts on children's behaviors, I focused on two things: 

01 Child Voice

  • Provide child voice option for children to get closer to our voice assistants 

02 Behavior Guide &

Emotional Support

  • Provide feedback with positive reinforcement.

  • Detect cues from voice data to recognize the children's negative emotions, provide interventions to support their real-time emotional needs.

Designing Child Voice.

Surveys on Child Voices

How to find out the best voice for children?

  • Select & Compare: Selected 9 voices among 19 options based on the tone, speed and rhythm.

  • Narrow down to 6 options: Our UX and product team conducted blind listening test and selected 6 options from 9 voices.

  • Questionnaire: Conducted two surveys - one for the group of parents, and one for the groups of 3-6 and 7-12 yrs old children. (50+ total samples)

  • Data Analysis: Analyze the traits of voices that are widely accepted by users, find out which voice users like most and why they like it.

  • Conclusion: Considered the opinions from both children and parents, universality and unacceptable traits of voices to select the best voice.

Key Findings

  • Key traits of best voice: clear, crispy, cute, sweet, friendly, neutral and slightly girl-like, simple, moderate speed, less childish

  • Different genders and age groups showed slight effect on voice preferences. Therefore, we need to consider the universality of the selected voice.

  • Both parents and children prefered girl-like voice. The common traits that both parents and children prefer - clear, crispy, friendly, cute and less childish  

  • Children liked the voice that sounds like an older sister but perceived the voice assistants as their peers or younger friends. It implied that 3-12 yrs old children are in a stage where self-esteem is developing - they wish they are great, and at the same time they like to have great peers.

Designing Better Child Voice

Only a few of the competitor products provide child voice options and all of their child voice options are completely synthetic and artificial, which don't sound either realistic or attractive.

Therefore, we decided to record the voice from a real child and did AI trainings to clone the voice. It finally sounds very realistic and emotional.

Child Voice - Puff

Designing Emotional Intelligent Conversations.

Conversation Design Process

conversation design process_edited.jpg

Three Strategies to Design Emotional Conversations

By analyzing anonymous high-frequency queries from children users, we observed that children usually tried to express their emotions to the assistants. It provided us a good opportunity to intervene children's behaviors during the conversations:


Reinforce Positive Behaviors


Discourage Negative Behaviors


Provide emotional support for children who are in extreme anxiety or sadness

Designing Conversations

I categorized the children's emotional conversations with our assistants into four types. Based on different user's queries, the assistants respond with positive reinforcement, discouragement, or emotional support and guide. 

Conversation Design_edited.jpg


After completing the design of conversations, I collaborated with the NLG teams to test and configure the conversations on our NLP platform.

  • Technical Platform: via NLG Generation and Configuration Platform [Internal Tool]

  • Implementation Logics: TTS for the queries with commands will be configured by specific domains (i.e. music, station, videos), TTS for the queries with no request will be configured by the Chat domain.


  • Identifying good product direction is crucial. Sometimes it is more important than finding product strategy and solutions.

    • Real-world cases are usually driven by the business values. In tech and internet industry, ​most of products constantly undergo frequent iterations. But each iteration requires lots of resources and efforts from different teams and departments. Therefore, it is critical to decide which way to go and prioritize the goals accordingly.

    • It is impactful if we can leverage the point that can maximize both the commercial success and the power of user-centric design while balancing with our current resource and technical limits.

  •  Informed by research and data, we make better decisions.

    • Now we are provided with efficient enterprise tools to draw valuable conclusions from big datasets. It will be more efficient to have rational and reliable inferences if we analyze data properly.

  • Pushing the boundary - it's difficult but worthwhile.

    • In Xiaomi, each team has its own goals(i.e. OKRs) and limited resources. The projects that require a lot of experiments are carefully evaluated. I had to provide more supported data and findings to answer why we should start building these innovations(rather than taking care of the most basic ux problems first) and motivate other teams to prioritize these requests. And I immensely appreciated a lot that other teams eventually agreed to support this direction even though it is the first-of-kind in our department.​​

Next Steps.

  • Add visual and voiceprint recognitions to accurately identify the specific children's interactive behaviors with our assistants.

  • Integrate with the analysis of the children's content consumption on Xiao AI app in order to provide better conversation feedback for children.

  • Enrich the emotional intelligence-driven conversations by developing multi-turn dialogs to build stronger user-agent connections.

  • Apply these emotional features to more devices (i.e. new product models)

bottom of page