Artificial Intelligence “AllyAssist”

Designing and building an AI to help users navigate their sleep concerns though coaching and conversation


The ultimate objective of this experiment was to validate that users want and will use a chatbot for something as personal as sleep, that users will find it useful, and would invest in coaching. User opinion is one of the most important parts of this feedback. We also want to hear what extra features might be useful for them in an application like this, so we can set direction for the future. Getting the user validation was the most important part of this project build.


•  Determine user preference for conversing with an artificial intelligence.
•  Will users talk/text openly with a chatbot about their sleep issues in order to receive tips
•  Can we create a meaningful and satisfactory conversation to improve sleep.
•  User preference about the way insights are presented and coaching (future features)
•  Exercise the application under controlled test conditions with representative users. Data will be used to assess whether user goals regarding an effective, efficient, and well-received user interface have been achieved.
•  Establish baseline user performance and user-satisfaction levels of the user interface for future usability evaluations.

Multiple personalities were developed to have this conversation with our users, and formal testing was used to validate how good the A.I. is.

For a small sample we managed to gain a considerable amount of insights both in the digital preferences of our target user groups, but also how we should approach them to make a meaningful AI.

For 3 months worth of work we were able to quickly build and test an experiment that will help set up a digital strategy for Philips Sleep for years to come. We were also able to prove that with a small amount of people with dedication and vision we are able to gain speed and create and test fast.

All of our 20 users managed to complete the conversation. As with new and experimental applications and concepts we only had 3 users with similar critical errors, ranging from a frozen aplication to recommendations not loading. There is a theme of positivity throughout the application with the sentiment being predominantly positive for first impression, how good & natural the conversation was, how understood they felt.

Our average app score rating was 6.7 out of 10 with most of the users scoring it a 7 or 8.


Role: Creative Lead
Timeframe: Alpha – 3 months
Beta – 6 months
Team: 6
Budget: ~ 1M EUR

The Messy Work