OK, so Amazon is the largest online shopping platform in the world - and even though their launch in Australia recently was a bit underwhelming given the restrictions placed on it by legislation - it’s still growing fast. We wondered though, could Amazon technology enhance the offline retail shopping experience for consumers in Australia?

Image courtesy of Amazon

We were given the opportunity to find out recently via the development of a proof of concept application. How could we use Amazon Web Services technology to improve the ability of customers to navigate a shopping centre and make their offline shopping experience more enjoyable. We were to come up with an in-shopping centre kiosk where a customer could converse with a virtual concierge that would provide map route information to help them find specific stores or specific items, like donuts. We would enhance this experience by providing the kiosk with a QR code to be scanned by a mobile device, allowing it to receive the map information, effectively revamping your mobile phone into a donut finding machine.

Fascinated by the recent launch of Amazon’s AR/VR offering we dug into Sumerian to see if it could deliver our vision. Within the day we had a scene created where Christine, our Sumerian Host, was integrated with the existing AWS Lex & Polly services to assume the role of a hotel concierge, just like the online tutorial had prophesied. Our host could receive voice commands processed by Lex and reply with hard-coded voice responses processed by Polly. We even threw in some provided 3D assets to give our host a roof over her head and desk to stand behind (we couldn’t figure out how to get the host to sit). Still, this looked very promising.

Creating the solution

Q.1 - So, how do we get Sumerian to interact with a customer?

To create the conversational interface for potential shoppers to interact with, we built upon our hotel booking application - which was basically a glorified hello world app - with a more targeted chatbot experience geared toward customers in a shopping centre. Using Lex ‘slots’ we were able to store key pieces of information throughout a customer’s conversation with the Sumerian host for further processing.

Q.2 - How do we process the information coming in from Lex and derive the map routing logic?

To answer this second question, we set up a Lambda function that would receive the slot information from Lex. From within the lambda function we were able to make use of an external API to translate the information, derive our map routing logic, and return the information to Lex.

The challenge we faced next was trying to find a way to get the slot and routing logic back from the lex chatbot, because in Sumerian this feature isn’t yet supported. We had to find a workaround. After looking through various AWS docs and online resources, unsure of the best way forward, I turned to my colleagues. After discussing the problem with developers of varying backgrounds and skill sets, it was the IoT devs that put me on the right track. Using the lambda function that was already able to receive slot information from Lex, we were to add some functionality to allow the Lambda to post to an IoT device shadow within AWS. This would provide us with a way to store information that could be read from the Sumerian application. We then allowed Sumerian to poll the IoT device shadow state at set intervals to receive slot information as well as the map routing logic and some basic speech patterns. We would later add other pieces of stateful data that would allow Sumerian to respond to different events and even external applications.

Finally we added an HTML element to Sumerian which enabled us to display a Map with routing information which was generated by the chatbot/Lambda combo. This meant that we were able to take the input from Lex, derive the map information and have this information displayed by Sumerian who would speak using Polly in a “kiosk” type environment. What was displayed on the screen at any given time depended on the state of the IoT device shadow, which triggered various state machines within Sumerian.

Q.3 - How did we get the map onto the customers phone to make it useful?

The last and possibly most important piece of the puzzle was to figure out how we could take the map generated using the mapping source API and Sumerian, and put it in the hands of the customers.

To do this we built a simple React JS web application designed for mobile and set it up to display the map. We then allowed the application to accept query parameters that would be used to pass the routing information from Sumerian and display the route on the mobile map. Back in Sumerian, we generated and displayed a QR code in the HTML view, which the customers could then scan with their phone, opening up a map in their browser with a path routed for them.

Given that we had already done the work of setting up an IoT device shadow within AWS, we took our React application a step further and enabled it to update the device shadow state so that the Sumerian application was aware that the map had successfully loaded on a mobile device and our Sumerian host could wave and say “goodbye”.

What we learned about Sumerian

Using Sumerian as a state machine:

What we found through our process was that Sumerian works really well as a collection of state machines that could call APIs, converse with a customer, and toggle the display of HTML views and QR codes. However we found it difficult to house the more complicated code that determined what our states should be due to the lack of source control. This was amplified by the inability to wire up Sumerian with Continuous Integration and Continuous Delivery methods that could have streamlined the development process and added further benefits like automated code quality checks including automated testing. From what we saw, basic testing of the behaviour within a Sumerian scene was only possible when performed manually.

Initially we had all our code in Sumerian tied into the different state machines. However this became problematic when more than one developer was tweaking the project. Firstly, only one developer can open the Sumerian scene at any given time. That wasn’t a deal breaker as we could always pair program or coordinate our development efforts. But the real struggle was with version control and code deployment. Sumerian only offers the ability to create labelled snapshots of the scene as a whole (3D objects, code and state machines), which was somewhat helpful, but not nearly as effective as more contemporary source control offerings like git. Moving our code out of Sumerian and into source controlled mini libraries that Sumerian could reference proved to be the best way forward, however we still had to manually integrate those libraries within the Sumerian scene whenever changes were made.

These issues are offset by the features delivered by Sumerian and the value added by the technology is certainly worth exploring. The potential for the virtual concierge to evolve into a smarter solution using Artificial Intelligence or Machine Learning to further improve the shopping experience is now possible given the new types of data that can be collected during a customer’s conversation with a Sumerian host. In fact, we were able to very quickly add basic facial detection and sentiment analysis using AWS Rekognition, enabling us to tailor the experience for individual customers without the clumsiness of traditional login interactions.

We’re learning that Amazon could indeed disrupt its own online retail shopping model, using an offline model supported by its own technology… theoretically anyway. Sumerian, combined with existing AWS technologies, provided us with a low barrier of entry for a web based virtual shopping center concierge and enabled us to build a working prototype in a short period of time. If you’re interested in exploring Sumerian yourself head on over to the website and check out the documentation and handy tutorials that have already grown in number since we started playing with one of the latest offerings from AWS.