Toyota Financial Services required a website for the Norwegian market which would improve conversions of car purchases within their region.

A design of the site already existed, however, this had not been designed for a particular cultural audience, this non-adapted version is referred to as version B. Guidance was provided to the design team on how to design a new version of the website which would provide a better user experience by drawing on best practice guidance from scholars including Geert Hofstede and Aaron Marcus. This new culturally adapted version is referred to as version A.

As two versions of the website were to be used a multivariate testing methodology was adopted, specifically in the form of an A/B test, with the primary research question being “Which version of the ‘purchase plan review’ page do Norwegians prefer. Version A or B?”. The hypothesis is that version A will be preferred, based on cultural factors [1], [2], [3].

Rather than rely on think-aloud protocol in isolation eye-tracking was also utilised in a supplementary fashion, primarily because: (a) this allowed retrospective think-aloud protocol to be used. (b) Real-time gaze tracking could also be used to work more effectively with concurrent think-aloud protocol if there were a decision to switch approaches. (c) Heat maps and gaze plots could be analysed with descriptive statistics.

Why would we prefer to use the retrospective think-aloud protocol in the first instance? Why not just use the concurrent think-aloud protocol and dispense with eye-tracking? Utilising the retrospective think-aloud protocol, with eye tracking, in place of the concurrent think-aloud protocol results in a reduced cognitive load on an end user when they are testing a website, as well as a more natural interaction approach as, most users do not think-aloud when they are using a website outside of a research setting.

One key piece of user interface design impacted by culture was related to whether or not information on a certain page should be initially visible or whether it should be shown in summary form with the option to expand it through a method such as progressive disclosure. The two key dimensions within Hofstede’s cultural dimensions which had relevance to Norwegian users in this regard were: Low power distance, which Marcus equates to less structured information (i.e. not collapsed), and high power distance which Marcus equates to more highly structured information. Note how similar Norway and Germany are in this regard. China is included as a culture that is the opposite, in China a summary view with progressive disclosure would likely be more popular.

Low long term orientation equates to content focussed on truth and certainty (i.e. displaying information up-front in an expanded form, not collapsed in an ambiguous form).

The user trials were conducted on site in Norway under controlled conditions. Although the sessions only lasted for approximately 60 minutes, ergonomic standards were observed with regard to the participants’ use of the facilities (ISO 9241-5:1998; ISO 9241-6:1999; ISO 11226:2000). The testing was conducted using a laptop PC with an Intel i7 processor running Microsoft Windows 7 with a built-in LCD panel (17 inch, resolution 1400×600, 60Hz refresh rate, 32bit true colour). The input devices used were a standard UK keyboard and mouse, and the computer was connected to the Internet through a wireless local area network connection using Internet Explorer (version 10.0) and Chrome (64.0.3282.167). An adjacent observation screen, was used to monitor the user trials. A live feed from the participant’s computer was captured using an external video capture card (Elgato HD 60S), and recorded using both Tobii Studio (on the main usability testing machine as well as being both streamed and recorded to YouTube via a second, high spec laptop which consumed and broadcasted the feed from the video capture card. The spec of this laptop included the latest i7 Kaby lake processor, 16GB RAM, and an Nvidia GTX 1050 graphics card.

Participants were customers looking to buy a new car within the next 6 months or customers that have bought a car in the last 6 months. The key characteristics of these participants are as follows:

  • Able to speak, read, and write in English (CEFR level C1 or C2 i.e. advanced to proficient).
  • 25 to 50 years old.
  • Able to use a desktop computer, tablet, and smartphone proficiently.
  • Experienced online shopper. Spends at least 500-2000NOK (approximately 50-200GBP) online each month.
  • Holds a driving license and drives at least once a week.
  • DOES NOT work in the fields of design, UX, product development, marketing, psychology, or any related field.

From the eye tracking data we are able to generate descriptive statistics by defining areas of interest within the webpages.

The results on the left show the mean fixation count for each participant in the defined areas of interest, the areas of interest are defined as the expanded page on version A and the collapsed version with accordions on version B. The software automatically calculates number of saccadic fixation count within those defined areas. We can also generate different descriptive statistics within the software such as tine unto first fixation.

A lower fixation count is better in this context, it means reduced cognitive load for the user and indicates they have found what they are looking for rather than having to keep scanning the page looking for relevant information.

Analysing a time segment interval of the saccadic activity from the actual page for version A, this is a typical number of fixations for this type of page. The user scanned this page from left to right and down in a typical Z pattern, they then did an upward rescan to ensure they have not missed anything.

On version B this user started in a typical way moving into a z pattern but then started repeatedly rescanning back and forth trying to make sense of the accordions and understand why there was no information presented at the top level, trying to understand whether the accordions could be expanded or not. There was a lot of retracing.

If we look at the mobile version the descriptive statistics again support the hypothesis that version A is objectively the more usable version.

The interesting thing here is that one of the users, the one indicated in purple, was overly verbose and talkative and was analysing the page when they were advised to just use the site without verbalising their actions. Though we try and filter out participants from UX and design fields to avoid this, invariably you will come across someone that will slightly skew your mean averages, particularly when using lower numbers of participants (only 3 participants could be tested for this version). Even so, the data still supports the hypothesis.

Analysing a time segment interval of the saccadic activity from the actual page for version A and B. As can be seen, there are many more fixations for version B.

A second problem, and point of contention, which this research sought to address was related to the header of the webpage. There were multiple design iterations and discussions as to how the header should look and whether or not it would be disregarded, perhaps as an advert, or whether in fact the user would fixate on this element and find some value in seeing their chosen car model. Hypothesis B stated that ‘The header image will be of low importance, and not fixated upon, in Norway’. Again, this hypothesis was based on research on cultural factors by previous scholars [1],[2],[3]. The main driving factor behind this hypothesis was garnered from the ‘masculinity’ cultural dimension. Norway scores just 8 on this dimension. A low score on this dimension equates to a low importance on visual aesthetics and also a low importance on appealing to unifying values through graphics. In comparison to Norway, for example, Germany and China have a high score on this dimension so, in theory, would pay more attention to the header graphic. ‘Masculinity’ is a bit of an outdated term nowadays and a low dimensional score does not infer any negative connotations in regard to gender. Hofstede uses the term ‘masculinity’ to refer to the level of competitiveness in a society.

The eye tracking cumulative heat map (combination of all 10 users saccadic activity) showed, interestingly, that the image was completely gazed over and not fixated upon by any significant number of users.

Viewing as an opacity map, which only shows the areas that a user looked at, confirms this further.

In addition to eye-tracking data and descriptive statistics for quantitative data as well as think-aloud protocol for qualitative data, the System Usability Scale was used to ascertain a score for the website providing more valuable quantitative data. The System Usability Scale, which is often referred to as the SUS, is an inexpensive and proven way of effectively measuring the overall usability of a given service or product. The scale was created by John Brooke in 1986 and has stood up to a battery of tests over the years proving itself a reliable measure. In our testing the improved version of the site scored a B grade, where most sites commonly score below this, further underlining the effectiveness of the culturally adapted version of the site.

References

  1. Hofstede, Geert. Culture’s consequences: Comparing values, behaviors, institutions and organizations across nations. Sage publications, 2003.
  1. Marcus, Aaron, and Valentina Johanna Baumgartner. “A visible language analysis of user-interface design components and culture dimensions.” Visible Language 38.1 (2004): 2.
  1. Cyr, Dianne. “Modeling web site design across cultures: relationships to trust, satisfaction, and e-loyalty.” Journal of Management Information Systems 24.4 (2008): 47-72.