Unilever Banner

identify a model that can predict consumer preferences.

A Data Innovation Challenge hosted by UNILEVER based on formulation data and customer survey results.

Go to Challenge     Download Case Study

overall opinion score prediction challenge

On any given day, two billion people use Unilever products to look good, feel good and get more out of life. Through more than 400 brands, 14 of which generate sales in excess of euro 1 billion Euros a year, Unilever supports many long-standing social missions, including Lifebouy’s drive to promote hygiene through hand washing with soap and Dove’s campaign for real beauty.

This challenge requires participants to identify a model that can predict consumer preferences (overall Opinion Score) based on formulation data and consumer survey results. To support the challenge, UNILEVER will provide very detailed and comprehensive, never publicly available before datasets to participants for the DEXTRA’s data innovation community.

why you should join

As one of the largest consumer goods companies in the world, Unilever constantly strives to provide better products. Getting new products to market involves multiple stages, some of which can be both time consuming and resource intensive. An example would be consumer research and the activities that lead to generating a customers’ Overall Opinion Score – an indicator that the product is liked by consumers.


Being able to predict the Overall Opinion Score will enable Unilever to shorten the number of questions as well as shorten the number of iterations the products has to undergo for testing. This will save a lot of resources on time and human capital to bring new products to market.


If you or any of your immediate relatives are employees and/or affiliates of proctor & gamble, you are not elligible to join the unilever overall opinion score prediction challenge, access the website pages and data relevant for that purpose and/or sign this confidentiality agreement. if you have questions regarding this policy, please contact mail@dextra.sg.


Challenge Statement

Given the formulation and consumer survey data, can you predict the overall opinion score of a product? Can you also find out what are the most important attributes and suggest improvements on the survey to shorten the number of questions or survey headcount?

Evaluation Metric

Evaluation metric for this challenge will be Mean Squared Error (MSE).
Note: Accuracy of the data will be evaluated real-time (against evaluation datasets). Participants can resubmit their entries until the closing date.


Period 60 Days
Start Thursday – 11 December 2014
End Sunday – 8 February 2015


1st Prize : SG$ 10,000
4 x 2nd Prize : SG$ 2,000


Related News

Unilever Challenge Presentation Finale
31st March 2015, DEXTRA Blog Post

Unilever Challenge Results
12th February 2015, DEXTRA Blog Post

Unilever Overall Opinion Score Prediction Challenge
4th December 2014, DEXTRA Blog Post

about the host



The Unilever brand, found in consumer products in millions of homes across 150 countries, is a trusted name in nutrition, hygiene and personal care. Throughout its history, Unilever has been adding vitality to the lives of consumers. From comforting soups to warm a winter’s day, to sensuous soaps that make you feel fabulous, our products help people get more out of life.

We’re constantly enhancing our brands to deliver more intense, rewarding product experiences. We invest nearly €1 billion every year in cutting-edge research and development, and have five laboratories around the world that explore new thinking and techniques to help develop our products.

Today, many Unilever products have household awareness status, with the brands becoming part of daily life in Singapore homes.

The challenge of developing brand awareness has been met eagerly by the company, with the evolution of marketing and promotional techniques, responding sensitively to the complex cultural nuances prevalent in Singapore society. Unilever has become renowned for the role it plays in the marketing of consumer goods in Singapore.


Efforts in understanding local consumers have brought about some of the top brands in the market today such as Clear, Dove, Lifebuoy, Rexona, Sunsilk, Ben & Jerry’s, Knorr, Lipton and Wall’s which add vitality to consumers lives.

A leader in research and development, Unilever has kept up with the pace of change by continuously striving to reconnect with consumers, focus on brand portfolio and pioneer new channels through strong roots in local markets and first-hand knowledge of culture.

As a responsible corporate citizen, Unilever actively participates in local community activities, enabling education for underprivileged children, raising environmental awareness and using the special connection with various brands to attract public attention and support for charity initiatives.

The company strives for excellence within its many roles in Singapore today, as an employer, marketeer, development partner and innovator. Unilever will continue to embrace new consumer expectations, adding vitality to life!

data and resources



Formulation data. The percentage weight of each ingredient in a product. There are a total of 122 rows (products) with product codes to be used to merge with consumer survey data.


Consumer survey results (including attribute questions and overall opinion score) for all products. Please be aware that there are about 200 consumer survey tests for each product. There are 25,721 rows (survey results) with product code to be used in merging with Formulation data.


Evaluation data. Total 8,024 rows of consumer survey results with formulation data, but without Overall Opinion Score. You are to predict the score and submit this file.


Sample submission file format. Please take note that you need to submit only ID and Overall.Opinion column.


Documentation for data attributes. Description for each column from provided.


Zipped folder of CSV exports from Formulation.Rdata, Training.Rdata, and Submission.Rdata for those who want to work with tools than R.


Report template for participants.



What is the difference between Public Evaluation Score and Private Evaluation Score?

The accuracy of your submission is decided by your prediction’s performance on the provided test dataset. We have the answers for this dataset, but are witholding them to compare with your predictions.


Public Evaluation Score is what you will receive back upon each submission (that score is calcuated using a statistical evaluation metric, which is Mean Squared Error in this challenge). But your Public Evaluation Score is being determined by only a fraction of the test dataset – usually between 25-33%. The scores shown on the leaderboard are the Public Evaluation Score and it shows relative performance during the challenge.


Private Evaluation Score is created when we compare your prediction is compared with all (100%) of the test dataset. You will never receive this score. Finalists are selected based on Private Leaderboard.

Why do we need to split Public and Private Evaluation Scores?

The separation of the test dataset into public and private portion is to ensure that the most accurate but generalized model is the winning model. If you based your model solely on the data which gave you constant feedback, you run the danger of a model that overfits to the specific noise in that data. One of the hard challenges in data science is to avoid overfitting, by leaving your model flexible to out-of-sample data.

Can I submit more than 1 reports for more than 1 model?

We will select the latest submission for our finalists consideration. However, if you have more than 1 model which you think is suitable for submission, you can submit them separately. Please send an email to us to let us know which models you want to submit. There is no limit to number of models you can submit. However, please take note that you can submit only 5 submissions a day and we will select the finalists based on the quality of your report and prediction performance.

Can I use additional or external data?

You can use additional or external data. However, please provide the source and credit to the data in your submission report.

How can I export the Rdata file to csv?

Since a lot of our paricipants data science community are using R as their primary tool, we decided to release the data in Rdata format. To export the Rdata to csv format, you need to install R and load the data first.


Loading data into memory : load(“path to file\\Submission.Rdata”) (Windows)


Loading data into memory : load(“path to file/Submission.Rdata”) (Mac)


Exporting data to csv : write.csv(Name of Data Object, “Name of CSV.csv”, quote=FALSE, row.names=FALSE)


For more options: please refer to write.csv help page R Help.

What is 'Overall.Opinion' score?

During the last stage of product testing, consumers give this “Overall Opinion Score” regarding to the product. Please be aware that this question is asked as first question in survey. The question, most of the time, is as following:


Taking everything into consideration, which of these phrases describes your overall opinion of the product?


Very poor (1), Poor (2), Neither poor nor fair (3), Fair (4), Good (5), Very good (6), Excellent (7)

Are we allowed to merge with other team before 8 Feb?

Yes, you are allowed to merge with other teams before 8 Feb (deadline). However, please be aware that once you or your team join another team, only the submissions from the newly formed team’s leader will be taken into consideration. Please remember that accuracy (Evaluation Score) is not the only factors in selection criteria. Innovative ideas, ease of implementations, and insights from data are also part of selection criteria.

What is the use of 'Comment' section at submission if we are providing report in PDF?

Yes, we have provided the report template so that all of participants’ reports will be consistent and cover the same area. However, each participant can be creative providing more information at Appendix section of report. We assumed participants will be testing multiple iterations of their algorithms and predictions before the deadline and writing the final report. So, you can write a brief note about your current iteration in “Comment” section for your easy reference. Please be aware that only CSV and PDF format files are accepted.


  • Rakuten-Viki Final Presentation Event

    The Rakuten-Viki Global TV Recommender Challenge has finally come to a successful closure on the 16 September 2015. Six Teams (Team Merlion, Team GM, Team Haipt, Team Pritish,Team Gbenedek & Team Lenguyenthedat) were invited to present publicly in front of a pool of audiences and the judges.

  • Rakuten Viki Challenge Results

    Finalist teams are announced! We would like you to join us for the final presentation event where shortlisted teams will present their algorithms and insights to you.


  • We’re excited to launch the UK Health & Wellness Challenge for Data City | Data Nation and invite the DEXTRA... https://t.co/oqjDygoc67
    over a year ago
  • We’re excited to launch the UK Health & Wellness Challenge for Data City | Data Nation and invite the DEXTRA... https://t.co/MVe53hyWSg
    over a year ago


Stay in touch and sign up to receive email updates


Launchpad@one-north, #04-03/04 79 Ayer Rajah Crescent Singapore, 139955