Rakuten Viki

Build a model to recommend TV drama episodes to viewers.

A Data Challenge hosted by Rakuten Institute of Technology and Rakuten-Viki based on the online TV viewing data

Go to Challenge     Download Case Study

Rakuten-Viki Global TV Recommender Challenge

Viki – a play on words “video” and “wiki” – is a Global TV site powered by fans who have been translating their favourite foreign videos, ranging from Korean and Turkish dramas to Japanese Anime, into over 200 languages. Acquired in 2013 by Rakuten, Viki continues to bring down language barriers to great entertainment, contributing to Rakuten’s borderless digital ecosystem. Are you able to recommend to our viewers videos that would catch their interest?

The Challenge requires participants to predict for each user a set of TV drama episodes that users would watch with interest. The winning models will be accurate, easy to implement, and innovative – scoring high on the Expected Weighted Average Precision (EWAP) metric, being well-documented, and featuring key drivers that are novel and insightful.

details

Challenge Statement

Are you able to build a personalized recommender system for Viki fans worldwide? Based on the user and video attributes and the historical viewing patterns, your task is to predict which 3 TV dramas each user will watch next with highest engagement.

Evaluation Metric

The evaluation metric is:
Expected Weighted Average Precision (EWAP).
Note: Accuracy of the data will be evaluated real-time (against evaluation datasets). Participants are limited to 10 submissions a day until the closing date.

Schedule

Period 39 Days
Start Wednesday 22nd July 2015
End Monday 31st August 2015 23:59

Prizes

Prize pool: SG$ 12,000

partners

about the host

 
Rakuten

Rakuten, Inc.

Rakuten Inc is a Japanese Internet services company with headquarters in Tokyo, Japan, founded in 1997. In addition to its flagship online marketplaces, the Rakuten Group consists of 40+ online businesses, including travel booking, internet finance, e-reading, digital marketing, and professional sports. Rakuten integrates all of its services in Japan through a membership loyalty program, Rakuten Super Points, which is a foundation of Rakuten Ecosystem and the core strategy for firm’s global expansion.

 

Rakuten

Rakuten Institute of Technology

Rakuten Institute of Technology is the strategic R&D arm of Rakuten, Inc. It is the in-house think-tank and accelerator on a mission to improve existing services and to find new solutions to business challenges. Rakuten Institute of Technology supports businesses with predictive analytics, reduces costs via automation, and introduces innovative technologies that contribute towards Rakuten’s broader corporate goal of empowering businesses and consumers.

Rakuten Institute of Technology has established research centres in Tokyo, Paris, and New York. In 2015, The Institute opened its offices in Singapore and Boston.

The Singapore branch of the Rakuten Institute of Technology will lead the research and technology development to deepen Rakuten’s understanding of consumer behaviours in Asia and beyond, enabling innovation in growth markets.

 

Rakuten

Viki, Inc.

Viki Inc is a global TV site with TV shows, movies and other premium content, translated into more than 200 languages by a community of avid fans. With 35 million viewers each month, 26 million mobile app installs and over 800 million words translated, Viki uniquely brings global prime-time entertainment to new audiences and unlocks new markets and revenue opportunities for content owners. Viki was acquired by Rakuten in 2013. Viki has offices in San Francisco, Singapore, Seoul and Tokyo.

 

 

Rakuten

Rakuten Singapore

In January 2014, Rakuten launched Internet shopping mall Rakuten.com.sg in Singapore, through its group company Rakuten Asia Pte. Ltd.

 

data

Data is provided for a sample of users and should not be considered complete. All data is anonymized to protect users’ privacy.

 

Behavior_training.csv

This is the users’ viewing behaviour data on Viki from 1 October 2014 to 31 January 2015, containing more than 4.9 million rows. Each record contains information about a particular user watching a particular video.

User_attributes.csv

This is the user attributes data, specifying gender and country of origin for over 880,000 users.

Video_attributes.csv

This is the video attribute data, including video’s country of origin, language, genre, and more, for over 600 titles.

Video_casts.csv

This is the data of nearly 2,000 actors and actresses featured in Viki TV dramas. The country of origin and gender of the cast members are provided without masking. The names are replaced with actor ID.

SampleSubmission.csv

This is the list of user IDs to be submitted with your recommended movies. The data has more than 1.8 million rows and covers user behaviour from February 2015 to March 2015. We will share details on the submission format soon.

Report_Template.docx

This is Report and Algorithm Summary template. Please complete your report according to the template. Save it as PDF and submit it together with your prediction scores and zipped well-documented codes in the “Upload” section of the submission page in your final submission.

VikiDataChallengeDataSchema.xlsx

This is the data schema for those data in CSV format. It includes the explanations of the features and their values in the datasets, and some basic statistics of selected features. Please read it carefully and not miss any information inside it.

Resources

We handpicked a selection of materials for you to quickly become familiar with the core principles of the recommendation systems.

 

Introduction to recommendation systems:

Introductory lesson on Cousera: Week 16

Introduction to Recommender System on Coursera by University of Minnesota

Recommender system tutorial from the Technical University of Dortmund

Recommendation Engine Introduction

Collaborative filtering implementation in python

 

More advanced materials:

Quora discussions on Netflix movie recommendation algorithm

Slides on recommendation system in KDD 2014 by recommendation system director in Netflix

Real-time movie recommendation

Collaborative Deep Learning for Recommender Systems

Research paper on collaborative filtering recommendation system


FAQ

 

What is the evaluation metric, and why we choose it?

The evaluation metric is Expected Weighted Average Precision (EWAP), derived from a commonly used metric Mean Average Precision (MAP), selected and customized for this challenge by DEXTRA and Rakuten data scientists. The details, including an example of how the metric is calculated, can be found in the section “details” of the challenge statement.

 

Typically 3 types of metrics are used to evaluate an offline recommendation system, i.e., accuracy metric (such as RMSE), decision support metric (such as recall and precision), and rank metric (such as DCG). From business point of view, decision support and rank metrics are usually more important, because they are directly related to increasing sales and improving users’ experience. Therefore, we crafted the new metric incorporating the characteristics of the two metrics.

Do you provide any resources for beginners in recommendation systems?

We strongly encourage everyone to participate in this challenge. We handpicked a selection of materials for you to quickly become familiar with the core principles of the recommendation systems. Please click here to access these materials.

Why the demographic data of users are limited?

Viki collects limited information from registered users. If you have an idea on improving the recommendation system through collecting more demographic data, please share with us in your report.

Can I submit more than one algorithm?

Yes, you can submit more than 1 algorithm by adding a one-line summary on your algorithm in the comments section during each submission. There is no limit to the number of models you can submit. However, please take note that you can submit only 10 models a day and we will evaluate your final performance based on your latest submission.

What if I submit the same algorithm more than once (as I had to make some changes)?

We totally understand that participants will be testing multiple iterations of their algorithms before the deadline and prior to writing the final report. Feel free to make use of the “Comment” section to annotate your submission for your easy reference.

Which techniques should I employ?

You are encouraged to explore different techniques or even a combination of multiple algorithms. It is a dual challenge for your creativity and knowledge. Work hard but have fun! As long as the output is in CSV format, so that our system can evaluate your submissions, there is no limit to which approaches or software or technology you use.

What is the difference between Public Evaluation Score and Private Evaluation Score?

The accuracy of your submission is decided by your recommendation sets’ performance on the provided test dataset. We have the answers for this dataset, but are witholding them to compare with your predictions.

 

Public Evaluation Score is what you will receive back upon each submission (that score is calculated using a statistical evaluation metric, which is EWAP in this challenge). But your Public Evaluation Score is being determined by only a fraction of the test dataset. The scores shown on the leaderboard are the Public Evaluation Score and it shows relative performance during the challenge.

 

Private Evaluation Score is created when your prediction is compared with the rest of the test dataset. In this challenge, public score is calculated from users’ activities in February 2015, about 50% of total test datasets, and the private score is calculated from users’ activities in March 2015. You will never receive this private score until the challenge ends. Finalists are selected based on Private Leaderboard.

Why do we need to split Public and Private Evaluation Scores?

The separation of the test dataset into public and private portion is to ensure that the most accurate but generalized model is the winning model. If you based your model solely on the data which gave you constant feedback, you run the danger of a model that overfits to the specific noise in that data. One of the hard challenges in predictive analytics is to avoid overfitting, by leaving your model flexible to out-of-sample data.


RECENT POSTS

  • Rakuten-Viki Final Presentation Event

    The Rakuten-Viki Global TV Recommender Challenge has finally come to a successful closure on the 16 September 2015. Six Teams (Team Merlion, Team GM, Team Haipt, Team Pritish,Team Gbenedek & Team Lenguyenthedat) were invited to present publicly in front of a pool of audiences and the judges.

  • Rakuten Viki Challenge Results

    Finalist teams are announced! We would like you to join us for the final presentation event where shortlisted teams will present their algorithms and insights to you.

RECENT TWEETS

  • We’re excited to launch the UK Health & Wellness Challenge for Data City | Data Nation and invite the DEXTRA... https://t.co/oqjDygoc67
    over a year ago
  • We’re excited to launch the UK Health & Wellness Challenge for Data City | Data Nation and invite the DEXTRA... https://t.co/MVe53hyWSg
    over a year ago

KEEP IN TOUCH

Stay in touch and sign up to receive email updates

CONTACT US

Launchpad@one-north, #04-03/04 79 Ayer Rajah Crescent Singapore, 139955

contact@dextra.sg
Top