Rakuten-Viki Final Presentation Event

PHOTO: Paul Harris (Newton Circus)

The Rakuten-Viki Global TV Recommender Challenge has finally come to a successful closure on the 16 September 2015. Six Teams (Team Merlion, Team GM, Team Haipt, Team Pritish,Team Gbenedek & Team Lenguyenthedat) were invited to present publicly in front of a pool of audiences and the judges. Rakuten was generous enough to make the presentation opened to public for the first time ever in DEXTRA history. We were also honored to have Mr Kiren Kumar, Director, Infocomms & Media Singapore Economic Development Board to give a welcoming speech on that evening.

 

After a long deliberation, the judges' votes were unanimous, they voted that Team Merlion (Duan Rubing, Liu Yong, Yang Xulei, Rick Goh, Hu Nan, Tan Yong Kiam, Wu Zhen Zhou, Yang Feng) were the winners for the challenge. Winning team Merlion employed an elegant algorithm that performs outstandingly in both public and private leaderboards, identified and engineered very useful features, as well as discovered and visualised some key insights from the data provided.

 

PHOTO: Paul Harris (Newton Circus)

 

PRIZE BREAKDOWN FOR RAKUTEN-VIKI GLOBAL RECOMMENDER DATA CHALLENGE

Top 3 prizes:
Top Winner, Team Merlion, won 8,000 SGD (cash)
1st Runner-up, Team Haipt, received 2,500 SGD (cash)
2nd Runner-up, Team GM, received 1,500 SGD (cash)

Bonus prizes:
- 3rd Runner-Up, Team Pritish walked away with 1,000 SGD worth of Rakuten Super Points
- For Best Visualisation, and approach to code testing, Team Le Nguyen The Dat received 600 SGD worth of Rakuten Super Points
- For the unique approach and application of graph theory to content popularity prediction, Team Gabon received 500 SGD worth of Rakuten Super Points

All members of the top 6 teams receive:
- Viki Pass – 12 months premium access to Viki, presented by Rohit Dewan, CTO at Viki. Viki Pass is Viki's subscription service that allows users to watch videos ad free and in HD.
- 100 SGD worth of Rakuten Super Points redeemable on Rakuten Singapore shopping site


Congratulations to all the six finalist teams. DEXTRA would like to give our biggest thanks to the Rakuten Institute of Technology (RIT) team and the participants from the DEXTRA community for their tremendous effort and time to make this challenge into a very successful and eventful one.

 

Featured News

Winner of Rakuten-Viki Global TV Recommender Challenge Announced
17 September 2015, The Tech Revolutionist

Rakuten Viki Challenge Results

The Rakuten Viki Global TV Recommender Challenge - had come to an end on 31 August. We received a total of 567 submissions from 132 participants - one of the most highly anticipated and compelling challenge thus far! Immense effort were put in by the participates who wanted shine in the challenge. The ranking on the Public Leaderboard continually evolved, as participants kept improving their predictions and outdo each other.

 

We have created an leaderboard activity graph to show positions of participants from the challenge launch day. From the visualization, we can see the participants battling out to be first on the leaderboard

 

The accuracy of the participants' models were validated from an unseen segment of the Rakuten Viki dataset, which generated a score on the Private Leaderboard (the higher the score, the better).

Team Name Public Leaderboard Private Leaderboard
haipt 0.1733 0.1576
Team Merlion 0.2374 0.1560
GM 0.2267 0.1477
PpRedicts 0.2204 0.1312
gbenedek 0.2300 0.1309
lenguyenthedat 0.2149 0.1112
 

Rakuten Viki Challenge Final Presentation

 

Following all the hard work put in by the participants, we would like to invite everyone to our grand finale on September 16th where shortlisted participants will present their algorithms and findings to a panel of judges, and compete for the top prizes (S$8,000). Our guest of honor Kiren Kumar, Director, Infocomms & Media, Singapore Economic Development Board, will be giving a welcome message. It is an opportunity not to be miss for the DEXTRA community as for the first time in DEXTRA history, the presentation will be open to public.

 

All the finalists are prominent data scientists such as PhDs and researchers from leading MNCs, SMEs, Start-ups and Research Institutes. Here are a brief preview of some of the highlights that will be shared by each team during the Final Presentations:

 
  • Team haipt – a co-founder and CTO of Teevers – used recommender to build user's preference vector. The most popular videos were used as recommendeded content for infrequent users.
  • Team Merlion of 7 A*Star data scientists, adopted a 3-step approach: classify > rank > filter, to give personalised recommendations. They dived deeply into the data and revealed a lot of insights.
  • Team GM - a research scientist at A*Star Data Analytics Department – combined the popularity and classification methods to make recommendations for infrequent and frequent users respectively.
  • Team PpRedicts - a data analyst at Facebook. He crafted a custom solution to combine metadata from movies, viewing behaviour and user preferences to generate predictions.
  • Team gbenedek - an Associate Professor of Corvinus University of Budapest and a founder of an analytics company - visualised the connections among videos using graph theory, which allowed identification of popular and similar videos for recommendation.
  • Team lenguyenthedat - a Senior Data Technologist at Commercialize TV - solution blended several models into a framework with a vision of allowing Rakuten-Viki to generate recommendations suited to different use cases.
 

Please click on the Register Now button below and reserve a seat for the event now!

 
register_pink-2

MINDEF Data Analytics Challenge – Hackathon Wrap-up

 

Over the weekend of 29 and 30 August 2015, ten teams participated in the Ministry of Defence (MINDEF) Data Analytics Challenge Hackathon, organised by Padang & Co and DEXTRA. This is MINDEF’s very first data analytics and innovation competition opened to the public.

 

This hackathon marked the end of the Challenge, which began with a technical challenge on DEXTRA. The technical challenge was to analyse a MINDEF Human Resource dataset to predict key factors contributing to MINDEF staff attrition rates. The top ten teams from this challenge were then invited to the hackathon.

 

At the Pre-Hackathon Workshop, these ten teams heard MINDEF teams from various units that have put up the problem statements pitch their respective challenges: from understanding public sentiment to manpower planning and training to procurement. The two sides were then match-made to form the ten hackathon teams.

 

The hackathon provided a fantastic opportunity for the data teams to co-create solutions with the MINDEF teams. Given the exceptionally high quality of the teams, competition was naturally very keen! The teams were mentored by:

  • Claudia Zeisberger, Senior Affiliate Professor of Decision Sciences and Entrepreneurship & Family Enterprise, INSEAD
  • Rajaraman Kanagasabai, Lab Head, Semantic Computing Lab, Data Analytics Department, Institute for Infocomm Research (I2R)
  • Thomas Holleczek, Senior Data Scientist, Singtel DataSpark
  • Johnson Poh, Head Data Science/Principal Data Scientist, Defence Management Group, MINDEF
 

The hackathon concluded with a closed-door pitching session in front of a senior MINDEF panel. All hackathon teams are exhibiting at the CIO Seminar, held in conjunction with the annual MINDEF PRIDE Day exhibition (2-4 September), to share their work and further socialise the power of data analytics and open innovation in the MINDEF community.

 

We would like to thank everyone who participated in the MINDEF Data Analytics Challenge and made it a big success! We hope we’ll have the chance to take this ground-breaking MINDEF initiative even further next year and organise similar activities with other organisations.

 

PHOTO: Paul Harris (Newton Circus)

MINDEF Data Analytics Challenge Results

Unilever_Banner
 

The first phase of the Ministry of Defence (MINDEF) Data Analytics Challenge - the technical challenge - had come to an end on 23 August. We received a total of 2,347 submissions from 158 participants - by far the largest number in any DEXTRA challenge! From the submissions, we can see that the participants have put in tremendous effort to excel in the challenge. The ranking on the Public Leaderboard continually evolved, as participants kept improving their predictions and outdo each other.

 
 

The accuracy of the participants' models were validated from an unseen segment of the MINDEF dataset, which generated a score on the Private Leaderboard (the lower the score, the better).

 
Team Name Public Leaderboard Private Leaderboard
Bun Thit Nuong 0.0169 0.0141351
ZD 0.0168 0.014161
YesWeCan 0.0172 0.0142127
Metakey 0.0160 0.014294
sadz2201 0.0168 0.0143151
MSD_HIJ 0.0170 0.0144094
MSD Data Science 0.0161 0.0144216
Little Apple 0.0179 0.014479
The Full Suite 0.0187 0.014649
Liangyj 0.0180 0.0146925
 

On to the second phase of the Challenge - we've shortlisted these ten teams based on their private scores (accuracy of their prediction model) and a further two wildcard teams based on the ideas, insights and recommendations on the data in the reports they submitted. They will have the opportunity to work with teams representing various MINDEF business units that have put up problem statements at the upcoming hackathon (29 to 30 August), and stand the chance to win the top prize of S$5,000 from a total prize pool of S$17,000.

 

All the hackathon teams will also have the opportunity to share their solutions with the whole MINDEF / SAF community at the CIO Seminar, held in conjunction with MINDEF PRIDE Day (2 to 4 September).

 

Stay tuned for more updates on these solutions!

Successful Rakuten Viki Challenge Workshop

We had a successful Rakuten Viki workshop on the 29th of July, Wednesday. The workshop had a outstanding lineup of speakers like Alex Chan (VP of Product at Viki), Ewa Szymanska (Head of Research at RIT Singapore), Carol Hargreaves, (Head of Business Analytics, National University of Singapore, Institute of Systems Science), Robin Swezey, Data Scientist at RIT Tokyo), Kaixin (Data Scientist Senior Manager at Lazada).

 

Photo caption: top left: Alex shared about Viki business, top middle: Prof Carol shared useful considerations for recommender systems, top right: Ewa talked about the Rakuten Viki Challenge and data, bottom left: audience, bottom middle: Robin explained the steps for evaluation metric, bottom right: Kai Xin shared his experience on DEXTRA challenge. (PHOTO: Paul Harris (Newton Circus))

 

The data scientists, business leaders, and academic experts imparted valuable insights into global TV business and video recommender system. The speakers shared their knowledge as well as cleared doubts that participants had regarding the challenge. It was little to our surprise that the workshop was full house.

 

We would like to take this opportunity to thank the participants and speakers who took their time off for the sharing session. We hope everyone enjoyed the evening as much as we did.

 

Strong interest and great participation from the community

 

As of 19 August, 2015, we have received more than 330 submissions from over 110 participants. With such a high-energy start, this is bound to be one of the most successful challenges we have organized to date. The top scores are very close on our leaderboard, and it is clear we are in for an amazing challenge! If you are a data enthusiast with customer behavior data experience or a curious data analyst, please do not miss this chance to win from a prize pool of S$12,000 (top prize S$8,000). If you have yet to join the challenge, get yourself registered at DEXTRA and start exploring the data!

 
JOIN NOW
 

Featured news

 

We would also like to share that Rakuten Institute of Technology and Rakuten-Viki Challenge are featured in the Straits Times, AsiaOne and different channels! Below are the links of coverage on Rakuten Institute of Technology and the Rakuten-Viki challenge.

 

Rakuten opens research centre in S'pore

1st August 2015, The Straits Times (Online)

Rakuten opens research centre in S'pore

1st August 2015, Asia One

Rakuten Institute of Technology Launches in Singapore to Drive Innovation, Empower Businesses and Consumers

1st August 2015, The Tech Revolutionist

Rakuten : Establishes New Technology Research Centers in Singapore and Boston

29th July 2015, Singapore News Net

#RakutenInstituteofTechnology , Rakuten’s New Research Centre in #Singapore & #Boston

30th July 2015, The Neo Dimension

Ministry of Defence Data Analytics Challenge

Ministry of Defence Data Analytics Challenge
 

As Singapore celebrates SG50 this year, we are reminded of how far we have come along in the last 50 years, especially in the area of national defence, transforming into the present-day formidable fighting force . The progress does not stop here as MINDEF continues to be on the lookout for technologies to enhance our national defence.

 

This technical challenge is a lead-up to the actual Data Analytics Challenge conducted by the MINDEF Information Systems Division (MISD) in partnership with DSTA, to leverage data analytics towards the intended outcomes of evidence-based decision making, service innovation, and proactive risk management.

 

In this challenge, you are provided with a dataset containing Human Resource (HR) data. The goal is to analyse and manage employee retention by identifying the key factors contributing to people staying and leaving the organisation. In order to do this effectively, we want to be able to make use of a model that depicts how employees make turnover decisions so as to better formulate retention strategies.

 

Excelling in this technical challenge gives you a chance to participate in the Data Analytics Challenge hackathon (29-30 August). At least 10 teams will be selected to work with MINDEF business units on their respective challenging problem statements and co-create new solutions with these teams. It’s an opportunity not to be missed!

 
Challenge Launch Workshop
 

Join us at the workshop on Monday, 3 August to learn more about the challenge and the datasets for the technical challenge from MINDEF representatives. They will share more context around the challenge and do a Q&A.

 
register_pink-2
 

For more information about the challenge, please visit Ministry of Defence Data Analytics Challenge page. For any clarifications regarding the challenge, please feel free to email to DEXTRA team at mail@dextra.sg.

Expected Weighted Average Precision

The evaluation metric is EWAP@3, which is Expected Weighted Average Precision for a recommendation list of size 3. The range of the value is from 0 to 1 and the bigger the value is, the better it is. It means 1 is the best score you can get.

EWAP@3 is the expectation of WAP@3 over all users.

WAP@3 is Weighted Average Precision, calculated for each user, ranging from 0 to 1; it descends from commonly used metric Average Precision (AP).

The size of the list is chosen according to the context of the problem, recommending items to users on the home page with space limitations. The EWAP metric is built as follows.

 

Step 1. We define Weighted Average Precision - WAP@3 as:

WAP

where

- n = min⁡(3,number of videos actually watched);

- [p_j ] @ user k are = scores of videos predicted for user k by participants.

- [y_j ] @ user k are = scores of videos actually watched by user k, sorted from highest to lowest as j increases.

Table 1. Examples of how to calculate WAP@3:

Table 1

Note that [WAP@3]_k

- measures the quality of videos retrieved by participants from the videos pool; see u1 and u3.

- measures the ordering of retrieved videos; see u1 and u2, and u7 and u8.

- takes values in [0,1] for any user, and it cannot discriminate predictions for u4 and u6.

 

Step 2. We define the importance of each user w(k) as:

wk

where

sk

- N is the total number of users.

- w(k) measures relative importance of user k, obtained by weighting user k against all users.

- S(k) measures absolute importance of user k, and is actually the sum of all y_j for user k.

Table 2. Examples of how to calculate S(k) and w(k):

wkt

Note that w(k)

- measures the relative importance of user k, so the predictions for u4 and u6 can be discriminated.

 

Step 3. We then define EWAP@3 as:

EWAP

Table 3. Examples of how to calculate EWAP@3:

table3
 

Sample implementation

  • Our sample R & Python implementation with sample submission files

  • Rakuten-Viki Global TV Recommender Challenge

    Viki_Banner
     

    Viki – a play on words “video” and “wiki” – is a Global TV site powered by fans who have been translating their favourite foreign language videos, ranging from Korean and Turkish dramas to Japanese Anime, into over 200 languages. Acquired in 2013 by Rakuten, Viki continues to bring down language barriers to great entertainment, contributing to Rakuten’s borderless digital ecosystem.

     

    Viki’s Global TV offering is ever expanding, and with thousands of hours of licensed content, it becomes increasingly important to help each fan discover new gems in Viki’s digital library.

     

    Are you up for the challenge of helping Viki personalize its content for global audiences?

     
    Rakuten-Viki Global TV Recommender Challenge
     

    The Rakuten Institute of Technology is excited to partner with Viki on bringing the Rakuten-Viki Global TV Recommender Challenge to the data science community of Singapore. The event will build on Viki’s strength in crowdsourcing and co-creation, and celebrate the launch of the Rakuten Institute of Technology hub in Singapore. Rakuten-Viki will share with participants over 7 million lines of rich, anonymised data, giving a taste of viewers’ preferences, popular content features, and TV fan demographics. The goal is to build a personalized recommender system for Viki fans world-wide, following a set of user and business considerations.

     

    With a prize pool of S$12,000 (top prize S$ 8,000) the challenge will be launched with full details of the data, evaluation metric and timeline on 22 July, 2015 on our DEXTRA challenge platform. Together with DEXTRA, the top team may have an opportunity to deploy and test the winning model real-time in collaboration with the Rakuten Institute of Technology and Viki’s engineering talent.

     

    Join us at the workshop on 29th July 2015 for a primer to the challenge! -- A not-to-be-missed opportunity to gain valuable insights into global TV business and video recommender system from Rakuten-Viki data scientists, business leaders, and academic experts. Presenters will be sharing their knowledge and responding to any question you might have regarding the challenge. Attendance is highly recommended even if you have prior knowledge of data analytics and other recommendation systems. Please join us!

     
    register_pink-2
     

    For more information about the challenge, please visit Rakuten-Viki Global TV Recommender Challenge page. For any clarifications regarding the challenge, please feel free to email to DEXTRA team at mail@dextra.sg.

    Titanic Survival Prediction Challenge

    Exploring the Legacy through Machine Learning

     
    Titanic
     

    Introduction to this Challenge

    Titanic accident has been a tragedy. With totally 2208 passengers and crew onboard, 1496 lost their lives. In memory of Titanic, numerous books, films, songs has been created. In accordance with our community, we present a unique way to explore the legacy of Titanic through creating the data challenge for knowledge and practice purpose.

     

    This challenge asks you to predict a given passenger’s survival probability based on his/her demographic data and boarding information. The data frame does not contain information for the crew, but it does contain actual and estimated ages for almost 80% of the passengers. The data is publicly available online, so please do not cheat.

     

    Knowledge and Practice to be Gained from this Challenge

    This challenge is very good for knowledge and practice in terms of these aspects:

  • The dataset is quite simple. It contains only 1309 rows of 11 features, so it is easy for anyone to get start.

  • It is a typical binary classification problem. One could use it to practice some basic machine learning skills for classification, such as Naive Bayes, decision tree, logistic regression, support vector machine (SVM), k-nearest neighbour (k-NN) and so on. One may also go beyond these basic skills to employ advanced skills like simple ensembling methods, e.g., Random Forest, Bagging and Gradient Boosted Regression Trees (GBRT), and complicated emsembling techniques, e.g., bucket of models and stacking.

  • Getting familiar with evaluation metrics for classification problems. We use Cross Entropy Loss (also known as Logarithmic Loss) as the evaluation metric. It is a widely used evaluation metric for classification problems. In the future, we will craft more classification challenges with evaluation metrics like precision, recall, area under the curve (AUC), F-measure and etc.

  • Learning how to interpret the Machine Learning results. Machine Learning is praised for its excellent predicting performance, but criticised for its lack of comprehension for the results, i.e., so called black box.

    However, results of simple algorithms like Decision Tree can usually be interpreted well, and you can learn this from this challenge. From example, you may find that the top node in the decision tree is sex, and this means sex is the most important feature. This makes sense because female usually survived while male usually did not.

    On the other hand, some advanced algorithms are not totally black box. For example, GBRT can provide you with some good insights from the data. You can find out how single feature affect the results, and the effects of combined features and their interactions. This is really important and useful in real business, and the winner of Unilever challenge declared their prize mainly thanks to this technique.

  • Resources of high quality and quantity are provided for this challenge. We have spent a lot of effort collecting high-quality free machine learning materials online, including lecture notes and videos from Harvard University, Stanford University, Cousera courses, Youtube videos and etc, ranging from tools like R and Python, to algorithms like Decision Tree, Naive Bayes, Random Forest and etc. We will keep adding more materials and are more than happy to receive suggestions from you.

  • With these benefits listed above, why not just give a try. It is never too late to start learning new knowledge or practice more. Let's do it right now.

    Mashing up different datasets from DEX API with unified API key

    DEX API to make data more easily accessible

     

    Data is being generated at an unprecedented speed, and people are realising the enormous potential of data. Aware of the big trend, Singapore government is actively pushing for Smart Nation building through improving data harvesting and data analytics. Our two platforms DEX and DEXTRA act in coordination with government's aim to spur innovation and applications.

     

    As a data platform and marketplace, DEX is dedicated to actively harvesting from various data sources and making data consumption much easier for developers/companies. DEX provides access to data collected from 200+ companies of various industries and local government agencies. In order to improve the efficiency and convenience of data consumption, DEX is providing our own DEX API with an unified API key.

     

    LTA and NEA APIs

     

    DEX has made API available for about 40 organisations, among which we would like to highlight 2 realtime APIs, i.e., APIs from Land Transport Authority (LTA) and National Environment Agency (NEA). LTA made 17 API endpoints available to call different datasets, such as bus arrival time, traffic incidents, traffic images, road conditions, ERP rates and many more. NEA made 8 API endpoints available to call different datasets, such as weather nowcast for 3 hours, heavy rain warning and PM 2.5 hourly update and etc. Following is an example of how the community can create amazing and useful applications and visualisations because of LTA's intention to make the data avaiable through API.

     

    Use Case: A cool visualisation from the data community

     

    Using LTA's API directly, Calixto Tay and his team have developed a mobile application to provide estimated arrival times of up to 3 buses: SG BusLeh. LTA's API provides up to two buses arrival times. Building on this API, SG BusLeh provides the estimated arrival time of 3rd bus by leveraging predictive data analytics. This application has collected large amount of data, such as bus stop locations, requests, timestamps and etc. Moreover, they made the available for research and academic purposes. Utilising the released data, Fu Hua Shih created an amazing visualisation, from which we can clearly see the request trends with respect to time and location to discover stories unfold by the data.

     
    Visualisation

    Snapshot taken from the visualisation created by Fu Hua Shih.

     

    Mashing up for more creativity and applications

     

    This example already shows the potential of utilising data from single organisation, but mashing up data from two or more organisations will be surely more amazing. For example, by combining bus arrival time data of LTA and weather nowcast data of NEA, we can analyse how weather affects bus arrival time, and then accurately predict the arrival time as well as bus operators can even dynamically optimise the bus schedule based on weather nowcast.

     

    These demonstrations of mashing up different datasets gives a sneak preview of the promising future of data mash-up. As a data platform, DEX welcomes any exploration of the datasets and is more than happy to help users pilot and develop various use cases. DEX would also like to encourage organisations to provide their datasets in the DEX marketplace to explore usage of their data. By connecting data consumers and data providers, DEX provide win-win situation for both parties. Let's take one step forward to build Smart Nation with data!

     
     

    Related resources:
    1. DEX API documentation
    2. Land Transport Authority API on DEX
    3. National Environment Agency API on DEX
    4. Fu Hua Shih's Visualisation
    5. SG BusLeh iOS Application

    Unilever Overall Opinion Score Prediction Phase II – Finale!

    Unilever_Banner

    Photo caption: From left: Zhenzhou Wu (Team Jg), Lucas Tan (Team Jg), Thia Kai Xin (Team Fox Hole), Marcus Lim (Team Fox Hole), Vijay Raj (VP CMI Research Innovation, Media & Shopper Insights, Unilever), Dipayan Sarkar (Team Hexponent), Kathleen Yung (Unilever), Aung Myint Thein (Project Director of DEXTRA, Newton Circus), Pritish Kakodar, Timothy Lin, Qifang Zhao (Data Analyst of DEXTRA, Newton Circus), and Daryl Arnold (CEO of Newton Circus) celebrated the closing of the Unilever Overall Opinion Score Prediction Challenge on 23 March, 2015. (PHOTO: Timothy Ley (Newton Circus))

     

    The Unilever overall opinion score prediction challenge has come to a successful conclusion on 23 March 2015. In the phase II of this challenge, 5 finalist teams (Team Fox Hole, Team Jg, Team Hexponent, Timothy Lin and Pritish Kakodkar) were invited to participate in 2 private challenges, i.e., to further predict the overall opinion scores and to predict the ranking of products based on ingredients data only.

     

    Compared with the Phase I of this challenge, the available dataset for phase II is smaller and more limited, this made it more challenging because it became difficult to stand out from other participants. However, our finalist teams were not fazed by the difficulties.

     

    Vijay Raj of Unilever congratulated finalists Thia Kai Xin and Macus Lim of team Fox Hole (left picture below) for identifying ingredients that had positive, negative and neutral effects on the overall opinion score. The way Fox Hole demonstrated how overall opinion scores varied according to the proportions of the ingredients was very impressive. Finally, Fox Hole demonstrated an enhanced formulation for one product and shown which attributes (such as Fragrance, Dissolution etc) of the new product were improved or declined. They won the second place in the challenge.

     
    Unilever_Banner

    Photo caption: Left photo: Vijay Raj (VP CMI Research Innovation, Media & Shopper Insights, Unilever) awarded Thia Kai Xin and Marcus Lim (Team Fox Hole) for finalist prize. Right photo: Vijay Raj (VP CMI Research Innovation, Media & Shopper Insights, Unilever) awarded Lucas Tan and Zhenzhou Wu (Team Jg) for winning prize. (PHOTO: Timothy Ley (Newton Circus))

     

    After much deliberation, Vijay and his colleague Supriya decided that the winners were team Jg, which consisted of Lucas Tan and Wu Zhen Zhou (right picture above). They did a very deep dive into the data and illustrated how individual ingredients affect different attributes of products - fragrance, dissolution and so on respectively. Their detailed analysis revealed ingredient level insights that had never been shared, explored before. For Unilever this original thinking really excited them and are now eager to further test the model in a key market. Congratulations to team Jg and they brought SGD 10,000 home!

     

    Lastly, congratulations to all the 5 finalist teams. The remaining teams were rewarded SGD 2,000 each for their outstanding performances. All members from the judging sessions were very impressed with all the findings and the quality of models being presented by all the teams. Supriya, based in India, praised each team for their outstanding efforts.

     

    Everyone has spent enormous amount of time and effort into making this a very successful challenge. Many, many thanks from the DEXTRA team.

    Awesome participation for Unilever Overall Opinion Score Prediction Challenge

    Unilever_Banner
     

    The Unilever overall opinion score prediction challenge has come to the end on 8 Feb 2015 and we received total 621 submissions from 133 participants, by far the largest number we have received and the most successful DEXTRA challenge to date! We have seen many wonderful models and brilliant ideas and it has been a great honor for us to review all of them.

     

    We have created an interesting graph for relative positions of participants from the challenge launch day to celebrate their participations. From the visualization, we can see the community tried very hard to predict the Overall Opinion Score better than each other as we can see their positions are jumping up and down.

     
    Picture3
     

    The accuracy of the teams’ models were validated against an unseen segment of the Unilever dataset which generated a score in the Private Leaderboard (the lower score is better).

     
    Team Public Leaderboard Private Leaderboard
    Lucastan and team 0.2061 0.19519
    timothy0336 0.2082 0.19719
    thiakx and team 0.2109 0.19964
    pritish.kakodkar 0.2121 0.20213
    praveenbysani 0.2132 0.20400
     

    Choosing the finalists is always a challenge as every submission is worthy of consideration. After a careful deliberation we shortlist five best teams based on accuracy of the prediction and two teams based on their interesting ideas, insights and recommendations on the data. The seven teams have been invited to present their models and findings to the judging panel from Unilever on Monday 23rd February.

     

    Please stay tuned for more updates. We will publish a post about the presentations soon after the date.

    Unilever Overall Opinion Score Prediction Challenge

    Unilever_Banner
     

    On any given day, two billion people use Unilever products to look good, feel good and get more out of life. Through more than 400 brands, 14 of which generate sales in excess of euro 1 billion a year, Unilever supports many long-standing social missions, including Lifebouy’s drive to promote hygiene through hand washing with soap, and Dove’s campaign for real beauty.

     

    Unilever’s portfolio ranges from nutritionally balanced foods to indulgent ice creams, affordable soaps, luxurious shampoos and everyday household care products. Their products are world-leading brands including Lipton, Knorr, Dove, Axe, Hellmann’s and Omo, alongside trusted local names such as Blue Band, Pureit and Suave.

     

    As one of the largest consumer goods companies in the world, Unilever constantly strives to provide better products and it is contributing in improving this process that the next Data Innovation challenge comes in.

     

    Getting new products to market involves multiple stages, some of which can be both time consuming and resource intensive. An example would be consumer research and the activities that lead to generating a customers’ Overall Opinion Score – a indicator of if the product is liked by consumers.

     
    Overall Opinion Score Prediction Challenge
     

    With many thanks to Unilever, we are pleased to announce the Overall Opinion Score Prediction Challenge to identify a model that predicts consumer preferences. To support the challenge, Unilever will provide to participants from the DEXTRA data innovation community very detailed and comprehensive never publicly available before datasets.

     

    With a prize pool of S$15,000 (top prize S$10,000) the challenge will be launched with full details of the data, evaluation metric and timelines on 11 December, 2014 on the dextra.sg challenge platform. A briefing and Q&A workshop will be held at 237 South Bridge Road at 8pm on 11 December, 2014 following the launch of the IDA Data Discovery Challenge. Please join us!

     
    register_pink-2
     

    For any clarifications regarding the challenge, please feel free to email to DEXTRA team at mail@dextra.sg. This challenge is brought to you by IDA and DEXTRA.

    IDA Data Discovery has launched!

    IDA Data Discovery Challenge

     

    Our UPSingapore friends are proud to announce the Data Discovery Challenge! IDA is looking for innovative solutions that will help open Singapore’s wealth of data to discover new insights!

     

    This Challenge seeks to encourage the use of wide range of private and public datasets to discover new value and benefits towards enabling smarter enterprises and improve how we live, work, learn and interact in Singapore.

     

    To learn more about the challenge, please check here.

    BellaDati Helps to Enable IDA Data Discovery Challenge

    BellaDatiCustomizedAgency
     

    BellaDati is bringing its pure cloud BI to IDA’s challenge. Participants will only need a web-browser or mobile. The solution has dashboards, reports, analytics, datastore, and embedded analytics platform with SDK/API.

     

    Easy for business user. Powerful for analyst. Trusted by retail, digital agencies, telcos, insurance, securities, governments in Asia and beyond.

     

    Visit their blog for Smart Nation/Challenge ideas. The blog includes tutorials, hints and tips for cloud data analytics for the challenge. There are also links to register for free competition account.

     

    Participants can sign up for free, time-limited access to Belladati’s cloud here.

    How to Insure Fresh Insights

    Great Eastern Challenge

     

    A scoring algorithm for prioritising insurance claims, a user-based recommended system to help grow the business, and the use of social media data to predict customer churn – these were the winning ideas to emerge from the final showdown of the Great Eastern Data Innovation Challenge.

     

    The Data Innovation Challenge is an IDA initiative which brings together user enterprises, data providers, data scientists, research institutes, institutes of higher learning and ICT companies to develop proof-of-concepts and test out the prototypes or working models to address the user enterprises’ business challenges.

     

    The goal of the Great Eastern Data Innovation Challenge was to come up with a model to improve Great Eastern’s insurance business. “With the vast data stored in Great Eastern’s enterprise data warehouse (EDW), we wanted to find new ideas in using existing data to improve Great Eastern’s business and services,” said Mr Yeo Joon Foong, Head of Customer Analytics at Great Eastern Life Assurance. “Hence we decided to throw an open challenge and invite analytics professionals to submit data-driven solutions that can bring value to Great Eastern’s business.”

     

    The main judging criteria for the competition included the business value of the proposal, ease of implementation and originality. Competition was keen – 67 people signed up, 12 proposals were submitted, five ideas were shortlisted and by the end of the final presentation round on 16 October, only three teams were left standing.

     

    Team Wee Kim Wee Miners – comprising Research Associate Johnkhan Sathik Basha and doctoral student Aravind Sesagiri Raamkumar, both from the Wee Kim Wee School of Communication and Information, Nanyang Technological University – came up tops with their ideas surrounding the use of social media data.

     

    Despite being first-time participants in a data analytics challenge, the two found themselves in familiar territory. “Since our research projects are related to data mining techniques, we are naturally interested in data analytics,” said Mr Basha. “It is an exciting field as it combines data modelling, user modelling, machine learning and natural language processing which are all interesting areas in themselves.”

     

    However, there were a few interesting hurdles that they had to overcome – such as the absence of datasets in what is supposed to be a data analytics challenge.

     

    Explaining how the challenge worked, Ms Huang Xiaoling, Senior Lead Analyst, Customer Analytics, Great Eastern Life, said instead of actual data, data fields from the EDW were released. More than 300 relevant fields were identified and broadly categorised into policies, benefits, parties, claims and transactions. In order for the participants to get a clearer picture of the data fields, entity-relationship diagrams are also drawn to show the linkages between the tables.

     

    Mr Basha noted that traditional data challenges were usually data mining challenges where participants were provided with a dataset based on which they devise their predictive solutions. “In the Great Eastern challenge, the input was a data model of Great Eastern’s data warehouse. The expected solution had to be a conceptual idea. Therefore, we found the lack of a dataset the most challenging aspect since we had to learn Great Eastern’s model through entity-relationship diagrams.”

     

    Second runner-up Mr Azhagarasan Annadorai, a veteran of at least five data challenges, shared the same view. “Unlike other challenges in the past, Great Eastern did not provide masked data, which made us to think creatively and at the same time it was an additional work to be done,” he said.

     

    Another interesting aspect of the competition was that participants had to narrow down the problem statement themselves. “It was a learning expedition to explore the industry problems,” said Mr Azhagarasan, a 20-year IT industry veteran. Through the process, he gained interesting insights into how modelling was done in an insurance set-up.

     

    Mr Basha also found the overall experience of participating in the innovation challenge very productive. “We got to network with other participants and learned about how they approached this Great Eastern challenge. Also, we got an idea of how industry personnel evaluate the implementation prospects of ideas.”

     

    First runner-up in the challenge was the Big BA Czar Team, who came up with the idea of growing Great Eastern’s insurance business through user-based collaborative filtering recommender system. The team comprised Mr Kenneth Koh, who graduated from the Singapore Management University School of Business Management, and Mr Ong Wei Keong, who holds a Masters in Mathematics from the National University of Singapore.

     

    From Great Eastern Life’s perspective, the innovation challenge has been an interesting experience as well. It has enabled the insurance company to tap on a wider audience of analyst and data scientists to use data beyond marketing, and given rise to fresh ideas on how to use insurance data and other data beyond traditional analytics, said Mr Yeo. This paves the way for superior analytical models that will help improve predictability through the use of data, he added.

     

    The original version of this article appears in the Infocomm News from Singapore and available here.

    Winners of Great Eastern Insurance Innovation Challenge

    IMG_9364-1024x682
     

    The Great Eastern Insurance Innovation Challenge came to a close on the evening of 16 October 2014. With 4 teams and 5 ideas shortlisted, only 3 teams walked away with the top 3 prizes at the end of the presentations.

     

    All the finalists did really well and the judges were very impressed with the work produced by our participants.

     

    The third prize went to our regular participant Azhagarasan Annadorai where he presented on prioritizing claims for processing based on a claim scoring algorithm.

    IMG_9360-682x1024

    Azhagarasan Annadorai (right) with Great Eastern Life Assurance’s Head of Customer Analytics, Yeo Joon Foong (left).

     

    Next, we have  The Big BA Czar Team, comprising of team members Kenneth Koh and Ong Wei Keong where their idea of Growing GE’s insurance business through user based collaborative filtering recommender system earned them the 2nd prize.

    IMG_9358-1024x682

    (From left to right) Kenneth Koh, Yeo Joon Foong (Head, Customer Analytics) and Ong Wei Keong

     

    Lastly, the top prize went to the brilliant team WeeKimWee Miners, who came up with 2 proposals that were both shortlisted by Great Eastern Life Assurance Pte Ltd

    IMG_9355-1024x682

    (From left to right) Yeo Joon Foong (Head, Customer Analytics), Aravind Sesagiri Raamkumar & Johnkhan Sathik Basha

     

    Congratulations to all the shortlisted teams and winners of the Great Eastern Insurance Innovation Challenge!

    Marketing Analytics & Risk Predictions Workshop

    Marketing-Analytics-Insurance-Innovation_Final

     

    We had a great workshop on 16 September, Tuesday where we had speakers touch on Marketing Analytics and Risk predictions. The highlight of the event was definitely the panel interview where participants had the chance to learn about Great Eastern’s involvement with data and their plans to innovate the insurance business through the use of data analytics.

     

    With a fantastic line up of speakers such as Rahul Budhraja (Executive Director of Nielsen Consulting Analytics), Mike Anderson (Partner, Datacraft Sciences) and Xavier Conort (Chief Data Scientist, DataRobot) as part of the panel alongside Collin Chan (Chief Marketing Officer, Great Eastern Life Assurance) and Yeo Joon Foong (Head of Customer Analytics), it was little surprise that the event was sold out!

     

    For those of you who missed it, here’s a recap of the events that night:

     

    Welcome message

    IMG_7760-1024x682

     

    DEXTRA’s Great Eastern Insurance Innovation Challenge – Welcome Address by Colin Chan from Newton Circus on Vimeo.

     

     
     

    Marketing Mix Optimization Models

    IMG_7762-1024x682

     

    DEXTRA’s Great Eastern Insurance Innovation Challenge – Rahul Budhraja “Marketing Mix Optimization Models” from Newton Circus on Vimeo.

     

     
     

    Risk Prediction

    IMG_7797-1024x682

    DEXTRA’s Great Eastern Insurance Innovation Challenge – Mike Anderson “Risk Prediction” from Newton Circus on Vimeo.

     


     
    We hope that you have enjoyed the event and if you have not signed up for the challenge, it’s not too late to do so! Sign up now before the challenge ends on Sunday, 28 September!

    [Challenge] Great Eastern Insurance Innovation Challenge FAQs

    Find out more about the Great Eastern Insurance Innovation Challenge

     

    GENERAL QUESTIONS

     

    1. Why can’t I verify my account? I followed the tutorial and my account is still showing ‘Account is not verified’.

     
    You may wish to follow the steps in this Account Creation Tutorial or in Account Verification Tutorial.
     
    If this still doesn’t work for you, do email us at mail@dextra.sg with the following:
     
    1. Subject title: Technical Issues: Great Eastern Insurance Innovation Challenge
    2. Error messages (if any and including screenshots will be helpful)
    3. Steps you went through, if you recall (screenshots will be helpful)
    4. Operating System (Windows, MAC or Linux)
    5. Operating System Version
    6. Browser type (Internet Explorer, Firefox or Google)
    7. Browser version
     

    2. Can I form my own team to take part in the challenge?

     
    For those who wish to form a team, please email us at mail@dextra.sg with the following:
     
    1. Subject title: Team Formation: Great Eastern Insurance Innovation Challenge
    2. Team name
    3. Team members’ names (to indicate team leader)
    4. Team members emails
    5. All members of the team are to register on the platform, join the challenge, agree to the terms & conditions and non-disclosure agreement individually*.

    * Note: Failure to comply to the above will lead to disqualifications.

     

    3. May I suggest the use of external datasets for this challenge proposal?

     
    A: Yes, you may and do explain your rational for the proposed data.
     

    4. Why can’t I upload and submit my proposals?

     
    Please email us your the following details at mail@dextra.sg
     
    1. Subject title: Technical Issues: Great Eastern Insurance Innovation Challenge
    2. Error messages (if any and including screenshots will be helpful)
    3. Steps you went through, if you recall (screenshots will be helpful)
    4. Operating System (Windows, MAC or Linux)
    5. Operating System Version
    6. Browser type (Internet Explorer, Firefox or Google)
    7. Browser version
     

    Register for the Great Eastern Insurance Innovation Challenge

    [CHALLENGE] Great Eastern Insurance Innovation Challenge

    Register for the Great Eastern Insurance Innovation Challenge 

    Launched on Wednesday, 03 September 2014

     

    INNOVATING THE INSURANCE BUSINESS

    Great-Eastern_Challenge-1024x456

     

    Great Eastern Life Assurance Pte Ltd founded in 1908, has been providing insurance protection to individuals, families and businesses for over 100 years. The company has grown over the years and now offers a wide range of products including life insurance, health insurance, personal accident, general insurance, travel insurance, investment-linked plans and employee insurance.

     

    Ever wondered why some people buy insurance and some don’t? Is it because of cost or is it because some do not see the need? More often, the consequences of not getting an insurance policy can cause severe economic grief to an entire family compared to the cost of being a policy holder. Coupled with the rise in healthcare costs and inflation, the importance of having an insurance policy is more necessary than ever.

     

    How then, can Great Eastern Life Assurance Company Limited improve its insurance business and at the same time ensuring the benefit of their customers? Would providing customers the confidence in purchasing a policy at a cost-effective rate generate more sales or would hiring and retaining the right talents anchor its business to a stronger position? Could anomaly detection also help identify and mitigate risks more efficiently than how it is being conducted currently?

     

    (1) Your Task:

    With the changes in the global economy and population demographics, can you provide an innovative idea on how insurance businesses can be improved to provide better services to consumers and in turn, grow its revenue? Select the fields from the data schema provided and propose how a model can be built to innovate Great Eastern’s insurance business.

     

    (2) Data:

    a. You will be provided a data schema of the available fields captured in their data warehouse. All available fields will be accompanied with a description of  the fields and  the type of value (text, numbers etc).

    b. Any other supporting data suggested will be a plus point added to your proposal

    c. Provide data simulation (with stated assumptions) to project possible models in appendix (optional)

     

    (3) Timeframe:

    3.5 weeks (03 September to 28 September)

     

    (4) Cash Prizes: 

    • 1st Prize: SG$4,000
    • 2nd Prize: SG$2,000
    • 3rd Prize: SG$1,000
     

    (5) Submission requirements:

    a. A 2-5 page report of their proposal covering the points stated in the template provided

    b. A powerpoint presentation not more than 10 slides (excluding the cover page, thank you page and appendix)

    c. Maximum file size limit of 20MB in a single zipped file

    Note: Participants can submit their entries until the closing date. Challenge will end at 23:59 on Sunday, 28 September 2014.

     

    (6) Selection criteria (in %), where, at least 8 teams will be shortlisted to present to the Great Eastern panel based on the following: 

    a. Effectiveness in generating value to Great Eastern’s business (30%)

    - What is the value of the idea?
    - Who will benefit from the idea and how can it bring value to Great Eastern?

    b. Ease of implementation in next Phase, with output that is actionable by Great Eastern (30%)

    - Can the idea be implemented?
    - How easy can it be implemented?
    - Any risk in implementation and how to overcome it?

    c. Originality of the idea (20%)

    d. Introduction of suitable external data to complement existing data (10%)

    e. Recommendations to current data collection (and how) for Great Eastern’s considerations (10%)

    Note:  An announcement will be made on Tuesday, 07 October 2014 on the shortlisted participants via our eNewsletter, Challenge Platform and Blog. Presentation and prize awards will be held in the evening of Tuesday, 14 October 2014. Thursday, 16 October 2014.

     

    (7) Team formation: 

    For those who wish to form a team, please email us at mail@dextra.sg with the following:

    1. Subject title: Team Formation: Great Eastern Insurance Innovation Challenge
    2. Team name
    3. Team members' names (to indicate team leader)
    4. Team members' emails
    5. All members of the team are to register on the platform, join the challenge, agree to the terms & conditions and non-disclosure agreement individually*.

    * Note: Failure to comply to the above will lead to disqualifications.

     

    There is no limit to the number of members you can have on your team, but please note that after the team is formed, only submissions from the team leader will be acknowledged.

     

    For any clarifications regarding the challenge, please feel free to email the DEXTRA team at mail@dextra.sg

     

    Register for the Great Eastern Insurance Innovation Challenge 

     

    Please note that staff employed by Great Eastern Life Assurance Company Limited are not eligible to enter the competition.

     

    This Challenge is brought to you by IDA & DEXTRA.

     

    Last updated on 07 October 2014, 16:22.

    Team Magellan – Winner of Best Use of Data

    Over the last weekend, our UPSingapore friends concluded the maritime-themed Smart Port Hackathon with a record number of 25 presentations on 20 July 2014. With over 32 million rows of data across 8 different organizing partners, the hackathon included datasets from MPA and its supporting partners such as PSA Corporation Limited, Jurong Port, YCH Group, DHI, Orbcomm and the National Environment Agency.

     

    These included datasets covering vessel movement and position data, ship registries, cargo information, berthing schedules, container trucking data, bunkering transactions, hydrographic maps, air quality, tidal and weather information. This was a first-time ever unlocked, never been released data from MPA that represented different sectors of the maritime industry.

     

    As with many attractive prizes, more than 120 participants took part, but only one team would walk away with the ‘Best use of Data’ prize that is worth $1,000. After a 48-hour data intensive hackathon, Team Magellan emerged as the winner for the ‘Best Use of Data’ prize.

     
    TeamMagellan-1024x694

    Three out of four members (Team member Alex was absent due to other commitments) of Team Magellan
    (From L to R: Yang Shun, Wei Kuan & Yee Sian) wearing their happy faces.

     

    The team built an interactive tool for the visualization of maritime data hosted on DEX and released during the hackathon. They had wanted to pick up visualization libraries (such as d3, crossfilter, and dc), and the hackathon provided just the opportunity!

     

    In the spirit of a good hack (and good data science!), they also documented the process through which the original data were worked with, as well as the javascript code behind the data-binding, so as to allow others who may come along and replicate what they have done. Talk about being humble and having a sharing attitude!

     

    Given that the hackathon was about the vision of a “Smart Port” and an “Intelligent Nation”, considerations about the “openness of data” were thought of in advance. The ability to interact with data to build understanding and transparency were important points the team took note of.

     

    Here’s what the team had to say about data, design and technology:

     

    We believe in the utility of algebraic languages for reasoning about complex systems, but we also believe in the responsible communication of abstract ideas to managers, policy-makers, and affected parties.

    And often, in building complex prediction and decision-making models, the cost of producing plots and diagrams (writing SQL queries, generating graphs) is high enough that we prefer to “fly blind” in making guesses and anticipating the behavior of our systems. We hope to motivate the development and adoption of visualization libraries that mitigates this friction.

    Lastly, to quote Bret Victor: “Software is for people. To derive what software should do, we have to start with what people do.” Software that invites people to work with data is challenging to create, and the role of design in making that a reality is immeasurable

     

    Such gracious and conscious thoughts on data, truly winners! Here is the video of their presentation for those who have missed it:

     

     

    Congratulations to Team Magellan!

    Over 32 million rows of maritime data on DEX

    We told you about 6+ million rows of data on ship types, movement, country of registration, location and cargo information last week. Now there’s so much more!

     

    Our dedicated data exchange platform (DEX) is now housing over 32 millions rows of data and 71 different data sets across the 7 different partners from the maritime industry! The data team has been preparing the data descriptions for the upcoming Smart Port hackathon and has learnt so much about the industry.

     

    Are you interested in this data set? Why not register yourself for the hackathon and see what you can do with these never-before released data? The prizes are pretty attractive too but we know the data is your kryptonite.

     

    We are so excited and can’t wait for you to discover what great ideas will be born from this.

     

    PS: The data is housed under this link but if you’re not a participant of the hackathon, you won’t be able to view it. You know what to do, don’t you?

    Best Use of Data Prize at UP Singapore Geo-hackathon 2014

    The UP Singapore Geohackathon 2014, in partnership with Singapore Land Authority, data.gov.sg, and VWOs including Red Cross Singapore, NVPC, SGEnable and Food from the Heart was held over the weekend of 06 – 08 June 2014. Many ideas, prototypes and use cases of data in the Geospatial arena were showcased and the well-deserved prize for the Best Use of Data went to Team TBG.

     
    Jeffrey Lau from Team TBG, posing with our very own makerbot

    Jeffrey Lau from Team TBG, posing with our very own makerbot

     

    Team TBG talked about perceived overcrowding and its impacts on healthcare facilities, public transport and housing. However, with good use of visualized geospatial data, they believe policy planning can be improved upon.

     

    With that idea in mind, they came up with a responsive app that helps you to look at population density, housing, schools and transport using a heatmap, simply by keying in your postal code. Their prototype also shows implications on the average capital appreciation of your housing. Overall, it is a great and useful tool that made good use of geospatial data.

     

    Here’s the video of their presentation. Congratulations to Team TBG!

    Prudential Healthcare Challenge Finale

    The Prudential Healthcare Challenge came to an exciting end on 18 April 2014 with 265 submissions from 120 different participants. 8 teams were shortlisted – the top 5 most accurate entries and 3 “wildcard” entries with interesting insights – to present their findings to the judging panel, consisting of Prudential’s CEO, Thomas Urbanec, Chief Marketing Officer, David Ng and Appointed Actuary, Marcus Ho.

     
    From left to right: Marcus Ng, David Ng and Thomas Urbanec

    From left to right: Marcus Ho, David Ng and Thomas Urbanec

     

    The accuracy of the teams’ models were validated against an unseen segment of the Prudential dataset, covering hospital bill sizes for January to December 2013, which generated a score in the Private Leaderboard (lower is better).

     
    Team Name Public Leaderboard Private Leaderboard
    Antoine Veillard 0.916596 0.924996
    Xminer 1.34047 1.35118
    Rubing Duan 1.33878 1.3562
    Aung Myint  Thein 1.39516 1.41366
    Troy J. Lee 1.52528 1.55218
     

    The teams’ submissions were judged overall against the following criteria:

    •  Accuracy of forecast model
    •  Analysis process, including findings from the data
    • Technology applied and innovation introduced
    • Value to Prudential’s business
     

    Wildcard Award ($2,000) – Team led by Hanif Samad

     
    Third from left: Hanif Samad & team mate Azhagarasan Annadorai

    Third from left: Hanif Samad & team mate Azhagarasan Annadorai

     

    Also, Hanif & Azhagarasan mentioned that a statistical model would need to respect the hierarchy naturally present in the data, for example the expectation that hospitalization bills attributed to the same hospital would be correlated, followed by those attributed to the same patient, and so on. This team also included an inflation variable taken as a number of years since 2010 which can be tuned according to expectations of the rate of inflation in future years.

     

    Tech Award ($4,000) – Team led by Antoine Veillard

     
    Third from left: Olivier Morere, Goh HanLin & Antoine Veilard

    Third from left: Olivier Morere, Goh HanLin & Antoine Veilard

     

    The team used semantic descriptors constructed from the existing ICD classifications and Natural Language Processing method on top of the Bag of Words (BoW) approach. The advantage of the semantic descriptor lies in the fact they can be computed in the early stages of the admission of the patient using the first elements of diagnosis. The NLP also regulates and enable identification of different derivations of the same word as the same concept (“irritant” and “irritation”). Entropy-based filtering and empirical selection were used to select predictors before building the predictive mode with random forest regression (RFR).

     

    An insight provided by the team was that the prediction of bill sizes at early stages of the medical process, decision making would have the most impact based on effectiveness of  NLP. Hence, they proposed estimating likelihood of person being admitted to healthcare institution with a given diagnosis and also suggested to also predict overall cost of healthcare for specific group of people over a specific time frame.

     

    Overall Winner ($9,000) – Troy J. Lee

     
    Third from left: Troy J. Lee

    Third from left: Troy J. Lee

     

    His best submission used a family of trees where tree branches on feature values to partition the data into clusters. The same prediction is made for each item in a cluster. the prediction being a weighted average of the prediction of each tree. Troy intuited that bill items with a ward type present are much more expensive as they are associated with operations or hospital stays. Cheaper bill items for prescription drugs, on the other hand, are less often labeled without a ward type. In addition, he made an observation that important predictive variables for prediction (in decreasing order) are diagnosis code, hospital, ward type and duration of stay.

     

    We are now working closely with Prudential and the winning data scientists to see how their models and findings can be integrated back into the business. In the meantime, watch our for our data science teams sharing their techniques and insights at the upcoming Good Health hackathon!

    [INTERVIEW] Winners of the Silverline Mobile Challenge

    Paul Meinshausen & Hanif Samad emerged tops in the Silverline Mobile Predictive Behaviour Challenge which came to a conclusion in March 2014. Their submission is a proposal for using the Silverline Mobile Eco-System to Detect, Support, and Improve the routines and sleep patterns of the elderly.

     

    Here’s a synopsis of their winning idea:
    Sleep patterns and daily routines are two critically important aspects of seniors’ lives. Using the data from Silverline’s sensors, they demonstrated how hidden Markov models can be used to make inferences about those patterns from the specific actions and conditions observed by the sensors. They suggested that if the seniors with sensors in their households were to use the Silverline phone app, the data from those seniors’ app profiles could be used to the validate the preliminary models they presented. They then proposed that by using these models, Silverline Mobile could help caretakers better understand and support the vital routines and sleep that are so critical to their parents’ and seniors’ lives and well-being.

     

    DEXTRA Silveline Challenge – Hanif Samad and Paul Meinshausen from Newton Circus on Vimeo.

     

    We decided to catch up with the winners themselves to find out more.

     

    DEXTRA: Tell us a little more about yourself.

     

    IMG_0049-1024x682

     

    Paul Meinshausen: I have an academic research background in anthropology and social psychology. I just moved to Singapore in October of 2013 and I spend most of my professional time as a data science consultant. The data science world in Southeast Asia is a fascinating place to be in at the moment and I’m really enjoying exploring it. Just recently I spent a couple of weeks working with Playbasis, a gamification startup (based in Bangkok) that’s doing some very innovative work in behavioral change and consumer engagement. I also like to work on projects that are oriented to helping people achieve improvements in their lives and that have a larger public benefit. This past summer I was a fellow with the Data Science for Social Good program in Chicago. After working with an amazing program like that, it’d be hard to imagine not regularly contributing to projects that try to use data science to tackle some of the many challenges that affect our broader society and communities.

     

    IMG_0051-1024x682

     

    Hanif Samad: I’m hired as a statistics officer for Sengkang Health, the upcoming public hospital in the Northeast. I am helping to develop the hospital’s analytics capabilities as a strategic resource, to be able to iterate data-driven solutions across the hospital’s value chain. I am otherwise an opportunistic data sleuth. I am curious about many things and will poke my finger into any interesting pie, with charts and commentary. I dabbled with aggregating leading forecasts of the US presidential race in 2012, and modelled missing positions in malaria parasite DNA during my Medical Statistics master’s at the London School of Hygiene and Tropical Medicine last year. The Silverline challenge was my first try at a data innovation competition proper.

     

    DEXTRA: Why motivated you to participate in the Silverline Mobile Challenge?

     

    Paul & Hanif: We learned about DEXTRA at the UP Singapore New Year’s party in January. The Silverline challenge began about a week after and since the topic seemed interesting to both of us, we decided to team up. One of the issues we grappled with at the beginning of the challenge was the open-ended nature of the challenge’s objective and evaluation criteria. Data science competitions often provide a specific and fixed standard (like an accuracy score or an error rate) that contestants’ proposed models will be evaluated against. That kind of technical specificity allows you to quickly get to the really fun statistical and machine learning work of data science, but it can also allow data scientists to avoid grappling with the conceptual and practical problems behind the challenge.

     

    With the Silverline challenge we were provided with sensor and app data and asked to derive an algorithm that could help achieve positive behavioral change in seniors. There was no specific metric that our models would be scored against. So we had to start by clearly defining the problem we would model, and only after that did we narrow our options and decide on a modeling approach. This kind of challenge can be very difficult and sometimes brings us outside of our comfort zone, but we both think it’s a critically important part of data science that is too often forgotten behind more technical details. In that sense, the Silverline challenge didn’t necessarily represent what most data science challenges are like, but it was a pretty good example of what data science in the real world is like.

    Silverline Mobile Predictive Behaviour Challenge: A Round-up

    Silverline Mobile Predictive Challenge

     

    The Silverline Mobile Predictive Behaviour Challenge came to a close on 17 March, with over 40 members from our community who have participated, and 3 valuable submissions received for the challenge. We are in the midst of evaluating these submissions, and the teams will be presenting their ideas and key findings to the Silverline Mobile panel on the evening of 27 March, Thursday.

     

    As with each challenge, the DEXTRA team has gathered a few learning points.

     

    The DEXTRA and Silverline Mobile teams understand that there were some short falls in the data sets provided to support the challenge. Some of the feedback received include behavioural data provided being collected over too short a period of time, and also that the constant updating of test data sets came off as being too disruptive.

     

    The good news is, Silverline Mobile has secured a group of long-term trial users of the programme. This means that the data sets collected will continue to be more and more comprehensive, and there is a good chance for a second challenge to go live in the second half of the year. Not only that, the data will be housed under our very own data sandbox at DEX, so that participants can make more robust use of the data collected.

     

    While the submissions are being evaluated, the intention is to review the possibilities of having the results from this Challenge incorporated so that the progress can be showcased in the next challenge. The teams at DEXTRA and Silverline Mobile are working hard to prepare more interesting sets of data to be collected long term with a larger group of users, so as to gain more insights and be effective in contributing to our seniors leading healthier and more active lifestyles.

     

    If you have any feedback for us regarding the challenge, we will be most glad to hear from you! Kindly take part in this survey — your responses will be most appreciated.

     

    -

     

    The Silverline Challenge presentation will take place on Thursday, 27 March, and winners will be announced the week after.

    [CHALLENGE] Prudential Healthcare Challenge FAQs

    Prudential-Masthead

     

    Register for the Prudential Healthcare Challenge

     

    GENERAL QUESTIONS

     

    1. Why can’t I verify my account? I followed the tutorial and my account is still showing ‘unverified’.

     

    Follow the steps in this Account Creation Tutorial. For some accounts we require you to save each piece of information separately. For instance, after entering your Passport/IC number, click save changes. Then go back to enter another piece of information like your phone number and verify it.

     

    If this still doesn’t work for you, do email us at mail@dextra.sg. We will delete your account on our end so that you can create a new account by following these steps.

     

    2. Can I form my own team to take part in the challenge?

     

    Yes, you can. Team formation is allowed. Do get your team leader to email us at mail@dextra.sg with the following:

     
    1. Team name & team leader
    2. Team members names
    3. Team members emails
     

    There is no limit to the number of members you can have on your team, but please note that after the team is formed, only submissions from the team leader will be acknowledged. All other submissions from individual team members will be rejected.

     

    3. May I use my own external datasets to build my predictive model?

     

    A: Yes, you may.

     

    4. Are there any restrictions to the progamming language and tool I use to build my predictive model?

     

    There are no restrictions to the programming language nor the tool(s) you choose to use when building your predictive model. However, we do require you to submit a short write-up, with your ‘submissions.csv’ file, detailing the methods and resources you use when building your model.

     

    5. Why can’t I upload and submit my predictions?

     

    The evaluation metric will be made publicly available mid-way through the challenge. In the meantime, do email us your submissions at mail@dextra.sg and we will reply you with your accuracy scores.

     

    *Update* The evaluation metric is available for submissions. You can click on “Dashboard” upon logging in to the challenge platform and click on the “Prudential Healthcare Challenge” link. Once on that landing page, you can click on the red ‘Submit’ button on the top right corner. Enter a description of your methodology and click on the green ‘Submit’ button. Now, you can click on the grey box that says ‘”Drop files to upload (or click)”. Select your files and click ‘Done’.

     

    You should see the page refresh and view your submissions under ‘My Submissions’.

     

    6. Some hospital bill items (with the same hospital bill ID) have the exact same costs, are they duplicate rows or amounts made in equal installments?

     

    The hospital bills amounts that are duplicated are in equal installments. They are not errors.

     

    7. Different hospital bill items (with the same hospital bill ID) have different prices, and there are no other variables to differentiate them. In the final submission dataset, is it possible predict the costs of each aggregated bill instead of predicting the costs of each component of each bill?

     

    The data set and the submission template will not be aggregated for prediction. The purpose of the challenge is to predict the medical bills at different time points.

     

    8. Diagnosis text is truncated. Can Prudential resend the exported data without truncation?

     

    Unfortunately, the data is truncated at source. However, we will be adding ICD9 and ICD10 codes (International Statistical Classification of Diseases and Related Health Problems) to the data set to assist in the identification in the event the diagnosis is being truncated.

     

    9. How do I differentiate between ICD9 and ICD10 codes?

     

    ICD9 codes were used before August 2012 and ICD10 codes were used after August 2012. However the data is subjected to inconsistencies and one way to tell is if they diagnosis  description is in capital letters, it will be ICD9 code.

     

    10.  Sometimes the date of event is actually after the date of discharge, what does the ‘date of event’ mean?

     

    Date of event refers to the day the medical condition occurred. Should the date of event be after the date of discharge, the entry can be ignored. At this point in writing, there are just 13 occurrences observed.

     

    Register for the Prudential Healthcare Challenge

    [CHALLENGE] Prudential Healthcare Challenge – LAUNCHED!

    *UPDATE: Challenge open for submissions till 18 April (extended by 1 week). Over 100 participants registered so far!

    prudentialchallenge_banner-1024x405

    Register for the Prudential Healthcare Challenge

     

    Prudential Singapore, is one of the top life insurance companies in Singapore with a rich history spanning more than 80 years. They are one of the market leaders in Healthcare, offering a Medisave-approved integrated medical insurance plan that provides comprehensive medical coverage. This helps ensure that you have the financial means for the best possible treatment and enjoy greater peace of mind without worrying about high medical expenses for you and your family.

     

    PREDICTING HEALTHCARE COSTS

    Healthcare cost is a growing concern in Singapore, where there is co-payment by patients on hospital bills. There is always the fear of affordability, especially in the event of a financially catastrophic illness. To enable consumers to make better healthcare & financial planning decisions, Prudential Singapore invites you to predict the cost of seeking treatment for individual consumers. Can you predict the consumers who have been admitted to a hospital in 2013 and their hospital bill size?

     

     

    The Challenge requires participants to predict the cost of seeking treatment for individual consumers who have been admitted to a hospital in 2013. Accuracy of the forecasts will be one of evaluation criteria, tested using Root Mean Square Logarithmic Error (RMSLE). The winning model should also be innovative, identifies key drivers that are insightful and unprecedented and also highlighting insights gleaned from external data sources.

     

    (1) Prize: Up to SGD$15,000 (allocation of prize fund to the winning teams will be determined by Prudential)

     

    (2) Timeframe: 5 weeks (14 March – 18 April)

     

    (3) Data:

     

    The hospital bills data shows year, gender, unique hospitalisation ID for each admission, hospital name, description of diagnosis code, event data, date of administration, discharge date of the admission, admission type, ward type and actual charges of bill amount. There are 10,000 unique IDs with 38,000 records. Each record contains an item in the hospital bill. One hospital bill can contains more than 1 item. For most records, each hospital bill will have more than 1 item and diagnosis, each ID might have more than 1 unique hospitalization ID. The date of admission and date of discharge is provided, together with type of hospitalization and ward.

     

    You will also find supporting data as follows:

     
    • Beds Occupancy Rate (BOR)
    • Average Hospital Inpatient Bill Size Tables
    • Public Hospitals – Medical Specialties
    • Private Hospitals – Medical Specialties
    • Private Hospitals – Surgical Specialties
    • Public Hospitals – Surgical Specialties
    • Health Manpower
    • Health Facilities
    • Consumer Price Indices (CPI) & Household Healthcare Expenditure
    • Hospital Admisssion Rates by Age and Gender (2010, 2011, 2012)
    • Attendances at Emergency Medicine Departments
    • Waiting Time for Admission to Ward
    • Top 10 Conditions of Hospitalisation
    • Population And Vital Statistics
     

    (4) Your Task:

     

    Can you predict the consumers who have been admitted to a hospital in 2013? From your prediction, take into consideration the healthcare costs data to further predict the costs of the each consumer’s hospitalization cost for the same time period.

     

    (5) Submission requirements:

     

    1)    Short write-up/synopsis (1-2 pages) on your analysis methodology, insights from data, suggestions on implementations, and the external research data that will make your model more applicable
    2)    Predicted hospital bill size for Jan to Dec 2013

     

    Note: Accuracy of the data will be evaluated real-time (against RMSLE evaluation metric). Participants can resubmit their entries until the closing date

     

    (6) Selection for interview to Prudential judging panel on 7 May, 2014:

     

    At least 5 teams will be shortlisted to present to the Prudential panel based on the following:
    a)     Most accurate forecast (evaluated using RMSLE)
    b)     Predictive Model that is compelling and innovative
    c)     Identification of key drivers that are insightful and unprecedented
    d)     Other interesting insights gleaned from external data sources

     

    Register for the Prudential Challenge | Find out more about the Prudential Challenge (VIDEO)

     

    -

     

    ANY QUESTIONS?
    Check our our Prudential FAQ Page!

    [INTERVIEW] Azhagarasan Annadorai – 2nd Runner Up of DSM Challenge

    DSM278-624x416
     

    Azhagarasan Annadorai (affectionately known as Aza) from Exilant Technologies won the 2nd Runner Up prize in our inaugural Data Innovation Challenge, hosted by DSM Engineering Plastics. Learn more about the techniques and tools he use in this second part of our interview series!

     

    1. What was your background prior to entering this Challenge?

     

    I’ve 17+ years of experience in the IT industry, primarily as a data warehousing and BI consultant. I’ve also been an entrepreneur; built two separate businesses before. One such start-up was Sellers Inc which is a consultancy service provider, where I employed data warehousing technique to identify & match highly skilled consultants. In the process, I found out that only MNCs can afford data warehousing solutions, as the technology is so expensive. This finding was the genesis to start another company, Kaizentric that built data warehousing products for SMEs. The product included features such as ETL (Extract-Transform-Load) on the cloud. I am currently employed at Exilant Technologies that specializes in Business Intelligence (BI) and Mobile Technologies (Mobility).

     

    2. What made you decide to enter?

     

    I am passionate about technology. I love data. When the DEXTRA Challenge came, I had to take it. It rekindled my passion and it gave me a real control over the models I’ve built. On top of that I also have the startup entrepreneurship mentality; I like to build things and work on mini projects in my free time. What really surprised me was how much interesting data, DEXTRA managed to liberate. It is difficult to establish the trust with the client that enables its organization to open up its data to anyone outside the company. When the Challenge came, I saw it as a low hanging fruit. It is amazing how DEXTRA is able to play a role of a neutral medium and get data from Enterprises like DSM (albeit anonymised). I am impressed by the work DEXTRA has been doing. Great job done in constantly nurturing the talent and facilitating the challenge.

     

    3. What kind of obstacles did you face?

     

    As a solutions consultant, I had to put myself in the shoes of the business to understand the current business process. So as to find out what sort of people influence the data captured beyond just looking at the raw data. For instance, how a typical customer’s need for a material is captured by sales person, how a sales person will behave in a scenario (forecasting decision) and how the management will react to sales figures. It is easier after I figured all these out to build an algorithm that will identify and close the gaps in the business process. After all, any software is computerization of manual processes and workflows that humans carry out. An intelligent one includes a set of instructions that simulates decision process.

     

    4. What preprocessing and supervised learning methods did you use?

     

    I started off with data profiling to find the basic statistics of the data like finding out the min, max and range of values. I also tried to understand how to standardize the data. With profiling I was able to understand the conversion rules. For instance I initially thought that the sales number was the revenue but it was in fact the quantity. After that I started correlating data and graphing them into charts. I generated many charts, tried to understand data and identify the gaps. There were some gaps and I had to use other datasets to patch the information lacking in some datasets. I tried to understand the shortcoming of each dataset. I found out that budget data for this challenge was not really relevant. Also observed that the budget was far off from the forecasts. I used a data warehousing technique called ETL. ETL stands for Extract Transform Load. ETL is normally applied to integrate and standardize data from multiple sources for better business insights. As for the supervised learning technique, I used a sort of neural network embedded in the logic I constructed. I believe in continuous improvement and my technique allows the model to get corrected based on a feedback loop, so my forecast will get better every time we churn the data.

     

    5. Which tools did you use?

     

    There are innumerous BA tools available in the market. We really just have to pick one and go with it. I picked Lavastorm analytics among other tools such as Tableau, QlikView and Rapid Miner. This is because I like the way Lavastorm analytics integrates with the open source R programming language. Tableau has come out with similar integration recently. I might use this for the next phase.

     

    6. What have you taken away from this competition?

     

    I am a socially active person and love to be surrounded by entrepreneurs and business experts. DEXTRA gave me an opportunity to meet like-minded people and make valuable connections to business leaders. That’s my take away!

    [INTERVIEW] David Low & Richard Oentaryo – Winners of DSM Challenge

    DSM271-624x416
     

    David Low & Richard Oentaryo from SMU Living Analytics Research Centre won our first ever inaugural Data Innovation Challenge, hosted by DSM Engineering Plastics. Learn more about the techniques and tools they use in this first part of our interview series!

     

    1. What was your background prior to entering this Challenge?

     

    We are both researchers at the Living Analytics Research Centre, a new joint research initiative between Singapore Management University (SMU) and Carnegie Mellon University (CMU) to conduct research on consumer and social network analytics and behavioural experiments. Our goal is to discover and harness the laws of information network evolution for networks of people, organisations and businesses. For example, one of our key projects is about real-time analytics on Singapore Twitter users, with applications such as real-life event detection, news tracking, sentiment analysis, etc. (see e.g. http://research.larc.smu.edu.sg/palanteert/).

     

    Richard: Before joining SMU, I was a Research Fellow at the School of Electrical and Electronic Engineering, Nanyang Technological University (NTU).

     

    David: I was previously a Research Engineer in NTU, partnering with LTA and MIT in an urban mobility research project.

     

    2. What made you decide to enter?

     

    Richard: This is probably the first online data mining challenge ever hosted by a multi-national company (MNC) based in Singapore. Given our research expertise and works on data mining and machine learning, joining a DEXTRA Challenge seems like a natural thing to do. When I was a PhD student in NTU, I did some research on stock market prediction, which combines technical analysis and machine learning methods. The problem is kind of similar, so we wanted to see how far we could go with this.

     

    David: The datasets offered on DEXTRA were really interesting as well. It is generally difficult to get such proprietary data from any MNC.

     

    3. What kind of obstacles did you face?

     

    Unlike stock market prediction where the prediction variables are of numerical type (e.g., past stock prices), the information in DSM dataset largely comprises categorical variables. We also found out that some categorical variables that appeared that in the training set did not appear in the test set, a paradigm which we call cold start problem. Our methods ought to address these issues. Also, the evaluation criterion (i.e., RMSLE) is quite unusual, whereas most regression methods in machine learning are usually optimized for least-square error (i.e., RMSE). Last but not least, we did not find the CRM data to be particularly useful, as our experimental results actually show degradation in our model performance.

     

    4. What preprocessing and supervised learning methods did you use?

     

    We used binary encoding for the categorical variables. For example we had 4 market segments M1, M2, M3 and M4, and we encoded them as ’1 0 0 0′, ’0 1 0 0′, ’0 0 1 0′, and ’0 0 0 1′ respectively. We also gave more weight to the instances of the more recent data to train our models; that is, we assume that the more recent data is more relevant. Moreover, instead of using the raw sales price, we used log(sales price + 1) as our target variable. This allows us to transform the task into a least-square error (i.e., RMSE) minimization problem, which is what most regression methods are typically optimized for. We employed a number of supervised learning techniques. As our baseline, we used two single models: nearest neighbours and linear regression. Through our experiments, we found that nearest neighbours performed better than linear regression. However, the performance is still generally unsatisfactory. As our final approach, therefore, we applied ensemble models that consist of multiple (instead of one) learning models. Specifically, we used random forests and extremely randomised trees (Extra Trees). Extra trees is similar to random forests, except that it performs a further randomization step when drawing the thresholds for each candidate feature. This extra randomization step can help deal with local optima cases and thus improve the overall prediction. We used these two models in a previous project on fraud detection in advertising. In most cases, we found that Extra Trees performs substantially better than random forest.

     

    5. Which tools did you use?

     

    We used open source programming languages and tools. For data preprocessing and visualization, we used the R programming language. More specifically, we used ggplot2 and plyr packages for the R visualization. For the regression algorithms, on the other hand, we used the Python language. The specific library we used was the Scikit-learn. We chose these languages mainly because we find them easy to implement/extend. Also, since we only had less than a month to work on the challenge, we had to choose a more agile development plan. Using these languages and existing libraries, we find that the development time is much shorter than that of other languages like Java or C++.

     

    6. What have you taken away from this competition?

     

    We got to know business experts from DSM and got to network with other expert data scientists. For example we got to talk to Xavier Conort, a top ranked and well known data scientist in Kaggle. Not to mention that this is the first challenge of its kind organized by a MNC, so we got to learn a lot of new business insights and concepts.

     

    [WORKSHOP] Data Governance & Visual Analytics

    A big thank you to all of you who have registered for last Thursday’s Data Innovation Challenge workshop! Thanks for joining us either at the event itself or online via our live webcast. For those of you who missed our workshop, here’s a quick round-up of what took place on the evening of 27 February.

     

    (1) Tools for the Data Enthusiasts – Pat Hanrahan, Tableau

     

    Pat Hanrahan, Tableau Software’s Co-founder and Chief Scientist, shared on tools available for data enthusiasts and the value of effective data visualisation. Pat received three Academy Awards for Science and Technology, and brings with him the experience of having worked at Pixar when sharing on the power of visualised data.

     

     

    Tools for Data Enthusiasts – Pat Hanrahan

     

    (2) Data Innovation Challenge Announcement – David Ng, Prudential Singapore

     

    Healthcare cost is a growing concern in Singapore, where there is co-payment by patients on hospital bills. There is always the fear of affordability, especially in the event of a financially catastrophic illness. The challenge Prudential Singapore will be launching on DEXTRA is to build a model that can predict the average cost of seeking treatment and therefore enabling consumers to make better decisions on the cost of insurance coverage. The challenge will be going live in the next 1-2 weeks, so stay tuned for more information on this exciting challenge!

     

     

    Data Innovation Challenge – David Ng

     

    (3) Data Laws in Asia Today – Elle Todd, Olswang

     

    Elle Todd, Head of the Asia media and technology practice at international law firm Olswang, closed the session by sharing on the latest trends in data legislation and common myths about the position in Asia on the upcoming personal data protection act.

     

     

    Data Law in Asia Today – Elle Todd

     

    Photos from the event are also up on Facebook, so do check them out and tag away when you spot yourself (and your friends)!

     

    P.S. If you missed out on our previous workshops, catch up on the presentations here.

    Data Innovation Challenge Prize Winners @ MICE coLAB

    MICE-coLAB-pic
     

    In the recently concluded MICE coLAB, organized by UP Singapore and co-sponsored by the Singapore Tourism Board, Team Beeve emerged as clear winners of the DATA INNOVATION CHALLENGE PRIZE. This is a prize awarded to the team who demonstrate compelling analytics and insights from the data, and leverage that to build their prototypes.

     
    Beeve-DEXTRA-2-1024x682
     

    Over the weekend, Team Beeve built an app to help business visitors schedule their Exhibitions and Conventions, supported by a data dashboard.

     

    By correlating between the length of stay and length of the conference, visitors are observed to have minimal time for leisure. And the fact that more time is spent on mobile devices for work on the move, there is a greater demand for easy-to-use, multitasking apps that solves the kind of “problems” typical for a business visitor. Hence, this inspired Team Beeve to develop an app that focused on smart scheduling for the business travellor, with a focused on simplicity as the key.

     

    Team Beeve approached the challenge with a three-prong approach of “Collect. Store. Maximise.” This meant that the Beeve app was designed to (1) collect data about the visitors via their mobile app, (2) Store the organisers and MICE attendees’ data via a Content Management System, and (3) Maximise the value of the data gathered to understand the visitors via Data Visualisation.

     

    They demonstrated cool mash-ups of various datasets to drive analysis, and the consensus was that with expanded datasets and more development, the Beeve app will indeed be a valuable tool!

     

    Check out their presentation here

    Join us at the Data Governance & Visual Analytics workshop!

    workshop-27-feb-1024x405
     

    We are excited to present two of the industry’s experts: One is a three-time Oscar winner. The other is a domain expert in the area of data legislation.

     

    Pat Hanrahan, Tableau Software’s Co-founder and Chief Scientist, will be speaking on the value of effective data visualisation. Pat has also worked at Pixar where he developed volume rendering software and was the chief architect of the RenderMan Interface – a protocol that allows modeling programs to describe scenes to high quality rendering programs. Pat has received three Academy Awards for Science and Technology, the Spirit of America Creativity Award, the SIGGRAPH Computer Graphics Achievement Award, the SIGGRAPH Stephen A. Coons Award, and the IEEE Visualization Career Award. With his current research involving visualization, image synthesis, graphics systems and architectures, come expectant to learn from this three-time Academy Award recipient on the power of visualised data.

     

    Elle Todd is Head of the Asia media and technology practice at international law firm Olswang and lives in Singapore. Educated at Cambridge University and has worked in London and Brussels prior to Asia, Elle also advises suppliers and customers from large corporations and household names to tech startups and entrepreneurs with particular expertise in digital media, data and the cloud. With the upcoming personal data protection act, she will be sharing on “Data Law in Asia Today: myths, realities and what you need to know”, covering the latest trends in data legislation and common myths about the position in Asia on the new Singapore Act.

     

    WORKSHOP SCHEDULE

    6:00PM: Registration & Refreshments
    7.00PM: Value of Visualisation, Pat Hanrahan, Chief Scientist at Tableau
    7.30PM: Q&A Session with Pat Hanrahan
    7:45PM: Data Innovation Challenge Updates
    8:00PM: Data Law in Asia Today, Elle Todd, Olswang
    8.30PM: Q&A session with Olswang
    8.45PM: Networking

     

    EVENT DETAILS

    Date: 27 Feb, Thursday
    Time : 6:00 to 9:00PM
    Venue : Olswang Asia LLP Office, #05-01, 10 Collyer Quay, Ocean Financial Centre

     

    PLEASE NOTE!

    Due to limited space, please register your interest to attend by providing us with your details. We will inform you by Tuesday, 25 February whether your request is successful. In the event that there is too great a demand, don’t worry – we will be livestreaming the event online, so that you can still join us! Also, your name will automatically enter a draw where you stand a chance to win a Nexus 7 tablet!

     

    Thanks for your understanding and we look forward to seeing you!
    REGISTER YOUR INTEREST HERE

     

    Subscribe to our newsletter to be updated on our upcoming workshops and events. Stay tuned!

    MICE coLAB Data Innovation Challenge, 21 – 23 Feb

    banner1
     

    We’re giving away a $500 cash prize for Data Innovation!

     

    The Singapore Tourism Board(STB) is bringing you MICE coLAB 2014, a 2 day hackathon in partnership with UP Singapore from 21-23 February. Many of you might not be familiar with the MICE (Meetings, Incentives, Conventions & Exhibitions) industry, but it is big business. In our very own Singapore, Business Travel and MICE visitors numbered 3.35 million visitors in 2012, who spent an estimated $5.73 billion!

     

    MICE coLAB 2014 is a hackathon to encourage collaboration between the MICE industry, technology and data science community or just anyone with a great idea. The aim is to develop solutions that utilise technology to track and collect the visitor’s preferences, movement and transactions real-time. Can you use the historical hotel occupancy data to predict exact hotel occupancy rates for upcoming events like the World Cities Summit 2014 or the Annual Singapore Grand Prix? Or how do we measure the impact of currency exchange rates to visitor arrivals? Data are provided, so just show up!

     

    Participants will have access to complimentary data from DEXTRA, UP Singapore and various public agencies to create your prototypes.

     

    Visitor Arrival Data

    There will be visitor arrival data by residence length, gender and age from 2005 to 2013. There will also be another set of tourist arrival data by Land, Sea or Air arrival from 2003 to 2013. (Excludes Malaysian Land arrivals to prevent skewed numbers) This dataset contains over 20,000 data points over 56 countries!

     

    Points of Interest Data

    SLA (Singapore Land Authority) and LTA (Land Transport Authority) are providing data on points of interest (island-wide) including Address points, Bus stop locations and SMRT routes.

     

    Exchange Rate Data

    Historical exchange rates of Singapore Dollar (SGD) against various currencies from 2009 to 2013 will also be provided.

     

    Other interesting Data

    In addition, DEXTRA & UP Singapore have partnered up to traverse through the web to gather interesting datasets and process it into a machine-readable format for you, including:

    • Hotel occupancy rates from 2003 to 2013
    • Event spaces available for booking from public, geotagged island-wide
     

    MICE coLAB Data Innovation Challenge
    The Challenge: To create solutions that maintain Singapore’s position as the top MICE destination through data analytics! (Find out more)

     

    The Data Innovation Challenge, enabled by DEXTRA, will be sponsoring a $500 cash prize. This award will go to teams who demonstrated compelling analytics and insights from the data, and leveraged that to build their prototypes.

     

    This is the third time we are awarding the Data Innovation Challenge Prize, after achieving successful ideas at the E3 hackathon as well as the Clean & Green Hackathon 2! Find out who the winners for our previous Data Innovation Challenge Prize are, and their winning ideas HERE.

     

    EVENT DETAILS
    MICE coLAB
    Dates : Fri, 21 Feb – Sun, 23 Feb 2014
    Venue : Suntec Singapore Convention & Exhibition Centre, 1 Raffles Boulevard

     

    EXCLUSIVE FOR DEXTRA MEMBERS
    Use the discount code: ilovedextra and enjoy a waiver off the $15 deposit!

     

    [CHALLENGE] Silverline Predictive Behaviour Challenge FAQs

    GENERAL QUESTIONS

     

    1.    You could possibly explore sound and the effect it has on the elderly. Does getting a call from a loved one help the seniors? Does it give them a better mood?

     

    Currently we are looking into sound in terms of distress sounds (i.e. Screams, falls etc) but not how mood is affected by voice calls from family members. It is something that we will look into but at this point, sound data will not be available for the challenge

     

    2.    How would you ensure that the number of glasses of water taken is the actual amount and not a false reading?

     

    Our aim is to automate the collection of this data through the use of sensors and tracking in the home. At the moment, we are relying on the user input data to give us a gauge of how much water the senior is drinking but we intend to automate this process in future. We are focusing in positive behavioral change and having that reminder to drink water and having to log your water intake daily will hopefully help to ensure that a good amount of water is drank per day.

     

    3.    What kind of outcome are you looking for with the Apps and the Sensors you are working with?

     

    As we collect more data and gain more seniors using the sensors in the home, we want to be able to predict any incidents before they happen as a preemptive measure. Using data analysis and your potential prediction models, we want to be able to alert the companion app users that there might be a problem or something may happen soon so that they can take the necessary actions. We are looking to build personal algorithms for every senior to effectively predict and monitor them. These algorithms will be able to pick up on trends and hopefully identify any instances of large variation from the usual trends of the user as a potential focus point.

     

    4.    How would you get seniors to use smartphones in the first place?

     

    The care pack is built to be as intuitive and friendly as possible to entice first time smartphone users to try the app. The incentive to use the smartphone would come from being able to interact directly with their sons/daughters and grandchildren through the use of direct messages and exchange of pictures through the companion app – care pack interaction. It is still definitely an issue but we are confident we have the solution and package to get seniors to begin using smartphones more and more. An important aspect of Silverline as a whole is to ensure that old smartphones continue to be used, even after you upgrade to a new phone. These devices are powerful and can still serve an important purpose, helping seniors join the smartphone revolution by making the previously complicated phones easier to use and more accessible.

     

    5.    Are we limited to only the datasets given?

     

    Not at all, We provide all the data we have to give you a good starting point but we encourage you to go as broad as you need to show us that your concept and idea works. You can pull data and results from your own sources to back up your model and show that this would work for our eco-system. We want to keep this challenge as open as possible and we are more than happy to receive suggestions on what sort of data would be more beneficial to make your models work.

    [CHALLENGE] Silverline Mobile Predictive Behaviour Challenge

    Register for the Silverline Challenge

    Launching: Wednesday, 29 January

     

    Silverline Mobile, a Singapore based company, is dedicated to creating the world’s first senior mobile eco-system to aid seniors in living more wholesome and healthy lives. With the development of Silverline Mobile apps on powerful smartphones, low energy sensor systems and data analytics, they are working to create an Internet connected system that allows Silverline Mobile to support effective behavioral change in seniors for them to achieve a healthier and happier lifestyle.

     

    Silverline provides a suite of apps that are made specifically to meet the needs of seniors and help them lead more connected, productive and healthier lives. You can now be part of the eco-system by taking part in the Silverline Predictive Behaviour Challenge.

     

    (1) Challenge Statement:

    1. One of the aims of Silverline is to implement effective behavioural change in its users for them to achieve a healthier and happier lifestyle.
    2. Highlight an observation, propose an algorithm, model that helps Silverline achieve this through the use of the applications and data sets shared with you.
     

    Find out more about the challenge announced at our New Year Party on 09 January:

    Silverline Challenge Announcement from Newton Circus on Vimeo.

     

    (2) Prize: SGD$5,000

     

    (3) Timeframe: 4 weeks (29 January – 26 February)

     

    (4) Data: Data is captured from multiple sources, including App User Input; Summary Statistics of App performance as well as various sensor data. We are in the midst of capturing more data so do look out for richer and more comprehensive data in the weeks ahead. Currently we are releasing:

     

    1.    The User Input data (in JSON format) includes

     

    (a)    User data with information like Date of Birth, Gender, Unique Object ID, etc.
    (b)    Water Consumption data with information like timestamp and linked to Unique User Objects
    (c)    Exercise data with information like timestamp, binary value of ‘Exercise: Yes’ or ‘Exercise: No’ and linked to Unique User Objects
    (d)    Mood data with information like timestamp, binary value of ‘Happy’ or ‘Unhappy’ and linked to Unique User Objects
    (e)    Medication Reminder data with information like timestamp of reminder notification and linked to Unique User Objects

     

    2.    The Summary Statistics of App performance (in CSV format) is an aggregated set of data showing user engagement over a timespan of 4 months, capturing no. of Screen Views and Average Time spent on screen etc.

     

    3.    The Sensors data (in CSV format & timestamped) includes

     

    (a)    Motion sensor data indicating if an elderly has passed by an area of the house, for instance placed in the kitchen
    (b)    Power consumption monitor data measuring the power usage of certain household appliance
    (c)    Accelerometer data measuring opening and closing of a door
    (d)    Bed pressure data indicating the time an elderly has gotten out of the bed

     

    4.    Other datasets to be released are more accelerometer data as well as bluetooth tag data placed in more location around the house measuring pill box usage, tagged to cups measuring water consumption.

     

    (5) Your Task:

     
    • Submit a 1-2 page write-up of interesting insights you have gained from the data as well as propose an algorithm, model that helps Silverline implement positive behavioral change in its users.
    • Your write-up has to include a working prototype model, algorithm that can be tested and implemented into Silverline.
     

    (6) Submission requirements:

    Short proposal (1-2 pages) on the insights from the data, analysis methodology, prototype data models, suggestions on app development as well as external research data applicable to Silverline

     

    Note: You can submit it in html or pdf format. If you have working prototypes built, place it together in a zip folder before you submit.

     

    (7) Selection for interview to Silverline judging panel

     

    1.    Top 5 most compelling proposals will be invited for a presentation before a judging panel on 14 March.

     

    2.    Top 5 entries will be selected based on:

     

    (a)    Displayed one key in-depth analysis into the data, presenting interesting insights never before seen
    (b)    Proposed an innovative way of modelling the data, which can be tested with validation data and metrics
    (c)    Presented any other relevant models backed up with research & external datasets
    (d)    Presented a list of suggestions on how to set up and gather the data environment for the models to work and be tested
    (e)    Proposed suggestions on how to improve the app by incorporating the model proposed in order to achieve Challenge objective

     

    Register for the Silverline Challenge | For more information on the Challenge (PDF)

    [WORKSHOP] DEXTRA New Year Party

    DextraParty09-1024x399
     

    The DEXTRA New Year Party that took place on Thursday, 09 January at The Co had an amazing turn out of over 100 guests, and we want to say a big thank you for taking the time off to join us for our first Data Innovation Challenge event of 2014! The crowd was a great mix of data professionals and enthusiasts, business leaders and academics from across different industries – healthcare, financial services, advertising and media, just to name a few! In case you weren’t able to make it down to join us that evening, here is a quick run-through of what took place that evening:

     

    (1) Data Innovation Challenge Milestones

     

    By Daryl Arnold, CEO, Newton Circus
    2013 has been an amazing year: We had our launch party in June, organised 2 workshops, and rolled out our first challenge by DSM. Look out for more challenges and activities to come!

     

    Data Innovation Challenge Milestones – Daryl Arnold from Newton Circus on Vimeo.

     

    (2) DSM’s Success Story as a Challenge Host

     

    By Clara Lee, Business Information Global, DSM
    The Data Innovation Challenge experience and memorable moments of running a Data Innovation Challenge from the challenge host’s perspective.

     

    DSM Challenge Business Value – Clara Lee from Newton Circus on Vimeo.

     

    (3) Silverline’s Predictive Behaviour Challenge

     

    By Mats Lundgren, CEO & Naing Maw, Project Manager, SIlverline Mobile
    Silverline creates the first Mobile Senior Ecosystem with the Smartphone as the Hub, and aims to define the future of mobile media, community, security and health for Seniors.

     
    Silverline-banner-1024x683
     

    Challenge launching on Wednesday, 29 January: Be part of this amazing revolution of bringing Smartphones to the Seniors by participating in this challenge where you can use data innovation to come up with creative ideas to help shape the development of Silverline!

     

    Find out more about the Silverline Predictive Behaviour Challenge.

     

    (4) Workshop 1: Structural Equation Modeling

     

    By Dr. Chan Siew Pang
    As a practicing medical statistician with formal education in statistics, decision sciences and industrial engineering. He founded the world’s only state-approved Stata User Group in Singapore in 2008, while serving as the programme head of business analytics at SIM University. Keen to apply structural equation models in urban and epidemiological studies, he has published over 90 articles in peer-reviewed academic journals.

     

     

    Structural Equation Modeling – Dr. Chan Siew Pang from Newton Circus on Vimeo. 

     

    (5) Workshop 2: Data Visualisation

     

    By Yin Shanyang (“Yang”), Swarm Studio
    Yang is the man behind cool projects like Observing Taxi Behaviour, Observing Train Disruptions, and even the Singapore Elections Tracker for GE2011.

     

     

    A Crash Course in Data Visualization – Yin Shanyang from Newton Circus on Vimeo.

     

    We will be having more Data Innovation Challenge workshops and events, so do sign up for our newsletter to keep yourself updated on what’s happening at DEXTRA! Also, you can catch up on what was shared at our previous workshops here.

    Join us at our New Year Party!

    newyearparty_banner
     

    2013 has been an amazing year, and thank YOU so much for being part of the DEXTRA journey with us. We had our launch party in June, organised 2 workshops, and rolled out our first challenge by DSM. Come celebrate with us as we usher in the new year!

     

    Here is what we have in store for you:

     

    (1) DSM Sales Forecasting Challenge Round-up:

     

    Our winning teams from the DSM Challenge will be giving you lightning presentations on their winning models and ideas. Ms Clara Lee, Director of Business Information Global for DSM, will also be sharing on the experience and memorable moments of running a Data Innovation Challenge from the challenge host’s perspective.

     

    (2) Silverline Challenge Announcement:

     

    Our next Data Innovation Challenge host, Silverline, will also be announced at the workshop. Silverline creates the first Mobile Senior Ecosystem with the Smartphone as the Hub, and aims to define the future of mobile media, community, security and health for Seniors. Be part of this amazing revolution of bringing Smartphones to the Seniors by participating in this challenge where you can use data innovation to come up with creative ideas to help shape the development of Silverline!

     

    (3) DEXTRA Workshop:

     

    Talk #1: Structural Equation Modeling by Dr Chan Siew Pang, a practicing medical statistician with formal education in statistics, decision sciences and industrial engineering. He founded the world’s only state-approved Stata User Group in Singapore in 2008, while serving as the programme head of business analytics at SIM University. Keen to apply structural equation models in urban and epidemiological studies, he has published over 90 articles in peer-reviewed academic journals.

     

    Talk #2: Data Visualization by Yin Shanyang (“Yang”), who tinkers with technology through the entity Swarm. Yang is the man behind cool projects like Observing Taxi Behaviour, Observing Train Disruptions, and even the Singapore Elections Tracker for GE2011.

     

    So come and have a few beers with us, chat with our challenge host and exchange ideas with other data specialists and enthusiasts in the DEXTRA community. See you there!

     

    E V E N T  D E T A I L S
    Date: Thursday, 09 Jan
    Time: 6:00-9:00PM
    Venue: The Co, 75 High Street, (S)179435
    REGISTER HERE

    DSM Phase 1 Challenge – Finale!

    The DSM Predictive Sales Forecasting Challenge came to an exciting conclusion on the evening of 13 November 2013. Over the course of this 4-week online Challenge, we invited the DEXTRA community of data specialists to accurately project DSM’s APAC sales volume in Electronics segment for 2013 based on past 2-3 years sales volume. Hundreds of submissions were received, of which the top ranked participants have demonstrated significant improvements to existing forecasts!

     

    The finale of took place in the form of a presentation to DSM’s business leaders, where 6 DEXTRA teams were shortlisted after the close of our online challenge on 5 November, 2013. We are also proud to share that DEXTRA has gone regional! We had a Vietnamese team lead by Sebastian Nguyen who connected with us and made their presentation via a live teleconference.

     

    The winning teams came up with models that not only accurately predicted the sales figures for 2013, but also have a compelling and innovative methodology to be able to give good indication for 2014 and beyond. And at the end of the evening, 3 winning teams were selected. They are:

     

    First Place – Team led by David Low (Singapore)

     

    David and his team member, Richard, are both researchers with the Living Analytics Research Centre at SMU. Together they have expertise in the areas of data mining, machine learning, and network mining.

     

    The team explored the data and methods in depth by first identifying important predictors before eventually deploying the use of an ensemble model which included random forests and extra forests. This team also did something interesting. They used the most recent 3 months’ data as a gauge to strengthen their predictions as the validation in our system is based on this year’s data. The team also made an observation: Investigations can be carried out to verify whether these features make sense.

     

    Second Place – Team led by Dr. Rubing Duan

     

    Dr Rubing Duan and his team members are all data scientists in A*STAR’s Institute of High Performance Computing. Together they have expertise in big data analytics, performance analysis and prediction as well as resource scheduling for parallel and cloud computing.

     

    For Rubing’s team, they started by analysing each country’s sales trends, followed by a breakdown of trends in different materials. They adopted the use of linear regression, radial basis function (neural network) and support vector machines for their models which also evaluated variables for prediction. This team observed and highlighted trends for DSM product lines and materials over the past two years.

     

    Third Place – Azhagarasan Annadorai (Singapore)

     

    Azhag is from Exilant Technologies that specialises in Business Intelligence & Mobility, and has more than 17 years of experience in the IT sector that includes Project & Program Management. His expertise lies in the areas of data warehousing, business intelligence & analytics.

     

    Azhagarasan’s predictive model is built upon the customer’s profile and its accuracy determined by the number of products a particular customer orders. He took into consideration the scalability of the model and also the technology that he chose to use. He also thought about the role of the sales person, and factors to consider from the sale persons’ perspectives.

     

    -

     

    We are now working closely with DSM and the winning data scientists to implement the model back into the business. In the meantime, do look out for more Data Innovation Challenges coming your way. Also, expect to see more exciting challenges hosted by DSM!

     

    P/S Data Science Superstar (and also a great friend of DEXTRA), Xavier Conort, was also present at the finale as a technical advisor to our judges. He gained experience in Machine Learning by running Gear Analytics, a Singapore-based consultancy and by competing on Kaggle (the leading platform for predictive analytics competition) and was the #1 ranked data scientist on Kaggle overall for almost a year in 2012-2013. No wonder everyone was asking for a photograph with him!

    Clean & Green Data Innovation Challenge, 8 – 10 November

    Web
     

    The National Environment Agency is bringing you the Clean & Green Hackathon in partnership with UP Singapore from 08-10 November. How can we use dengue-related data to create lifestyle changes to prevent the increasing threat of dengue in Singapore? Do you have fresh ideas of how to reduce citizen produced air pollution? In addition, the Clean and Green Hackathon will be showcasing the brand new electronic Tesla Roadster, the eco-friendly YikeBike, plus the chance to build a giant mosquito made from recyclable materials! And of course, EXCITING NEW DATA.

     

    Participants will be treated to brand new datasets from NEA in addition to updated data released for the inaugural Clean & Green Hackathon. You will also have access to complimentary data from other agencies to create your prototypes.

     

    (1) AIR QUALITY

     

    The National Environment Agency is releasing daily data of 24 hour PSI levels, 24-hr Sulphur Dioxide, 24-hr Nitrogen Dioxide and 24-hr Carbon Monoxide levels spanning January to September 2013 across 5 regions in Singapore.

     

    (2) ENVIRONMENTAL HEALTH DATA

     

    Participants will also get access to environmental health data including Food Hygiene feedback data.

     

    Other data sets in these category have been released for the first time and include:

     

    - Cleanliness feedback
    - Dengue cluster by block
    - Anonymised Dengue Offences
    - Findings from sociological study on littering

     

    (3) ENERGY DATA

     

    In addition, the Energy Market Authority will be releasing anonymised gas and electricity consumption data that includes over 1.5 million anonymised accounts. Participants will have access to more than 24 million electricity records and 8 million gas records, spanning 36 months. In addition, there are over 12 million electricity records at half-hourly resolution, over a period of 8 months.

     

    -

     

    Clean & Green Data Innovation Challenge

     

    The Data Innovation Challenge, enabled by DEXTRA, will be sponsoring up to 2 awards of $500 in cash each. This award will go to teams who demonstrate compelling analytics and insights from the data, and leverage that to build their prototypes.

     

    This is the second time we are awarding the Data Innovation Challenge Prize, after achieving successful ideas at the E3 hackathon! Find out who the winners for the EMA/Singapore Power Data Innovation Challenge Prize are, and their winning ideas HERE.

     

    The Clean & Green Challenge: Create solutions to better the environment under the four themes of Air Quality, Dengue Prevention, Public Cleanliness & Recycling and stand to win great prizes! (Find out more)

     

    You will also have the chance to win cash prizes, vouchers and seed funding up to $5000 if you create a winning prototype under the challenges proposed by OneMap, StarHub and IDA.

     

    Come join us at the Clean & Green Hackathon NEXT weekend, 8th – 10th November, to generate prototype applications to protect and conserve Singapore’s environmental resources. See you soon!

     

    E V E N T  D E T A I L S 
    Clean & Green Hackathon
    Dates : Fri, 08 Nov – Sun, 10 Nov 2013
    Venue :  NUS University Town, 8 College Ave West Singapore 138608

     

    EXCLUSIVE FOR DEXTRA MEMBERS
    Sign up here and enjoy a waiver off the $15 deposit!

     

    Data Innovation Challenge Prize Winners @ E3 Hackathon

    1E3-banners-Sep13-10-
     

    In the recently concluded E3 (Energy Efficiency for Everyone) Hackathon, organized by UP Singapore and co-sponsored by the Energy Market Authority and Singapore Power, 2 teams emerged as clear winners of the DATA INNOVATION CHALLENGE PRIZE. This is a prize awarded to teams who demonstrate compelling analytics and insights from the data, and leverage that to build their prototypes.

     

    Here is a round-up of the EMA/Singapore Power Data Innovation Challenge Prize winners and their cool ideas:

     

    (1) Team Power Heroes

     
    Team-Power-Heroes-1024x682
     

    They developed an energy saving app that uses real time visualization and gamification to provide you with real understanding of their electricity usage, and provide the motivation to make changes to their usage behavior.

     

    Not only does their app boast of interactive maps for your homes (with draggable electrical appliances to boot), they also came up with the Power Heroes smart socket to measure energy usage of any appliance in real-time. Social aspect of behavioural change is also incorporated, where you get to see how you stack up against your friends and the rest of Singapore in terms of energy usage through sharing of data. The happy outcome is: you know how much money you are saving by using energy more wisely, and having some fun in the process.

     

    Pitch by Team Power Heroes (VIDEO)

     

    (2) Team Feedback

     
    Team-feeback-1024x683
     

    They developed an app for data visualization of energy bills and consumption to show how your energy consumption for the month is stacking up against the previous months, as well as raise awareness to how you are doing against the rest of Singapore, by the area and type of home you’re living in.

     

    By using clever data visualisation, you are able to dive deeper into the way you use energy and gather insights from the trends and patterns in your energy consumption. Constant feedback is also key here, where mid-term check-points are provided to give you an idea of how you are faring, so that you are still able to tweak the way you use energy before you get a rude shock at the end of the month when your utilities bill reaches you.

     

    Pitch by Team Feedback (VIDEO)

     

    Look out for the Data Innovation Challenge taking place at the CLEAN & GREEN HACKATHON taking place 8 to 10 November at NUS UTOWN! We look forward to your participation and see you there!

    [CHALLENGE] DSM Sales Forecasting Challenge FAQs

    DSM-Challenge-FAQ-624x468
     

    These are a couple of questions that came up frequently.

     

    GENERAL QUESTIONS

     

    1. When will the challenge close?

     

    A: The challenge will officially close at 11:59PM, 06 November 2013 (Thursday). We will refuse all submissions made beyond this date & time.

     

    2. Why can’t I verify my account? I followed the tutorial and my account is still showing ‘unverified’.

     

    A: Follow the steps in this tutorial. For some accounts we require you to save each piece of information separately. For instance, after entering your Passport/IC number, click save changes. Then go back to enter another piece of information like your phone number and verify it.

     

    3. Can I form my own team to take part in the challenge?

     

    A: Yes, you can. Team formation is allowed. We are in the midst of building team formation online. In the meantime, do get your team leader to email us at mail@dextra.sg with the following: 1. Team name 2. Team members names 3. Team members emails. Please note that after the team is formed, only submissions from the team leader will be acknowledged. All other submissions from individual team members will be ignored and rejected.

     

    4. May I use my own external datasets to build my predictive model?

     

    A: Yes, you may.

     

    5. Are there any restrictions to the progamming language and tool I use to build my predictive model?

     

    A: There are no restrictions to the programming language nor the tool(s) you choose to use when building your predictive model. However, we do require you to submit a short write-up, with your ‘submissions.csv’ file, detailing the methods and resources you use when building your model.

     

    Data specific questions

     

    1. Across the various datasets, are the values under ‘Material#’ column a decimal number or just material code separated by a period? 

     
    Screen-Shot-2013-10-14-at-5_002
     

    A: Yes these are decimal numbers and not just a dot. We use various formulas to hash the data and it has resulted in the appearance of rational numbers.

     

    2. In the actual sales dataset ‘DSM_Data Set 1 Sales.csv’, there are quite a number of conflicting rows with different sales figures while having same values for the rest of the columns. Are there some errors that occurred during the dataset preparation?

     
    Screen-Shot-2013-10-14-at-4
     

    A: We omitted a even more granular classification of materials. Hence there is a confusion as to why there are two exact same row entries with different sales #. For this challenge, feel free to add these repeated entries into one. For instance, you can take the two highlighted rows in your image as one single row with 870 sales #.

     

    3. In the actual sales dataset ‘DSM_Data Set 1 Sales.csv’, why are there negative sales quantity under the ‘Sales#’ column?

     
    Screen-Shot-2013-10-14-at-5
     

    A: Negative sales numbers denote return orders by the customers.

     

    ***NOTE*** Please take all negative sales numbers in the sales data as zero. Inputting negative values in your ‘submissions.csv’ file will affect your score and rank adversely.

     

     

    We are still adding more questions to this list so feel free to email us your questions at mail@dextra.sg. Thanks!

    [CHALLENGE] DSM Sales Forecasting Challenge – CLOSED

    4-DSMEP_company-presentation-2013-Data-Innovation-Challenge-
     

    Note: The DSM Challenge has ended on 5 November, 2013

     

    Register for the DSM Challenge

     

    DSM, a Fortune 500 MNC, recently announced their Sales Forecasting Challenge for Engineering Plastics, a focused portfolio of products in which they have realised global leadership. DSM is the global number 3 in the overall market for semi-crystalline engineering plastics and is the global market leader in high-temperature polyamides. (More information on DSM)

     

    The DSM Challenge is NOW LIVE, and will be split into 2 key phases:

     
    1. Predictive Sales Forecasting (Phase 1):
      Accurate projection of 2013 DSM’s APAC sales volume in Electronics segment based on past 2-3 years sales volume. The winning model should take into consideration 2011 & 2012 results, and should give good indication for 2013 and beyond (2014 onwards). This competition requires participants to develop models accurately predicting actual monthly sales for 2013 and beyond (2014 onwards), based on 4 given data sets. Accuracy of the forecasts will be one of evaluation criteria, tested using Root Mean Square Logarithmic Error (RMSLE). Phase 1 will allow participants to have deeper understanding and insights into DSM’s business in Electronics.
       
      Find out more about the DSM Challenge (VIDEO)
    2.  
    3. Strategic Sales Forecasting (Phase 2):
      Based on the outcomes of Phase 1, DSM will hosting a Data Innovation Challenge Hackathon – a physical weekend event that will focusing on developing growth strategies for the Electronics segment. Participate in the Predictive Sales Forecasting Challenge for the opportunity to be invited to this challenge!

    (1) Prize:  SGD$10,000

    (2) Timeframe: 4 weeks (8 October – 5 November)

    (3) Data: Data is captured progressively each year and gets richer towards recent years. Hence, there is only data on sales forecast in year 2012 and budget data from 2011 to 2012.

     
    • The forecasts data shows forecasted sales numbers for 2012 with material code and customer code details similar to that in actual sales data. “Sales Area”, “Cust#”, “PdtLine#” and “Material#” are common keys that can be cross referenced with all the rest of the data sets.
    •  
    • The budget data contains budget data for year 2011 to year 2012 . “Market”, “Sales Area” and “PdtLine#”can be cross-referenced with all the rest of the data sets.
    •  
    • The actual sales data provides sales records of products from year 2011 to year 2013 in 6 different geographical regions. It includes sales quantities of each product encoded in “Material#” and customers under “Cust#”. Participants will also be able to tell how the products are eventually used under “Application Code”, “Product Market Combination” and “Market”. “Market”, “Product Market Combination”, “Application Code”, “Sales Area”, “Material#” & “Cust#” are fields that can be cross-referenced with all the other data sets.
    •  
    • The CRM data year is a supplementary data set that is captured by account managers on potential opportunities indicating the anonymized project number, giving details on when a particular project was commercialized and the status of the project. ( Status 6 is the Commercialization Stage.) The set also captures the projected volume of each material each year from the year the project is commercialized. “Market”and “PdtLine#” are fields that can be cross referenced in all other data sets.
     

    (4) Your Task:  You will be given 4 different sets of data, namely, 3 years of sales data of Electronics segment from 2010 to 2012, budget of sales at broad level, a year of forecasted data from 2012 and 3 years of data on CRM pipeline from 2010 to 2013 to determine the sales forecast by materials for January 2013 to December 2013.

     

    (5) Submission requirements:

     

    1)    Predicted actual sales figures by materials for Jan to Dec 2013
    2)    Short write-up/synopsis (1-2 pages) on their forecast methodology, findings and models

     

    Note: Accuracy of the data will be evaluated real-time (against evaluation metric). Participants can resubmit their entries until the closing date

     

    (6) Selection for interview to DSM judging panel

     

    1)    Top 20% most accurate models will have their 1-2 page synopsis reviewed after competition closes on 5 November, Tuesday.

     

    2)    Best 5 entries based on:

    • Most accurate forecast (evaluated using RMSLE);
    • Most compelling forecast method;
    • Most innovative forecast process.
     

    3)    Judging criteria (b) and (c) will take into consideration the following:

    • What are the year to year changes they observe
    • How they capture the change elements that will make the model work from year to year in forecasting accurately, such that they work for 2014 and beyond.
     

    Register for the DSM Challenge | Find out more about the DSM Challenge (VIDEO)

    [WORKSHOP] Forecasting The Future: A Round-up

    IMGP7664-624x414
     

    A big thank you to those of you who made it for our 2nd Data Innovation Challenge workshop, Forecasting The Future last Thursday, despite the F1 madness! For those of you who missed our workshop, here’s a quick round-up of the excitement that took place on the evening of 19 September, 2013.

     
    Mike_WS
     

    There was much laughter when Mike Anderson, UP Singapore’s championer hackathoner and McKinsey alumni, presented on the Art and Science of Sales Forecasting. Mike shared injected common pitfalls as well as anecdotes while sharing on several techniques for sales forecasting.

     
    Find out if Sales Forecasting is an Art or a Science (VIDEO)
     
    DSM1
     

    To the excitement of our participants, DSM Engineering Plastics also shared with us their Predicitive Sales Forecasting Challenge for the Electronics segment that will be hosted on the DEXTRA platform. The DSM Challenge will be launched in early October, and DEXTRA participants will not only be competiting for a cash prize of SGD$10,000, but also the opportunity to compete in the Strategic Sales Forecasting Challenge hackathon – Phase 2 of this exciting DSM Challenge!

     
    More details on the DSM Sales Forecasting Challenge (VIDEO)
     

    To close the workshop, Elixir Technology shared on their Ambience software, which combines Business Intelligence with cloud computing to provide scalability to any business needs. They showcased the use of visualisation widgets from 3rd party packages like D3.js and other JavaScript/HTML components.

     
    Intermediate level: Ambience software workshop (VIDEO)
     

    Photos from the event are also up on Facebook, so do check them out and tag away when you spot yourself (and your friends)!

    Forecasting The Future Workshop on Thurs, 19 Sep

    01-892A7373-624x416
     

    DATA INNOVATION CHALLENGE WORKSHOP 2
    6:00-9:00PM, Thursday, 19 Sep
    Amazon Web Services Office
    Register for the event

     

    The 2nd Data Innovation Challenge workshop is taking place on 19th September,Thursday! We had a great first workshop in August where we saw a turn-out of over 60 data specialists, learning from the best in our community. With a theme like Forecasting The Future, you can be sure that this second workshop is bound to pack a punch.

     

    Opening the session is Mike Anderson: a Mckinsey alumni, he holds a Double First in Maths and Economics from Cambridge University and represented the UK at the International Olympiad in Informatics! Mike also has an MBA from INSEAD complementing his significant experience working in Healthcare and IT architecture. For the last 2 years, Mike has been developing significant IP in the neural network space, and today his company Nuroko specialises in machine learning software for real-time pattern recognition and predictive analytics. Mike is a big contributor to the Open Data and Innovation community in Singapore and publishes his open source work under the moniker of mikera.

     

    After Mike we serve up our main course with our Challenge Host, DSM. DSM is a Fortune 500 MNC and is announcing their Sales Forecasting Challenge for Engineering Plastics, a focused portfolio of products in which they have realised global leadership. DSM is the global number 3 in the overall market for semi-crystalline engineering plastics and is the global market leader in high-temperature polyamides. More information on DSM.

     

    To close, we have a workshop by Elixir Technology. This is a 60-minute Intermediate workshop covering their Ambience business intelligence suite, where you will learn about Visual Analytics & Data Mining in the cloud.

     

    -

     

    WORKSHOP SCHEDULE

     

    - 6:45PM: Mike Anderson, The ‘Why’ & ‘How’ of Forecasting: Strategy vs. Operations & Forecasting Techniques

     

    - 7:00PM: DSM, Challenge Announcement by business leaders from DSM

     

    - 7:30PM: Q & A session (with DSM & Mike Anderson)

     

    - 7:45PM: Elixir Technologies, Ambience BI software Hands-on Workshop (Intermediate Level)

     

    -

     

    E V E N T   D E T A I L S
    Data Innovation Challenge Workshop
    Forecasting The Future

     

    Date: Thursday, 19 Sep 2013
    Venue: Amazon Web Services Office, Level 10, 23 Church Street, Singapore 049481
    Admission: FREE

     
    Register for the event

    [CHALLENGE] RedMart Routing & Scheduling Optimisation Challenge

    redmart
     

    As you can gather from our previous blog post, we have finally announced our first Challenge! We are very excited to have RedMart, Singapore’s leading e-grocer, on board as our first Challenge Hosts. Tim Klem, the VP of Transport Logistics at RedMart announced the Challenge at our first capacity building workshop on August 1st.

     

    The most interesting aspect of this challenge is that it centers around the broad theme of Urban Planning but also has a narrow, more specific focus on capacity optimization for RedMart. Being an e-grocer, one of RedMart’s the main concerns is managing its transport systems and logistics by optimizing routing and schedules. A pressing challenge they face is: with limited number of trucks, how can RedMart best allocate its resources in a sustainable way to ensure timely deliveries to it’s customers? RedMart believes that cleverly deploying their resources around their current limitations presents many opportunities for their business.

     

    This brings us back to the first DEXTRA Challenge – a hybrid that combines the elements of both customer persuadability and routing/scheduling optimisation. The Challenge proposes the creation of a model that generates the most efficient X changes to customer delivery times, given a data set consisting of customer orders and a fixed routing algorithm where:

     
    1. X is the number of changes on any given day, e.g. 5 changes allowed per day
    2. Efficiency is measured by routing algorithm’s calculation of driving minutes and driving distance for the whole fleet
    3. Delivery time changes must be within the same day and all other order details (basket contents, volume)
    4. Routing algorithm can be any technology chosen by RedMart (Viamente, Quantum Inventions)
     

    For this challenge, slated to go live on the DEXTRA platform in mid-September, RedMart will be providing some very interesting and insightful data sets on delivery times, customer orders, etc to which will add our own data on transport, weather and urban planning.

     

    We will be letting out more information on this challenge so do check our blog for more updates!

     

    Catch the presentation of the RedMart Challenge announcement

    [WORKSHOP] Urban Planning: RedMart Challenge Announcement & Splunk Hands-on

    dextra-ws-624x306
     

    We are thrilled to share that the first of our series of DEXTRA Workshop on Urban Planning was a great success! We are looking to organise a thematic workshop for you every month, so keep your eyes peeled for our updates.

     

    The workshop took place on 1st August, Thursday, and was held at the Amazon Web Services office, one of DEXTRA’s key Industry Partners. Essentially, our workshops are meant to facilitate interactions between the members of our community – Data Specialists, Lead Users and Industry Partners. They also serve as a platform for our data specialists to keep abreast of the latest tools and technologies and also get familiarized with the upcoming competitions.

     

    Our inaugural DEXTRA workshop saw over 60 data specialists from across different industries coming together for an evening to witness the announcement of the first-ever DEXTRA Challenge by RedMart. There was also an insightful sharing session by Mr. Ng See Kiong, Project Director of the Urban Systems Initiative at A*STAR, and also hands-on training by the NYP-Splunk Operational Intelligence Lab on effective usage of the Splunk Enterprise software.

     

    The theme of this workshop was Urban Planning, and we had Mr. Ng See Kiong, the Program Director of the Urban Systems Initiative at A*STAR kick-starting the session by highlighting the key role of big data in transforming our cities. His talk, titled, “Bright Light, Big City to Smart City” discussed the growing need for establishing key tools and technologies to enable the development of solutions for today’s hyperconnected and growing cities. He shared on how Integrated city planning and urban transportation challenges can be solved by developing predictive analysis and real time analysis based on historical data. He also emphasised on the multiagency peer-to-peer approach involving collaboration between different public and private entities.

     

    You can watch presentation by Dr. Ng See Kiong here.

     

    Keeping in line with the evening’s theme, Urban Planning, RedMart, our first Lead User, announced their Challenge that will be made live on the DEXTRA Platform this coming September. The RedMart Challenge centers around routing and scheduling optimisation and will be discussed more in detail in our next blog post.

     

    The second half of the workshop involved the 90 minute hands-on training by Splunk, a turnkey solution that makes machine data accessible and valuable to everyone. This extremely effective crash course in Splunk was led by Prof. James Goh from the Nanyang Polytechnic-Splunk Operational Intelligence Lab, where we also saw projects by NYP students which made showcased the innovative use of Splunk’s software.

     

    All in all, we are extremely happy with our first workshop – we saw a highly informative and insightful session and received some great feedback! We are very excited about our next workshop and hope to see you there again – stay tuned for more details!

    Welcome to the Official DEXTRA Blog!

    dextra-lp-624x415
     

    Hello and welcome to the official DEXTRA blog!

     

    For those of you who are new to our community, or stumbled upon this blog by chance, here’s a little bit about us: DEXTRA is a new initiative bringing together data specialists and enterprises to leverage tools and technologies for crowdsourcing data-driven solutions to business problems. We run challenges to help find the best and most innovative predictive models for businesses that limited by the resources of their in-house data team, or those who want to minimize the expenses of hiring external consultants.

     

    Despite having started only a couple months ago, we are proud to say that we have grown the DEXTRA community of lead users and data scientists from just a handful to over a few hundred! We are extremely excited to know that both, the public and the private sectors – in Singapore and around the world, have started to recognise the value and the countless possibilities of big data innovation and we are hoping to contribute to this space in a big way.

     

    We have created this blog so as to share our journey with all of you, and enlist your help and expertise from time to time. Each week, we will update you on the progress of DEXTRA – our successes, roadblocks, questions and ideas. This blog will also serve as a channel for discussions and sharing of information between members of the DEXTRA Community. We are also looking to keep you up-to-date and engaged with what’s going on in the Data Science and Business Analytics space in Singapore and the rest of the world.

     

    So whether you are a company trying to unlock the value of your data reserves, or an aspiring data specialist, we are glad to have you on board. We welcome you to join us in the DEXTRA journey, and invite to contribute to our story.

    RECENT POSTS

    • Rakuten-Viki Final Presentation Event

      The Rakuten-Viki Global TV Recommender Challenge has finally come to a successful closure on the 16 September 2015. Six Teams (Team Merlion, Team GM, Team Haipt, Team Pritish,Team Gbenedek & Team Lenguyenthedat) were invited to present publicly in front of a pool of audiences and the judges.

    • Rakuten Viki Challenge Results

      Finalist teams are announced! We would like you to join us for the final presentation event where shortlisted teams will present their algorithms and insights to you.

    RECENT TWEETS

    • We’re excited to launch the UK Health & Wellness Challenge for Data City | Data Nation and invite the DEXTRA... https://t.co/oqjDygoc67
      337 days ago
    • We’re excited to launch the UK Health & Wellness Challenge for Data City | Data Nation and invite the DEXTRA... https://t.co/MVe53hyWSg
      338 days ago

    KEEP IN TOUCH

    Stay in touch and sign up to receive email updates

    CONTACT US

    Launchpad@one-north, #04-03/04 79 Ayer Rajah Crescent Singapore, 139955

    contact@dextra.sg
    Top