craft a model that improves Human Resource Retention.

A Data Innovation Challenge hosted by Ministry of Defence based on Human Resource data.

Go to Challenge     Case Study Coming Soon!

Data Analytics Challenge on Human Resource Retention

As Singapore celebrates SG50 this year, we are reminded of how far we have come along in the last 50 years, especially in the area of national defense which has transformed into the present-day formidable fighting force. The progress does not stop here as MINDEF continues to look out for new capabilities and technologies that enhance our national defense.

In this challenge, you are provided with a dataset containing Human Resource (HR) data. The goal is to analyse and manage employee retention by identifying the key factors contributing to people staying and leaving the organisation. In order to do this effectively, we want to be able to make use of a model that depicts how employees make turnover decisions so as to better formulate retention strategies.

why you should join

This technical challenge is a lead-up to the actual Data Analytics Challenge conducted by the MINDEF Information Systems Division (MISD) in partnership with DSTA, to leverage data analytics towards the intended outcomes of evidence-based decision making, service innovation, and proactive risk management.


Participating in this technical challenge gives you a chance at being shortlisted to participate in the actual MINDEF Data Analytics Challenge. At least 10 best teams will get to solve challenging problem statements posed by the various lines of businesses within MINDEF/SAF. Hence, we invite you to co-create solutions to our defensive needs with us!


Challenge Statement

Given the Human Resource (HR) data, can you analyse and manage employee retention by identifying the key factors contributing to people staying and leaving the organisation? Can your model depict how employees make turnover decisions so as to better formulate retention strategies?

Evaluation Metric

Evaluation metric for this challenge will be Cross Entropy Loss / Logarithmic Loss.
Note: Accuracy of the data will be evaluated real-time (against evaluation datasets). Participants can resubmit their entries until the closing date.


Period 21 Days
Start Monday – 03 August 2015
End Sunday – 23 August 2015


At least 10 teams will be shortlisted to participate in the actual MINDEF Data Analytics Hackathon, and at least S$1,000 is guaranteed for each team.


Related News

MINDEF Data Analytics Challenge – Hackathon Wrap-Up
3rd Septembe 2015, DEXTRA Blog Post

Ministry of Defence Data Analytics Challenge Results
24th August 2015, DEXTRA Blog Post

Ministry of Defence Data Analytics Challenge
24th July 2015, DEXTRA Blog Post

about the host



The Ministry of Defence (MINDEF) oversees the national defence needs of Singapore, to enhance Singapore’s peace and security.

MINDEF has implemented Data Analytics as a key capability to achieve organisational effectiveness through evidence based decision making, service innovation and proactive risk management.


The MINDEF Data Analytics Challenge aims to promote collaboration between MINDEF and passionate analytics experts from the public. With the challenge, MINDEF hopes to generate new solutions, identify new datasets, identify candidates and industry partners for upcoming data analytics projects.




The Defence Science and Technology Agency (DSTA) implements defence technology plans, acquires defence equipment and supplies, and develops defence infrastructure for the Ministry of Defence (MINDEF).

DSTA provides leading-edge technological solutions to the Singapore Armed Forces (SAF) so that it continues to be a formidable fighting force


for the defence and security of Singapore. To this end, DSTA taps the best technologies, thus fostering an environment of creativity and innovation for defence applications.

DSTA also actively helps to build up a strong community of scientists and engineers from the universities, research institutes, government and industry to meet the defence and security needs of the nation.




This is MINDEF’s Human Resource records, containing 15000 rows. Each record contains information about a particular individual’s age, education, vocation and etc.


This is MINDEF’s Human Resource records, containing 15000 rows. This csv file is without the 6 resignation related columns.


This is the list of user IDs to be submitted with your predictions.


This is Report and Algorithm Summary template. Please complete your report according to the template.


This is the data schema for the data in xlsx format. It includes the explanations of the features and their values in the datasets, and some basic statistics of selected features. Please read it carefully and not miss any information inside it.


We handpicked a selection of materials for you to quickly become familiar with the core principles of the recommendation systems.


Getting start with Python in Data Science from Harvard’s free online course CS 109:

Lecture Videos

Lecture Notes

Getting start with R in Data Science from free courses by Johns Hopkins University in Coursera:

The Data Scientist’s Toolbox

R Programming

Suggested Algorithms

We suggest some common algorithms for classification problems.


Top suggestions:

Gradient Boosting Classifier video and Python implementation

Random Forest in video and Wiki

Support Vector Machine in coursera by Andrew Ng and Wiki

Ensemble Learning in video and Wiki


Medium suggestions:

Neural network in coursera by Andrew Ng and Wiki

Logistic Regression in coursera by Andrew Ng and Wiki


Other suggestions:

Naïve Bayes in video and Wiki

Decision Tree in video and Wiki



Do I need to form a team, and how?

This is a two-stage challenge, and the 1st stage is technical challenge in DEXTRA. Top 10 teams from 1st stage will be given a chance to participate in the 2nd stage Data Analytics Hackathon (29-30 August), in which each team will be paired with 2 domain experts from MINDEF. To perfrom well in Hackathon, various skills are required, such as visualisation, design and application development. Therefore, we strongly suggest you forming a team of members equipped with different skills.

To find suitable team members, we recommend you to make use of the forum for this challenge. The team formation guideline can be found in the section 7 of CRITERIA sector on challenge page.

Do you have any recommendations on data science tools to do the challenge?

R and Python are among the mostly used tools in data challenges. They are free, and equipped with many good packages for data cleaning, visulisation and machine learning tasks. They both have advantages and disadvantages, see here for comparisons. Some data scientists also use the commercial software RapidMiner.

What is the difference between Public Evaluation Score and Private Evaluation Score?

The accuracy of your submission is decided by your prediction’s performance on the provided test dataset. We have the answers for this dataset, but are witholding them to compare with your predictions.


Public Evaluation Score is what you will receive back upon each submission (that score is calcuated using a statistical evaluation metric, which is Mean Squared Error in this challenge). But your Public Evaluation Score is being determined by only a fraction of the test dataset – usually between 35 – 50%. The scores shown on the leaderboard are the Public Evaluation Score and it shows relative performance during the challenge.


Private Evaluation Score is created when we compare your prediction is compared with the left of the test dataset (65% – 50%). You will never receive this score. Finalists are selected based on Private Leaderboard.

Why do we need to split Public and Private Evaluation Scores?

The separation of the test dataset into public and private portion is to ensure that the most accurate but generalized model is the winning model. If you based your model solely on the data which gave you constant feedback, you run the danger of a model that overfits to the specific noise in that data. One of the hard challenges in data science is to avoid overfitting, by leaving your model flexible to out-of-sample data.

Can I use additional or external data?

You can use additional or external data. However, please provide the source and credit to the data in your submission report.


  • Rakuten-Viki Final Presentation Event

    The Rakuten-Viki Global TV Recommender Challenge has finally come to a successful closure on the 16 September 2015. Six Teams (Team Merlion, Team GM, Team Haipt, Team Pritish,Team Gbenedek & Team Lenguyenthedat) were invited to present publicly in front of a pool of audiences and the judges.

  • Rakuten Viki Challenge Results

    Finalist teams are announced! We would like you to join us for the final presentation event where shortlisted teams will present their algorithms and insights to you.


  • We’re excited to launch the UK Health & Wellness Challenge for Data City | Data Nation and invite the DEXTRA... https://t.co/oqjDygoc67
    over a year ago
  • We’re excited to launch the UK Health & Wellness Challenge for Data City | Data Nation and invite the DEXTRA... https://t.co/MVe53hyWSg
    over a year ago


Stay in touch and sign up to receive email updates


Launchpad@one-north, #04-03/04 79 Ayer Rajah Crescent Singapore, 139955