The Rakuten-Viki Global TV Recommender Challenge has finally come to a successful closure on the 16 September 2015. Six Teams (Team Merlion, Team GM, Team Haipt, Team Pritish,Team Gbenedek & Team Lenguyenthedat) were invited to present publicly in front of a pool of audiences and the judges.
Finalist teams are announced! We would like you to join us for the final presentation event where shortlisted teams will present their algorithms and insights to you.
identify algorithm to predict healthcare costs in Singapore.
A Data Innovation Challenge hosted by Prudential based on comprehensive data on hospital bills and profile of consumers.
|Go to Challenge||Download Case Study|
prudential healthcare challenge
Healthcare cost is a growing concern in Singapore, where there is co-payment by patients on hospital bills. There is always the fear of affordability, especially in the event of a financially catastrophic illness. To enable consumers to make better healthcare & financial planning decisions, Prudential Singapore invites you to predict the cost of seeking treatment for individual consumers. Can you predict the consumers who have been admitted to a hospital in 2013 and their hospital bill size?
The Challenge requires participants to predict the cost of seeking treatment for individual consumers who have been admitted to a hospital in 2013. Accuracy of the forecasts will be one of evaluation criteria, tested using Root Mean Square Logarithmic Error (RMSLE). The winning model should also be innovative, identifies key drivers that are insightful and unprecedented and also highlighting insights gleaned from external data sources.
Can you predict the consumers who have been admitted to a hospital in 2013? From your prediction, take into consideration the healthcare costs data to further predict the costs of the each consumer’s hospitalization cost for the same time period.
The evaluation metric is RMSLE (root mean squared logarithmic error).
Note: Accuracy of the data will be evaluated real-time (against evaluation datasets). Participants can resubmit their entries until the closing date.
Period 36 Days
Start Friday – 14 March 2014
End Friday – 18 April 2014
Prize pool: SG$ 15,000
about the host
Prudential Singapore, is one of the top life insurance companies in Singapore with a rich history spanning more than 80 years. We are one of the market leaders in Healthcare, offering a Medisave-approved integrated medical insurance plan that provides comprehensive medical coverage. This helps ensure that you have the financial means for the best possible treatment and enjoy greater peace of mind without worrying about high medical expenses for you and your family.
data and resources
Prudential Data v2.zip
The hospital bills data shows year, gender, unique hospitalisation ID for each admission, hospital name, description of diagnosis code, event data, date of administration, discharge date of the admission, admission type, ward type and actual charges of bill amount. There are 10,000 unique IDs with 38,000 records from 2009 to 2013. Each record contains an item in the hospital bill. One hospital bill can contains more than 1 item. For most records, each hospital bill will have more than 1 item and diagnosis, each ID might have more than 1 unique hospitalization ID. The date of admission and date of discharge is provided, together with type of hospitalization and ward.
Supporting data such as Beds Occupancy Rate (BOR), Average Hospital Inpatient Bill Size Tables, Public Hospitals – Medical Specialties, Private Hospitals – Medical Specialties, Private Hospitals – Surgical Specialties, Public Hospitals – Surgical Specialties, Health Manpower, Health Facilities and etc will be provided.
The submission data with an empty column (‘Predicted sales volumn’) for participants to predict and submit.
May I use my own external datasets to build my predictive model?
Yes, you may.
Are there any restrictions to the progamming language and tool I use to build my predictive model?
There are no restrictions to the programming language nor the tool(s) you choose to use when building your predictive model. However, we do require you to submit a short write-up, with your ‘submissions.csv’ file, detailing the methods and resources you use when building your model.
Some hospital bill items (with the same hospital bill ID) have the exact same costs, are they duplicate rows or amounts made in equal installments?
The hospital bills amounts that are duplicated are in equal installments. They are not errors.
Different hospital bill items (with the same hospital bill ID) have different prices, and there are no other variables to differentiate them. In the final submission dataset, is it possible predict the costs of each aggregated bill instead of predicting the costs of each component of each bill?
The data set and the submission template will not be aggregated for prediction. The purpose of the challenge is to predict the medical bills at different time points.
Diagnosis text is truncated. Can Prudential resend the exported data without truncation?
Unfortunately, the data is truncated at source. However, we will be adding ICD9 and ICD10 codes (International Statistical Classification of Diseases and Related Health Problems) to the data set to assist in the identification in the event the diagnosis is being truncated.
How do I differentiate between ICD9 and ICD10 codes?
ICD9 codes were used before August 2012 and ICD10 codes were used after August 2012. However the data is subjected to inconsistencies and one way to tell is if they diagnosis description is in capital letters, it will be ICD9 code.
Sometimes the date of event is actually after the date of discharge, what does the 'date of event' mean?
Date of event refers to the day the medical condition occurred. Should the date of event be after the date of discharge, the entry can be ignored. At this point in writing, there are just 13 occurrences observed.