Iowa State student team wins international data mining competition
A team of Iowa State University graduate students topped 98 other universities from 28 countries to capture first place in the 15th annual Data Mining Cup. Prudsys AG, a leading European data mining company, sponsors the intelligent-data analysis competition for universities. According to Prudsys, the competition is meant to be a “bridge between university and industry to identify the best up-and-coming data miners.”
Teams had six weeks to develop a solution for a data mining problem about optimal return prognosis. This year, teams had to use an unidentified online store’s historical purchase data to create a model for new orders that predicts the probability of a purchase being returned.
“The motivation for this contest data is that some online retailers offering free return shipping have almost half of their orders returned,” said Iowa State’s team leader and statistics Ph.D. candidate Cory Lanker. “We could advance our ideas to create an application that helps online retailers reduce returned shipments and increase profit margins,” he said.
Between April 2 and May 14, teams worked at their respective universities to develop their probability predictions. “Teams submitted return probabilities for approximately 50,000 purchases made in one month using data from approximately 481,000 orders from the previous 12 months,” Lanker said. “They used 12 variables that characterize the customer information—such as age, location and purchase history—and information about ordered items—such as size, color, price, etc.” Lanker said that the basis of Iowa State’s technical solution was “to fully characterize customer behavior, which we did using advanced statistical learning concepts on the provided history of purchases. Once we successfully characterized customer behavior, we could then best predict whether a new purchase would be returned.”
“This was specifically a student contest,” said Steve Vardeman, University Professor of statistics and industrial engineering. “The team had no direct faculty input on the problem. They organized and executed their solution entirely on their own.”
A jury scored all 57 submitted solutions (not all teams submitted a solution), and invited the top ten teams to Berlin to present their solution methods at the Prudsys User Days conference. Each team gave a ten-minute presentation.
Iowa State team members and their departments are Guillermo Basulto-Elias (statistics), Fan Cao (statistics), Xiaoyue Cheng (statistics), Marius Dragomiroiu (computer science), Jessica Hicks (bioinformatics and computational biology), Cory Lanker (statistics), Ian Mouzon (statistics), Lanfeng Pan (statistics) and Xin Yin (bioinformatics and computational biology/statistics).
Basulto-Elias, Yin and Lanker went to Berlin for the presentation and announcement. Final team rankings were announced beginning with tenth place.
“Before long, fifth place was announced and it wasn’t us, so I knew we did better this year,” Lanker said. “When it was down to two teams, [Prudsys organizer] Jens Scholz said, ‘The United States lost in the World Cup last night,’ and I thought, ‘Well, this is us, we finished second,’ but then he added, ‘But a United States team has won the 2014 Data Mining Cup!’”
Lanker says the shock has not worn off yet. He attributes the team’s success to multiple weekly team meetings that were well attended at the end of the semester, demonstrating the “dedication we all had to our team’s success.”
“As a leader, I stressed sticking to a schedule so we didn’t run out of time, and involving everyone in discussions about making the many important statistical decisions,” Lanker said. “The level of teamwork was extraordinary … with many large contributions from all members.”
[l-r]: IMS members Guillermo Basulto-Elias and Xin Yin, together with Iowa State’s team leader Cory Lanker, proudly hold their first prize in the international Data Mining Cup, beating 124 other teams that came from 99 universities in 28 countries. See http://www.data-mining-cup.de/en/dmc-competition/winner/