credit card dataset:
the 6000 credit card records were randomly selected from 25,000 real-life credit card records of a major us bank. each record has 113 variables, with 38 original variables and 65 derived variables. the 38 original variables are balance, purchase, payment, cash advance, and related variables, with the former 5 items each have six variables that represent raw data of six consecutive months and the last item includes interest charges, data of last payment, times of cash advance, account open data and so on. the 65 derived variables (char01 – char65) are derived from original 38 variables using simple arithmetic methods to reinforce the comprehension of cardholders’ behaviors.
download this dataset
vip e-mail dataset
with the booming of advancement of internet and innovation of web technology, people rely more and more on e-mail to communication with each other. nowadays, e-mail is not only a tool of communication, but also a business. how to hold on to the easy-to-lost customer accounts becomes the key question for executive officers of e-mail hosing companies.
our partner company’s vip e-mail data are mainly stored in two kinds of repository systems; one is databases manually recorded by employee, which was initially produced to meet the needs of every kind of business service; the other is log files recorded automatically by machine, which contains the information about customer login, e-mail transaction and so on. after integrating all these data with the keyword ssn, we finally acquire
the large table for data mining.