A Practical Guide to Data Mining for Business and Industry
Gebonden Engels 2014 9781119977131Samenvatting
Data mining is well on its way to becoming a recognized discipline in the overlapping areas of IT, statistics, machine learning, and AI. Practical Data Mining for Business presents a user–friendly approach to data mining methods, covering the typical uses to which it is applied. The methodology is complemented by case studies to create a versatile reference book, allowing readers to look for specific methods as well as for specific applications. The book is formatted to allow statisticians, computer scientists, and economists to cross–reference from a particular application or method to sectors of interest.
Specificaties
Lezersrecensies
Inhoudsopgave
<p>Part I Data Mining Concept 1</p>
<p>1 Introduction 3</p>
<p>1.1 Aims of the Book 3</p>
<p>1.2 Data Mining Context 5</p>
<p>1.2.1 Domain Knowledge 6</p>
<p>1.2.2 Words to Remember 7</p>
<p>1.2.3 Associated Concepts 7</p>
<p>1.3 Global Appeal 8</p>
<p>1.4 Example Datasets Used in This Book 8</p>
<p>1.5 Recipe Structure 11</p>
<p>1.6 Further Reading and Resources 13</p>
<p>2 Data Mining Definition 14</p>
<p>2.1 Types of Data Mining Questions 15</p>
<p>2.1.1 Population and Sample 15</p>
<p>2.1.2 Data Preparation 16</p>
<p>2.1.3 Supervised and Unsupervised Methods 16</p>
<p>2.1.4 Knowledge–Discovery Techniques 18</p>
<p>2.2 Data Mining Process 19</p>
<p>2.3 Business Task: Clarification of the Business Question behind the Problem 20</p>
<p>2.4 Data: Provision and Processing of the Required Data 21</p>
<p>2.4.1 Fixing the Analysis Period 22</p>
<p>2.4.2 Basic Unit of Interest 23</p>
<p>2.4.3 Target Variables 24</p>
<p>2.4.4 Input Variables/Explanatory Variables 24</p>
<p>2.5 Modelling: Analysis of the Data 25</p>
<p>2.6 Evaluation and Validation during the Analysis Stage 25</p>
<p>2.7 Application of Data Mining Results and Learning from the Experience 28</p>
<p>Part II Data Mining Practicalities 31</p>
<p>3 All about data 33</p>
<p>3.1 Some Basics 34</p>
<p>3.1.1 Data, Information, Knowledge and Wisdom 35</p>
<p>3.1.2 Sources and Quality of Data 36</p>
<p>3.1.3 Measurement Level and Types of Data 37</p>
<p>3.1.4 Measures of Magnitude and Dispersion 39</p>
<p>3.1.5 Data Distributions 41</p>
<p>3.2 Data Partition: Random Samples for Training, Testing and Validation 41</p>
<p>3.3 Types of Business Information Systems 44</p>
<p>3.3.1 Operational Systems Supporting Business Processes 44</p>
<p>3.3.2 Analysis–Based Information Systems 45</p>
<p>3.3.3 Importance of Information 45</p>
<p>3.4 Data Warehouses 47</p>
<p>3.4.1 Topic Orientation 47</p>
<p>3.4.2 Logical Integration and Homogenisation 48</p>
<p>3.4.3 Reference Period 48</p>
<p>3.4.4 Low Volatility 48</p>
<p>3.4.5 Using the Data Warehouse 49</p>
<p>3.5 Three Components of a Data Warehouse: DBMS, DB and DBCS 50</p>
<p>3.5.1 Database Management System (DBMS) 51</p>
<p>3.5.2 Database (DB) 51</p>
<p>3.5.3 Database Communication Systems (DBCS) 51</p>
<p>3.6 Data Marts 52</p>
<p>3.6.1 Regularly Filled Data Marts 53</p>
<p>3.6.2 Comparison between Data Marts and Data Warehouses 53</p>
<p>3.7 A Typical Example from the Online Marketing Area 54</p>
<p>3.8 Unique Data Marts 54</p>
<p>3.8.1 Permanent Data Marts 54</p>
<p>3.8.2 Data Marts Resulting from Complex Analysis 56</p>
<p>3.9 Data Mart: Do s and Don ts 58</p>
<p>3.9.1 Do s and Don ts for Processes 58</p>
<p>3.9.2 Do s and Don ts for Handling 58</p>
<p>3.9.3 Do s and Don ts for Coding/Programming 59</p>
<p>4 Data Preparation 60</p>
<p>4.1 Necessity of Data Preparation 61</p>
<p>4.2 From Small and Long to Short and Wide 61</p>
<p>4.3 Transformation of Variables 65</p>
<p>4.4 Missing Data and Imputation Strategies 66</p>
<p>4.5 Outliers 69</p>
<p>4.6 Dealing with the Vagaries of Data 70</p>
<p>4.6.1 Distributions 70</p>
<p>4.6.2 Tests for Normality 70</p>
<p>4.6.3 Data with Totally Different Scales 70</p>
<p>4.7 Adjusting the Data Distributions 71</p>
<p>4.7.1 Standardisation and Normalisation 71</p>
<p>4.7.2 Ranking 71</p>
<p>4.7.3 Box Cox Transformation 71</p>
<p>4.8 Binning 72</p>
<p>4.8.1 Bucket Method 73</p>
<p>4.8.2 Analytical Binning for Nominal Variables 73</p>
<p>4.8.3 Quantiles 73</p>
<p>4.8.4 Binning in Practice 74</p>
<p>4.9 Timing Considerations 77</p>
<p>4.10 Operational Issues 77</p>
<p>5 Analytics 78</p>
<p>5.1 Introduction 79</p>
<p>5.2 Basis of Statistical Tests 80</p>
<p>5.2.1 Hypothesis Tests and P Values 80</p>
<p>5.2.2 Tolerance Intervals 82</p>
<p>5.2.3 Standard Errors and Confidence Intervals 83</p>
<p>5.3 Sampling 83</p>
<p>5.3.1 Methods 83</p>
<p>5.3.2 Sample Sizes 84</p>
<p>5.3.3 Sample Quality and Stability 84</p>
<p>5.4 Basic Statistics for Pre–analytics 85</p>
<p>5.4.1 Frequencies 85</p>
<p>5.4.2 Comparative Tests 88</p>
<p>5.4.3 Cross Tabulation and Contingency Tables 89</p>
<p>5.4.4 Correlations 90</p>
<p>5.4.5 Association Measures for Nominal Variables 91</p>
<p>5.4.6 Examples of Output from Comparative and Cross Tabulation Tests 92</p>
<p>5.5 Feature Selection/Reduction of Variables 96</p>
<p>5.5.1 Feature Reduction Using Domain Knowledge 96</p>
<p>5.5.2 Feature Selection Using Chi–Square 97</p>
<p>5.5.3 Principal Components Analysis and Factor Analysis 97</p>
<p>5.5.4 Canonical Correlation, PLS and SEM 98</p>
<p>5.5.5 Decision Trees 98</p>
<p>5.5.6 Random Forests 98</p>
<p>5.6 Time Series Analysis 99</p>
<p>6 Methods 102</p>
<p>6.1 Methods Overview 104</p>
<p>6.2 Supervised Learning 105</p>
<p>6.2.1 Introduction and Process Steps 105</p>
<p>6.2.2 Business Task 105</p>
<p>6.2.3 Provision and Processing of the Required Data 106</p>
<p>6.2.4 Analysis of the Data 107</p>
<p>6.2.5 Evaluation and Validation of the Results (during the Analysis) 108</p>
<p>6.2.6 Application of the Results 108</p>
<p>6.3 Multiple Linear Regression for use when Target is Continuous 109</p>
<p>6.3.1 Rationale of Multiple Linear Regression Modelling 109</p>
<p>6.3.2 Regression Coefficients 110</p>
<p>6.3.3 Assessment of the Quality of the Model 111</p>
<p>6.3.4 Example of Linear Regression in Practice 113</p>
<p>6.4 Regression when the Target is not Continuous 119</p>
<p>6.4.1 Logistic Regression 119</p>
<p>6.4.2 Example of Logistic Regression in Practice 121</p>
<p>6.4.3 Discriminant Analysis 126</p>
<p>6.4.4 Log–Linear Models and Poisson Regression 128</p>
<p>6.5 Decision Trees 129</p>
<p>6.5.1 Overview 129</p>
<p>6.5.2 Selection Procedures of the Relevant Input Variables 134</p>
<p>6.5.3 Splitting Criteria 134</p>
<p>6.5.4 Number of Splits (Branches of the Tree) 135</p>
<p>6.5.5 Symmetry/Asymmetry 135</p>
<p>6.5.6 Pruning 135</p>
<p>6.6 Neural Networks 137</p>
<p>6.7 Which Method Produces the Best Model? A Comparison of Regression, Decision Trees and Neural Networks 141</p>
<p>6.8 Unsupervised Learning 142</p>
<p>6.8.1 Introduction and Process Steps 142</p>
<p>6.8.2 Business Task 143</p>
<p>6.8.3 Provision and Processing of the Required Data 143</p>
<p>6.8.4 Analysis of the Data 145</p>
<p>6.8.5 Evaluation and Validation of the Results (during the Analysis) 147</p>
<p>6.8.6 Application of the Results 148</p>
<p>6.9 Cluster Analysis 148</p>
<p>6.9.1 Introduction 148</p>
<p>6.9.2 Hierarchical Cluster Analysis 149</p>
<p>6.9.3 K–Means Method of Cluster Analysis 150</p>
<p>6.9.4 Example of Cluster Analysis in Practice 151</p>
<p>6.10 Kohonen Networks and Self–Organising Maps 151</p>
<p>6.10.1 Description 151</p>
<p>6.10.2 Example of SOMs in Practice 152</p>
<p>6.11 Group Purchase Methods: Association and Sequence Analysis 155</p>
<p>6.11.1 Introduction 155</p>
<p>6.11.2 Analysis of the Data 157</p>
<p>6.11.3 Group Purchase Methods 158</p>
<p>6.11.4 Examples of Group Purchase Methods in Practice 158</p>
<p>7 Validation and Application 161</p>
<p>7.1 Introduction to Methods for Validation 161</p>
<p>7.2 Lift and Gain Charts 162</p>
<p>7.3 Model Stability 164</p>
<p>7.4 Sensitivity Analysis 167</p>
<p>7.5 Threshold Analytics and Confusion Matrix 169</p>
<p>7.6 ROC Curves 170</p>
<p>7.7 Cross–Validation and Robustness 171</p>
<p>7.8 Model Complexity 172</p>
<p>Part III Data Mining in Action 173</p>
<p>8 Marketing: Prediction 175</p>
<p>8.1 Recipe 1: Response Optimisation: to Find and Address the Right Number of Customers 176</p>
<p>8.2 Recipe 2: To Find the x% of Customers with the Highest Affinity to an Offer 186</p>
<p>8.3 Recipe 3: To Find the Right Number of Customers to Ignore 187</p>
<p>8.4 Recipe 4: To Find the x% of Customers with the Lowest Affinity to an Offer 190</p>
<p>8.5 Recipe 5: To Find the x% of Customers with the Highest Affinity to Buy 191</p>
<p>8.6 Recipe 6: To Find the x% of Customers with the Lowest Affinity to Buy 192</p>
<p>8.7 Recipe 7: To Find the x% of Customers with the Highest Affinity to a Single Purchase 193</p>
<p>8.8 Recipe 8: To Find the x% of Customers with the Highest Affinity to Sign a Long–Term Contract in Communication Areas 194</p>
<p>8.9 Recipe 9: To Find the x% of Customers with the Highest Affinity to Sign a Long–Term Contract in Insurance Areas 196</p>
<p>9 Intra–Customer Analysis 198</p>
<p>9.1 Recipe 10: To Find the Optimal Amount of Single Communication to Activate One Customer 199</p>
<p>9.2 Recipe 11: To Find the Optimal Communication Mix to Activate One Customer 200</p>
<p>9.3 Recipe 12: To Find and Describe Homogeneous Groups of Products 206</p>
<p>9.4 Recipe 13: To Find and Describe Groups of Customers with Homogeneous Usage 210</p>
<p>9.5 Recipe 14: To Predict the Order Size of Single Products or Product Groups 216</p>
<p>9.6 Recipe 15: Product Set Combination 217</p>
<p>9.7 Recipe 16: To Predict the Future Customer Lifetime Value of a Customer 219</p>
<p>10 Learning from a Small Testing Sample and Prediction 225</p>
<p>10.1 Recipe 17: To Predict Demographic Signs (Like Sex, Age, Education and Income) 225</p>
<p>10.2 Recipe 18: To Predict the Potential Customers of a Brand New Product or Service in Your Databases 236</p>
<p>10.3 Recipe 19: To Understand Operational Features and General Business Forecasting 241</p>
<p>11 Miscellaneous 244</p>
<p>11.1 Recipe 20: To Find Customers Who Will Potentially Churn 244</p>
<p>11.2 Recipe 21: Indirect Churn Based on a Discontinued Contract 249</p>
<p>11.3 Recipe 22: Social Media Target Group Descriptions 250</p>
<p>11.4 Recipe 23: Web Monitoring 254</p>
<p>11.5 Recipe 24: To Predict Who is Likely to Click on a Special Banner 258</p>
<p>12 Software and Tools: A Quick Guide 261</p>
<p>12.1 List of Requirements When Choosing a Data Mining Tool 261</p>
<p>12.2 Introduction to the Idea of Fully Automated Modelling (FAM) 265</p>
<p>12.2.1 Predictive Behavioural Targeting 265</p>
<p>12.2.2 Fully Automatic Predictive Targeting and Modelling Real–Time Online Behaviour 266</p>
<p>12.3 FAM Function 266</p>
<p>12.4 FAM Architecture 267</p>
<p>12.5 FAM Data Flows and Databases 268</p>
<p>12.6 FAM Modelling Aspects 269</p>
<p>12.7 FAM Challenges and Critical Success Factors 270</p>
<p>12.8 FAM Summary 270</p>
<p>13 Overviews 271</p>
<p>13.1 To Make Use of Official Statistics 272</p>
<p>13.2 How to Use Simple Maths to Make an Impression 272</p>
<p>13.2.1 Approximations 272</p>
<p>13.2.2 Absolute and Relative Values 273</p>
<p>13.2.3 % Change 273</p>
<p>13.2.4 Values in Context 273</p>
<p>13.2.5 Confidence Intervals 274</p>
<p>13.2.6 Rounding 274</p>
<p>13.2.7 Tables 274</p>
<p>13.2.8 Figures 274</p>
<p>13.3 Differences between Statistical Analysis and Data Mining 275</p>
<p>13.3.1 Assumptions 275</p>
<p>13.3.2 Values Missing Because Nothing Happened 275</p>
<p>13.3.3 Sample Sizes 276</p>
<p>13.3.4 Goodness–of–Fit Tests 276</p>
<p>13.3.5 Model Complexity 277</p>
<p>13.4 How to Use Data Mining in Different Industries 277</p>
<p>13.5 Future Views 283</p>
<p>Bibliography 285</p>
<p>Index 296</p>
Rubrieken
- advisering
- algemeen management
- coaching en trainen
- communicatie en media
- economie
- financieel management
- inkoop en logistiek
- internet en social media
- it-management / ict
- juridisch
- leiderschap
- marketing
- mens en maatschappij
- non-profit
- ondernemen
- organisatiekunde
- personal finance
- personeelsmanagement
- persoonlijke effectiviteit
- projectmanagement
- psychologie
- reclame en verkoop
- strategisch management
- verandermanagement
- werk en loopbaan