第 1 步：了解业务
第 2 步：了解数据
第 3 步：准备数据
第 4 步：构建模型
第 5 步：评估结果
第 6 步：实施变更和监控
COMP3425 Data Mining 代写实例
A company that has large sums of money flushing through its hands is under pressure from regulators, knows that stock exchanges run real-time fraud detection schemes, and accepts at face value the upbeat claims made by the proponents of big data analytics. It combines fraud-detection heuristics with inferences drawn from its large transaction database, and generates suspects. It assigns its own limited internal investigation resources to these suspect cases, and refers some of them to law enforcement agencies.
The large majority of the cases investigated internally are found to be spurious. Little is heard back from law enforcement agencies. Some of the suspects discover that they are being investigated, and threaten to take their business elsewhere and to initiate defamation actions. The investigators return to their tried-and-true methods of locating and prioritising suspicious cases. You must answer the following questions, clearly indicating which question you are answering within your submission. The page lengths suggested for each question here are for guidance only; the given page length limit for the overall assignment is mandatory.
Question 1. (1 page) Consider the ACS code of conduct. For each of the six values, taking account of any relevant sub-parts, discuss whether the value was demonstrated in the scenario and to what extent. If you assess any value as largely irrelevant to the scenario, then a very brief reason for this assessment is sufficient.
Question 2. (1/2 page) Consider the 7 US ACM Principles. Looking closely at Principle 1, Awareness, discuss how this principle is applied (or not) in the scenario and identify any “potential harm” that might have ensued.
Question 3. (2 pages) Consider the numbered guidelines in Table 2 of Clarke’s Guidelines for the responsible application of data analytics. From each segment (1 General, 2 Data Acquisition, 3 Data analysis, and 4 Use of the Inferences) choose one guideline that you consider most relevant and important to the scenario and explain its role in the scenario. Justify why it is more relevant than every one of the others in the same segment. Be careful to consider the intention of the guidelines rather than an overly literal interpretation; you may rephrase the chosen guideline for the scenario context where beneficial. For further
explanation of this point, see Section 3 in Clarke’s paper.
Question 4. (1 page) (a) Choose one, numbered guideline (e.g. guideline 3.3) in Table 2 of the Guidelines that you consider to have been disregarded in the scenario. You may choose any guideline that you did not choose for Question 3. Discuss how the failure to consider the guideline could have contributed to the negative outcome of the scenario.
(b) In addition, identify any other potential consequences that could have occurred due to the failure to consider that same guideline. For this purpose, the consequences you identify are not necessarily explicit within the scenario description. You might find it helpful to think of this activity as contributing to a risk assessment process prior to your hypothetical involvement in the analysis work of the scenario.
Question 5. (1 page) Consider the paper by Du et al, Techniques for Interpretable Machine Learning. iscuss whether and how intrinsic and post-hoc interpretability techniques could be applied to the scenario and what benefits could ensue.