Your assignment is to use PySpark to analyze a dataset of your choice from one of the open dataset repositories. You can perform one or any combination of data analysis tasks (regression, clustering, classification, etc)

Module Title Big Data Management and Data Visualization Individual Cohort: Sept Module Code 7082CEM Coursework Title: Dataset Analysis and Visualization Using Big Data Programs Hand out date: 25 Sep 2020 Lecturer Dr. Marwan Fuad Due date and time: Date: 23 Oct 2020 Online: 18:00:00 Estimated Time (hrs): 25 h Word Limit: about 15 pages apart from references and screenshots Coursework type: Written report 66.66 % of Module Mark Submission arrangement online via Aula: File types and method of recording: PDF using the “Assignments” link in 7082CEM Mark and Feedback date (DD/MM/YY): 2 weeks after submission Mark and Feedback method (e.g. in lecture, electronic via Aula): electronic via Aula Module Learning Outcomes Assessed: 1. Demonstrate sound knowledge of different data analytical techniques for different structured and unstructured big data sets to support decision-making. 2. Critically identify and select appropriate analytical technique for big data analysis using examples from case studies 3. Critically evaluate and apply appropriate methods that are suitable for visualising big data Task: 1. Your assignment is to use PySpark to analyze a dataset of your choice from one of the open dataset repositories. You can perform one or any combination of data analysis tasks (regression, clustering, classification, etc). 2. You should also use visualization tools to show the results of your analysis 3. You should critically analyze your findings: the results of the data analysis and the performance of the program 4. Using another Big Data program from the Hadoop Ecosystem, instead of PySpark, is a plus. 5. Coding the task you are performing yourself is a plus. 6. You can use any operating system that you prefer to install your program. This document is for Coventry University students for their own use in completing their assessed work for this module and should not be passed to third parties or posted on any website. Any infringements of this rule should be reported to 7. For the visualization part, you can use either Tableau or another program of your choice. 8. Your report should typically have: ? A title ? An introduction in which you briefly (circa 2 pages) describe your project. ? An implementation part, in which you should introduce the program you are using (PySpark or another - the description should be more detailed if you use another program from the Hadoop Ecosystem), how it is installed, how it is configured, how it works, the dataset you are applying your program to/the data analysis task you are performing. ? A discussion of your findings ? A conclusion ? References Mark distribution: Technical quality (45 Marks): This aspect concerns the depth of the information presented in the report Difficulty (15 Marks): This aspect concerns the difficulty of the program used or the analysis applied/the complexity of the dataset. Visualization (20 Marks): This aspect concerns the quality of visualization produced Reproducibility (10 Marks): This aspect concerns using screen shots/providing codes used/ clear explanation of the steps taken Style and format (10 Marks) Notes: 1. Students are advised to inform the module leader by email of the dataset they have decided to work on, and get approval. 2. Given the nature of this module and the task, you should document everything you do. 3. Everything you do should be reproducible. The link to the dataset should be clear (direct link to the dataset not the site where it is hosted). If you use a code from an external source, the link should be clear and direct. If the code is not too long, it is better to include it in the report, or submit it separately with your submission. If you modify a code, the modification should be very clearly indicated (meaning you should show the original part that you modified, and the modification you made). This document is for Coventry University students for their own use in completing their assessed work for this module and should not be passed to third parties or posted on any website. Any infringements of this rule should be reported to 4. You are expected to use the Coventry University APA style for referencing. For support and advice on this students can contact Centre for Academic Writing (CAW). 5. Please notify your registry course support team and module leader for disability support. 6. Any student requiring an extension or deferral should follow the university process as outlined here. 7. The University cannot take responsibility for any coursework lost or corrupted on disks, laptops or personal computer. Students should therefore regularly back-up any work and are advised to save it on the University system. 8. If there are technical or performance issues that prevent students submitting coursework through the online coursework submission system on the day of a coursework deadline, an appropriate extension to the coursework submission deadline will be agreed. This extension will normally be 24 hours or the next working day if the deadline falls on a Friday or over the weekend period. This will be communicated via your Module Leader. 9. You are encouraged to check the originality of your work by using the draft Turnitin links on Aula. 10. Collusion between students (where sections of your work are similar to the work submitted by other students in this or previous module cohorts) is taken extremely seriously and will be reported to the academic conduct panel. This applies to both courseworks and exam answers. 11. A marked difference between your writing style, knowledge and skill level demonstrated in class discussion, any test conditions and that demonstrated in a coursework assignment may result in you having to undertake a Viva Voce in order to prove the coursework assignment is entirely your own work. 12. If you make use of the services of a proof reader in your work you must keep your original version and make it available as a demonstration of your written efforts. 13. You must not submit work for assessment that you have already submitted (partially or in full), either for your current course or for another qualification of this university, with the exception of resits, where for the coursework, you maybe asked to rework and improve a previous attempt. This requirement will be specifically detailed in your assignment brief or specific course or module information. Where earlier work by you is citable, i.e. it has already been published/submitted, you must reference it clearly. Identical pieces of work submitted concurrently may also be considered to be self-plagiarism. Mark allocation guidelines to students 0-39 40-49 50-59 60-69 70+ 80+ Work mainly incomplete and /or weaknesses in most areas Most elements completed; weaknesses outweigh strengths Most elements are strong, minor weaknesses Strengths in all elements Most work exceeds the standard expected All work substantially exceeds the standard expected This document is for Coventry University students for their own use in completing their assessed work for this module and should not be passed to third parties or posted on any website. Any infringements of this rule should be reported to Marking Rubric GRADE ANSWER RELEVANCE ARGUMENT & COHERENCE EVIDENCE SUMMARY First ≥70 Innovative response, answers the question fully, addressing the learning objectives of the assessment task. Evidence of critical analysis, synthesis and evaluation. A clear, consistent in-depth critical and evaluative argument, displaying the ability to develop original ideas from a range of sources. Engagement with theoretical and conceptual analysis. Wide range of appropriately supporting evidence provided, going beyond the recommended texts. Correctly referenced. An outstanding, well-structured and appropriately referenced answer, demonstrating a high degree of understanding and critical analytic skills. Upper Second 60-69 A very good attempt to address the objectives of the assessment task with an emphasis on those elements requiring critical review. A generally clear line of critical and evaluative argument is presented. Relationships between statements and sections are easy to follow, and there is a sound, coherent structure. A very good range of relevant sources is used in a largely consistent way as supporting evidence. There is use of some sources beyond recommended texts. Correctly referenced in the main. The answer demonstrates a very good understanding of theories, concepts and issues, with evidence of reading beyond the recommended minimum. Well organised and clearly written. Lower Second 50-59 Competently addresses objectives, but may contain errors or omissions and critical discussion of issues may be superficial or limited in places. Some critical discussion, but the argument is not always convincing, and the work is descriptive in places, with over-reliance on the work of others. A range of relevant sources is used, but the critical evaluation aspect is not fully presented. There is limited use of sources beyond the standard recommended materials. Referencing is not always correctly presented. The answer demonstrates a good understanding of some relevant theories, concepts and issues, but there are some errors and irrelevant material included. The structure lacks clarity. Third 40-49 Addresses most objectives of the assessment task, with some notable omissions. The structure is unclear in parts, and there is limited analysis. The work is descriptive with minimal critical discussion and limited theoretical engagement. A limited range of relevant sources used without appropriate presentation as supporting or conflicting evidence coupled with very limited critical analysis. Referencing has some errors. Some understanding is demonstrated but is incomplete, and there is evidence of limited research on the topic. Poor structure and presentation, with few and/or poorly presented references. Fail <40 Some deviation from the objectives of the assessment task. May not consistently address the assignment brief. At the lower end fails to answer the question set or address the learning outcomes. There is minimal evidence of analysis or evaluation. Descriptive with no evidence of theoretical engagement, critical discussion or theoretical engagement. At the lower end displays a minimal level of understanding. Very limited use and application of relevant sources as supporting evidence. At the lower end demonstrates a lack of real understanding. Poor presentation of references. Whilst some relevant material is present, the level of understanding is poor with limited evidence of wider reading. Poor structure and poor presentation, including referencing. At the lower end there is evidence of a lack of comprehension, resulting in an assignment that is well below the required standard. Late submission 0 0 0 0

