Goal: This project will be used to integrate concepts developed from all the assignments in the second half of this class, specifically. You will identify a data driven business problem that requires preparation of the data. This preparation involves Extracting data (from 3 or more sources), Transforming (or cleaning) the data before Loading it into a database for analysis. In other words, you will experience, first-hand, the ETL process of Data management – preparing the data for further analyses.
Options: You can take this project in one of two directions: (1) Identify a large file, clean the data and normalize it into three or more tables OR (2) Identify three or more large data sources, clean the data and merge them into a denormalized table for analysis. In either case, you will need to identify what you plan to learn from the cleaned and loaded data. BOTTOM LINE: Can you do the analyses WITHOUT going through this ETL process. If so, what’s the point?!
Resource: This articleLinks to an external site.
In preparation for your project this term, I need you to do some digging to identify sources and ideas for a decent project.
There are a couple of decisions that have to be made. And so, I am making part of the project a “deliverable” so you can begin mulling over it. Most ETL tasks involve cleaning and integration. For integration, it is vital that you have an attribute that is common across all three data sets
Cleaning
Cleaning is one of the most important steps as it ensures the quality of the data in the data warehouse. Cleaning should perform basic data unification rules, such as:
Transform
The transform step applies a set of rules to transform the data from the source to the target. This includes
Data Integration
It is at this stage that you get the most value for the project. This typically means you are adding some attribute from a related set that adds ‘Color’ to the data. Perhaps Census data to labor data or other demographic data. The challenge is to locate data that are relatable.
Project direction: You will need to complete a datamart with significant pre-processing (ETL) activities.
Requirements:
Data sources: You are welcome to use datasets from work that has been sufficiently “anonymizedLinks to an external site.“. In fact this itself is a valuable transformation task that you can then use to protect your data and make it available for additional analysis/exploration. There are many public data sets that can be used (see “data sources” tab)
Goal: Explore various datasets (see below) to see what is missing in any of the data and how you can enhance it by combining info from other seemingly unconnected data (industry, education, poverty and liquor shops?). The links below serve as a starting point for your exploration. Get started!’
Expectation: You can take this project in one of two directions: (1) Identify three or more large data sources, clean the data and merge them into a denormalized table for analysis. OR (2) Identify a large file, clean the data and normalize it into three or more tables so that when you rejoin them, you get more accurate answers to your questions. Sometimes this process may require you to get “reference sources” so your dimension tables (destinations in Model Y above) are more complete/accurate.
In either case, you will need to identify what you plan to learn from the cleaned and loaded data.
There are two main ideas to keep in mind: (1) Cleaning badly prepared data and (2) integrating data from multiple sources. An ETL project usually involves BOTH of these.
When integrating data from more than one source, you need to make sure that they can be linked in the first place. In other words, is there something in common between the two data sets? Some kind of identifier like we use as PK and FK? If not, can you create it?
As you review the following sources for ideas, look for files that can be linked. Otherwise, all you have is data!
Note: You don’t have to get ALL your data from a single source. As long as they are related, you can draw from multiple sources.
I ALREADY HAVE THE DATA SOURCES AND PROJECT BACKGROUND TO WORK WITH. YOU JUST HAVE TO DO THE PROJECT ETL AND PRESENTATION. FOR THIS PROJECT YOU NEED TO USE VISUAL STUDIO 2019(PREFFERED) OR POWER BI OR TABLEAU. I NEED IN A SHORT TIME SO PLEASE BID ONLY IF YOU ARE SURE YOU CAN DO IT . I WILL BE UPLOADING THE DATA FILES HERE.
Delivering a high-quality product at a reasonable price is not enough anymore.
That’s why we have developed 5 beneficial guarantees that will make your experience with our service enjoyable, easy, and safe.
You have to be 100% sure of the quality of your product to give a money-back guarantee. This describes us perfectly. Make sure that this guarantee is totally transparent.
Read moreEach paper is composed from scratch, according to your instructions. It is then checked by our plagiarism-detection software. There is no gap where plagiarism could squeeze in.
Read moreThanks to our free revisions, there is no way for you to be unsatisfied. We will work on your paper until you are completely happy with the result.
Read moreYour email is safe, as we store it according to international data protection rules. Your bank details are secure, as we use only reliable payment systems.
Read moreBy sending us your money, you buy the service we provide. Check out our terms and conditions if you prefer business talks to be laid out in official language.
Read more