Exploring Udemy Programs Developments Utilizing Google Massive Question

Introduction
Google Massive Question is a safe, accessible, fully-manage, pay-as-you-go, server-less, multi-cloud knowledge warehouse Platform as a Service (PaaS) service supplied by Google Cloud Platform that helps to generate helpful insights from massive knowledge that may assist enterprise stakeholders in efficient decision-making. Google Massive Question supplies built-in machine studying functionality and SQL question engine to put in writing SQL, which can be utilized for analyzing giant datasets. We will develop a safe and extremely accessible knowledge warehouse utilizing Google Massive Question.
Udemy is among the hottest on-line studying platforms. Udemy supplies high-quality studying content material in design, advertising and marketing, improvement, finance & accounting, IT & software program, images & video, well being & wellness, workplace productiveness, and so forth. in several languages. Udemy is a crucial supply of knowledge for a lot of college students, freelancers, and dealing professionals. Udemy is among the greatest platforms to study Python and React and to organize for AWS and Azure certification. Nonetheless, learners could be involved in taking programs from instructors extra aligned to their job titles, programs taken by many customers, and authorized builders like AWS licensed, Salesforce licensed, and so forth. To handle this drawback, we’ll construct an information warehouse for exploring Udemy course tendencies and insights utilizing Google Massive Question.
Virtually all main cloud service suppliers, like Google, Amazon, Microsoft, and so forth., immediately present knowledge warehouse instruments. Cloud-based knowledge warehouse instruments are extremely scalable and supply catastrophe restoration. Utilizing a knowledge warehouse we are able to retailer and analyze a considerable amount of knowledge and produce helpful knowledge insights with the assistance of information visualizations and stories. Nicely-designed knowledge warehouses ship high-quality knowledge and enhance question efficiency by correctly defining the kind of knowledge, utilizing knowledge mining, synthetic intelligence, and so forth., and serving to in making smarter selections.
This text will talk about the method of constructing an information warehouse for exploring Udemy course tendencies and insights utilizing Google Massive Question which is able to assist us to establish issues akin to classifying programs primarily based on teacher job titles, the common score of all of the programs of an teacher, and so forth.
Studying Goals
On this article, we’ll study:
- How you can construct an information warehouse utilizing Google Massive Question
- How you can use Google Massive Question Sandbox
- Acquire data about creating datasets and tables in Massive Question
- Querying Udemy knowledge in Massive Question SQL question engine
This text was printed as part of the Data Science Blogathon.
Desk of Contents
Mission Description
Now, we’ll create the desk contained in the dataset within the Google Cloud Platform SQL question engine from the downloaded knowledge. After creating the desk, we’ll format the desk schema and carry out knowledge cleansing. We will carry out querying on imported knowledge to generate helpful insights akin to classifying programs primarily based on teacher job titles, figuring out programs having most scores, instructors whose programs have good scores, and so forth.
Presently, we have now knowledge from just one supply, and we’re importing CSV format knowledge by way of batch ingestion utilizing the Google Cloud Platform UI interface. We will additionally import knowledge from a number of sources akin to Cloud Storage, Azure Storage Account, and so forth. Other than importing knowledge by way of the Google Cloud Platform UI interface, customers may also import knowledge utilizing CLI, and REST APIs, utilizing knowledge pipeline choices akin to Cloud Dataflow, Cloud Dataproc, and so forth. Google Massive Question additionally helps file codecs akin to Parquet, Avro, and so forth., for knowledge loading and processing. Builders may also save, share and run queries within the SQL question engine on the scheduled time.

By querying Udemy knowledge, customers can decide which programs they need to buy primarily based on target period, course scores, teacher job titles, course reputation, and so forth. Customers can save and share these queries. Customers may also save the outcomes of those queries to create dashboards utilizing Energy BI, Looker Studio, Tableau, and so forth. Customers may also extract extra knowledge from Udemy utilizing internet scraping methods and ingest it in Google Massive Question SQL question engine to maintain the information up to date in order that customers can get extra correct outcomes.
Downside Assertion
On this article, we can be utilizing Udemy Programs Information 2023 dataset from Kaggle to develop an information warehouse for exploring Udemy course tendencies and insights utilizing Google Massive Question, which is able to assist us to establish issues akin to classifying programs primarily based on teacher job titles, the common score of all of the programs of an teacher, classifying programs primarily based on the variety of lectures within the course, figuring out lately printed and modified programs on Udemy, and so forth.
As already mentioned, we are able to extract extra knowledge from Udemy utilizing internet scraping methods as new programs and instructors carry on rising on the Udemy platform. We are going to create tables contained in the dataset within the Google Cloud Platform SQL question engine to import the programs and teacher knowledge downloaded from Kaggle. After desk creation, we’ll carry out knowledge cleansing and desk schema formatting.

We will save, share and run queries within the SQL question engine on the scheduled time. Other than this, we are able to additionally save the outcomes of the question execution in order that it may be utilized queries to create dashboards utilizing Energy BI, Looker Studio, Tableau, and so forth. This venture goals to develop an information warehouse utilizing Udemy knowledge, querying which customers can establish lately printed and modified programs on Udemy, classify programs primarily based on target period and course scores, establish common scores of all of the programs of an teacher, classify programs primarily based on the variety of lectures within the course, and so forth.
Stipulations
Beneath are some stipulations to undertake this venture:
- Understanding of Information Warehouse: On this venture, we’ll construct an information warehouse to discover Udemy course tendencies and insights utilizing Google Massive Question. Subsequently, understanding what an information warehouse is, why an information warehouse is beneficial, and what the information warehouse supplies by numerous cloud distributors, and so forth., are necessary.
- Expertise with Google Cloud Platform: We are going to use Google Massive Question, an information warehouse service accessible contained in the Google Cloud Platform. So, expertise with the Google Cloud Platform is necessary to simply navigate the platform and perceive the useful resource creation course of, roles & entry permissions, and so forth.
- Expertise with SQL queries: We can be writing queries within the SQL question engine to generate helpful insights, akin to classifying programs primarily based on teacher job titles, figuring out programs having most scores, instructors whose programs have good scores, and so forth.
- Familiarity with Udemy and Kaggle: Understanding what Kaggle is, how it’s helpful for downloading datasets, and primary familiarity with the web studying platform Udemy can be useful whereas growing the venture.
- Understanding of Google Massive Question: As this venture makes use of Google Massive Question for creating an information warehouse, it will be useful to have an understanding of Google Massive Question’s widespread knowledge operations, ideas, and methods.
Realizing in regards to the Dataset
On this article, we can be utilizing Udemy Programs Information 2023 dataset from Kaggle. The dataset will be downloaded by visiting https://www.kaggle.com/datasets/ankushbisht005/udemy-courses-data-2023. The aim behind utilizing this dataset is to establish lately printed and modified programs on Udemy, classify programs primarily based on target period and course scores, establish common scores of all of the programs of an teacher, classify programs primarily based on the variety of lectures within the course, and so forth.
The Udemy Programs Information 2023 dataset has two recordsdata named programs.csv and instructors.csv. The programs.csv accommodates info associated to the Udemy programs. The instructors.csv accommodates the knowledge associated to the Udemy instructors. The programs.csv accommodates 11 columns and 83,105 rows. The instructors.csv accommodates 10 columns and 32,234 rows. The programs.csv accommodates the instructors_id column, which provides the id of the teacher of the course. The instructors_id column is used to kind the relation between programs.csv and instructors.csv.

The programs.csv accommodates the distinctive id of the course, the course title, course score, course period, the variety of lectures within the Udemy course, the URL of the course, the creation date of the course, the date on which the course was final modified, variety of critiques of the course and id of the course teacher. The instructors.csv accommodates the distinctive id of the teacher, the identify of the course teacher, the show identify of the course teacher, the title of the course teacher, the job title of the course teacher, the teacher class, the URL of the teacher, initials of the course teacher, 50 X 50 picture of the teacher and 100 X 100 picture of the teacher. To study extra in regards to the dataset, go to https://www.kaggle.com/datasets/ankushbisht005/udemy-courses-data-2023.
Method to the Mission
On this venture, we can be utilizing Udemy Programs Information 2023 dataset from Kaggle to develop an information warehouse for exploring Udemy course tendencies and insights utilizing Google Massive Question, which is able to assist us to establish issues akin to classifying programs primarily based on teacher job titles, the common score of all of the programs of an teacher, classifying programs primarily based on the variety of lectures within the course, figuring out lately printed and modified programs on Udemy, and so forth.
Comply with the beneath steps to create an information warehouse utilizing Udemy Programs Information 2023 dataset from Kaggle:
Step 1: Create a New Mission utilizing Massive Question Sandbox
To work with Google Massive Question, builders can both create an account on the Google Cloud Platform or make the most of the Google Massive Question Sandbox. I’ll use Google Massive Question Sandbox on this article to create an information warehouse. The venture is used for organizing all of the Google cloud sources in GCP. Utilizing Identification and Entry Administration, we are able to specify which consumer is allowed to entry which sources in a venture.
Go to the beneath hyperlink to make use of the Google Massive Question Sandbox: https://console.cloud.google.com/bigquery
Now, observe the steps described beneath:
1. Click on on NEW PROJECT, then Present the Mission Title as Udemy-Mission and Location on the following display screen. Click on CREATE.


2. Udemy-Mission is efficiently created. Choose the Udemy-Mission to view the venture and handle consumer permissions and sources contained in the venture.

Step 2: Obtain the Dataset from Kaggle and Reserve it on the Native Machine
Go to https://www.kaggle.com/datasets/ankushbisht005/udemy-courses-data-2023 and click on Obtain. After unzipping the downloaded zip file, you will discover two CSV recordsdata named programs.csv and instructors.csv. The programs.csv accommodates info associated to the Udemy programs. The instructors.csv accommodates the knowledge associated to the Udemy instructors. The programs.csv accommodates 11 columns and 83,105 rows. The instructors.csv accommodates 10 columns and 32,234 rows. The instructors_id column is used to kind the relation between programs.csv and instructors.csv.

Step 3: Creating Dataset Inside Google Massive Question Useful resource
Comply with the steps described beneath to create a dataset inside Google Massive Question:
1. Choose the identify of the Mission -> Massive Question within the sources card -> Click on Create dataset.

2. Present Udemy_dataset as Dataset ID, select Area in Location Kind, select Asia-south1 (Mumbai) as Area, and allow desk expiration.

3. Click on CREATE DATASET

Step 4: Create Tables within the Dataset Inside Google Massive Question Useful resource
Comply with the steps described beneath to create tables within the dataset inside Google Massive Question:
1. Choose Udemy_dataset dataset -> Create desk

2. Select to create desk from add, choose the programs.csv file downloaded from Kaggle, choose file format as CSV, present programs as desk identify, Native desk as a desk kind, select Auto to detect within the schema, and partition and cluster settings as per our necessities. Within the Advance choices, present 1 within the header rows to skip and select Encryption appropriate as per the requirement. Click on CREATE TABLE.

3. Now, once more choose the Udemy_dataset dataset
-> Create desk. Select to create a desk from add, choose the instructors.csv file downloaded from Kaggle, choose file format as CSV, present instructors as desk identify, Native desk as a desk kind, select Auto to detect within the schema, and partition and cluster settings as per our necessities. Within the Advance choices, present 1 within the header rows to skip and select Encryption appropriate as per the requirement. Click on CREATE TABLE.

Step 5: Verifying Tables Schema and Previewing Information
Go to the programs desk, and cross-verify the sector identify, kind, and mode within the schema tab. View the row entry insurance policies of the programs desk and edit desk schema, if required. View the desk data within the DETAILS tab and edit the main points in case of corrections. We will additionally preview, copy, refresh, and share the information. Equally, go to the instructors’ desk, and cross-verify the sector identify, kind, and mode within the schema tab. View the row entry insurance policies of the instructors’ desk and edit the desk schema if required.


Step 6: Exploring Udemy Course Developments and Insights by Querying the Information
To see 5000 data from the programs desk, execute the beneath question within the SQL question engine:
SELECT * FROM `udemy-project-381211.Udemy_dataset.programs` LIMIT 5000

To see 5000 data from the instructors’ desk, execute the beneath question within the SQL question engine:
SELECT * FROM `udemy-project-381211.Udemy_dataset.instructors` LIMIT 5000

A. Discover the title of all programs whose scores are higher than 4.5 and greater than 10000 individuals has given the score for these programs. Show these programs in reducing order after all scores and creation date.
SELECT title AS course_title FROM `udemy-project-381211.Udemy_dataset.programs`
WHERE score>4.5 and num_reviews>10000
ORDER BY score DESC, created DESC

B. Discover the main points of the ten newly created Udemy programs.
SELECT * FROM `udemy-project-381211.Udemy_dataset.programs`
ORDER BY created DESC
LIMIT 10

C. Discover the main points of the ten lately modified Udemy programs.
SELECT * FROM `udemy-project-381211.Udemy_dataset.programs`
ORDER BY last_update_date DESC
LIMIT 10

D. Discover the main points of the JavaScript programs whose scores are higher than 4 and greater than 20000 individuals have given the score for these programs.
SELECT * FROM `udemy-project-381211.Udemy_dataset.programs`
WHERE title LIKE '%JavaScript%' AND
score>4 AND num_reviews>20000

E. Show the title, score, and variety of lectures of the Udemy React programs which has higher than 50-course lectures.
SELECT title AS course_title, score AS course_rating, num_published_lectures as course_lectures
FROM `udemy-project-381211.Udemy_dataset.programs`
WHERE title LIKE '%React%' AND
num_published_lectures>50

F. Discover the variety of programs, and course teacher identify developed by the course instructors with course scores higher than common scores of the programs.
SELECT COUNT(programs.id), instructors.identify
FROM `Udemy_dataset.instructors` instructors
LEFT JOIN `Udemy_dataset.programs` programs
ON instructors.id = programs.instructors_id
WHERE programs.instructors_id IN
(SELECT instructors_id FROM `Udemy_dataset.programs`
WHERE score >(SELECT AVG(score) FROM `Udemy_dataset.programs`))
GROUP BY instructors.identify

G. Show the course teacher identify and title of the Udemy programs created by individuals whose job title is an internet developer
and whose course scores are higher than 4.2.
SELECT instructors.display_name, programs.title as course_title
FROM `Udemy_dataset.instructors` instructors
LEFT JOIN `Udemy_dataset.programs` programs
ON instructors.id = programs.instructors_id
WHERE instructors.job_title LIKE '%Internet developer%' and programs.score>4.2

H. Show the course title, course teacher identify, scores, and course period of the Udemy programs the place the course period is bigger than 40 minutes, 40 hours, or 40 questions.
SELECT programs.title as course_title,
instructors.display_name as course_instructor, programs.score, programs.period
FROM `Udemy_dataset.instructors` instructors
LEFT JOIN `Udemy_dataset.programs` programs
ON instructors.id = programs.instructors_id
WHERE
CASE WHEN programs.period LIKE '%.%'
THEN CAST(LEFT(programs.period, STRPOS(programs.period,'.')-1) AS FLOAT64)>40
WHEN programs.period LIKE '%whole%'
THEN CAST(LEFT(programs.period, STRPOS(programs.period,'t')-1) AS FLOAT64)>40
WHEN programs.period LIKE '%ques%'
THEN CAST(LEFT(programs.period, STRPOS(programs.period,'q')-1) AS FLOAT64)>40
END

I. Show the course teacher identify and title of the Udemy programs created by licensed builders.
SELECT programs.title as course_title, instructors.display_name as course_instructor
FROM `Udemy_dataset.instructors` instructors
LEFT JOIN `Udemy_dataset.programs` programs
ON instructors.id = programs.instructors_id
WHERE instructors.job_title LIKE '%licensed%'

J. Discover all of the distinct job titles of Udemy course instructors.
SELECT DISTINCT instructors.job_title
FROM `Udemy_dataset.instructors` instructors

Okay. Discover the title, scores, and teacher of all programs whose scores are higher than 4 and greater than 17000 individuals have given the score for these programs. Show these programs in reducing order after all scores.
SELECT programs.title as course_title, instructors.display_name as course_instructor, programs.score
FROM `Udemy_dataset.instructors` instructors
LEFT JOIN `Udemy_dataset.programs` programs
ON instructors.id = programs.instructors_id
WHERE programs.score > 4 and programs.num_reviews > 17000
ORDER BY programs.score DESC

L. Discover the main points of the 20 newly created Azure Udemy programs.
SELECT * FROM `udemy-project-381211.Udemy_dataset.programs`
WHERE title LIKE '%Azure%'
ORDER BY created DESC
LIMIT 20

M. Discover the main points of the 15 newly created AWS Udemy programs.
SELECT * FROM `udemy-project-381211.Udemy_dataset.programs`
WHERE title LIKE '%AWS%'
ORDER BY created DESC
LIMIT 15

N. Show all the main points of the Udemy SAS programs which have course lectures between 112 and 156 in rising order after all title.
SELECT * FROM `udemy-project-381211.Udemy_dataset.programs`
WHERE title LIKE '%SAS %' AND
num_published_lectures BETWEEN 112 AND 156
ORDER BY title

O. Show the course teacher identify, title, scores, and the course critiques of the highest two Udemy Azure Information Manufacturing facility programs primarily based on target scores and the variety of course critiques.
SELECT programs.title as course_title,
instructors.display_name as course_instructor, programs.score, programs.num_reviews
FROM `Udemy_dataset.instructors` instructors
LEFT JOIN `Udemy_dataset.programs` programs
ON instructors.id = programs.instructors_id
WHERE programs.title LIKE '%Azure Information Manufacturing facility %'
ORDER BY programs.num_reviews DESC, programs.score DESC
LIMIT 2

P. Show the course teacher identify, title, scores, and course critiques of the perfect Udemy Salesforce course primarily based on target scores and the variety of course critiques.
SELECT programs.title as course_title, instructors.display_name as course_instructor,
programs.score, programs.num_reviews
FROM `Udemy_dataset.instructors` instructors
LEFT JOIN `Udemy_dataset.programs` programs
ON instructors.id = programs.instructors_id
WHERE programs.title LIKE '%Salesforce %'
ORDER BY programs.num_reviews DESC, programs.score DESC
LIMIT 1

Key Developments and Insights Found Whereas Exploring the Udemy Programs Information
From the above, we all know learn how to construct an information warehouse for exploring Udemy course tendencies and insights utilizing Google Massive Question. Beneath are some key tendencies and insights found whereas exploring the Udemy programs knowledge:
1. The preferred JavaScript programs have a median score higher than 4.6.
2. Solely 34 Udemy programs are created by instructors whose job title is an internet developer and whose course scores are higher than 4.2.
3. Virtually 150 Udemy programs are created by AWS, Azure, GCP, or Salesforce-certified builders.
4. Ramesh Retnasamy creates the preferred Azure Information Manufacturing facility course on Udemy.
5. Lately created Azure and AWS programs are highly regarded on Udemy.
6. Udemy customers want to enroll in SAS programs with about 100-150 lectures with good scores.
Conclusion
On this article, we have now seen learn how to construct an information warehouse for exploring Udemy course tendencies and insights utilizing Google Massive Question. An information warehouse shops and analyze a considerable amount of knowledge and produce helpful knowledge insights with the assistance of information visualizations and stories. We’ve got seen learn how to create a desk by importing knowledge from Kaggle in Google Massive Question. We additionally perceive learn how to create relationships between tables to grasp knowledge higher. We checked out learn how to analyze the information with the assistance of queries to get significant perception from the information. Beneath are the main takeaways from the above article:
- We’ve got seen how we are able to create tables in Google Massive Question.
- We understood learn how to question knowledge within the Massive Question SQL question engine.
- We’ve got additionally recognized particulars of the Udemy programs created by individuals whose job title is an internet developer and whose course scores are higher than 4.2.
- We’ve got additionally seen what number of programs on Udemy are created by licensed builders.
- We’ve got discovered the newly created Azure and AWS programs on Udemy foundation the tendencies.
- Other than that, we have now additionally seen different course tendencies on Udemy by exploring Udemy knowledge contained in the SQL question engine.
The media proven on this article just isn’t owned by Analytics Vidhya and is used on the Creator’s discretion.