ChatGPT Code Interpreter: How It Saved Me Hours of Work | by Soner Yıldırım | Jul, 2023

Information cleansing and preprocessing

The subsequent step after studying the info is to wash and preprocess it. I’m positive you heard it a number of instances that this step often takes probably the most time.

There are a couple of points that have to be dealt with earlier than creating the world map visualizations. ChatGPT detected them and offered an answer in lower than a minute. Fairly spectacular!

(picture by writer)

The code within the above snippet:

# code generated by Code Interpreter
# Examine if the nation names are constant throughout each dataframes
population_countries = set(population_data['Country Name'])
coordinates_countries = set(country_coordinates['name'])

# International locations within the inhabitants knowledge however not within the coordinates knowledge
missing_in_coordinates = population_countries - coordinates_countries

# International locations within the coordinates knowledge however not within the inhabitants knowledge
missing_in_population = coordinates_countries - population_countries

There are some international locations lacking within the coordinates dataset and in addition some within the inhabitants dataset.

What ChatGPT did within the code snippet above:

  • Create Python units of nation names in each datasets
  • Examine them individually to search out the distinction as a set

ChatGPT solved it in lower than a minute:

The code within the above snippet:

# code generated by Code Interpreter
# Mapping of nation names from the inhabitants dataset to the coordinates dataset
country_name_mapping = {
'Bahamas, The': 'Bahamas',
'Brunei Darussalam': 'Brunei',
'Cabo Verde': 'Cape Verde',
'Congo, Dem. Rep.': 'Congo [DRC]',
'Congo, Rep.': 'Congo [Republic]',
"Cote d'Ivoire": "Côte d'Ivoire",
'Czechia': 'Czech Republic',
'Egypt, Arab Rep.': 'Egypt',
'Gambia, The': 'Gambia',
'Hong Kong SAR, China': 'Hong Kong',
'Iran, Islamic Rep.': 'Iran',
"Korea, Dem. Individuals's Rep.": 'North Korea',
'Korea, Rep.': 'South Korea',
'Kyrgyz Republic': 'Kyrgyzstan',
'Lao PDR': 'Laos',
'Macao SAR, China': 'Macau',
'Micronesia, Fed. Sts.': 'Micronesia',
'Myanmar': 'Myanmar [Burma]',
'North Macedonia': 'Macedonia [FYROM]',
'Russian Federation': 'Russia',
'Sao Tome and Principe': 'São Tomé and Príncipe',
'Slovak Republic': 'Slovakia',
'Syrian Arab Republic': 'Syria',
'Turkiye': 'Turkey',
'Venezuela, RB': 'Venezuela',
'Yemen, Rep.': 'Yemen',
'Eswatini': 'Swaziland'

# Clear the inhabitants dataset
population_data_clean = population_data.change({"Nation Identify": country_name_mapping})

# Drop the rows that don't exist within the coordinates dataset
population_data_clean = population_data_clean[population_data_clean['Country Name'].isin(coordinates_countries)]

# Merge the inhabitants knowledge with the nation coordinates knowledge
merged_data = pd.merge(population_data_clean, country_coordinates, left_on='Nation Identify', right_on='identify')

# Hold solely the required columns
merged_data = merged_data[['Country Name', '2022', 'latitude', 'longitude']]

# Rename the columns for readability
merged_data.columns = ['Country', 'Population', 'Latitude', 'Longitude']


Cleaned and preprocessed knowledge (picture by writer)

We now have the info in a pleasant and clear format. This step would usually take a variety of time and guide efforts. ChatGPT did it in a few minute.

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button