How you can Create Artificial Knowledge. Go from Nothing to a Full Dataframe… | by Kurt Klingensmith | Feb, 2024


Go from nothing to a whole dataframe with Python

Picture by Joshua Sortino on Unsplash.

After submitting a latest article to In the direction of Knowledge Science’s editorial crew, I acquired a message again with a easy inquiry: are the datasets licensed for industrial use? It was an incredible query — the datasets in my draft got here from Seaborn, a standard Python Library that comes full with 17 pattern datasets [1]. The datasets definitely appeared open supply and, certain sufficient, many had simply discoverable licenses authorizing industrial use. Sadly for me, I occurred to choose one of many few datasets that I couldn’t discover a license for. However as a substitute of switching to a special Seaborn dataset, I made a decision to make my very own Artificial Knowledge.

What’s Artificial Knowledge?

IBM’s Kim Martineau defines Artificial Knowledge as “info that’s been generated on a pc to enhance or substitute actual knowledge to enhance AI fashions, defend delicate knowledge, and mitigate bias” [2].

Artificial Knowledge might look like info from a real-world occasion, however it’s not. This avoids licensing points, hides proprietary knowledge, and protects private info.

Artificial Knowledge differs from anonymized or masked knowledge, which takes actual knowledge from precise occasions and alters sure fields to make the information non-attributional. If you happen to’re searching for anonymizing names in knowledge, you’ll be able to learn a how-to on name anonymization here.

Artificial Knowledge doesn’t must be good. In my previous article’s use case, I used to be writing a information on methods to use the Python GroupBy() perform. All I wanted was a dataset that had numeric knowledge, categorical knowledge, and a website (on this case, pupil take a look at scores and grades) comprehensible to the reader to assist me ship the message. Primarily based on the work for that article, beneath I’ll present a information on constructing a Artificial Dataset of your personal.


The Jupyter pocket book with full Python code used on this walkthrough is available at the linked github page. Obtain or clone the repository to observe alongside!

The code requires the next libraries:

# Knowledge Dealing with
import pandas as pd
import numpy as np

# Knowledge visualization
import plotly.categorical as px

# Anonymizer:
from faker import Faker


Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button