5 Steps to Rework Messy Capabilities into Manufacturing-Prepared Code | by Khuyen Tran | Jan, 2024


Picture by Writer

Capabilities are important in a knowledge science venture as a result of they make the code extra modular, reusable, readable, and testable. Nevertheless, writing a messy perform that tries to do an excessive amount of can introduce upkeep hurdles and diminish the code’s readability.

Within the following code, the perform impute_missing_values is lengthy, messy, and tries to do many issues. Since there are numerous hard-coded values, it could be not possible for another person to reuse this perform for a DataFrame with totally different column names.

def impute_missing_values(df):
# Fill lacking values with group statistics
df["MSZoning"] = df.groupby("MSSubClass")["MSZoning"].rework(
lambda x: x.fillna(x.mode()[0])
df["LotFrontage"] = df.groupby("Neighborhood")["LotFrontage"].rework(
lambda x: x.fillna(x.median())

# Fill lacking values with fixed
df["Functional"] = df["Functional"].fillna("Typ")

df["Alley"] = df["Alley"].fillna("Lacking")
for col in ["GarageType", "GarageFinish", "GarageQual", "GarageCond"]:
df[col] = df[col].fillna("Lacking")

for col in ("BsmtQual", "BsmtCond", "BsmtExposure", "BsmtFinType1", "BsmtFinType2"):
df[col] = df[col].fillna("Lacking")

df["FireplaceQu"] = df["FireplaceQu"].fillna("Lacking")

df["PoolQC"] = df["PoolQC"].fillna("Lacking")

df["Fence"] = df["Fence"].fillna("Lacking")

df["MiscFeature"] = df["MiscFeature"].fillna("Lacking")

numeric_dtypes = ["int16", "int32", "int64", "float16", "float32", "float64"]
for i in df.columns:
if df[i].dtype in numeric_dtypes:
df[i] = df[i].fillna(0)

# Fill lacking values with mode
df["Electrical"] = df["Electrical"].fillna("SBrkr")
df["KitchenQual"] = df["KitchenQual"].fillna("TA")
df["Exterior1st"] = df["Exterior1st"].fillna(df["Exterior1st"].mode()[0])
df["Exterior2nd"] = df["Exterior2nd"].fillna(df["Exterior2nd"].mode()[0])
df["SaleType"] = df["SaleType"].fillna(df["SaleType"].mode()[0])
for i in df.columns:
if df[i].dtype == object:
df[i] = df[i].fillna(df[i].mode()[0])
return df

This instance is tailored from the pocket book titled How I Achieved Top 0.3% in a Kaggle Competition, with just a few alterations.


Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button