Restaurant Survey Exploratory Data Analysis in Python
This analysis aims to:
- Find out the average customer age and which age group gave us the best ratings.
- Discover any relationships that exsist between customer habits and customer ratings.
- Asses the effect of customer marriage status, age and habits on their budget.
KPI’s
- Overall Rating
- Service Rating
- Food Rating
Insights
- These are the averages for all ratings attributes:
df_all_average_ratings=df[['Overall Rating','Service Rating','Food Rating']].mean()
print(df_all_average_ratings)
- Average overall rating 3.225
- Average service rating 3.230
- Average food rating 3.220
- The average customer age was 40.17. Most customers surveyed are in the ‘twenties’ age group. The distribution is shown below

- The teens and fifties age groups consistently rate highest among all age groups while customers in the thirties age group consistently rated lower.



- Customers who said that they smoke often, gave considerably higher ratings than other customers. In overall rating on average, they rated 18% higer.



- Divorced customers had a 28% lower budget on average.

Recommendations
- Improving ratings among core customer age groups like twenties and thirties will push average ratings up.
- Investigate the reason for high ratings among the smoker demographic.
Code Snippets
def read_n_clean(url):
df = pd.read_csv(url)
#removing whitespace from headers. Noticed whitespace in the column header 'Alcohol'. Initially, 'Alcohol '
print("Removing whitespace from headers...")
df=df.rename(columns=lambda x: x.rstrip())
print("Creating new columns...")
df['Location']=df['Location'].replace('Central Park,ny','Central Park,NY')
df['Location']=df['Location'].replace('Market City, MY','Market City, NY')
df['Age'] = 2025 - df['YOB']
bins=['Teens','Twenties','Thirties','Fourties','Fifties','Sixties']
bins_edges=[15,20,30,40,50,60,70]
df['Age Group']=pd.cut(df['Age'],bins=bins_edges,labels=bins)
return df
The function above was used to read and clean the data. Resolving inconsistencies like white space in columns and also creating a column for age bins.
def validate_data(df):
# Extract info about dataset
dataframe_info={
"info":df.info(),
"shape":df.shape,
"describe":df.describe(),
"null_count":df.isnull().sum(),
}
return dataframe_info
This function extracts the info from the dataframe and stores it in a hashmap.