Restaurant Survey Exploratory Data Analysis in Python

Link to code

This analysis aims to:

  1. Find out the average customer age and which age group gave us the best ratings.
  2. Discover any relationships that exsist between customer habits and customer ratings.
  3. Asses the effect of customer marriage status, age and habits on their budget.

KPI’s

  • Overall Rating
  • Service Rating
  • Food Rating

Insights

  • These are the averages for all ratings attributes:
df_all_average_ratings=df[['Overall Rating','Service Rating','Food Rating']].mean()
print(df_all_average_ratings)
  1. Average overall rating 3.225
  2. Average service rating 3.230
  3. Average food rating 3.220
  • The average customer age was 40.17. Most customers surveyed are in the ‘twenties’ age group. The distribution is shown below
  • The teens and fifties age groups consistently rate highest among all age groups while customers in the thirties age group consistently rated lower.
  • Customers who said that they smoke often, gave considerably higher ratings than other customers. In overall rating on average, they rated 18% higer.
  • Divorced customers had a 28% lower budget on average.

Recommendations

  1. Improving ratings among core customer age groups like twenties and thirties will push average ratings up.
  2. Investigate the reason for high ratings among the smoker demographic.

Code Snippets

def read_n_clean(url):
    df = pd.read_csv(url)
    #removing whitespace from headers. Noticed whitespace in the column header 'Alcohol'. Initially, 'Alcohol '
    print("Removing whitespace from headers...")
    df=df.rename(columns=lambda x: x.rstrip())
    print("Creating new columns...")
    df['Location']=df['Location'].replace('Central Park,ny','Central Park,NY')
    df['Location']=df['Location'].replace('Market City, MY','Market City, NY')
    df['Age'] = 2025 - df['YOB']
    bins=['Teens','Twenties','Thirties','Fourties','Fifties','Sixties']
    bins_edges=[15,20,30,40,50,60,70]
    df['Age Group']=pd.cut(df['Age'],bins=bins_edges,labels=bins)
    return df

The function above was used to read and clean the data. Resolving inconsistencies like white space in columns and also creating a column for age bins.

def validate_data(df):
    # Extract info about dataset
    dataframe_info={
    "info":df.info(),
    "shape":df.shape,
    "describe":df.describe(),
    "null_count":df.isnull().sum(),
    }
    return dataframe_info

This function extracts the info from the dataframe and stores it in a hashmap.

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top