Data Visualization Web App Using Streamlit: Step-by-Step Guide

Hi there! In this article, we will create an interactive data visualization web app by using the Python library streamlit.

What I love about streamlit is you can create a web application straight in Python without using HTML, CSS, and JavaScript.

So guys, are you ready?

Let’s start!

Let’s take a look at our final app

First of all, let’s take a look at the data visualization web app we are going to build today.

Data Visualization web app using streamlit gif

The data is directly coming from a CSV file. For this project, I have created dummy survey results by gender, age, income, education, occupation, and satisfaction.

I have created these results using a simple Python script:

import csv
import random

# Set up the possible values for each column
genders = ['Male', 'Female']
ages = list(range(18, 66))
incomes = list(range(20000, 100001, 1000))
educations = ["High School Diploma", "Associate's Degree", "Bachelor's Degree", "Master's Degree", "Doctorate"]
occupations = ["Engineer", "Teacher", "Doctor", "IT Professional", "Writer", "Manager", "Lawyer", "Journalist", "Designer", "Marketing Professional"]
satisfactions = list(range(1, 11))

# Generate 500 random survey responses
responses = []
for i in range(1, 501):
    gender = random.choice(genders)
    age = random.choice(ages)
    income = random.choice(incomes)
    education = random.choice(educations)
    occupation = random.choice(occupations)
    satisfaction = random.choice(satisfactions)
    
    response = (i, gender, age, income, education, occupation, satisfaction)
    responses.append(response)

# Write the responses to a CSV file
with open('survey_results.csv', 'w', newline='') as file:
    writer = csv.writer(file)
    writer.writerow(['id', 'gender', 'age', 'income', 'education', 'occupation', 'satisfaction'])
    writer.writerows(responses)

This Python script will generate a CSV file with random data in the same directory.

I will not explain this code in this tutorial because that will be out of the scope of this article. But if you want me to explain this code then please let me know in the comment section below. I will write another article explaining this code.

We have two sections on our website: a sidebar and a main section.

Web app image highlighting the sidebar and main section

At the top of the sidebar, we have an “Education:” filter selection where we can narrow down the results by choosing different education levels.

After that, we have Occupation and Gender filters where we can further narrow down the results. Then we have Age and Income sliders.

In the main section, we have our page title at the top. Then we have our data frame showing the data based on the options we select in the sidebar.

After the data frame, we have the Available Results heading which shows the number of available results. Then we have some bar charts and in the end, we have some pie charts to show the ratio of total participants by occupation and by education.

Let’s build it

Now that you have seen what we are going to build let’s start our project.

First, open your favorite code editor and make a project directory. Name it whatever you want. I am using VS Code for this project. It’s lightweight and fast and I love it.

Anyways, let’s move toward the first step.

Step 1: Install dependencies

First of all, install the necessary libraries that we are going to use in this project. We’ll use three libraries for this project: Streamlit, pandas, and Plotly.

So, open the command prompt and install the libraries using this command.

pip install streamlit pandas plotly

Step 2: Import the libraries

Let’s begin with importing the libraries. We are using streamlit to build the interface of our web app, pandas for data manipulation, and plotly to plot charts.

import streamlit as st
import pandas as pd
import plotly.express as px

Step 3: Set the page title

Now, we will set the page title and the header title of our application by using st.set_page_config() and st.title(). I have set the page title as “Excel Data Visualization” and the heading as “Survey Results”.

st.set_page_config(page_title="Excel Data Visualization", page_icon=":heart:")

st.title(":bar_chart: Survey Results")

You may have noticed I have also used emojis.

Yes! Streamlit supports emojis. You can do this by typing the name of emojis between ::. You can find all the emojis’ names on this website.

to check if everything is working properly, run the app by typing the following command in the terminal. Make sure you are in the same directory where you created your Python file.

streamlit run app.py

Till now your page should look like this:

front end showing the heading of our web app

Step 4: Load and display the data frame

Now, it’s time to load your CSV file using pandas read_csv() method.

df = pd.read_csv('survey_results.csv')

After loading the data you can show your data frame on the front end using streamlit’s dataframe() method.

st.datafram(df)

Now, you should be able to see your data on the front end. Streamlit also supports hot reload which means whenever we are going to make changes in the Python file the changes will be visible on the website without restarting the server.

You just need to click on the Rerun button in the top-right corner of the website. You can also click the always Rerun button which means as soon as you make changes in the Python file and save the changes will be visible on the front end without clicking the Rerun button.

Isn’t that cool?

Front end of our web app showing the dataframe

Perfect!

Everything is working fine so far.

Step 5: Create the sidebar

To create the sidebar use streamlit’s sidebar method.

with st.sidebar:
    education = df['education'].unique().tolist()

    education_selection = st.multiselect('Education:',
                education,
                default=education)

I have created the sidebar using the sidebar method. Here I created a multi-select box to filter out the data using the multiselect() method.

I have stored all the unique education values using the pandas unique() method in a list form using tolist() method.

In the multiselect() method I have passed three arguments. The first one is for the label of the box. The second one is for the available options and the third one is for the default values. In this case, I have set the education list as the default value.

As the user selects the options in the box it gets stored in the variable education_selection which we will use later for the data filtration.

You can create the multiselect box occupation and gender also using the same procedure.

Here is what the code looks like:

with st.sidebar:
    education = df['education'].unique().tolist()

    education_selection = st.multiselect('Education:',
                education,
                default=education)
                
    occupation = df['occupation'].unique().tolist()

    occupation_selection = st.multiselect('Occupation:',
                occupation,
                default=occupation)

    gender = df['gender'].unique().tolist()

    gender_selection = st.multiselect('Gender:',
                gender,
                default=gender)

Now, the boxes should be visible in the sidebar.

Interface of our web app

To create the age and income slider I have used the slider() method.

with st.sidebar:
    education = df['education'].unique().tolist()

    education_selection = st.multiselect('Education:',
                education,
                default=education)

    occupation = df['occupation'].unique().tolist()

    occupation_selection = st.multiselect('Occupation:',
                occupation,
                default=occupation)

    gender = df['gender'].unique().tolist()

    gender_selection = st.multiselect('Gender:',
                gender,
                default=gender)

    age = df['age'].unique().tolist()

    age_selection = st.slider('Age:',
                              min_value=min(age),
                              max_value=max(age),
                              value=(min(age), max(age))
                              )
    
    income = df['income'].unique().tolist()

    income_selection = st.slider('Income:',
                                 min_value=min(income),
                                 max_value=max(income),
                                 value=(min(income), max(income)))

For the slider minimum and maximum values, you can get the values from min() and max() functions.

The value from the slider is stored in the form of a tuple representing the range that will be used for the data filtration.

Let’s see how our application looks so far:

Frint end with sidebar and min section containing the dataframe

Step 6: Filter the results

You are doing great so far!

Now, let’s filter the result.

mask = (df['income'].between(*income_selection) & df['age'].between(*age_selection) & df['education'].isin(education_selection) & df['occupation'].isin(occupation_selection) & df['gender'].isin(gender_selection))

st.dataframe(df[mask])

Don’t be scared! I will explain this line of code.

Here I have created a mask using the pandas between() method for numeric values and isin() method for non-numeric values.

You may have noticed I have combined the filters using the & operator and stored them in the mask variable.

This time I displayed the data frame using the mask filter. Now, you can see your data be filtered using the selection boxes and sliders.

You can also show the number of results based on the filter:

results = df[mask].shape[0]
st.markdown(f'### Available Results: {results}')

Here is how it looks:

Front end of our web app

Step 7: Group the data and plot the bar charts

Now that we have our filter mask we can step forward and group the data frame.

df_grouped = df[mask].groupby(by='satisfaction').count()[['age']]
df_grouped = df_grouped.rename(columns={'age': 'votes'})
df_grouped = df_grouped.reset_index()

For our first bar chart, we will group the data by satisfaction rating, counting all values and only returning the age column.

Remember! We will also need to apply the mask on the data frame.

Next, rename the age column to votes and reset the index.

Now we can use our grouped data frame and plot the bar chart.

bar_chart = px.bar(df_grouped,
                   x='satisfaction',
                   y='votes',
                   text='votes',
                   title='<b>Votes</b>',
                   color_discrete_sequence=['#00FF66'],
                   template='plotly_white')
                   
st.plotly_chart(bar_chart)

You can also change the color of the bars in the bar chart using the color_discrete_sequence. I have stored the chart in bar_chart variable and finally plotted the bar chart using st.plotly_chart().

Next, you can also plot the chart for average income by occupation.

average_income = df.groupby(by='occupation').mean()[['income']].sort_values(by='income')

average_income_chart = px.bar(average_income,
                              x='income',
                              y= average_income.index,
                              orientation='h',
                              title='<b>Average Income By Occupation</b>',
                              color_discrete_sequence=['#00FF66'],
                              template='plotly_white')

st.plotly_chart(average_income_chart)

You can use the approach to plot the bar chart for average income by occupation. But this time you need to group the data by occupation and take the mean by using mean() method.

You can plot the horizontal bar chart by setting the orientation parameter to 'h'.

Let’s see how our charts look on the front end:

bar chart in our data visualization web app

Amazing!

You can also create columns in streamlit.

col1, col2 = st.columns(2)

Now, let’s create a pie chart to display the ratio of participants.

pie_occupation = df.groupby(by='occupation').count()[['id']]
pie_occupation = pie_occupation.rename(columns={'id': 'participants'})
col1.dataframe(pie_occupation)

Here I have grouped the data by occupation and counted the total number of participants by occupation using the count() method.

As we will be using this grouped data to plot our pie chart.

I have displayed the data frame in column 1.

pie_chart_occupation = px.pie(pie_occupation,
                   title='Total No. of Participants by Occupation',
                   values='participants',
                   names=pie_occupation.index)

col2.plotly_chart(pie_chart_occupation)

Here, I have plotted the pie chart right next to the data frame in column 2.

Here’s how it looks.

pie chart for participants' ratio by occupation

Similarly, you can plot the participants’ ratio by education.

pie_education = df.groupby(by='education').count()[['id']]
pie_education = pie_education.rename(columns={'id': 'participants'})
col1.dataframe(pie_education)

pie_chart_education = px.pie(pie_education,
                   title='Total No. of Participants by Occupation',
                   values='participants',
                   names=pie_education.index)

col2.plotly_chart(pie_chart_education)

Here’s how both charts look.

Pie chart for data visualization web app

So, that’s it. We have successfully created our data visualization web app using streamlit.

Here’s the final code:

import pandas as pd
import streamlit as st
import plotly.express as px


#---------- Set the page title ----------
st.set_page_config(page_title="Excel Data Visualization", page_icon=":heart:")
st.title(":bar_chart: Survey Results")

#---------- Load the dataframe ----------
df = pd.read_csv('survey_results.csv')


#---------- Create the Sidebar ----------
with st.sidebar:
    education = df['education'].unique().tolist()

    education_selection = st.multiselect('Education:',
                education,
                default=education)

    occupation = df['occupation'].unique().tolist()

    occupation_selection = st.multiselect('Occupation:',
                occupation,
                default=occupation)

    gender = df['gender'].unique().tolist()

    gender_selection = st.multiselect('Gender:',
                gender,
                default=gender)

    age = df['age'].unique().tolist()

    age_selection = st.slider('Age:',
                              min_value=min(age),
                              max_value=max(age),
                              value=(min(age), max(age))
                              )
    
    income = df['income'].unique().tolist()

    income_selection = st.slider('Income:',
                                 min_value=min(income),
                                 max_value=max(income),
                                 value=(min(income), max(income)))


#---------- Filter the data ----------
mask = (df['income'].between(*income_selection)) & (df['age'].between(*age_selection)) & (df['education'].isin(education_selection)) & (df['occupation'].isin(occupation_selection)) & (df['gender'].isin(gender_selection))

st.dataframe(df[mask])

results = df[mask].shape[0]
st.markdown(f'### Available Results: {results}')

#---------- Group the data for bar chart by satisfaction ----------
df_grouped = df[mask].groupby(by='satisfaction').count()[['age']]
df_grouped = df_grouped.rename(columns={'age': 'votes'})
df_grouped = df_grouped.reset_index()

#--------- Plot the bar chart ----------
bar_chart = px.bar(df_grouped,
                   x='satisfaction',
                   y='votes',
                   text='votes',
                   title='<b>Votes</b>',
                   color_discrete_sequence=['#00FF66'],
                   template='plotly_white')

st.plotly_chart(bar_chart)


#---------- Group the data for bar chart by occupation ----------
average_income = df.groupby(by='occupation').mean()[['income']].sort_values(by='income')

#--------- Plot the bar chart ----------
average_income_chart = px.bar(average_income,
                              x='income',
                              y= average_income.index,
                              orientation='h',
                              title='<b>Average Income By Occupation</b>',
                              color_discrete_sequence=['#00FF66'],
                              template='plotly_white')

st.plotly_chart(average_income_chart)

#---------- Create the columns ----------
col1, col2 = st.columns(2)

#---------- Group the data for pie chart by occupation ----------
pie_occupation = df.groupby(by='occupation').count()[['id']]
pie_occupation = pie_occupation.rename(columns={'id': 'participants'})
col1.dataframe(pie_occupation)

#---------- Plot the pie chart ----------
pie_chart_occupation = px.pie(pie_occupation,
                   title='Total No. of Participants by Occupation',
                   values='participants',
                   names=pie_occupation.index)

col2.plotly_chart(pie_chart_occupation)

#---------- Group the data for pie chart by education ----------
pie_education = df.groupby(by='education').count()[['id']]
pie_education = pie_education.rename(columns={'id': 'participants'})
col1.dataframe(pie_education)

#---------- Plot the pie chart ----------
pie_chart_education = px.pie(pie_education,
                   title='Total No. of Participants by Occupation',
                   values='participants',
                   names=pie_education.index)

col2.plotly_chart(pie_chart_education)

Enjoy!

Final Thoughts

Hopefully, you see how easy it is to create a data visualization web app directly in Python by using Streamlit. This is just a demo site that shows only a fraction of the possibilities that streamlit offers.

You must check out the Streamlit documentation where you can find all the features of this amazing web framework.

I hope you enjoyed creating a web application using Streamlit. If you liked this blog post then please share it with your friends which will really help me a lot.

You may also enjoy my previous article: 10 Easy Web Apps You Can Build with Python Today

Thanks see you next time. Happy Coding!

Leave a Reply

Your email address will not be published. Required fields are marked *