Exploring Boston through Airbnb

An analysis of Airbnb’s listing data

Shanghui Li
4 min readJan 30, 2021

Introduction

This post presents insights I have gleaned from analysing Airbnb listing data. The data captures scraped information from Airbnb listings in Boston that include price, property details, as well as host and neighbourhood information, among others.

I identified 3 questions to answer using the data and will address each in the sections below.

1. What are the best locations to stay at in Boston?

Airbnb properties are evaluated by customers across a few aspects including location. Chart 1 shows that on average, properties in North End, Beacon Hill and Back Bay are rated highest for location, suggesting that these neighbourhoods tend to offer guests the most. A quick check on Google Maps reveals that these areas are close to the Downtown area of Boston where more attractions are located. Hence, a plausible explanation for the high ratings given to properties in these neighbourhoods could be the proximity to places of attraction in Boston.

Chart 1: Average location ratings by neighbourhood

In choosing a property to stay at, location is likely an important factor in most people’s minds, and hence likely to be positively correlated with price. This appears to be true in this dataset, as shown in Chart 2 below, where each neighbourhood’s average listing price is plotted against the average location rating given by guests.

Chart 2: Property listing price is positively correlated with location rating

2. How can we describe each neighbourhood in Boston?

An Airbnb listing sometimes contains a description of the neighbourhood that the property is located in, written by the host. We can get a sense of what each neighbourhood is like by identifying the most common words used by hosts to describe each neighbourhood. We do this by generating word clouds for each neighbourhood based on all neighbourhood descriptions for properties in that neighbourhood.

Chart 3 shows the word cloud for the neighbourhood of Allston-Brighton. Several words stand out immediately — Harvard, walk, restaurant, student, safe etc. We can infer from the visualisation that Allston-Brighton is located close to Harvard University and is a safe area that’s conducive for walking. As further examples, charts 4 and 5 contain the word clouds for the neighbourhoods of Charlestown and North End respectively.

Chart 3: word cloud based on hosts’ description of the Allston-Brighton neighbourhood
Chart 4: word cloud for Charlestown
Chart 5: word cloud for North End

3. What are the key predictors of a property’s listing price?

For the final question, I attempt to build a simple linear model to predict a property’s price based on a subset of the listing attributes in the dataset. The independent variables include important property details such as capacity, location and type of accommodation, host details, ease of booking/cancellation, as well as guest ratings.

As part of exploratory data analysis, I first examined the correlation between the quantitative independent variables and the dependent variable (price) using a correlation heatmap shown in Chart 6.

Chart 6: Correlation heatmap of several quantitative features in the Airbnb dataset

The resulting linear model is able to explain about 40% of the variation in Boston Airbnb prices. In line with intuition, the following factors had a significant positive impact on prices — property capacity, number of rooms/beds, customer rating and host quality. However, there were also some counter-intuitive findings. For instance, having fewer reviews and an unverified host identity was correlated with higher prices. More details on the regression can be found here.

The root-mean-square-error for predicting prices out-of-sample is $165, which is large relative to the average listing price of $175. Hence, while the model gives us an indication of which factors may be more important for price prediction, it is at best a blunt tool for the purpose of predicting prices to a high degree of accuracy.

--

--