The dataset consists of over 600K craigslist listings for housing rentals throughout California. Each observation includes text features (i.e., title and description), numeric features such as price, number of bedrooms/badrooms, and categorical feature such as city, and whether or not pets are allowed. This particular visualization focuses on the 78k listings in San Francisco
This visualization in a geographic symbol map where each point represents a rental listing. The color of the point indicates which pricing bucket the particular listing falls under. When hovering over points, the tooltip at the top left populates with information about that listing. You can also hover over the pricing buckets on the right-hand side to assess the geographical distribution of listings that fall in each price range. Moreover, clicking on the pricing bucket will remove all points that do not correspond to that price range, allowing you to then hover over a smaller set of listings.
First step in data preparation was to filter for San Francisco listings. Next, I only wanted to include listings that had a valid Neighborhood so that I could display it in the top left tooltip. Neighborhood information was contained within the title, typically at the end and wrapped with parentheses. Since this info is user-entered, there were many incorrectly inputted neighborhoods. Therefore, I chose to only include listings with neighborhoods that were listed at least five times. The next step was to create pricing buckets. The buckets listed were created using quartiles for prices, where each bucket includes around the same number of observations.