Metabase On Heroku



How do you begin combining data from cloud applications with your internal databases to gain insight into your business? Maybe your organization has already standardized on Metabase as your analytics tool, but you're still learning about using it with multiple data sources.

The final piece of our system is the ability to use SQL to query and visualize data in Redshift; for this we can use Metabase, a free open-source product analytics tool that you can deploy to Heroku using Metabase's Heroku Button. Once deployed, you'll need to configure Metabase with the RedShift cluster URL, database name, username, and password. The button below will deploy the branch where this README.md lives onto Heroku. Metabase developers use it to deploy branches of Metabase to test our PRs, etc. We’ve run Metabase on Heroku and it works just fine, but it’s not hardened for production use just yet. If you’re up for it then give it a shot and let us know how we can make it better! Running on Debian as a service Community support only at this time, but learn how to deploy Metabase as a service on Debian (and Debian-based) systems.

Metabase Heroku Update

To analyze data from diverse sources, you need a data warehouse that consolidates all of your data in a single location. Most businesses take advantage of cloud data warehouses such as Amazon Redshift, Google BigQuery, or Snowflake.

You can extract data that you have stored in SaaS applications and databases and load it into the data warehouse using an ETL (extract, transform, load) tool. Once the data is available, your analysts can use it to create reports. In this post, we'll look at how to start from scratch and create a report using Metabase, an open source business intelligence (BI) tool that's free to download and use.

Metabase's visual query builder lets you generate simple charts and dashboards, or you can use SQL to create more complex visualizations, as we'll do here. Each query starts by clicking a button to 'ask a question.' Metabase is simpler than tools like Tableau and Power BI, offering fewer features, but it's correspondingly simpler to learn.

Reporting, data warehouses, and ETL

Metabase can run in the environment of your choice via a Docker image, on AWS Elastic Beanstalk, or on Heroku, or you can run it as a native application on macOS or as a Java jar file. I used macOS version 0.31.1.

Per its FAQ, 'Metabase is primarily meant to work with actual databases.' If you want to analyze data in SaaS platforms, the developers 'suggest that you use other tools to build a data warehouse with the data you need.'

That's what we'll do with Stitch, a simple, powerful ETL service for businesses of all sizes, up to and including the enterprise. It can move data from dozens of data sources. Sign up and you can be moving data to a data warehouse in five minutes. Choose at least one of Stitch's integrations as a data source. Most businesses get more value out of their data as they integrate more data sources; for our example, we'll look at data from two separate sources.

Setting up a data warehouse

I used some of Stitch's real data to build a data visualization for this post. Specifically, I was curious to see how many support conversations came from each of Stitch's pricing tiers. To find out, I'd need to use billing data stored in Salesforce and join it with support conversation data from Intercom to create a visualization of support conversations.

My first step was to set up a BigQuery data warehouse to store the data from the two SaaS platforms. Following our documentation, I added to my Google user the permissions that Stitch would need to load data to BigQuery.

Using Stitch makes extracting data from a source and loading it into a data warehouse easy. I logged in to Stitch and added new integrations for Salesforce and Intercom, following the instructions in our documentation. From Salesforce I selected the Account table to replicate. Stitch's Intercom integration automatically extracts data from all available tables. From both, I chose the period from which to replicate data.

Once I'd set up my integrations I added BigQuery as my destination and specified a BigQuery project name. Within a few minutes, Stitch extracted the data I specified and loaded it into BigQuery.

Building a sample report

Next I turned to Metabase.

To start, I clicked on Metabase's settings icon, then Admin. I chose Add a Database and entered the names of my BigQuery project ID and other information, which I obtained from the BigQuery dashboard. As part of that process I had to generate an OAuth 2.0 Client ID and Client Secret — a simple process that involved entering some simple information on a screen connected to a Click here link. Similarly, I generated an Auth Code by providing my Google authentication information.

Once you have a database to work with, you can 'ask a question' in one of three ways:

If you choose Metrics, Metabase directs you to create segments and metrics from its admin panel. You can choose Custom for slightly more complex queries on a single table. I wanted to join data from two tables, but a Metabase blog post says:

Our goal with Metabase has always been to provide a way for non-technical users to answer their own questions in a self-serve manner. While joins are a great tool that a skilled analyst or programmer might reach for, we will be trying to add features that expose a highly specific, easily understood operation that someone who isn't SQL fluent would understand.

Fortunately, we're SQL-fluent around here. I chose Native Query, which lets you paste in SQL code.

A look at the Intercom schema showed me that to associate the number of conversations with the Stitch customer ID, I'd need to join the Intercom Users and Conversations tables. The Users table contains a field called companies that contains information about the companies a given user is associated with. That field is a list that could contain multiple values – a nested data structure, in other words. Many data warehouses don't support nested data structures, but BigQuery does. It stores each list in an array.

Heroku

Metabase On Heroku Roblox

Retrieving data from a nested data structure is tricky if you're used to working with fully normalized data. My colleague Erin, Stitch's senior technical documentation manager, set me on the right path with this query:

The SQL UNNEST operator takes an array and returns a table, with one row for each element in the array.

To get the plan tier information from our Salesforce Account table, I created another query:

I used those those queries, along with data from Intercom's Conversations table, to retrieve the data I wanted, joining Intercom and Salesforce data. I used a case clause to consolidate data from monthly and annual billing plan into a single bar for each pricing tier:

Running this query in the BigData console gave me the data I wanted. Pasting it into Metabase gave me a table of data. To turn it into a visualization, I selected Bar from the Visualization drop-down and chose tier as the X axis. The result was a useful visualization.

Metabase lets you customize bar colors and axis labels. When you're satisfied with your visualization, you can save it in a collection and publish it to a dashboard that other users can share.

So there you have it – a quick walk through the process of using an ETL tool like Stitch to move data from multiple sources into a data warehouse, then report on it using Metabase. Sign up for a free trial of Stitch and start creating your own.