As business evolves, the necessity for companies to leverage data has gone from a nice-to-have to a must-have.
The good news is a data analytics dashboard is not as complicated as it may sound for businesses to implement. Applications already generate immense amounts of data within their databases every day, and we can take advantage of these existing data to create an analytics dashboard.
In this article, I will present three core metrics we used to generate an analytics dashboard that would provide the quickest value for one of our client’s projects. Next, I’ll discuss some of the details of our technical implementation. I’ll conclude with a recommendation for development teams and business owners on the next steps for integrating existing data into an analytics dashboard.
We have been working on a document management application where our client’s users can upload files to collections of documents called work packets. As one of the features, we were tasked to build an analytics dashboard. The goal of this dashboard implementation was to allow users and stakeholders to easily visualize and analyze how many work packets were added and files uploaded to the app over time. We wanted the analytic dashboard to display the total numbers of users, work packets, and files added to the app during a given time frame (e.g., last week, last calendar month, from the app’s production launch date until now, etc.) and a chart that visualizes a trend of work packets per category that were added each month.
Our approach was driven by three goals:
- Keep it simple and maintainable (simple durable features instead of exquisite and complex brittle ones)
- Keep it performant (low latency, quickly refreshed, and available on demand)
- Keep it flexible to extend to new data types (flexibility to add new data types as business needs change)
Note: We used ReactJS in the frontend and Spring Boot in the backend for our analytics dashboard. However, you can use a tech stack of your own choosing to implement your own dashboard.
Goal One: Keep It Simple and Maintainable
We set up a single endpoint in the backend for our analytics dashboard, which collects and returns all the analytics data to the frontend for display to the user. We started with just a few data points to keep the architecture simple, and so that all the data points are immediately available for viewing in the frontend. This simple architecture makes the code easier to maintain at the outset, compared to setting up multiple endpoints before such complexity was truly necessary.
This simple approach can be extended to many different types of applications, beyond the document management application we built. As an example of leveraging existing data from a database to create an analytics dashboard, think of an e-commerce app. We could count the total numbers of users, purchased products, and purchased products by category and by user group, and expose them as data points in a single endpoint to get started. If you have a schema for it in your database and can access it, you are only limited by your imagination and the business problem you are trying to solve.
Goal Two: Keep It Performant
Caching of Historical Calculations
Performance can become an issue if the application needs to query the database and perform each calculation anew every time the statistics are requested from the client side. Therefore, we calculated each data point per month and saved the historical monthly data into a new database table called monthly_statistics. We used the auto-generated
modifiedAt timestamps on each database entry to calculate total monthly metrics for each data point (for example, the number of new users that were added in the month of January 2022).
These monthly totals for each data point are cached into the monthly_statistics table, starting from when the app was launched into production all the way to one month before the current month. This architectural decision allowed us to only calculate a statistic for a particular month once and store that data into a table, and from then on, every time we need to pull the data, we just need to query the table. That way, only the current (partial) month’s data are calculated fresh after each request, improving performance.
Here is an example of how this data might be exposed to the frontend in JSON:
An important thing to note is that the cached monthly_statistics totals are based on the calendar month (i.e., August 1st to August 31st). This limits the number of historical data points to cache, compared to calculating totals on a rolling basis for the “last 30 days” (i.e., counting from July 16th to August 15th). If a rolling “last x days’’ is what stakeholders request, it may be worth discussing with them the potential benefits of using calendar months instead, in order to potentially improve performance as the database scales to thousands of rows.
Ensuring the Cached Data Is Present
The analytics endpoint is hit only when a user goes to the analytics page. Because our cached data is only updated on demand, this means that the monthly_statistics table will not get updated if a user doesn’t visit the analytics page for over a month. For this reason, and to ensure that data is cached correctly in each environment, we loop through the months from
currentMonth on each request to ensure that each month’s data is stored in the database in memory. That way, there is no need to set up a scheduled job to calculate new historical data.
Here is how we implemented this caching logic when the analytics endpoint is hit:
- Grab the current month and year (
currentMonth) and the month and year when the first user was created in that environment (
- Loop through the months between
firstMonthand a month before
currentMonth, and check if each month’s data is stored in the monthly_statistics table. If any are missing, calculate the totals for that month and cache the data.
- Exit the loop when the month and year matches
As a result, monthly_statistics will have monthly totals from when the first user was created in the environment all the way to a month before the current month, regardless of when users happen to visit the analytics dashboard and hit this backend endpoint, and regardless of when an environment was stood-up.
Goal Three: Keep it flexible to extend to new data types
Based on our chart designs, we also wanted to make sure that the chart is responsive to the data based on the different slices of time users would want to see. Chartjs offers animations when changing displayed data, which makes the dynamically changing chart smooth and fun to look at!
Your business is already creating data everyday in your application’s database, and now is the time to turn that data into insights! First, ask yourself: what business questions do I have and what insights do I need? Once you know the business questions, next ask yourself: do I already have the data I need? If so, focus on goal one and begin by generating analytics data in a simple and maintainable way. Once your analytics data is prepared for display, focus on creating a performant solution to keep your users happy and help them quickly answer those business questions. Lastly, stay flexible. Don’t get locked into today’s business paradigm or visualization as the market changes. Leverage open source and highly fluid visualizations that scale towards the future.