The Future of Method Tracing In Mobile Apps? Android Profiler vs Specto

Production profiling makes its mobile debut.

Published in

Specto

9 min readMay 8, 2021

The Android Profiler is a tool built into Android Studio to help you profile your app in real-time. Specto is a cloud-based performance management tool meant to be deployed in production as part of your app. Both have essentially the same goal: to produce insights that will improve your app’s performance, making it faster and resource-efficient. But their approach differs—if you’ve only ever used the Android Profiler, Specto may provide fresh insights and help you keep track of performance on an ongoing basis with minimal effort.

Here we’ll specifically take a look at the method tracing feature which monitors Java method calls—and Kotlin functions—over a period of time, essentially letting us know which methods were called, including their order and duration. This is arguably the most important tool of the performance box! With it, you can identify long-running methods, operations that are blocking the main thread, and much more. Both the Android Profiler and Specto also measure things like memory usage and network activity but those will be topics for another day.

Local Profiling vs Production Profiling

This is the foundational difference between the two tools. The Android Profiler is used locally. You, the developer, launch your app on your device with the profiler turned on—you navigate around, performing the operations you want to profile, and voila! You get a method trace back. This is a great first step, and you will find performance improvements this way. I did and still do. But it has significant limitations.

One Device

Perhaps the most obvious shortcoming is that a single device is profiled at a time. Because of the manual nature of the process, it quickly becomes tedious to run multiple profiles. Unfortunately, devices will perform very differently depending on their hardware, OS, network conditions, etc. In 2015 there were already 24,000 different Android devices. Profiling one device a few times can reveal some issues, but many more may remain hidden.

One Flow

Similarly, only one flow is profiled at a time. Apps can get pretty complicated, and the number of flow permutations tends to increase exponentially 🤯. Even if you had the time to manually profile the critical ones, would you remember them all? I didn’t—not for the apps I’ve worked on.

One Time

Completing the trio is the fact that each manual profile is run once, presumably after performance has noticeably degraded, or maybe before launching a new feature. But what about the rest of the time? For all apps, but especially large ones, performance regressions can be introduced at any point. Maybe a third-party library does something funky in the version you just upgraded to. Or one of your callbacks unexpectedly runs on the main thread. Again, because of how varied devices and app usages are, you may never experience the problem yourself, but some of your users will.

vs Production Profiling

I know exactly what you’re thinking. Can I automatically profile my app a bunch of times on a bunch of flows and a bunch of devices? Yes! This is what Specto does. It’s called “production profiling” because it is done continuously in production—that means real devices and real users doing whatever it is they do. Sending Yos, ordering food, catching Pokemon, etc.

As the developer, you still get to decide which parts of the app should be monitored by defining start and end points in your codebase. For example, you may want to trace the transition from one screen to another. Simply call:

Specto.startTrace(SCREEN_A_TO_SCREEN_B)

— before launching the next screen and:

Specto.endTrace(SCREEN_A_TO_SCREEN_B)

—once the new screen is fully loaded. That’s it. Now this flow’s performance is monitored across devices and conditions at all times. The traces can be sorted, filtered, and analyzed in aggregate or individually via Specto’s web dashboard.

Here we can see a list of traces updated in near real-time. They correspond to the app’s startup, which is always a critical flow. The graph helps us find outliers—each dot is a trace, color-coded based on the device class. Pink corresponds to high-end devices, and we can see all those traces have a short duration. The startup duration for mid and low-end devices (in violet and blue) varies greatly, from 200 milliseconds to 6+ seconds. We can go straight to problematic runs, rather than hoping to produce them locally.

Visualizing and Analyzing Traces

Both tools offer similar ways to visualize traces. For some of its views, Specto uses the excellent, open-source viewer speedscope. Here is a high-level comparison of the main features.

Time Order

Method calls are displayed per thread, in the order in which they are called. This view is particularly useful to get the full context around suspicious method calls: what happened before, what happened during the call, etc.

Flame Chart / Left Heavy

Methods are displayed left to right in decreasing order of their total run time. I find this view to be great at highlighting the most valuable methods to optimize.

Top Down / Bottom Up

Top down means each method in the list can be recursively expanded to show callees. Bottom up is the reverse, each method can be expanded to show callers. In both cases, we are presented with a list rather than a graph that highlights the duration of each method.

Aggregates

Each of the previous views is great to dig into individual traces, but one benefit of Specto is it also aggregates data from multiple traces. Here we can see functions (aka. methods) sorted by their 75th percentile duration.

Profiling Overhead

Both local and production profiling have overhead, meaning the performance monitoring has an impact on the performance of your app. There are two things to consider here:

Some of the measurements may be more or less inflated due to the profiling. As long as the overhead is small or predictable we can still get a lot of value from the data gathered, and so it’s good to have a rough idea of what the overhead is and to reduce it whenever possible. For example, you’ll want to avoid profiling debuggable apps.
When profiling in production, the performance overhead can have an impact on the user experience. This is a non-issue with local profiling since the app developer is the only person experiencing it.

The overhead of the Android Profiler will vary a lot depending on several configuration options. Assuming you are using a release build, which is what production profiling would run on, it will mostly come down to your choice of what Android Studio calls sampling vs tracing.

The Android Profiler CPU profiling mode selector.

Sampling records the method call stack at frequent intervals whereas tracing records the precise start and end of each method call. Sampling has less overhead, but the recorded times are less precise and short-running methods may be completely absent from the resulting data. Tracing captures everything but in my experience has a very noticeable overhead and may even cause apps to become unresponsive.

Sampling is a good compromise because profiling is most often used to identify long-running methods, which are apparent using either technique. The sampling frequency determines how precise the measurements are and how short a method run would have to be to escape detection.

Note: in the rest of this article tracing is used for its generic meaning of recording information about an app’s execution — it includes sampling.

The Android Profiler takes 1,000 samples per second by default but can be configured to any frequency. Specto, on the other hand, takes 300 samples per second. This means that method times are precise to about 3.3 milliseconds, rather than 1 millisecond for the Android Profiler. Is that enough? I think so. Methods that are worth optimizing usually run in tens or hundreds of milliseconds, so 3.3ms doesn’t make much of a difference. In practice, I’ve seen apps get a lot of value out of even 100 samples per second. Why pick 300? We found it to be a sweet spot when it comes to balancing value and overhead.

Overhead in production profiling is a tricky thing. It must be low enough so that the whole endeavor is worth it. You don’t want to noticeably worsen your app’s performance in order to improve it. Here’s how I think about this:

The overhead should be low enough that most users won’t notice it. For example, it doesn’t make the app janky or interactions noticeably longer.
It should also be low enough that a single, small performance improvement can fully offset it.

To measure this I created a benchmark to compare the performance of an app with and without Specto. I wanted to create a scenario that would be strenuous, particularly for the UI thread—if the overhead was low when the app was doing a ton of stuff, it would undoubtedly be low enough when it wasn’t doing much. And I wanted a measure that took the user perception into account.

So I wrote an instrumented test that flings through a list of items where each item contains a randomly generated bitmap with no view recycling and non-stop calculations happening on a background thread. Intense! I opted to measure CPU time, wall time and the number of frames dropped.

The Specto benchmark in action. (The GIF frame rate is not representative of the original test.)

Running the benchmark over several months I found that when taking 300 method tracing samples per second the overall Specto overhead—not just method tracing—is consistently between 2 and 6%. The precise number is largely dependent on the device specs. I feel really good about that range because:

It’s within industry standards for application performance monitoring tools. Google Cloud Profiler targets 5% overhead for continuous profiling in production. Azure adds 5 to 15% overhead when profiling server applications. Etc.
Generally, users will not notice a 6% difference. For example, if it took 1 second to go from one screen to another, it now takes about 1.06 seconds.
If you’ve never profiled a flow before, I can almost guarantee that you’ll find a way to shave off 6% (actually you only need to improve performance by 5.66% to get back to your baseline but whatever). Every improvement after that is pure gain!

Lightning Round

Some questions you may have, answered in brief:

SDK size: The Specto SDK weighs about 1MB per architecture.
Concurrent traces: Like the Android Profiler, Specto doesn’t support concurrent traces at the moment. However, you can use spans to segment parts of your traces and those can be concurrent.
Dark Mode: We have it!
C++ support: Unlike the Android Profiler, Specto does not currently support C++ function tracing for Android. If it’s something you’d be interested in, please leave a comment or reach out to us in some way!

In Closing

The Android Profiler is great and keeps getting better, but the data you can reasonably collect from it is naturally limited. Specto, using production profiling, is betting that more performance data will mean more performance improvements, fewer regressions and that the profiling overhead will pay for itself many times over.

If you’ve got an Android app, Specto is free for up to 10,000 active installations and can scale to any number with our paid plans. I hope you’ll give it a try and let me know how it goes! 😊