This is a continuation of Part 1 which covered how Flutter compiles apps and what snapshots look like internally.
As you have probably guessed so far, reverse engineering its not an easy task.
Calling conventions
Let's first cover some basics about Dart's type system:
void main() {
void foo() {}
int bar([int aaa]) {}
Null biz({int aaa}) {}
int baz(int aa, {int aaa}) {}
print(foo is void Function());
print(bar is void Function());
print(biz is void Function());
print(baz is void Function());
}
Which functions do you think print true?
It turns out the Dart type system is much more flexible than you might expect, as long as a function takes the same positional arguments and has compatible return type it is a valid function subtype. Because of this, all but the baz print true.
Here's another experiment:
void main() {
int foo({int a}) {}
int bar({int a, int b}) {}
print(foo is int Function());
print(foo is int Function({int a}));
print(bar is int Function({int a}));
print(bar is int Function({int b}));
print(bar is int Function({int b, int c}));
}
Here we check if functions have a valid subtype when they have a subset of named arguments, all but the last prints true.
For a formal description of function types, see "9.3 Type of a Function" in the Dart language specification.
Mixing and matching parameter signatures are a nice feature but pose some problems when implementing them at a low level, for example:
void main() {
void Function({int a, int c}) foo;
foo = ({int a, int b, int c}) {
print("Hi $a $b $c");
};
foo(a: 1, c: 2);
}
In order for this to work, foo needs some way of knowing the caller provided a and c but not b, this piece of information is called an argument descriptor.
Internally argument descriptors are defined by vm/dart_entry.h. The implementation is just an interface over a regular Array object which the callee provides via the argument descriptor register.
Rather than using Dart's built-in disassembler I'll be using a custom one that provides proper annotations for calls, object pool entries, and other constants.
Disassembly of foo, the caller:
The argument descriptor for the call to bar is the following RawArray:
The descriptor is used in the prologue of the callee to map stack indices to their respective argument slots and verify the proper arguments were received. Here is the disassembly of the callee:
To summarize, it loops the array assigning slots to any matching arguments, throwing a NoSuchMethodError if any are not part of the function type. Also keep in mind argument checking is only required for polymorphic calls, most (including the hello world example) are monomorphic.
This code is generated at a high level in vm/compiler/frontend/prologue_builder.ccPrologueBuilder::BuildOptionalParameterHandling meaning registers and subroutines may be layed out differently depending on the types of arguments and what optimizations it feels like doing.
Integer arithmetic
The num, int, and double classes are special in the Dart type system, for performance reasons they cannot be extended or implemented.
Because of this restriction we never have to check the type of an int before doing arithmetic, if that wasn't the case the compiler would have to generate relatively expensive method calls instead.
All objects in dart are pointers to RawObject however only pointers tagged with kHeapObjectTag are actual heap objects, objects without the tag are signed ints shifted to the left by one.
Because of pointer tagging you will see a lot of tst r0, #1 and similar instructions in generated code, these are for discriminating between smis and heap objects. You will also see a lot of odd-numbered offset loads and stores to subtract the heap flag.
Any integer that can fit within the word size minus one bit (31 bits on A32) can be stored as an smi, otherwise larger integers are stored as 64 bit mint (medium int) instances on the heap.
Smis can contain negative numbers too of course, it uses an arithmetic right shift to sign extend the number back into place.
For example, here is a simple function that adds two ints:
int hello(int x, int y) => x + y;
To start, x and y are each unboxed into pairs of registers, Dart ints are 64 bit so two registers are needed for each arg on A32:
After x and y are in pairs of registers it can perform the actual 64 bit add:
Before returning the result gets re-boxed:
Boxing looks more expensive than it actually since the value will be returned immediately as an smi and only hits the slow code paths when the result is larger than 31 bits.
Instances
The code below creates an instance by calling an allocation stub followed by a call to the constructor:
makeFoo() => Foo<int>();
Disassembled:
Each class has a corresponding allocation stub that allocates and initializes an instance (very similar to how boxing creates an object), these stubs are generated for any classes that can be constructed.
Unfortunately for us, field information is removed from the snapshot so we can't directly get their names. You can however see the names of implicit getter and setter methods (assuming they haven't been inlined).
Offsets for fields are calculated at Class::CalculateFieldOffsets, the rules go as follows:
Start at end of super class, otherwise start at sizeof(RawInstance)
Use the type arguments field of parent, else put it at the start
Lay out remaining (non static) fields sequentially
Because type arguments are shared with the super, instantiating the following class gives us a type arguments field containing <String, int>:
class Foo<T> extends Bar<String> {}
var x = Foo<int>(); // instance type arguments are <String, int>
Whereas if the type arguments are the same for parent and child, the list will only contain <int>:
class Foo<T> extends Bar<T> {}
var x = Foo<int>(); // instance type arguments are <int>
Another fun feature of Dart is that all field access is done via setters and getters, this may sound very slow but in practice dart eliminates a ton of overhead with the following optimizations:
Whole-program static analysis
Inlining calls on known types
Code de-duplication
Inline cache (via ICData)
These optimizations apply to all methods including getters and setters, in the following example the setter is inlined:
class Foo {
int x;
}
Foo bar() => Foo()..x = 42;
Disassembled:
But when we call this setter through an interface:
abstract class Foo {
set x(int x);
}
class FooImpl extends Foo {
int x;
}
void bar(Foo foo) {
foo.x = 42;
}
Disassembled:
Here it invokes an unlinkedCall stub which is a magic bit of code that handles polymorphic method invocation, it will patch its own object pool entry so that further calls are quicker.
I'd love to get into more detail about how this works at runtime but all we need to know is that it invokes the method specified in the RawUnlinkedCall. If you are interested, there is a great article on the internals of DartVM that explains more: https://mrale.ph/dartvm/
Type Checking
Type checking is a fundamental component of polymorphism, dart provides this through the is and as operators.
Both operators do a subtype check with the exception of as allowing null values, here is the is operator in action:
class FooBase {}
class Foo extends FooBase {}
class Bar extends FooBase {}
bool isFoo(FooBase x) => x is Foo;
Disassembled:
Since whole-program analysis determined Foo only has one implementer, it can simply check equality of the class ID, but what if it has a child class?
class Baz extends Foo {}
We now get:
Gah! This code is awful so here is a basic translation:
All it is doing here is checking if the class id falls within a set of ranges, in this case there is only one range to check.
This is definitely a place where DartVM could improve on ARM, it's doing 64 bit smi range checks for 16 bit class ids instead of just comparing it directly.
The range checks also do not take into consideration the super type its comparing from which can cause a range to be split by a type that does not implement the super, perhaps as a result of unsoundness.
Control flow
Dart uses a relatively advanced flow graph, represented as an SSA (Single Static Assignment) intermediate similar to modern compilers like gcc and clang. It can perform many optimizations that change the control flow structure of the program, making reasoning about its generated code a bit harder.
That null check is an example of a "runtime entry" dynamic call, this is the bridge from dart code to subroutines defined in vm/runtime_entry.cc.
In this case it is a specialized entry that throws a Failed assertion: boolean expression must not be null, as you would expect if the condition of an if statement is null.
Whole program optimization (and sound non-nullability in the future) allows this null check to be elided, for example if hello never gets called with a possible null value then it won't do the check at all:
Closures are the implementation of first-class functions under the Function type, you can acquire one by creating an anonymous function or extracting a method.
A simple function hi that returns an anonymous function:
Pretty simple, but what if the lambda depends on a local variable from the parent function?
int Function() hi() {
int i = 123;
return () => ++i;
}
Disassembled:
Instead of storing the variable i in the stack frame like a regular local variable, the function will store it in a RawContext and pass that context to the closure.
When called, the closure can access that variable from the closure argument:
Another way to get a closure is method extraction:
class Potato {
int _foo = 0;
int foo() => _foo++;
}
int Function() extractFoo() => Potato().foo;
When you call get:foo on Potato, Dart will generate that getter method as follows:
get:foo invokes buildMethodExtractor, which eventually returns a RawClosure and stores the receiver (this) in its context and loads that back into r0 when called, just like a regular instance call.
Where the fun starts
With a good starting point to reverse engineer real world applications, the first big Flutter app that comes to mind is Stadia.
So let's take a crack at it, first step is to grab an APK off of apkmirror, in this case version 2.2.289534823:
(I don't recommend downloading apps from third party websites, it's just the easiest way to grab an apk file without a compatible android device)
The important part here is that the version information contains arm64-v8a + armeabi-v7a which are A64 and A32 respectively.
The interesting bits are in the lib folder like libflutter.so which is the flutter engine, and libproduction_android_library.so which is just a renamed libapp.so.
Before being able to do anything with the snapshot we must know the exact version of Dart that was used to build the app, a quick search of libflutter.so in a hex editor gives us a version string:
That c547f5d933 is a commit hash in the Dart SDK which you can view on GitHub: https://github.com/dart-lang/sdk/tree/c547f5d933, after some digging this corresponds to Flutter version v1.13.6 or commit 659dc8129d.
Knowing the exact version of dart is important because it gives you a reference to know how objects are layed out and provides a testbed.
Once decoded, the next step is to search for the root library, in this version of dart it's located at index 66 of the root objects list:
Neat, we can see the package name of this app is chrome.cloudcast.client.mobile.app, which you might notice is not actually valid for a pub package, what's going on here?
The reason for the weird package name is that Google doesn't actually use pub for internal projects and instead uses it's internal Google3 repository. You can occasionally see issues on the Flutter GitHub labelled customer: ... (g3), this is what it refers to.
By extracting uris from every library defined in the app, we can view the complete file structure for every packages it contains.
I picked a random widget to look at, SocialNotificationCard from profile/view/social_notification_card.dart.
The library containing this widget is structured as follows:
The type information on these parameters is missing, but since they are build methods we can assume they all take a BuildContext.
The full disassembly of the _buildPartyIcon method goes as follows:
This one is quite easy to turn back into code by hand since it constructs a single Image widget:
Widget _buildPartyIcon(BuildContext context) {
return Image.asset(
// The `name` parameter is converted into the const AssetImage we
// saw above at compile-time, by the Image.asset constructor.
"assets/social/party_invite.png",
fit: BoxFit.cover,
width: 48,
height: 48,
// All of the other fields were assigned to their default value
);
}
Note that object construction generally happens in 3 parts:
Invoke allocation stub, passing type arguments if needed
Evaluate parameter expressions and assigning them to fields in-order
Calling the constructor body, if any
The initializer list and default parameters seem to be unconditionally inlined to the caller, leading to a bit more noise.
Finally let's disassemble the actual build method of SocialNotificationCard:
There was a bit more GC related code this time, if you are interested these write barriers are required due to the tri-color invariant. Hitting a write barrier is actually pretty rare, so it has minimal impact on performance with the benefit of allowing parallel garbage collection.
A little more tedious to reverse due to the amount of code, but still relatively easy given tools to identify object pool entries and call targets.
Conclusion
This was a super fun project and I thoroughly enjoyed picking apart assembly code. I hope this series inspires others to also learn more about compilers and the internals of Dart.
Can someone steal my app?
Technically was always possible, given enough time and resources.
In practice this is not something you should worry about (yet), we are far off from having a full decompilation suite that allows someone to steal an entire app.
Are my tokens and API keys safe?
Nope!
There will never be a way to fully hide secrets in any client-side application. Note that things like the google_maps_flutter API key is not actually private.
If you are currently using hard coded credentials or tokens for third party apis in your app you should switch to a real backend or cloud function ASAP.
Will obfuscation help?
Yes and no.
Obfuscation will randomize identifier names for things like classes and methods, but it won't prevent us from viewing class structure, library structure, strings, assembly code, etc.
A competent reverse engineer can still look for common patterns like http API layers, state management, and widgets. It is also possible to partially symbolize code that uses publicly available packages, e.g. you can build signatures for functions in package:flutter and correlate them to ones in an obfuscated snapshot.
I generally don't recommend obfuscating Flutter apps because it makes reading error messages harder without doing much for security, you can read more about it here.