Dec 22, 2014

Hardening Android (5 of ???)

Alternative title: Let's take a look under the hood step 1.  What is that noise?

So far in this Hardening Android series we've looked at the Nexus 7 world from the view of an average user. In the first episode we looked at the Legal environment. Then we dove into the setup and found some edges we could trim. Next we explored the Settings. After that we talked a bit about how we would measure hardening and formed a goal. And lastly we tweaked some settings that were clearly in line with our goal. All these are details that your average user could have figured out without much more than a basic grasp of the technology and the English language. The question is, does this really mean our Nexus is hardened and we have achieved the goal?


This is where things get challenging for the security practitioner to justify our work and more specifically our time (the total time I put into the just the analysis below was around 8 hours with scripting to make it easier next time). Because the average user doesn't have the technical know-how to get under the covers to discover what is really going on, the information we gather will be particularly enlightening for them—if we can explain it successfully. It will be most enlightening to other technical, but non-security, roles once they see how “noisy” these mobile platforms are compared to the traditional workstations, servers and other devices IT traditionally manages.


This episode will define a lab and process that can be used to identify what is coming out of the Nexus at layers 4 through 7 in the network stack.

The Lab

First an inventory of the lab I started with.
  • FirewallpfSense and an Internet connection so I can run tcpdump to capture all the network traffic coming or going
  • Access Point (AP) Linksys e2500 configured as just a wireless access point
  • Web proxyCharles Proxy running on my laptop. I also will install the charles.crt on the Nexus to unencrypt the SSL traffic.  Charles allows me to see inside the HTTP and HTTPS communications.
  • Proxy configuration on the Nexus – See Advanced tab in the Wifi configuration
  • Packet Analyzer – Wireshark

For the sake of brevity I'm not going to go into how to set all these up. See the links above for references that will get you started as they once did for me.

Gathering the data

Note: If you haven't read anything on Malware analysis, you may want to read up before you continue. I am leveraging similar techniques to study the behavior of the system. The few articles I have read about the privacy implications of these devices and some apps makes them sound a lot like they are leaking information or being monitored the way the bad guys use Malware.  We'll see if critics are right and the traffic is less about telemetry used for supporting the end-user as they go about their business.  At this point in the research we should always keep an open mind or else we might miss something by leaning too far one way or another.

The following are the steps I followed to gather an initial view of the network output of the system.
  1. Connect the already installed and updated Nexus to the Wireless AP with a static IP.
  2. Turn off the Nexus and wait several minutes to let any lingering network sessions expire. – Why?: We want to see all the traffic from boot onward
  3. On the firewall, start tcpdump to capture all traffic: tcpdump -w capture.pcap -vv host <Nexus IP>
  4. Turn on the Nexus and don't log in for one day – Why?: Most IT folks want to interact in some way with their platforms at least once in 24 hours so this will tell us what is “normal” (for a little over 24 hours anyway) without any user interaction.
  5. Log into the Nexus and do nothing. Why?: We want so see what launches as part of our login so we know what parts are refreshed for just-in-time access by the user.
  6. Wait an hour or so. Why?: We want to give the activities that occurred due to login to complete and the sessions to expire so we can be pretty sure we've seen them all and can spot anything that is different due to user interaction.
  7. Log into the Nexus and look for notifications of any activities (e.g. updates) in the upper right corner.
  8. Ctrl-C on the firewall to stop the tcpdump
  9. Transport the capture.pcap to the laptop and open it with Wireshark.

Analysis

Besides the common activities (e.g. DNS, NTP, ARP) I found 3 ports used (80, 443 and 5228) to the following IP's, hostnames and owners.

Target IP addresses and ports


IP
Hostname (Reverse lookup using IP)
Port
IP Owner per WHOIS
173.194.133.179
not found
443
Google Inc.
173.194.133.230
not found
443
Google Inc.
173.194.46.102
ord08s13-in-f6.1e100.net
443
Google Inc.
173.194.46.106
ord08s13-in-f10.1e100.net
443
Google Inc.
173.194.46.106
ord08s13-in-f10.1e100.net
80
Google Inc.
173.194.46.107
ord08s13-in-f11.1e100.net
443
Google Inc.
173.194.46.107
ord08s13-in-f11.1e100.net
80
Google Inc.
173.194.46.108
ord08s13-in-f12.1e100.net
80
Google Inc.
173.194.46.108
ord08s13-in-f12.1e100.net
443
Google Inc.
173.194.46.64
ord08s11-in-f0.1e100.net
443
Google Inc.
173.194.46.65
ord08s11-in-f1.1e100.net
443
Google Inc.
173.194.46.66
ord08s11-in-f2.1e100.net
443
Google Inc.
173.194.46.67
ord08s11-in-f3.1e100.net
443
Google Inc.
173.194.46.68
ord08s11-in-f4.1e100.net
443
Google Inc.
173.194.46.73
ord08s11-in-f9.1e100.net
443
Google Inc.
173.194.46.74
ord08s11-in-f10.1e100.net
80
Google Inc.
173.194.46.74
ord08s11-in-f10.1e100.net
443
Google Inc.
173.194.46.75
ord08s11-in-f11.1e100.net
443
Google Inc.
173.194.46.75
ord08s11-in-f11.1e100.net
80
Google Inc.
173.194.46.76
ord08s11-in-f12.1e100.net
443
Google Inc.
173.194.46.76
ord08s11-in-f12.1e100.net
80
Google Inc.
173.194.46.84
ord08s11-in-f20.1e100.net
443
Google Inc.
54.230.88.32
server-54-230-88-32.ind6.r.cloudfront.net
80
Amazon.com, Inc. Amazon Technologies Inc.
64.233.181.188
not found
5228
Google Inc.
64.233.181.95
not found
443
Google Inc.
64.233.182.95
not found
443
Google Inc.
74.125.142.95
ie-in-f95.1e100.net
443
Google Inc.
74.125.192.100
ib-in-f100.1e100.net
443
Google Inc.
74.125.192.102
ib-in-f102.1e100.net
80
Google Inc.
74.125.192.136
ib-in-f136.1e100.net
443
Google Inc.
74.125.192.138
ib-in-f138.1e100.net
80
Google Inc.
74.125.192.139
ib-in-f139.1e100.net
443
Google Inc.
74.125.192.139
ib-in-f139.1e100.net
80
Google Inc.
74.125.192.94
ib-in-f94.1e100.net
443
Google Inc.
74.125.192.94
ib-in-f94.1e100.net
80
Google Inc.
74.125.192.95
ib-in-f95.1e100.net
443
Google Inc.
74.125.193.100
ig-in-f100.1e100.net
80
Google Inc.
74.125.193.101
ig-in-f101.1e100.net
80
Google Inc.
74.125.193.136
ig-in-f136.1e100.net
443
Google Inc.
74.125.193.188
ig-in-f188.1e100.net
5228
Google Inc.
74.125.193.95
ig-in-f95.1e100.net
443
Google Inc.
74.125.201.95
not found
443
Google Inc.
74.125.207.101
not found
80
Google Inc.
74.125.207.95
not found
443
Google Inc.
74.125.225.0
ord08s12-in-f0.1e100.net
443
Google Inc.
74.125.225.10
ord08s12-in-f10.1e100.net
80
Google Inc.
74.125.225.10
ord08s12-in-f10.1e100.net
443
Google Inc.
74.125.225.11
ord08s12-in-f11.1e100.net
80
Google Inc.
74.125.225.11
ord08s12-in-f11.1e100.net
443
Google Inc.
74.125.225.12
ord08s12-in-f12.1e100.net
443
Google Inc.
74.125.225.12
ord08s12-in-f12.1e100.net
80
Google Inc.
74.125.225.128
ord08s09-in-f0.1e100.net
443
Google Inc.
74.125.225.129
ord08s09-in-f1.1e100.net
443
Google Inc.
74.125.225.134
ord08s09-in-f6.1e100.net
443
Google Inc.
74.125.225.137
ord08s09-in-f9.1e100.net
443
Google Inc.
74.125.225.138
ord08s09-in-f10.1e100.net
80
Google Inc.
74.125.225.138
ord08s09-in-f10.1e100.net
443
Google Inc.
74.125.225.139
ord08s09-in-f11.1e100.net
443
Google Inc.
74.125.225.139
ord08s09-in-f11.1e100.net
80
Google Inc.
74.125.225.14
ord08s12-in-f14.1e100.net
443
Google Inc.
74.125.225.140
ord08s09-in-f12.1e100.net
443
Google Inc.
74.125.225.140
ord08s09-in-f12.1e100.net
80
Google Inc.
74.125.225.142
ord08s09-in-f14.1e100.net
443
Google Inc.
74.125.225.19
ord08s12-in-f19.1e100.net
443
Google Inc.
74.125.225.2
ord08s12-in-f2.1e100.net
443
Google Inc.
74.125.225.25
ord08s12-in-f25.1e100.net
443
Google Inc.
74.125.225.3
ord08s12-in-f3.1e100.net
443
Google Inc.
74.125.225.37
ord08s06-in-f5.1e100.net
443
Google Inc.
74.125.225.4
ord08s12-in-f4.1e100.net
443
Google Inc.
74.125.225.41
ord08s06-in-f9.1e100.net
443
Google Inc.
74.125.225.42
ord08s06-in-f10.1e100.net
443
Google Inc.
74.125.225.42
ord08s06-in-f10.1e100.net
80
Google Inc.
74.125.225.43
ord08s06-in-f11.1e100.net
443
Google Inc.
74.125.225.44
ord08s06-in-f12.1e100.net
443
Google Inc.
74.125.225.44
ord08s06-in-f12.1e100.net
80
Google Inc.
74.125.225.5
ord08s12-in-f5.1e100.net
443
Google Inc.
74.125.225.8
ord08s12-in-f8.1e100.net
443
Google Inc.
74.125.225.9
ord08s12-in-f9.1e100.net
443
Google Inc.
74.125.69.95
not found
443
Google Inc.
74.125.70.95
not found
443
Google Inc.

What is inside the traffic?


Since most of the communications is to ports 80 and 443 I can use Charles Proxy to examine the contents more closely. I did a little digging and found a way to proxy the 5228 traffic but I can't do that with just Charles and the cert.

I configure the WiFi interface on the Nexus to use the identity cert Charles provides and pointed toward my Charles Proxy. Then I booted and let it sit for several hours to see what could be captured. I did this a few times and with one rebuild of the Nexus in between to see what changed and to gather a lot of data for a fuller understanding. Note: During this testing Android 5 came out. I purposefully did not upgrade so I can use the same techniques after the upgrade to discover new features and information exchanges.

Where is it supposed to go?

The hostnames, IPs and ports that were called during the Charles monitoring samples were:


Hostname
IP
Port(s)

216.58.216.192, 74.125.225.99, 74.125.225.68
80
android.clients.google.com
173.194.46.64,
173.194.46.71,
173.194.46.99,
173.194.46.100,
173.194.46.101,
173.194.46.102
216.58.216.64,
216.58.216.96,
216.58.216.192,
74.125.225.3,
74.125.225.6,
74.125.225.8,
74.125.225.32,
74.125.225.38,
74.125.225.46,
74.125.225.67,
74.125.225.69,
74.125.225.96,
74.125.225.128,
74.125.225.131,
74.125.225.133,
74.125.225.135,
74.125.225.136

443
bks0.books.google.com
173.194.46.96
80
bks1.books.google.com
74.125.225.41
80
bks3.books.google.com
173.194.46.96
80
bks6.books.google.com
74.124.225.14
80
bks7.books.google.com
74.125.225.69
80
clients1.google.com
74.125.225.9, 173.194.46.99
80
gllto.glpals.com
54.210.101.153, 54.192.4.89, 54.230.101.50
80
lh3.ggpht.com
173.194.46.108, 173.194.46.76, 74.125.225.74, 74.125.225.11
80, 443
lh4.ggpht.com
74.125.225.108, 74.125.225.12, 74.125.225.11
80, 443
lh5.ggpht.com
173.194.46.76, 74.125.225.74
80, 443
lh5.googleusercontent.com
74.125.225.42
443
lh6.ggpht.com
216.58.216.65, 74.125.225.12, 74.125.225.74
80
play.googleapis.com
64.233.181.95, 74.125.70.95, 74.125.69.95, 74.125.207.95, 74.125.193.95
443
www.google.com
74.125.225.147, 74.125.225.48, 74.125.225.51, 216.58.216.196, 216.58.216.228, 74.125.225.80, 74.125.225.83, 74.125.225.146
443
www.googleadservices.com
74.125.225.90, 173.194.46.121
443
www.googleapis.com
64.233.181.95, 74.125.193.95, 74.125.69.95, 74.125.207.95, 74.125.142.95, 64.233.183.95
443
www.qstatic.com
74.125.225.95, 173.194.46.87
443
www.youtube.com
74.125.225.142
443

What is inside the http/https?

What I found in the traffic was very enlightening.  It wasn't just the system checking for updates to help keep my information safe from hackers, the system stable and to gather information displayed for my choice of purchase or viewing. There was uniquely identifying information about my system and me shared in the process of acquiring content and routinely for purposes only Google knows.   There was also some authentication and access control information to control what my device could see and do.

The following are a few notes and initial thoughts from quick Google searching. I'm not going to dig much deeper into this until I can get a more complete view of the outbound traffic over the ports.
  • The calls are to only a few hostnames which translate to several of the IPs found in the previous test. This is likely due to the need for high capacity services delivered using load balanced services.
  • Three IPs were called directly without any hostname. These were after a few calls to other sites occurred successfully. The requests to each IP were to a folder called “generate_204”, they had no identifying content besides the user agent and the only response was a 204 No Content page with no cookies or other information.
    • I am guessing this is how they validate Internet connectivity with web services but it seems a waste of cycles since it comes after other calls.
    • I wonder how it figures out which IP to use over time? It wasn't sent in any reply to any of the calls before it so I have to assume it was either delivered earlier in the install process, is an IP gathered through DNS but the URL is only called with the IP, it is hard coded in Android somewhere or ??. I'm leaning towards the first possible but the second seems just as plausible given the high traffic nature of the services involved requiring some flexibility that a hard coded list couldn't assure.
  • Android.clients.google.com seems to have a number of functions:
    • Authentication for Google Play Services ( Is it only for this though?) – A page called auth is used to identify my email and oauth token and receives an Auth cookie and Expiration, what appears to be something related to automatic updates (I'm guessing) and what appears to be an auto update/install control value (vs seeking approval every time because it is configured locally?).
    • Configuration reporting – The page called auth is also sent a file identified as protobuffer format. This is streamed via the first call to android.clients.google.com using POST. The request cannot be displayed using the Protobuffer viewer in Charles because it lacks a reference to the desc in the Content-Type field. Since this is being posted to Google, I expect they already know the type and are able to decypher this. Sometime I will have to look around to see if anyone figured out the File Descriptor Set to expose the content for human reading so we can know what's going on here.
    • Checkin – This is another unidentified protobuffer compressed file delivery that happens roughly every 2 hours over the course of a day. This always follows the call to www.googleapis.com/androidabuse so it would seem to be a “phone home” type event. The file provides some information that can be seen as English in the body including hardware, build version, event log status, free storage space, latest system update, battery charge, email and a list of processes and some configurables.
    • Auth/recovery – This provides some details about the device (similar to the Auth) and receives some details that would seem to be related to locating the device or activating a remote wipe feature. This was sent once in the 24 hours sampled but it received a time value and random ms value which I'm guessing is a timer value to activate another follow-up.
    • Auth/reauthsettings – This provides the oauth token. Another value called packageSignature sent here is also sent as a value called cert to the c2dm/register3 page but does not appear to be received from anywhere (a checksum of the system or some package perhaps?). The response to the reauthsettings is what appears to be a password and pin status value with an opportunity to reset the pin if needed. Note: I have created a PIN and the status value returned was CONFIGURABLE so my guess is that this is to activate a function that Google can initiate with a specific response to this call. Maybe this is the unlock process for legal warrant situations or basic support?
    • Gsync – There are several pages here that uses Atom XML format for feeds. They use my email address and the androidid (also delivered during the auth and fdfe calls) as values to identify me. The following are the “services” that appear to be making the calls based on the content:
      • photosync
      • webupdates – appears to be related to google+ updates
      • Google+ events
      • reminders
      • some service called cl that appears to be related to the calendar
      • credential state
      • sj with value like track-update, later playlist-update, radio-station-update – Must be the music app
      • print – appears to be related to book feeds
      • plusupdates
      • games
      • ears
      • cp – Contacts
      • Notes
      • plusupdates
      • writely
    • c2dm/register3 – This happened at 1:30 am and only once in the 26 hours sampled. It sends a few pieces of information that seem to be unique (X-GOOG.USER_AID value is the same as used by the call to get information on the games.google.com server. The response was only a 184 character token that was never used again. Hmm...encryption key or seed maybe?
    • fdfe – based on what the pages under this directory do, I am guessing this is used by the PlayStore.
      • SelfUpdate – This shares a few unique identifiers (e.g. Authorization, device id, logging id). It also adds several interesting lines to the html header. The most intriguing are the Enabled and Unsupported Experiments lines which look like they might be identifying app configurables. The response to this call is very short and hardly meaningful. So this is likely to push some status info to Google but further experimentation is needed to see changes over time and system configuration before I can get clarity.
      • replicateLibrary – This shares the same unique identifiers and Experiments info as above. Interestingly it also shares a 343 character nonce which appears to be sent only once in the sample. The nonce value was different for the two times this feature was called (across the 2 samples). Because of the use of “nonce”, I'm guessing some encryption is involved here. The response includes a signature so I'm guessing the reply is some value that has to be validated as accurate by the tablet using the nonce (and some pre-shared key established during initial setup?).
      • bulkDetails – This is a Protobuf formatted file that is delivered using the stored auth value. It also delivers the same Experiments information in the header, userAgent and other similar details as the SelfUpdate and replicateLibrary. The file contains a list of the applications I have installed. The reply is a large file in x-gzip containing what appear to be the privileges and pictures related to the apps so I suspect this is retrieving the most current version and information to display in the PlayStore when I review any Updates or look around for new apps, music, etc. to buy in the PlayStore.
  • Key conclusions from review of the calls to the android.clients.google.com are:
    • the lh#.ggpht.com and lh#.googleusercontent.com URLs are used for pictures displayed in various apps.
    • The features provided by android.clients.google.com deliver the core communication and control capabilities of the device.
  • The bks#.books.google.com URLs are all delivering images presumably for the book reader.
  • The clients#.google.com URL performs a function identified with the word “ping”. For those who don't know, this is a network troubleshooting tool used to confirm a system is available on the network.
    • The request header provides the brand identifier.
    • This feature occured once after boot and then called again after I logged into the device.
    • The reply includes a value for a variable called crc32 which is used by gzip for compression or obfuscation through encryption. Further investigation would be needed to see if this changes (likely) and to see if I can figure out what feature collects it and where it gets stored.
  • The gllto.glpals.com URL is sent a file called lto.dat. The header references some details containing the sequence wap and a reference to the openmobilealliance.org site. A little browsing taught me that that site is for a group of entities that defined shared standards for communication of mobile data via interoperable technologies. Will need to look to find more about this as there isn't any apparent exchange of person identifiable information but the file is binary with few ASCII details. It must contain some re-usable information though as the max-age for the data in cache is 90 and the service is delivered by loadbalancer from cloudfront.
  • Play.googleapis.com is sent a log in several small chunks. I suspect this is from logcat but I will have to connect to the device via the debug bridge to know for sure. Some details to note:
    • It is an authenticated session so this tells me that Google knows at least some of what is going on in my system.
    • It shares details about my build, hardware and software versions.
    • There are bursts of details from various apps and then repeating (on 1-2 hour cycle) deliveries of logs from GmsHttp2.
    • At 1:45am (about 12 hours after I first booted) a series of logs from differing apps (about 4) was sent.
    • At 3:57 am it settled into a pattern of sending details only for the GmsHttp2 and apps.plus2 apps that occurred roughly every 2 hours.
    • When I logged in the first time since the reboot it sent another burst of logs to apps.plus2, GmsHttp2 and gms.games.background.
  • The www.google.com URL is called by two apps.
    • Google Search (identified by the v value in the query) – This shares my timezone (via ctzn), language preference (via hl value) brand of hardware (via rlz value), model of hardware, and other software and software version information (via the UserAgent). There are also a few values that are interesting and beg for more research.
      • gcc – Maybe this is identifying the preferred language (en) of a compiler?
      • Cookie called SSID – Possibly a conversion of my SSID to assist in locating me or otherwise uniquely identifying me? It doesn't look like a hash or encoding so will be hard to confirm. I'll have to play with this to see if it changes over installs and connected SSIDs.
      • There are a number of short, unique (compared to calls to other sites), but repeating cookie values in this query that are concatenated into a single cookie called PREF. The first value is called ID that suggest some uniquely identifying information or at least session information used to maintain my identity over subsequent events.
    • www.googleapis.com seems to deliver a number of services as you could imagine. There are several folder structures used.
      • There are folders indicating media related connections
        • books – receives my authorization cookie, presumably to retrieve my purchased books.
        • android_video – Gets model and build info but uses a long sequence for authentication called “Bearer.Authentication” which may be used to identify the session but it isn't used anywhere else so it is hard to say whether it is used to uniquely identify me relative to other details sent earlier or after.
        • Youtubei – Uses a key that was communicated to a page under a directory called deviceregistration in the www.googleapis.com space. The page under the youtubei space is called config and tells them about my hardware and android version.
      • The deviceregistration folder contains a file called devices which provides the unique key used for the youtubei call along with my Device ID so they I'm guessing they would be able to know who I am when I watch Youtube videos even if I don't log in but I will have to try logging out first as I'm pretty sure Youtube knows me as my email after registration.
      • The plusi folder contains a number of pages which indicate some exchanges around the experiments, settings, items, people views, user highlights and logs. Some of these are called repeatedly throughout the day.
        • All the calls use the same BearerAuthentication string which is not received from any calls. Further investigation is needed to see if this is created or delivered from google earlier in the lifecycle of the device.
        • The getmobileexperimentsbackground call sends the most information but it is only in short strings of alpha-numeric characters in protobuf format so its meaning isn't clear.
        • There is a value in the Request to the getuseritemsbackground that is is used across multiple calls including logs sent to play.googleapi.com and other Requests from the plusi folder. Based on the lessons learned so far, this suggests that the same app may be involved or that this is some identifier of a version.
        • A long string delivered in the getuseritemsbackground and getuseritemsdeltabackground calls is used in a call to batch in the root of the www.googleapis.com space.
        • A call is made to getappupgradestatus at 1:30am and 1:28pm – This is the second instance of this specific time of day so I will take a closer look at the sequence of events at that time to see if this is the admin's checkin cycle.
      • Plus – The folders under here have “whitelisted” in the name so presumably this has something to do with who I engage with via the Google+, Hangouts and other apps that care about who my Google+ contacts are and who I don't want to talk to.
    • YouTube seems to be the app that uses the googleadservices.com URL. The HTTPS Request identifies a timestamp, OS version and other version information but doesn't appear to include any uniquely identifying information in isolation.
    • Youtube
      • The registerDevice page call uses a serialNumber value that isn't the same as my advertising serial # or the serial # of my device. Will have to see if this changes once I register as another userID to know if this is something related to my device or not.
      • The gen204 page is a really cool way of delivering information about me. Since it uses a GET, I wonder if that means they are parsing logs or if they just use some filter on the http listener. I also have to wonder if those cookies in the 204 Not Found reply are something useful. They don't show up in any other packets but in the capture. Maybe I'll see more once I start using the features.
That was a long one so I think I'll wrap up here for today.

Conclusion

So what did we discover. Except for the NTP call, the device is only sending information to Google owned hosts and IP ranges. The calls range in purpose from telemetry to content gathering. The use of my specifically identifying information appears to be related to sessions involving my unique configurable identity (e.g. email) and the specific device.


Next we'll want to take a look at the traffic that could come inbound without us or the device contacting anyone. Then we can dive into looking through the system from the inside.


No comments:

Post a Comment