(All testing was performed using WhatsApp for Android v2.20.201.20 and WhatsApp Web)
Introduction
Recently, we have been looking into possible security issues around how WhatsApp parses and displays preview information about hyperlinks. Basically, WhatsApp will parse some basic information from a hyperlink and display it within the body of a chat. Based on our sleuthing, it appears to be parsed from various elements in the original HTML. For Google, it looks like this:


Parsing code
This appears to be parsed from various meta tags within the original site as per the code snippet below. If those are not present, it will use the “title” tag instead. Here is some of the parsing code:


Additional Details on HTML Retrieval
From testing and review of logs, it appears that the actual call to retrieve the site happens on the Android client. There are also some additional interesting points:
- The retrieval is cached on the client
- If WhatsApp Web is used, the retrieval still happens on the mobile phone with the parsed results transferred to the Web version
- If a link is forwarded, posted into a chat or group, there is no additional retrieval that happens. Instead, the parsed preview is transmitted along with the link
Here is the actual snippet of decompiled code doing the retrieval:

Future Areas for Research
We plan to research the actual parsing and retrieval of the HTML with the eye towards trying to see if any of the parsing code can be manipulated to inject content into the client or the Web version. For things like images and videos, there is potential for exploiting the underlying native code.