How Cert Spotter Parses 255 Million Certificates
When Cert Spotter scans Certificate Transparency logs to find SSL certificates for your domain, it has one job: do not miss certificates. In a perfect world, this would be easy. Cert Spotter is written in Go, and Go's standard library has a delightfully easy-to-use certificate parser that extracts all the pertinent information about a certificate: domain names, public key, expiration date, and so on.
Unfortunately, the real world is not so easy. Although there are standards that say how a certificate should be encoded, the standards are complex and hard-to-follow, and even when they are crystal clear, certificate authorities still find a way to violate them. Consequentially, Certificate Transparency logs are rife with nonconformant certificates. Since Go's certificate parser is strict with standards conformance, it cannot parse many of these certificates. If Cert Spotter used Go's certificate parser, it would miss certificates. Ideally, these certificates would be harmless because TLS clients would reject them also, but in practice many TLS clients have very permissive certificate parsers. Therefore, Cert Spotter needs to be at least as permissive as the most permissive TLS client imaginable.
After ruling out Go's standard parser, we tried using a fork of Go's parser from Google's Certificate Transparency library which treats several of the most common encoding errors as warnings. Unfortunately, this wasn't enough and some certificates still failed to parse. We considered special casing additional errors until every logged certificate would parse, but this seemed brittle and prone to failure under an adversarial model: an attacker who compromised a certificate authority could issue a certificate with an encoding error that Cert Spotter had not yet special-cased.
Instead, we wrote a custom, extra-lenient certificate parser using Go's excellent ASN.1 library. (ASN.1 is the syntax used to represent certificates.) Here are some of the techniques we use to make sure Cert Spotter is able to parse every single certificate ever logged to Certificate Transparency logs - all 255 million of them.
Lazy Parsing
A certificate contains many bits of information, but not all of them are equally important. The most important information for a monitor is the domain name, since the domain name tells us who should know about a certificate. Even if Cert Spotter can't understand the rest of the certificate, we want to at least alert the domain owner so they can take the appropriate action. Therefore, Cert Spotter uses lazy parsing, and parses different parts of the certificate separately and only as necessary, so that a problem in one part of the certificate doesn't affect the ability to parse the rest of the certificate.
Go's ASN.1 library makes lazy parsing easy thanks to its
asn1.RawValue
data type, which allows part of a data structure to remain unparsed.
Cert Spotter uses asn1.RawValue
when parsing the overall certificate structure,
and then separately parses the individual RawValues
.
Below is the definition of the TBSCertificate
struct used by Go's parser followed by the equivalent struct used by Cert Spotter.
Both structs have the same fields, but where Go's parser uses actual types that must all be
parsed successfully for the certificate to parse, Cert Spotter uses asn1.RawValues
.
Go Parser Struct:
type tbsCertificate struct { Raw asn1.RawContent Version int `asn1:"optional,explicit,default:0,tag:0"` SerialNumber *big.Int SignatureAlgorithm pkix.AlgorithmIdentifier Issuer asn1.RawValue Validity validity Subject asn1.RawValue PublicKey publicKeyInfo UniqueId asn1.BitString `asn1:"optional,tag:1"` SubjectUniqueId asn1.BitString `asn1:"optional,tag:2"` Extensions []pkix.Extension `asn1:"optional,explicit,tag:3"` }
Cert Spotter Struct:
type TBSCertificate struct { Raw asn1.RawContent Version int `asn1:"optional,explicit,default:1,tag:0"` SerialNumber asn1.RawValue SignatureAlgorithm asn1.RawValue Issuer asn1.RawValue Validity asn1.RawValue Subject asn1.RawValue PublicKey asn1.RawValue UniqueId asn1.BitString `asn1:"optional,tag:1"` SubjectUniqueId asn1.BitString `asn1:"optional,tag:2"` Extensions []Extension `asn1:"optional,explicit,tag:3"` }
Once the overall structure is parsed, Cert Spotter can parse individual RawValues
by accessing their FullBytes
member like so:
serialNumber := big.NewInt(0) asn1.Unmarshal(tbs.SerialNumber.FullBytes, &serialNumber)
Cert Spotter also uses lazy parsing when parsing the certificate subject.
The subject is a sequence of attributes such as organization name,
country code, and common name. Cert Spotter only cares about the common name,
since it might contain a domain name (this is deprecated, but some TLS clients still
support it so Cert Spotter needs to understand it). Cert Spotter uses asn1.RawValue
for the attribute values and only bothers to parse the RawValue
if the attribute type
is common name.
Lax String Decoding
ASN.1 defines a multitude of string types, any of which might be used to
encode a certificate's common name. Two string types in particular,
PrintableString
and IA5String
, cause problems. PrintableString
is a
subset of ASCII that forbids some characters, such as the asterisk.
CAs routinely include these characters, particularly for wildcard
certificates. IA5String
is ASCII, but CAs routinely include Latin-1
characters. Therefore, Cert Spotter treats both PrintableString
and
IA5String
as Latin-1 and doesn't care if the string contains forbidden
characters.
IP Addresses in DNS Names
Although there are separate Subject Alternative Name (SAN) types for DNS names and IP addresses, Windows used to lack support for IP address SANs and would instead interpret a string-encoded IP address in a DNS SAN as an IP address. Therefore, Cert Spotter checks if a DNS SAN could be interpreted as an IP address, and treats it as one if so.
NUL Bytes in DNS Names
In 2009, Moxie Marlinspike was able to get a certificate
for the DNS name www.paypal.com<NUL>.secureconnection.cc
. Since Marlinspike
owned the domain secureconnection.cc
, he was able to authorize the
certificate's issuance. However, due to the NUL
byte in the middle
of the DNS name, some TLS clients thought the certificate was for
www.paypal.com
instead, allowing Marlinspike to impersonate
PayPal to those clients.
When Cert Spotter sees a DNS name containing a NUL
byte, it
treats the certificate as being valid for not only the
complete DNS name, but also the DNS name before the NUL
byte.
For www.paypal.com<NUL>.secureconnection.cc
, Cert Spotter would alert
the owners of both paypal.com
and secureconnection.cc
.
URLs in DNS Names
Incredibly, certificate authorities have put entire URLs in a certificate's DNS name. Although such DNS names don't work in TLS clients, the owner of the domain in the URL should still know about the certificate, as it may be evidence of a larger attack against the domain. Therefore, Cert Spotter parses URLs, extracts the domain name from them, and notifies the domain owner.
Duplicate Extensions
Certificate authorities have sometimes issued certificates containing more than one extension of the same type. If a certificate includes more than one subject alternative name extension, Cert Spotter extracts domain names from all of them. A naive parser might only use the first or last SAN extension, which would miss critical information.
Ongoing Monitoring
Although Cert Spotter is able to parse all 255 million logged certificates, and has tried to anticipate all types of encoding errors, a new certificate may appear that Cert Spotter cannot parse. SSLMate continuously monitors for unparseable certificates, and should one be discovered (which hasn't happened yet), we'll fix Cert Spotter so it can parse the certificate, and then reprocess the certificate so the domain owners are notified.
Conclusion
When picking a Certificate Transparency monitor to track your domain's SSL certificates, it's important to pick a monitor that won't miss certificates, even those that are very badly encoded. A monitor that uses a standard certificate parser will miss malformed certificates, even though these certificates pose a risk to your infrastructure. Cert Spotter's custom, extra permissive certificate parser is carefully written to catch every certificate, so you'll be notified when you need to be.
Cert Spotter finds SSL certificates issued for your domains, so you won't be caught off-guard by an expiring or unauthorized certificate.