Skip to content

Back to blog

How Cert Spotter Parses 255 Million Certificates

When Cert Spotter scans Certificate Transparency logs to find SSL certificates for your domain, it has one job: do not miss certificates. In a perfect world, this would be easy. Cert Spotter is written in Go, and Go's standard library has a delightfully easy-to-use certificate parser that extracts all the pertinent information about a certificate: domain names, public key, expiration date, and so on.

Unfortunately, the real world is not so easy. Although there are standards that say how a certificate should be encoded, the standards are complex and hard-to-follow, and even when they are crystal clear, certificate authorities still find a way to violate them. Consequentially, Certificate Transparency logs are rife with nonconformant certificates. Since Go's certificate parser is strict with standards conformance, it cannot parse many of these certificates. If Cert Spotter used Go's certificate parser, it would miss certificates. Ideally, these certificates would be harmless because TLS clients would reject them also, but in practice many TLS clients have very permissive certificate parsers. Therefore, Cert Spotter needs to be at least as permissive as the most permissive TLS client imaginable.

After ruling out Go's standard parser, we tried using a fork of Go's parser from Google's Certificate Transparency library which treats several of the most common encoding errors as warnings. Unfortunately, this wasn't enough and some certificates still failed to parse. We considered special casing additional errors until every logged certificate would parse, but this seemed brittle and prone to failure under an adversarial model: an attacker who compromised a certificate authority could issue a certificate with an encoding error that Cert Spotter had not yet special-cased.

Instead, we wrote a custom, extra-lenient certificate parser using Go's excellent ASN.1 library. (ASN.1 is the syntax used to represent certificates.) Here are some of the techniques we use to make sure Cert Spotter is able to parse every single certificate ever logged to Certificate Transparency logs - all 255 million of them.

Lazy Parsing

A certificate contains many bits of information, but not all of them are equally important. The most important information for a monitor is the domain name, since the domain name tells us who should know about a certificate. Even if Cert Spotter can't understand the rest of the certificate, we want to at least alert the domain owner so they can take the appropriate action. Therefore, Cert Spotter uses lazy parsing, and parses different parts of the certificate separately and only as necessary, so that a problem in one part of the certificate doesn't affect the ability to parse the rest of the certificate.

Go's ASN.1 library makes lazy parsing easy thanks to its asn1.RawValue data type, which allows part of a data structure to remain unparsed. Cert Spotter uses asn1.RawValue when parsing the overall certificate structure, and then separately parses the individual RawValues. Below is the definition of the TBSCertificate struct used by Go's parser followed by the equivalent struct used by Cert Spotter. Both structs have the same fields, but where Go's parser uses actual types that must all be parsed successfully for the certificate to parse, Cert Spotter uses asn1.RawValues.

Go Parser Struct:
type tbsCertificate struct {
        Raw                asn1.RawContent
        Version            int `asn1:"optional,explicit,default:0,tag:0"`
        SerialNumber       *big.Int
        SignatureAlgorithm pkix.AlgorithmIdentifier
        Issuer             asn1.RawValue
        Validity           validity
        Subject            asn1.RawValue
        PublicKey          publicKeyInfo
        UniqueId           asn1.BitString   `asn1:"optional,tag:1"`
        SubjectUniqueId    asn1.BitString   `asn1:"optional,tag:2"`
        Extensions         []pkix.Extension `asn1:"optional,explicit,tag:3"`
}
Cert Spotter Struct:
type TBSCertificate struct {
	Raw                asn1.RawContent
	Version            int `asn1:"optional,explicit,default:1,tag:0"`
	SerialNumber       asn1.RawValue
	SignatureAlgorithm asn1.RawValue
	Issuer             asn1.RawValue
	Validity           asn1.RawValue
	Subject            asn1.RawValue
	PublicKey          asn1.RawValue
	UniqueId           asn1.BitString `asn1:"optional,tag:1"`
	SubjectUniqueId    asn1.BitString `asn1:"optional,tag:2"`
	Extensions         []Extension    `asn1:"optional,explicit,tag:3"`
}

Once the overall structure is parsed, Cert Spotter can parse individual RawValues by accessing their FullBytes member like so:

serialNumber := big.NewInt(0)
asn1.Unmarshal(tbs.SerialNumber.FullBytes, &serialNumber)

Cert Spotter also uses lazy parsing when parsing the certificate subject. The subject is a sequence of attributes such as organization name, country code, and common name. Cert Spotter only cares about the common name, since it might contain a domain name (this is deprecated, but some TLS clients still support it so Cert Spotter needs to understand it). Cert Spotter uses asn1.RawValue for the attribute values and only bothers to parse the RawValue if the attribute type is common name.

Lax String Decoding

ASN.1 defines a multitude of string types, any of which might be used to encode a certificate's common name. Two string types in particular, PrintableString and IA5String, cause problems. PrintableString is a subset of ASCII that forbids some characters, such as the asterisk. CAs routinely include these characters, particularly for wildcard certificates. IA5String is ASCII, but CAs routinely include Latin-1 characters. Therefore, Cert Spotter treats both PrintableString and IA5String as Latin-1 and doesn't care if the string contains forbidden characters.

IP Addresses in DNS Names

Although there are separate Subject Alternative Name (SAN) types for DNS names and IP addresses, Windows used to lack support for IP address SANs and would instead interpret a string-encoded IP address in a DNS SAN as an IP address. Therefore, Cert Spotter checks if a DNS SAN could be interpreted as an IP address, and treats it as one if so.

NUL Bytes in DNS Names

In 2009, Moxie Marlinspike was able to get a certificate for the DNS name www.paypal.com<NUL>.secureconnection.cc. Since Marlinspike owned the domain secureconnection.cc, he was able to authorize the certificate's issuance. However, due to the NUL byte in the middle of the DNS name, some TLS clients thought the certificate was for www.paypal.com instead, allowing Marlinspike to impersonate PayPal to those clients.

When Cert Spotter sees a DNS name containing a NUL byte, it treats the certificate as being valid for not only the complete DNS name, but also the DNS name before the NUL byte. For www.paypal.com<NUL>.secureconnection.cc, Cert Spotter would alert the owners of both paypal.com and secureconnection.cc.

URLs in DNS Names

Incredibly, certificate authorities have put entire URLs in a certificate's DNS name. Although such DNS names don't work in TLS clients, the owner of the domain in the URL should still know about the certificate, as it may be evidence of a larger attack against the domain. Therefore, Cert Spotter parses URLs, extracts the domain name from them, and notifies the domain owner.

Duplicate Extensions

Certificate authorities have sometimes issued certificates containing more than one extension of the same type. If a certificate includes more than one subject alternative name extension, Cert Spotter extracts domain names from all of them. A naive parser might only use the first or last SAN extension, which would miss critical information.

Ongoing Monitoring

Although Cert Spotter is able to parse all 255 million logged certificates, and has tried to anticipate all types of encoding errors, a new certificate may appear that Cert Spotter cannot parse. SSLMate continuously monitors for unparseable certificates, and should one be discovered (which hasn't happened yet), we'll fix Cert Spotter so it can parse the certificate, and then reprocess the certificate so the domain owners are notified.

Conclusion

When picking a Certificate Transparency monitor to track your domain's SSL certificates, it's important to pick a monitor that won't miss certificates, even those that are very badly encoded. A monitor that uses a standard certificate parser will miss malformed certificates, even though these certificates pose a risk to your infrastructure. Cert Spotter's custom, extra permissive certificate parser is carefully written to catch every certificate, so you'll be notified when you need to be.

Cert Spotter finds SSL certificates issued for your domains, so you won't be caught off-guard by an expiring or unauthorized certificate.