Address Quality

Address Quality Check

The address quality check uses a sequence of predefined test procedures to verify that an e-mail address is formally valid and does actually exist. There are two versions: a fast version and a more precise version. The interfaces of both versions differ only marginally, so we document them together and mention the differences where applicable.

The input address can include encoded Unicode characters.

To call the fast address quality check use this syntax:

Syntax …/svc/2.0/address/quality/<e-mail address>
Example …/svc/2.0/address/quality/foo@bar.com
Parameter An e-mail address as last part of the URL

To call the enhanced address quality check use this one:

Syntax …/svc/2.0/address/quality-n/<e-mail address>
Example …/svc/2.0/address/quality-n/foo@bar.com
Parameter An e-mail address as last part of the URL

The main difference between the two versions is the way they handle temporary errors. Temporary errors are used by mailservers to signal either Greylisting anti-spam measures or real temporary problems, e.g. a high server load.

The normal fast address quality check treats temporary errors as problems that make it impossible to verify the existence of an e-mail address. It will immediately return upon encountering such a problem.

The enhanced address quality check is more thoroughly, and starts a background check for addresses with temporary errors. The background checks repeat the tests according to predefined schedules, to check if the temporary problem caused by Greylisting or other problems goes away. The results of these background checks can be queried by simply repeating the enhanced address quality check. See the documentation for field address for more.

Result

As XML:

<qualityStatus>
    <address>0</address>
    <bounceRisk>0</bounceRisk>
    <checked>1</checked>
    <decoded>foo@xn—blmchen-o2a.de</decoded>
    <domain>1</domain>
    <extSyntax>1</extSyntax>
    <mailserver>0</mailserver>
    <mailserverDiagnosis>0</mailserverDiagnosis>
    <probability>0</probability>
    <syntax>2</syntax>
</qualityStatus>

As JSON:

   {
       "address":0,
       "bounceRisk":0,
       "checked":1,
       "decoded": "foo@xn—blmchen-o2a.de",
       "domain":1,
       "extSyntax":1
       "mailserver":0,
       "mailserverDiagnosis":1,
       "probability":0,
       "syntax":1,
   }

The result contains the results of a sequence of different tests that depend on each other. As soon as one of these fails (result = 0), the subsequent test will not be executed and contain also a 0 result.

We will discuss the results in the order of execution of the respective tests:

syntax

Tests the syntax of the address against the e-mail addressing standards, possible result values are

  • 0: invalid syntax
  • 1: valid syntax
  • 2: probably valid syntax, Unicode problems were solved, see decoded

The test is stricter than the standards because it requires a valid domain name in the address. Localhost addresses and other exotic cases will not be accepted, because it is unlikely that these are e-mail addresses valid for business.

If the syntax result is 0 (invalid) or 2 (probably valid) the structure contains also syntax warnings explaining the problems, see below.

decoded

If the syntax test ended with a result of 2, this field will contain the decoded ASCII address. A syntax test result of 2 means that the address contained Unicode characters (e.g., umlauts, arabic or chinese characters), which are invalid in an e-mail address. These characters were successfully converted and the resulting, valid ASCII address was stored in decoded. Further tests should always use this decoded address.

The decoding during the syntax test is done in two stages:

  1. the local part of the address is checked for German umlauts. If found, they are converted to their usual ASCII counterparts (ü ⟶ ue, ä ⟶ ae, ö ⟶ oe, ß ⟶ ss)
  2. the domain part of the address is transformed to Punycode, according to the standard for international domains
extSyntax

Many e-mail providers have their own rules for valid e-mail addresses of their domains. These ypically include the minimal length of an address, which punctuation characters are allowed etc. The extended syntax check verifies addresses against these rules. Possible results are:

  • 0: invalid syntax for this domain
  • 1: valid syntax for this domain

If the extended syntax check fails, the result structure will include syntax warnings explaining the problem, see below.

syntaxWarnings

If one of the syntax checks fails, the result structure will include one ore more syntaxWarnings elements:

<qualityStatus>
    ...
    <syntaxWarnings>synm002</syntaxWarnings>
    <syntaxWarnings>....
</qualityStatus>

Each element contains a message code, which can be used to identify the problem. See the page syntax warnings for codes and explanations.

domain

The domain test checks the validity of the domain name in the e-mail address. The system checks this be looking at the DNS record for the domain name. Possible result values are:

  • 0: domain name does not exist
  • 1: domain name exists

If the domain name is invalid, the system assumes a typo and tries to find similar domain names, which are offered to the user for selection. These domain names will be returned in the domainScores element, see domainScores.

domainScores

The domainScores element consists of a list of similar sounding domain names ordered by a calculated score:

<qualityStatus>
  ...
  <domainScores>
    <domainScore>
      <domain>teleos-web.de</domain><score>1.0</score>
    </domainScore>
  </domainScores>
</qualityStatus>

The higher the score the higher is the probability that this domain was intended. The system calculates the score by searching for similar domain names that are popular with e-mail marketing users.

mailserver

If the domain exists, the system checks also whether the domain has a mail server defined. also by checking the DNS record. Possible result values are:

  • 0: no mailserver found
  • 1: mailserver found
mailserverDiagnosis

Not all mailserver tell the truth about the existence of an e-mail address, mostly due to anti-spam measures. The mailserverDiagnosis element describes the response behaviour of the domain’s mailservers to SMTP requests:

  • 0: unknown
  • 1: server tells the truth
  • 2: server answers always address exists
  • 3: server answers always address does not exist
  • 4: SMTP requests ended with errors (e.g, network errors, timeout, server errors)
bounceRisk

The aggregated risk that an e-mail to this domain/address will be rejected, bounce:

  • 0: there is a high bounce risk
  • 1: there is a normal bounce risk
probability

This test calculates the probabilty of the domain name. Domain name input in e-mail address forms often results in typos, and this check tries to find such errors. Possible result values are:

  • 0: the domain name has a low probability
  • 1: the domain name has a normal or high probability

If the test ends with a low probability result, thewn the system found more popular domains with similar names. These are returned in a domainScores element.

Unlike the previous tests, a negative result here will not cause a termination. Since the test relies on probabilities the assessment of the results is up to the user.

address

The address is verified by SMTP requests to one of the domain’s mailservers. All address quality check versions support the following results:

  • 0: address does not exist
  • 1: address does exist
  • 2: address not verifiable

The enhanced address quality check provides one more result:

  • -1: encountered temporary error, background check initiated

As mentioned above the enhanced quality check tries to work around temporary errors and will repeat the SMTP test periodically, to see if the error will go away. To query the result of the background checks just repeat the API call until a result other than -1 is returned for the address field.

checked

This field provides further information about the address check. Possible values are:

  • 0: the result in address was taken from the SMTP cache
  • 1: the result in address is the result of a real SMTP check

Functionality

This section describes the execution of the address quality check in detail. The check consist of the following steps:

  1. checking the mailbox (local part) of the address for Unicode characters. Unicode is not allowed in the local part, so this routine only checks for typical, language-specific typos. Currently this includes only German umlauts. If found, they are converted to their usual ASCII counterparts: ü ⟶ ue, ä ⟶ ae, ö ⟶ oe, ß ⟶ ss. If at least one of these conversions happens, the syntax warning synm018 is added to the result, and the decoded ASCII value of the mailbox is stored in decoded.

  2. checking the domain name of the address for Unicode characters. If Unicode is found, the domain is probably an IRI, and the domain name will be transformed according to the Punycode standard (RFC 3942). In this case the syntax warning synm017 is added to the result, and the decoded ASCII value of the domain name is stored in decoded.

  3. if decoded contains a value this address will be used for all subsequent tests, else the original address.

  4. checking the syntax according to the standards. As mentioned above, exotic cases, like localhost addresses or addresses with comments, will be rejected. It expects real addresses usable for e-mail transfer accross domains. If the snytax check fails, the test assumes an input error and result includes a list of similar sounding, popular domain names, taken from the domain_response table, in domainScores.

  5. checking the syntax against the rulebase of extended syntax criteria. These are provider-specific syntax rules that can be changed in the rulebase, without recoding.

  6. checking the domain name. The component looks for a DNS (A) record for the domain. By default the component tries 2 times, with a timeout of 2 seconds each, before giving up. A DNS cache, located near the servers to minimize the error rate, is recommended. In case of failure the mentioned list of similar domain names is returned.

  7. looking for a mailserver. Using the DNS information from the previous step, the component looks for MX entries for the domain. By default the component tries 2 times, with a timeout of 2 seconds each, before giving up. In case of failure the mentioned list of similar domain names is returned.

  8. calculating the bounce risk. The database table domain_response contains an aggregation of previous e-mail transfers for various domains. The component searches this table for the domain name in question. If found the bounce risk is calculated. If there were more than 80% bounces then the bounce risk is considered high, in all other cases it is normal.

  9. calculating the domain name probability. This test also uses the domain_response table to check whether there are domains with similar names having higher address counts. If there are such domains, then an input error is assumed, and the list of similar domains is included in the result.

  10. check the SMTP cache for the domain’s status. The SMTP cache tracks all SMTP test results and tries to find out whether a SMTP test for an address is appropriate. Some mailservers are set up to always answer positively or negatively when asked for a mail addresses. Others might have technical errors or be simply too slow. In all these cases it would be not useful to start a SMTP check. So the address check would be skipped. The SMTP cache contains also lists of exceptions for domains and mailservers that provide fixed answers.

  11. check the SMTP cache for an already existing Greylisting result. This step only occurs in the enhanced quality check. Results of Greylisting background checks are temporarily stored because it could be computationally expensive to repeat the test. By default these results ares stored for 24 hours. If during the storage time one of the stored addresses is checked again, the SMTP results from the cache are used. Although these results are taken from the cache, the checked flag is set to 1 (really checked), because the cache content is the result from a recent SMTP check.

  12. checking the address by contacting the mailserver. If the SMTP cache doesn’t object, a SMTP conversation with one of the mailservers for the domain is initiated using an external SSB-Bot. The bots are randomly selected from the list of available bots in the database table botstate. If no bots are available the address test will return a technical error. The internal timeout for an answer from the SSB-Bot is 20 seconds.

  13. the response behaviour of the domain’s mailservers is diagnosed. If the address check was skipped due to the SMTP cache, the value from the cache will be returned, else the result of the SMTP check will be used.

Each execution of a normal quality check will be documented as a business event (database table business_event) with type 100, including the result. Enhanced quality checks have type 128.