Categorization API

This document describes the Categorization API and interaction with it.

Working with Categorization API

Purpose of the Categorization API

The Categorization API is designed to provide developers and third-party systems with a quick and easy way to get data from the SafeDNS database of categorized sites. The API is designed for integration with other systems where site category verification is required (filtering systems, advertising systems, etc.). The categorization API uses the standard JSON specification to process requests. The API is not intended to be accessed by end users of the integrated system. The API must be requested from an intermediate server of the integrated system.

The performance of the API is approximately 1k requests per second.

The database of categorized SafeDNS sites currently includes more than 106 million unique domains in more than 60 categories.

In order to increase the processing speed, the data provided by the Categorization API can be cached on the side of the integrated system for a period of no more than 12 hours.


Accessing API

To access API you should use the following host: x.api.safedns.com


Authorization

Requesting x.api.safedns.com you need to use Basic Authorization. A special HTTP Authorization header should be passed in each request. The string <client_id>:<client_secret> passed in the Authorization header is encoded by the base64 method. Herewith, the Basic should be specified as the authorization method.

Header example:

Authorization: Basic Ndc2MDE4N2Q4MWJjNGI3Nzk5NDc2YjQycjUxMDM3MTM6ZjI1YmViZjk5MWZmNDE5ODkzZGIyNTU3MjhlNGUxZGU=

CURL request with authorization:

curl --user <client_id>:<client_secret> https://x.api.safedns.com/domain/www.website.com

Getting a list of site categories

Request:

GET https://x.api.safedns.com/domain/www.website.com

will return an answer in JSON format:

{
  "category": [49, 59], 
  "bad": false, 
  "category_name": ["Computers & Internet", "Business"]
}

Getting a list of URL categories

Request:

GET https://x.api.safedns.com/url/http://www.website.com/path/to?arg=val

will return an answer in JSON format:

{
  "category": [49, 59], 
  "bad": false, 
  "category_name": ["Computers & Internet", "Business"]
}

Getting a list of categories

Request:

GET https://x.api.safedns.com/catgroups

will return an answer in JSON format:

[
  {
    "Illegal Activity": {"65": "Child Sexual Abuse (Arachnid)",
                         "66": "Crypto Mining", "6": "Drugs",
                         "7": "Tasteless", "8": "Academic Fraud",
                         "9": "Parked Domains", "10": "Hate & Discrimination",
                         "11": "Proxies & Anonymizers",
                         "19": "Child Sexual Abuse (IWF)",
                         "31": "German Youth Protection"}
  },
  {
    "Adult Related": {"13": "Adult Sites",
                     "14": "Alcohol & Tobacco",
                     "15": "Dating",
                     "16": "Pornography & Sexuality",
                     "17": "Astrology",
                     "18": "Gambling"}
  },
  {
    "Bandwidth Hogs": {"24": "Photo Sharing",
                       "20": "Torrents & P2P",
                       "21": "File Storage",
                       "22": "Movies & Video",
                       "23": "Music & Radio"}
  },
  {
    "Time Wasters": {"5": "Online Ads",
                    "26": "Chats & Messengers",
                    "27": "Forums", "28": "Games",
                    "29": "Social Networks",
                    "30": "Entertainment"}
  },
  {
    "General Sites": {"32": "Automotive",
                     "33": "Blogs",
                     "34": "Corporate Sites",
                     "35": "E-commerce",
                     "36": "Education",
                     "37": "Finances",
                     "38": "Government",
                     "39": "Health & Fitness",
                     "40": "Humor",
                     "41": "Jobs & Career",
                     "42": "Weapons",
                     "43": "Politics, Society and Law",
                     "44": "News & Media",
                     "45": "Non-profit",
                     "46": "Portals",
                     "47": "Religious",
                     "48": "Search Engines",
                     "49": "Computers & Internet",
                     "50": "Sports",
                     "51": "Science & Technology",
                     "52": "Travel",
                     "53": "Home & Family",
                     "54": "Shopping",
                     "55": "Arts",
                     "56": "Webmail",
                     "57": "Real Estate",
                     "58": "Classifieds",
                     "59": "Business",
                     "60": "Kids",
                     "63": "Trackers & Analytics",
                     "67": "Online Libraries"}
  },
  {
    "Security": {"12": "Botnets",
                "3": "Virus Propagation",
                "4": "Phishing"}
  }
]

Getting usage stats

Method total_count returns a summary of detailed stats for the requested period.

USERNAME and TOKEN are generated separately for the stats service by SafeDNS Manager or Tech Support.

Request:

curl --location --request POST 'https://sdk.safedns.com/stats_x/total_count' \
--header 'Authorization: TOKEN' \
--header 'Content-Type: application/json' \
--data-raw '{
    "user": "USERNAME",
    "start_date": "2023-01-01",
    "end_date": "2023-05-31"
}'

will return an answer in JSON format:

{
"total_requests": 10600,
"billed_requests": 9600,
"categorized_domains": 5600
"nx_domains": 4000,
"unknown_domains": 910,
"bad_requests": 90
}

‌Method detailed_count returns detailed stats for the each day of the requested period.
USERNAME and TOKEN are generated separately for the stats service by SafeDNS Manager or Tech Support.

Request:

curl --location --request POST 'https://sdk.safedns.com/stats_x/detailed_count' \
--header 'Authorization: TOKEN' \
--header 'Content-Type: application/json' \
--data-raw '{
    "user": "USERNAME",
    "start_date": "2023-01-01",
    "end_date": "2023-05-31"
}'

will return an answer in JSON format:

{
"2023-01-01": {
               "total_requests": 10600,
               "billed_requests": 9600,
               "categorized_domains": 5600
               "nx_domains": 4000,
               "unknown_domains": 910,
               "bad_requests": 90
              },
"2023-01-02": {"total_requests": 22600,
               "billed_requests": 21600,
               "categorized_domains": 15000
               "nx_domains": 6600,
               "unknown_domains": 130,
               "bad_requests": 870
              },
"2023-01-02": {...},
}

Responses

If a domain is categorized, API returns code 200, category name and its number.

curl --user <client_id>:<client_secret> https://x.api.safedns.com/domain/safedns.com
StatusCode        : 200
StatusDescription : OK
Content           : {"category": [49], "bad": false, "category_name": ["Computers & Internet"]}

If a domain does not exist, API returns code 206, category 0 and non-existing domain.

curl --user <client_id>:<client_secret> https://x.api.safedns.com/domain/does.not.exist
StatusCode        : 206
StatusDescription : Partial Content
Content           : {"category": [0], "bad": false, "category_name": ["Non-Existing Domain"]}

If a domain is not categorized, API returns code 404 without JSON.

curl : The remote server returned an error: (404) Not Found.

Example of a simple python project

Below you can see an example of a simple python project that allows you to get categories for a list of domains from a file domains.txt. To work with the code, create a text file domains.txt, add the domains there for categorization line by line and save them in one folder with a code file.

To access the API, you must use the following host(line 6): x.api.safedns.com

 
#!/usr/bin/env python3

import requests
from base64 import b64encode

url_src = "https://x.api.safedns.com/domain/"

credentials = b64encode(b"username:password").decode("ascii")  # replace username:password with your credentials

headers = {
    'Authorization': 'Basic %s' % credentials,
    'Content-Type': 'application/json'
}
domain_src = open("domains.txt", "r")
total_time = 0
while True:
    domain = domain_src.readline()
    if not domain:
        print(f'All requests were processed for {total_time} sec')
        print('ENDofFILE')
        break
    url = url_src + domain
    response = requests.get(url=url, headers=headers)
    if response.status_code == 200:
        print(domain, response.json())
        print(f'Request processing time {response.elapsed.total_seconds()} sec')
        total_time = total_time + response.elapsed.total_seconds()
    elif response.status_code == 404:
        print(f"According to our Data Base {domain} doesn't belong to any category.")
        pass
    elif response.status_code == 403:
        print("Wrong username or password, access denied")
        break
    elif response.status_code == 429:
        print("You run out of queries to x.api, wait for 1 minute")
        break

domain_src.close()