Provider Directory Workshop Roundup

On April 5th and 6th, the Office of the National Coordinator (ONC) and the Federal Health Architecture (FHA) held a workshop at MITRE in McLean VA on the subject of “Provider Directories”. Attendees included representatives from health plans, state organizations, health providers, and the military. A key takeaway from the two days for me is that there are a myriad of use cases for provider directories throughout the US healthcare ecosystem. The meeting included healthy debate around such issues as who should operate the directory, who should pay for the directory, and which technologies should be employed. What most participants did agree on, however, is that reliable and accurate provider directories represent an important tool for improving US healthcare system operations.

My software demonstration was designed to show how provider data can be updated by a 3rd party application using a RESTFul API, protected by oAuth2, in a FHIR format. Below is a list of the various components along with brief descriptions.

Software Components

oAuth2 and Write API Gateway Server: This server controls the Authorization Server, the Protected Resources (our updated APIs), and the API administration. It is a Django application making use of Django oAuth Toolkit. Running Demo|Source code 

oAuth2 Example Client: This sample client lets a client user login and subsequently call a series of RESTFul APIs to update provider data. It is a Django application making use of python-social-auth. Running Demo|Source code

Public Provider Registry and Read APIs: This application serves up data to humans and machines. The user interface provides simple NPPES search capabilities. NPPES and other types of provider data are available via API and there are numerous APIs available from this server. (I’ll be adding a catalog with documentation soon). During my demonstration I presented a “PECOS API” that reports if a provider participates in Medicare and, if so, for which provider organizations. The URL I demonstrated is here: I plan to make some updates soon to reorganize the data to simplify things. Running Demo | Source code

Provider Data Tools:  This is a set of command line tools and libraries for manipulating provider data. It can break the NPPES data into smaller CSVs and convert the NPPES data into FHIR resource documents or Provider JSON documents. It can also help import those data into a MongoDB database. See the README in the source code for more documentation. Source code 

Select Data Sources


PECOS Provider Enrollment Data

Charlie Ornstein’s Tip Sheet on Medicare Datasets



I often get asked the question, “What is the difference between REST and RESTFul?” First off, let’s keep in mind that many words in technology are used inconsistently or change meaning in specific contexts. Sometimes when people say REST often they actually mean RESTFul. Both terms refer to a software architectural style of Application Programming Interface (API) development.

REST stands for “Representational State Transfer” and was coined in 2000 by Roy Fielding while working on his PhD at UC Irvine. The key notion is that the HTTP verbs, “POST”, “GET”, “PUT”, and “DELETE” can be used like the database operations “Create”, “Read”, “Update”, and “Delete”. These database operations are often referred to as CRUD. So in pure REST, there is a one to one match as outlined below.

  • POST: Create
  • GET: Read
  • PUT: Update
  • DELETE: Delete

RESTFul is a slightly more relaxed version of the same general software architectural style. For example, we might choose not to use PUT or DELETE at all and instead use POST or GET in some manner to achieve the same update and delete functions. Many APIs out there work in this way since it’s often simpler and more practical.

REST/RESTFul approaches are not new, but they are an increasingly popular pattern for exposing machine-to-machine interfaces, i.e. APIs. REST/RESTful approaches are popular in part because they are technology agnostic in the sense that usually they can be implemented in any programming language and on a variety of HTTP (Web) servers. Also, REST/RESTFul servers and clients are easier to implement than SOAP architectures since, unlike the latter, they don’t require WebLogic, WebSphere, or similar heavy (and often expensive) software tools to implement.


The Direct Certificate Discovery RESTFul API

As part of my HHS Fellowship, I was asked by more than one stakeholder to perform some level of validation on the Direct email address. I’ve written before on how I think Direct certificates should be discoverable in HTTP as well. I didn’t get much traction on that proposal. To this end, I’ve built the next best thing; a RESTFul API that both fetches and reads x509 Certificate information via LDAP and DNS and returns this information over HTTP. In the NPPES Write API Alpha, this mechanism is used as the “gatekeeper” for Direct address inclusion in health provider records. It prevents addresses that are not backed by a discoverable certificate from being accepted. For example, “” would not be accepted, but “” would be accepted. You can check out a live demo at If you would like to install it on your own site, start by typing the following into a terminal window.

pip install django-direct

The source code can be found here on GitHub. Note there are a number of dependencies to the underlying getdc library. Please check out the getdc documentation too.


Return from Sabbatical

Alan Viars, founder and president of Videntity, has completed his appointment as an HHS External Entrepreneur at the Centers for Medicare and Medicaid Services (CMS) and has returned to Videntity full time.


President’s Sabbatical

Alan Viars, founder and president of Videntity, is on sabbatical from Videntity until Spring 2015 during his appointment as an HHS External Entrepreneur at the Centers for Medicare and Medicaid Services (CMS). He is working on an open-source redesign and modernization of the National Plan and Provider Enumeration System (NPPES). NPPES is best known as the system that issues National Provider Identifiers (NPIs). More information about this effort cat be found at


Certificate Authority Management System for Direct

The management software that runs the certificate authority (CA) at, or more precisely, is now open source. “vcert” is a web-based application written in Django which relies on OpenSSL for certificate creation and management. is still free to use, but if you need operate your own CA or a registration authority (RA), then this system provides a simple web-based interface for doing so. This software was originally designed to facilitate testing of the Direct Project.

The source code can be found here on GitHub. See the README for more information on installation and operation.


Converting Stata files to CSV using R

A simple recipe for converting Stata data into CSV. CSV stands for comma separated valuew. Common uses for this recipe is when you want to move information from Stata into a spreadsheet or another database. In this example, we assume in Stata input file name is called “MyStata.dta” and the resulting CSV file name is “MyStata.csv”.
Here is the process.

Install R (if its not already installed)

$ sudo apt-get install r-base-core

Now run R from the folder where your Stata (.dta) files lives.

  $ R

You should see something like this:

R version 2.14.1 (2011-12-22)
Copyright (C) 2011 The R Foundation for Statistical Computing
ISBN 3-900051-07-0
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

  Natural language support but running in an English locale

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.


Import the foreign library

> library(foreign)

Import the Stata data

> MyData <- read.dta("MyStata.dta")

Now write the information out to a CSV file using write.csv()

write.csv(MyData, file = "MyStata.csv")

Quit R

> q()

Now you have your CSV file.


Provider Directory Proposal (NPPES Redux)

A couple of weeks ago, I attended ONC’s Direct Bootcamp in Crystal City, VA. A hot topic at the two-day conference was the notion of a “Provider Directory” that incorporates Direct email addresses.

I also read that HHS/CMS intends to revamp the National Plan and Provider Enumeration System (NPPES). This is the system that manages National Provider Identifiers or (NPIs). Every individual provider and provider organization has one of these numbers, sort of like a tax ID for providers. A common complaint I hear is that it contains information that is often out of date and/or incorrect.

So what, you might ask, does the NPPES have to do with the Direct Project? Having worked with the NPPES data and having some background with Direct, the idea of “killing two birds with one stone” has captured my imagination. (Nerdy and wonky I know.) This is an opportunity for government efficiency by consolidating systems. Efficiency can only be achieved if the new system is simple, however. Too often in health information technology, consultants and vendors introduce complexity for complexity’s sake. After all, complexity is good for the bottom line for many companies because it means more billable hours and more services sold. Sadly, I see this sort of thing all the time. As an American and a taxpayer it ticks me off.(See footnote)

To illustrate what I mean by “simple”, I’ve built a prototype web service application that illustrates my vision of a combined NPPES and Direct email Provider Directory. Before I outline that technical proposal, however, I’d like to point out how adding some other data fields to NPPES could result in a an empowering service for patients, providers, and payers.

Adding Other Data Points to NPPES

While a full NPPES would involve adding more fields and require pagination, which I have not included here, items that I think contain more fields and would require pagination that is not illustrated here, items that I think would be most useful to add include:

1. Diagnosis and Procedure Codes such as ICD9, ICD10, and CPT codes that are typically provided by the provider.

2.Payer and health plan identifiers to make it easy easy to search and determine if a provider takes a particular insurance plan, for example.

3.State Medical Board License information to determine in which state(2) the provider is licensed.(An even better idea is for our nation to adopt a national medical board and do away with state medial boards altogether.)

An NPPES Web Service Proposal

This is my proposal for a simple NPPES web service in a RESTFul style. I have provided examples and defined some basic HTTP request and response expectations.

Here is an example of a simple search query looking for all doctors named “Fred Smith” that have a practice within the zip code 20004. We simply use the any web client to query the database. In this example we return the data in JSON, but we could also just as easily return it in XML, CSV, or HTML.

 GET http://localhost/nppes/example.json?first_name=Fred&last_name=Smith&zip=21223

If any results are found, we get back an HTTP 200 response and a JSON file containing a non-empty list of results.


    "message": "OK",
    "num_results": 1,
    "results": [
            "first_name": "Fred",
            "last_name": "Smith",
            "npi": "23456789",
            "address_1": "901 Pennsylvania Ave",
            "address_2": "",
            "city": "Washington",
            "state": "DC",
            "zip": "20004",
            "telephone": "202-555-5555",
            "fax": "202-777-7777",
            "provider_type": "",
            "regular_email": "",
            "direct_emails": [
                    "organization": "Hope Hospital",
                    "npi": "3453456985",
                    "email": ""
                    "organization": "Fred Smih MD",
                    "npi": "23456789",
                    "email": ""


Note that within the results is a field called “direct_emails”. We assume each provider could have many Direct addresses, for example, if he or she works at multiple organizations. This field maps all other NPIs and Direct addresses together.

We can also query by a Direct address…..

GET http://localhost/nppes/example.json?

…and we can also query by an NPI…..

GET http://localhost/nppes/example.json?npi=23456789

The above two example returns the same result as before. So we can query by name, address, provider type, etc. and we can also query just by a Direct email address or an NPI.

For cohesiveness, I’d like to outline what things look like when no results are returned (i.e. an unhappy path). If no results are found or some sort of error occurs, then the NPPES web service responds with something other than an HTTP 200 status. Here are two unhappy examples; A valid query returning no results, and an invalid query.

No Results


GET http://localhost/nppes/search/?first_name=Fred&last_name=Appleseed


    "message": "No search result matched your query.",
    "results": [ ]

The HTTP response code is 404.

Invalid Query


GET http://localhost/nppes/search/?foo=bar



    "message": "You supplied an invalid search parameter: foo",
    "results": [ ]


The HTTP response code is 400.

Next Steps

This blog post is an open letter to HHS/CMS on how to construct a new NPPES without the complexity that often accompanies health IT. Comments and feedback are welcome.

Resources and Background NPPES

Currently the NPPES data is made available as a comma separated value (CSV) file. The field headers/names are in a separate CSV file. This URL is not on a .gov domain and is somewhat hard to find. I’ve published a link to it here. Thanks Fred Trotter. The sign up and update is not an electronic process. Here is a link to the PDF sign-up/update form. Almost certainly any NPPES modernization effort would involve making this an online process.

Footnote: John Stewart made this point well in his segment “The Red Tape Diaries – Veterans Benefits” (VIDEO). At Videntity, we subscribe to the mantra that building things as simply and as efficiently as possible is always the best design choice, even if it means a smaller contract.


RESTFul Direct Certificate Discovery

Direct is a health information exchange framework based on Public Key Infrastructure (PKI) and email. Public x509 certificates must be discoverable for Direct to work properly. Currently Direct uses both DNS and LDAP for this purpose. This blog post outlines a proposal for a new RESTful method of Direct certificate discovery.

The Direct Applicability statement already provides two methods to discover certificates: DNS, and LDAP. Since DNS and LDAP are easy enough to query, why do we need a third method? In a word, simplicity. If the use of a technology is going to be mandated by the government, as in the case of Direct via Meaningful Use, shouldn’t it be simple, inexpensive, and as friction-less as possible? We’re living in a web world so why not leverage this commodity. I’ll be the first to admit that it can be hard to reduce complexity by adding requirements. We should only add requirements if they are needed. I’d like outline some of the issues with the DNS and LDAP approaches.

Issues Serving Direct Certificates via DNS and LDAP

1. DNS does not work on many large networks. On many networks including Comcast, Time Warner, and others, looking up a Direct certificate via DNS is not possible because these networks do not allow large DNS lookups. The certificates are too large and are blocked on these networks.

2. No large hosting provider (including Amazon, GoDaddy, Yahoo, DreamHost, Google, and others) provide support for certificate “CERT-type” records. Hosting your own DNS server is complicated and burdensome. Most organizations leave this to their hosting provider, however, if you want to serve certificates for Direct via DNS you have no choice but to host your own DNS server or contract out to a third party service.

3. Some security-minded people may frown on the idea of anonymous access to an LDAP server, especially if it also contains other resources with restricted access. In addition, LDAP is also and burdensome to setup under the Linux/Unix.

The first item in the list is going to cause a lot of problems. See this for yourself using the command-line tool “dig”. (You could also use “nslookup”). Try typing the following:

> dig CERT

Depending on your network, you may or may not get a response. For example, on Time Warner or Comcast you might get this instead of the certificate.

;; Truncated, retrying in TCP mode.

; <<>> DiG 9.8.1-P1 <<>> CERT
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: REFUSED, id: 3004
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0


;; Query time: 1 msec
;; WHEN: Thu Aug 22 13:45:31 2013
;; MSG SIZE  rcvd: 43

If you try the same command on another network (from an Amazon EC2 instance for example), you will get the certificate. DNS is working as expected but our network is blocking it. Since many providers throughout the United States use Time Warner, Comcast, and others as Internet Service Providers (ISPs), this presents a problem. If an organization wants to host Direct and lookup certificates via DNS, then their network could not.

It may require someone, such as the US government, to require all major ISPs to change their network configuration to support Direct (i.e. large DNS responses).

I assume that is possible and has already happened in many cases, but as I said before its a web world. There’s got to be a better way! Below is my proposal for serving Direct certificates in a simple RESTFul fashion. I call it RCD for “RESTFul Certificate Discovery”. Here are the details.

RESTFul Certificate Discovery

RCD Syntax


                              |-- Email or Domain part --|

RCD Protocol Specification

RCD works by ensuring the URL is predictable. The RCD service MUST operate on a sub-domain named “rcd” on “your-domain”. The “your-domain” in the email/domain part MUST match in the email part. This ensures the system remains distributed among organizations just like DNS. The RCD service ALWAYS returns the public certificate in “.pem” (Privacy Enhanced Mail) format and ALWAYS with an HTTP status code of 200. If the certificate is not found or does not exist then RCD response SHALL ALWAYS return some HTTP response code other than 200 (e.g. 404, 4xx, 5xx). RCD SHALL ALWAYS operate over HTTP on port 80 or port 443 for HTTPS. A RCD client MUST ALWAYS check the HTTPS URL first and then the corresponding HTTP URL. How certificates get into RCD is explicitly left undefined and is left to the implementer.

RCD Example:

RCD is just an HTTP GET request to the URL “”

Here is another domain-bound example.

The RCD service responds with the .pem formatted certificate file or a non-200 HTTP status (404, 400 etc.).

That’s it. That’s the entire specification. New approaches like RCD are specifically allowed in the current Direct spec:”Direct Project solutions MAY obtain digital certificates through some other out-of-band and thus manual means”.

Discover a Certificate via RCD

What’s this look like in action?. We will use a web client called “wget” that is already installed on Mac and most Linux flavors. You could also use a web browser, curl, or any number of web clients.

> wget
> cat
...removed for brevity...

We use “wget” to fetch the certificate at the predictable URL. Then we use “cat” to display the contents of the file that was downloaded.

Hosting RCD

So we see how simple RCD is to query, but what about hosting? This is really the best part to this approach – it really couldn’t be easier. Here’s why:

1.RCD can be implemented with just about any web server (Apache, Microsoft IIS, NGINX, etc).

2. RCD is simple enough to be implemented within a content delivery network (CDN) such as Amazon S3 or Rackspace Cloud Files. Take Amazon S3 for example. S3 Storage is inexpensive and redundant. What this means is that even a fairly hefty deployment would likely only run pennies a month or few dollars per year. Setup is simple because all you need to do is place your files in a sub-directory called “read” and you’re done. No code is necessary to implement this protocol. Its just a static file at a predictable URL.

3.RCD does not break on some networks as described above in “Issues serving Direct certificates via DNS and LDAP”.

Technical Note:To make RCD work with S3, you need to setup a “vanity URL” on the S3 bucket and create a CNAME within your DNS provider’s (e.g. GoDaddy, Yahoo, etc.) configuration. You will need the authority to make changes to DNS for the domain .

Drawbacks, Conclusions, and Next Steps

The key drawback to this approach is that it requires existing Direct implementations to add the functionality to query for certificates in this way. I’d estimate this would require about 50 lines of code in Java and would be very straightforward. Another drawback is that there isn’t a standard RFC to point to when it comes to REST/RESTFul approaches. That being said RESTful approaches are usually much easier to use and this is why they have largely supplanted more complicated SOAP services.

In my humble opinion, the juice is worth the squeeze. RCD will result in a solution that is orders of magnitude simpler and more cost effective than current options. I am writing this blog post as an open letter to the Direct community hoping that this approach will receive consideration. Thoughts? Opinions? Suggestions?



How to Serve Public Certificates with BIND for the Direct Project

Serving public certificates via DNS is quite an obscure endeavor. This is made quite evident by the total lack of documentation on how to serve CERT type records using BIND.The Direct Project requires that public certificates are discoverable via DNS and LDAP. BIND is by far the most widely used nameserver on the Internet and so this article describes how to get BIND to serve certificates for a Direct implementation.

Adding the certificate to BIND’s zone file is the tricky bit. You must do two things to the certificate so that BIND will serve it correctly. 1.) Calculate the “key tag” and 2.) format the public key correctly. It is not just the contents of a .pem file. The the certs’s zone file entry must be all on one line.

To make this task easier we have created a free open-source command line utility called “BIND Certificate Converter” or just “bcc” for short. bcc accepts two command line parameters; 1.) a host name and 2.) a certificate file (in .pem format). bcc outputs a suitable BIND zone file entry. You can redirect the output of this utility to append to your zone file. We will illustrate this with an example bewlow. In this example, we assume you are using BIND 9 on Ubuntu. The utility BIND Certificate Converter “bcc” also requires that dnspython and openssl are installed. Let start by making sure we have everything we need installed.

sudo apt-get install python-dnspython openssl bind9

Copy the file “bcc” into “/usr/bin” and set it to executable. The source code is at the bottom of this blog entry.

sudo cp bcc /usr/bin
sudo chmod 755 /usr/bin/bcc

We have now installed BIND, bcc, and the other prerequisites. You will now have the directory /etc/bind. Lets change “cd” into that directory. The file we want to change for localhost is “/etc/bind/db.local”. We are assuming you have a file called “” sitting in your $HOME directory (/home/ubunutu/

Now we can give our utility bbc (source code below) a dry run before outputting anything.

bcc examplehost

Let’s make things easier and just switch to a root shell.

sudo su -

Now lets write the output of bcc to the end of “/etc/bind/db.local”

cd /etc/bind
bcc examplehost /home/ubuntu/ >> /etc/bind/db.local

Now we need to restart BIND for our change to take effect.

sudo /etc/init.d/bind9 restart

Now we can test it with “nslookup”. Execute the following command:

nslookup -type=CERT myexamplehost.localhost localhost

The BIND server responds:

Server:		localhost

myexamplehost.localhost	cert = PKIX 12437 RSASHA1 MIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKCAQEAzXVb5YJHmV7a ljGA2KHXPgJ3KmGkMCF9AIHOGsnNN1CuuN4gCRdPVhySVsyEjWLj2xby 6i7yc8Zau2AutrFEFBibXw1YQZvbzabxpG0zZV3tG88t+03OH2VJsK2t 5adxY8wufuY353NwiCMhLtsnRMMym9BbLqQWt3v1P+s9zqq1bLQYQYJC ZexUVhBnjEEVL5oschErtoahpRlmhE1LxtmxKr75mv8RfZV17Pbn7JbP Jk36wpFKpT9SGJWC27eqUFtorOOkH6Kr+j/fGs1GWKgXjMZpeADC14Yh KrDeJtpUL3zzUtsLN9nP/MbcCzHnwdRd4Sb+5V0K1S3R/vtrDQIDAQAB

It works (hopefully)! Here is the source code to bcc.

#!/usr/bin/env python
# -*- coding: utf-8 -*-
# vim: ai ts=4 sts=4 et sw=4
# Copyright 2013 Videntity
# Freely reuse under the terms of
# Last Updated: August 18 , 2013

# BIND Certificate Convert - Read in a host name and PEM certificate and write out the corresponding line of BIND's zone file.

import os, sys, hashlib
import subprocess
import dns.rdata
import dns.dnssec
import dns.rdataclass
import dns.rdatatype
import dns.rdtypes.ANY.DNSKEY

def find_between( s, first, last ):
        start = s.index( first ) + len( first )
        end = s.index( last, start )
        return s[start:end]
    except ValueError:
        return ""
def bind_cert_convert(hostname, pemcert):
    output,error = subprocess.Popen(["openssl", "x509", "-in",
                                      pemcert, "-pubkey","-noout",],
                             stderr= subprocess.PIPE
    clean_pk = find_between(output,
                            "-----BEGIN PUBLIC KEY-----",
                            "-----END PUBLIC KEY-----")
    clean_pk = clean_pk.replace("\n", "")
    decoded_clean_pk = clean_pk.decode('base64', strict)
    dnskey = dns.rdtypes.ANY.DNSKEY.DNSKEY(dns.rdataclass.IN,
                                           dns.rdatatype.DNSKEY, 0, 0,
    bind_entry = "%s\tIN\tCERT\tPKIX %s RSASHA1 %s" % (hostname,
    print bind_entry
if __name__ == "__main__":
    if len(sys.argv)!=3:
        print "Usage: [HOST_NAME] [PEM_CERT_FILENAME]"
    bind_cert_convert(sys.argv[1], sys.argv[2])

So now there is documentation! I’d like to give a special thank you to Bob Halley, author of the “dnspython” library for helping me figure this out.