Why is User Provisioning So Hard? Doesn’t SCIM Fix It?

In a prior post, I discussed how SAML works and mentioned the challenge of user provisioning. The issue is that although a service does not manage the master user identity, it still needs to have some information about the user. Let’s imagine a multi-tenant SaaS service called cloudcrm.com; it supports SAML for user login (kudos to them for this). But it still needs the user to have an account in cloudcrm.com that contains the user’s name, email, phone number and department name. The classic way this would be done is that the admin of a cloudcrm.com customer would log in via an administrative UI and create and maintain those entries. Everything works, everybody is happy. End of post.

User Provisioning

Not so fast.  Look at it from the administrator’s viewpoint. This is actually really painful. If there are 10 employees needing cloudcrm accounts, then it is doable. A hundred is not good and 10,000 is impossible. The administrator wants it all to happen magically. So all he needs to do is get some nice new software that works with his IDP to do the provisioning automatically. Ideally, he would tell it “all my sales team members get accounts in cloudcrm that look like this.” And presto! We’re done! End of Post.

But not so fast.  Look at it from the IDP developer’s viewpoint (which is where I sit). In order to maintain user information inside cloudcrm.com, there needs to be a programmatic way for me to access it, read it, update it , etc. So, what standards exist for doing this? The two well-known ones are:

  • SPML. This protocol is a little old now, and the industry generally agrees that it missed its chance. It predates large scale adoption of multi-tenant SaaS and hence doesn’t quite fit the current problem at hand.
  • SCIM. This is the current ‘hot’ solution with a lot of activity going on. Its main problem is that it’s so new and in constant change that it’s not widely adopted.

So what’s actually out there in the real world today? Really not a lot. The most common thing to see are proprietary solutions:

  • Microsoft’s Office365 / Azure AD uses powershell cmdlets, which under the hood is using an undocumented API. (I wish I could give you a link.)  They also have their Graph API, which is published but doesn’t have the full feature set needed.
  • Salesforce can be provisioned via the Force API for user management; they are certainly working on SCIM, but it’s not clear to me if you can use SCIM to provision Salesforce users yet.  Let me know in the comments below if you have the answer…
  • Google apps demonstrate nicely the rapidly evolving nature of what going on; their provisioning API is now deprecated and replaced by a new API. Surprisingly the new one is not SCIM.

I apologize to all websites that have gorgeous, standards-based provisioning APIs. I am sure there are many of them, but the point I was making is that many big names don’t do it (yet).

There are a couple of alternatives to using a provision API. The first is Just-In-Time provisioning. In this case the application creates or updates users every time they login. I explained in my SAML post that a SAML token can contain more than just the user name. It can contain arbitrary data. An application can therefore maintain its user database ‘on the fly’ provided that the SAML token contains enough data. In our cloudcrm.com case, the token needs email and department in addition to the user name. This is a very common mechanism. Salesforce, WebEx, Box use it. (Once more, apologies to all other websites that do JIT and are not mentioned.)

Sadly the mechanisms described so far only cover a small percentage of the total number of websites in use. The other approach is (key dramatic music…) screen scraping.

For those of us that have been working in the industry for some time (that includes me), screen scraping is a Bad Thing™. It means having code that pretends to be a user logging on to the application (originally mainframe terminal apps) and reading the screen just like a user and trying to make sense of it and then entering data into various fields just as a user would. It is notoriously fragile. If a screen changes for some reason, the whole thing breaks. This is a similar problem faced by UI test automation tools; the difference is that test is test, this would be for real – creating users, deleting users, setting permissions. The opportunity for chaos is great!

Let’s hope that the industry converges into a good set of standards so we can avoid the chaos…