The ABC of Zoho AI

During ZohoDay24, Zoho amongst other topics, gave some insight into how the company looks at AI. Raju Vegesna presented Zoho’s AI vision and progress. Additionally, I had the opportunity for a one on one with Zoho’s director of AI research, Ramprakash (Ram) Ramamoorthy. If you want to listen and watch the interview, you can do this here.

Both represented a vision that is refreshingly differentiated from the current hype with everyone and their dog talking like large language models, LLMs, are the everything one needs.

Well, let me tell you: They aren’t. But let me come to this point later.

In addition to not every language model being created equal, and typical for a hype, there is still too much talk about the technology itself, whereas in the words of Raju and Ram the best AI implementation is “when the customer doesn’t know they are using AI but finds value in the output”. This resonates very well with me, as one of my beliefs is that the customer shouldn’t care about the technology that is used to achieve the desired outcome, within some constraints like legality, ethics, and efficiency, of course.

Zoho is a technology vendor with a focus on business applications. So, Zoho quite quickly realized that consumer type AI that e.g., helps with spell checks, or nowadays research, suffers from two fundamental flaws: lacking privacy/security and accuracy when it comes to business applications. Both violate some of Zoho’s core tenets, namely their pursue of privacy and business applications that offer a lot of value to the customer. Take the example of improving one’s writing – for some years now, this is offered by Zoho Writer, instead of making the user copy and paste potentially sensitive information into an external tool with potentially questionable guardrails.

Similarly for the use of generative AI, or AI in general, in a business context. LLMs, as they are offered by vendors, regularly miss the business context. This means that their responses to a prompt are less than accurate – they are hallucinating. AI works best in context, in this instance in a business context – your business’s context. This requires a process called grounding, or RAG – retrieval augmented generation.

Or in plain words: It requires business data and analytics on it.

In Zohos words, this is contextual intelligence with decision intelligence adding on to it. Decision intelligences creates recommendations and/or actions based upon the findings. Raju Vegesna used a revenue and profit timeline to make this point. Business analytics shows a drop in profit. AI is perfectly able to identify this anomaly. But the real questions are why and what to do. The contextual intelligence correlates this drop to other factors, it gives a diagnosis, and the decision intelligence gives a recommendation on what can (or should?) be done, based on the diagnosis.

Which brings us back to the topic of large language models not being a silver bullet. Language models, especially multi modal ones, can increase the accuracy of e.g., the OCR of an expense application significantly. Ram and his team observed that, in the case of a receipt only having a merchant logo instead of its name, the accuracy of identifying the merchant “has gone up from somewhere {around] 75% to 98%” because of using a multi-modal models. Even a small language model already increases the OCR accuracy, e.g, by not confusing the cash given with the sales price. Scale this up. This increase in accuracy translates to a far higher efficiency of the expense process. And now look at the end-to-end expense process. It starts by taking a picture of the receipt and then extracts and infers different information. On submission the expense report is analyzed for anomalies and policy violations. This multi-step process can be efficiently handled by a variety of specialized models, from narrow models (OCR), small language models (text extraction and inference), mid-sized models (anomaly detection) to LLMs (policy violation.

Another example is in the legal field, using electronic contract signature as an example. Again, a multi-step process is followed that leads from phishing detection via a narrow model, possibly translation via a small language model, summarization, and anomaly detection via a medium language model and finally the development of recommendations if the suggested document is not compatible with policies.

Similarly, a warranty process in customer service.

All these AI supported processes have two things in common: the user is oblivious to the use of AI and secondly, the orchestrated interaction of smaller models makes the process more resource efficient without endangering process efficiency.

Ram maintains that “Even though these models are trained using energy intensive CPUs, we are able to run the inference on CPUs, and that that means I’m contributing positively to the enrollment. And again, I’m not passing on the GPU tax to my customer. So, find out models that work for different levels. I mean, it’s okay, three to 5 million model is more than enough to identify the name of the merchant on a receipt. And a 50 billion model is enough to rephrase legal statements given its specialized in that domain, that domain and the context is what is helping us to lower the model size and thereby keep costs in check.”

In my opinion, Zoho is right in saying that AI models are going to be commoditized. Leveraging AI in a business application is already now a table stake.

But what about AI and ethics, especially when it comes to training models? After all, Zoho takes a strong position on privacy.

Ram extends this into the realm of training models. “We strongly believe that your data is your data. And it should work just for you, meaning you cannot use your data to train a third party’s AI and in turn, you get some subsidized software. And that’s not how it should work. Because that’s your business secret, right? That’s your secret sauce. And you don’t want it to be taken over by some model. So, we have set clear policy privacy guidelines, where we have built foundational models that are independent of any of our customer data. And then they are fine tuned to individual customers.” He insists that “if I’m going to use the model that is trained on one company, for their competitor, who has just signed up for CRM, I’m basically selling their data. We don’t do that. So, all of these privacy policies intact, and we also have a strong policy towards making our AI bias free.”

On top of this, Zoho makes sure that there is no PII used for learning. Drift is controlled via regular systems audits using split systems.

Zoho promises to do it right, which is, as Ram says, the promise that every vendor should give and keep.