A recent news story that must have riveted many professionals is that ChatGPT 4, the latest edition of the artificial intelligence tool, has passed major accounting certification exams, averaging a score of 85.1% across all sections of the exams.

It should be pointed out, though, that the majority of these exams are in multiple choice format, taken from a large dataset, and attuned to the internal logic of Western languages. In comparison, when ChatGPT 4 took the equivalent Chinese exam, its score was a much less impressive 44%.

These different outcomes may be rooted in the fact that the Chinese CPA qualification does not grant many passes in the first place, while the complexity of the Chinese language itself may have stunned the artificial intelligence tool. For example, a Chinese school principal’s instruction to students to comply with school uniform rules by not pinning anything to them would be ‘bie bie bie de’, as the three words ‘not’, ‘pin’ and ‘anything else’ all share the same ‘bee-yeah’ pronunciation as ‘bie’.


Mark Ma FCCA is co-founder of Beijing consultancy Brook & Partners and a prolific author

The Sherlock Holmes-style deduction undertaken by auditors may prove too challenging for AI


The exam news aside, what I want to discuss more broadly here is whether AI – now qualified by the textbook standard – could become an auditor through continuous learning.

One of my clients runs a traditional manufacturing company. It’s a solid business, with a focus on making furniture for kindergartens, schools and hotels. Judged by its income statement, the business is sound. However, its cashflow needs to do some catching up, as most of its customers are state-owned or public sector, where settlement tends to be slow.

It’s a situation that would unsettle some auditors. As they see it, long-term receivables – unless proven otherwise – are more likely to end up becoming stale to the point of write-off. Some, more aggressive auditors would swiftly adjust them to reflect the risks.

However, to entrepreneurs like my client, those long-term receivables have a low risk based on the past decade-long experience. As we batted the issue back and forth, the company produced its receivables record for the past three years to prove that those receivables were collected, and in the end the auditor didn’t insist on categorising them as non-performing. That judgment was a result of efficient communications, coupled with reasoning and mutual understanding. Don’t imagine that AI can overcome its own limitations and deliver that kind of performance.


Another example is a company where I conducted due diligence on behalf of a fund. The company specialised in metal coatings for computer screens and its technology made it an industry leader. However, doubts arose on my part following a pre-visit look at its financial statements.

The business was sitting on a huge inventory with low margin, while cashflow was poor and payment histories were problematic. When I paid my visit, there were only a few staff at the factory and the doors of its warehouse were locked. I was told an internal inventory was under way.

When emotions are entirely absent, it is impossible to sense the presence of the exceptional in an audit

Many of my questions to the financial manager went unanswered, the only response being a brief pitying smile, as if I were asking the wrong questions. I therefore asked a colleague to look up the business’s utility bills over the past six months, and sent someone to pull the CCTV footage of their canteen during lunchtime and cross-check against the claimed attendance. I also demanded entry to the warehouse.


To fast-forward to the conclusion, my suspicions were borne out. The unfortunate company was trapped right in the middle of the supply chain, with its upstream suppliers and downstream customers alike in more powerful positions. The raw materials the business needed were rare, so its suppliers demanded payment in full upon delivery, while its computer manufacturing clients tended to have a long payment cycle. With its poor cashflow position, the company was close to shutting down its factory, with management forced to juggle debt, ‘robbing Peter to pay Paul’, and hoping for a miracle.

None of all this really matters here; my point is that the Sherlock-style deduction and practice undertaken by auditors may prove too challenging for ChatGPT.

Some people jokingly compliment colleagues as being ‘machines’ or ‘killers’ in terms of their ability to focus on the facts and ignore emotions. But when emotions are entirely absent, it is impossible to sense the presence of the bizarre or exceptional in the intricacies of an audit. That is something that may turn out to be a killer flaw that prevents AI from reigning supreme in the audit space.