ChatGPT’s trial balance test

Author

Dean Hezekiah FCCA is an accountant who writes on business and professional themes

Artificial intelligence capabilities can deliver significant efficiencies in labour-intensive tasks. Despite some of the negative headlines it has recently garnered, ChatGPT is an immensely capable AI tool when used to support accounting workflows.

I set up two impromptu experiments that mimic real-world tasks in accounting roles. The first tested ChatGPT’s ability to classify lines on a trial balance, assigning a financial statement label to each line item on the list of accounts.

To set up this experiment, I brought a list of trial balance labels into a Google Sheet and installed an add-on that links to ChatGPT’s OpenAI language model. After tinkering with a few GPT functions, I settled for one that took a text query in the first parameter and pointed to the relevant cell in the next. My query was: ‘In one phrase, what is the best IFRS classification for this?’

The AI excelled at reading and classifying trial balance entries accurately

Surprisingly clever

The results were impressive. The AI excelled at reading and classifying trial balance entries accurately. The tool labelled ‘advances’ as ‘financial assets – loans and receivables’, ‘computer software’ as ‘intangible assets’, and ‘dividends payable’ as a ‘current liability’. It even recognised ‘computer software – work in progress (at cost)’, labelling it ‘intangible asset – development costs (under IAS 38)’.

One seemingly trivial drawback was the tool’s inability to assign the word ‘advance’ to a label, especially after correctly assigning a variation of the same word – ‘advances’ – as noted above. The error message read: ‘It depends on the context and nature of the advance. Without further information, it is not possible to determine the best IFRS classification.’

A human would make a judgment based on contextual factors not modelled in the AI

Judging from the location of this entry in the liabilities section of the trial balance, I knew it would have to be an advance received. I updated the name, and the AI model labelled it as a liability.

This speaks to what we can expect the limitation of a model like this to be. In a real-world setting, a human would make a judgment based on contextual factors that may not have been modelled in the AI. In this instance, that’s the location of an entry relative to others, and the nature of the account balance (debit or credit).

Research associate

Continuing my experiment, I went on to ask the AI a question that would require it to source and summarise a lot of information. Working from OpenAI’s publicly accessible website, I asked ChatGPT to ‘summarise the accounting standard IFRS 17’. For comparison, the current standard has more than 30,000 words, so this summary is a good place to start.

I found its summary accurate but overly simplistic. Those familiar with the standard will know just how comprehensive it is. So I asked the tool to ‘provide a longer summary’. This time, its 394-word effort was a much clearer rendering of the guidance.

Finally, I asked the tool to ‘provide a link to this standard on the IFRS site’, and got an accurate URL, pointing me to a page containing the official guidance.

The tool cannot rank information sources by relevance or know their reliability

Although the tool fared reasonably well as a research companion, it helps to appreciate its limitations. These limitations are that it cannot rank its sources by relevance, know how reliable those sources are, nor assess the quality of its points. You therefore still need to have some process for validating the information you get from it.

Lingering challenges

The real challenge is finding innovative ways to embed tools such as ChatGPT into existing workflows. Several factors play into how to negotiate this integration.

Privacy is a major concern. AI tools rely on a feedback loop to improve their responses. But feedback is not always feasible, especially given the sensitive nature of accounting data. Management will want assurances that the AI does not expose the organisation’s data to external parties.

It is difficult to commit to a tool whose workings you don’t fully understand

Another concern is the level of control most professionals like to have over their work. It is difficult to commit to a tool where you don’t know precisely what happens under the hood. Importantly, an understanding of the strengths and weaknesses of the tool allows professionals to assess how dependable it would be in different contexts.

I recently conducted a LinkedIn poll to discover sentiments on AI tools in accounting. I asked accountants if AI such as ChatGPT will change how we do accounting work. Of 785 respondents:

65% said yes, it will, for the better
11% said yes, it will, for worse
24% said no, it will not.

Although skewed towards optimism, these sentiments are varied. Most respondents believe that AI will influence how we do finance work but a significant minority think this influence will be negative.

The most crucial battle of all for this new technology to win may well turn out to be convincing users of its worth.

New improved AB app

AB Direct ezine

ChatGPT’s trial balance test

Reading: ChatGPT’s trial balance test

Author

Surprisingly clever

Research associate

Lingering challenges

Recommended reading

Interview

The changemaker

Careers

European jobs market heats up

In practice

Evolving sustainability assurance

Careers

Compassionate leadership

Advertisement

In this Issue

March 2023 issue

Dubai’s metaverse bid

March 2023 issue

Bitter split