Halo Automatic Ticket Responses Using GPT and Local LLMs
At Advancery, we use a PSA software platform called Halo PSA to do everything from registering new sales leads, quoting customers, generating invoices, running helpdesk tickets to more complex projects, pretty much everything inc the kitchen sink… It’s fair to say we’re big advocates of it. Over time we’ve heavily customised and configured Halo using workflows, automation rules and playbooks to create genuine efficiencies across the organisation. That work has gone far enough that we’re now regularly asked by other organisations to help them integrate and customise both Halo PSA and Halo ITSM in their own environments.
If you’re not already familiar with Halo, it’s well worth a look. It’s a modern, flexible platform that avoids a lot of the legacy thinking you still see in PSA and ITSM tools, and it’s also a British company, which is always good to see. You can find more at halopsa.com and haloitsm.com.
Halo does include a native integration with OpenAI, but in its current form this is primarily designed as an assistive tool for engineers. It generates a suggested response which an engineer can review, edit and then send, a bit like a glorified spell checker. In some situations that’s perfectly acceptable, but it doesn’t go far enough for what we were trying to achieve. It still relies on manual interaction, it doesn’t always sound like the engineer handling the ticket, and it assumes you’re happy for all engineers to have direct access to AI-generated responses. Which we most certainly are not, for many reasons.
In practice, this means:
-
An engineer clicks a button
-
Halo generates a suggested reply
-
The engineer edits it and sends it
That’s perfectly fine in some situations, but it has limitations. You don’t necessarily want:
-
Every engineer having direct access to AI tools
-
AI-generated text being sent verbatim without control
-
Responses that don’t sound like your organisation or technician
-
A manual step where the whole point is speed and efficiency
What we wanted instead was separation. Rather than AI sitting inside the engineer workflow, we wanted an independent agent that could handle certain ticket responses on its own, in a controlled and predictable way.

Aida’s ticket interface.
To achieve this, we designed a stand-alone application called Aida that communicates directly with the Halo Application Programming Interface (API). The application pulls ticket data from Halo, reads the relevant content from the customer’s initial enquiry, and then passes that information to a language model to generate an appropriate response. That response is then written as a draft (initially) for review before being posted back into Halo via the API.
At a high level, the process looks like this:
-
The application pulls ticket data from Halo using the API
-
It reads the customer’s initial enquiry (or the full ticket history)
-
The content is passed to an LLM
-
A response is generated based on predefined context and rules
-
The response is written back into Halo as a draft (or later, posted automatically)
This immediately gives us far more control over how AI is used.
For the language model itself, we’re currently using a local LLM based on Llama 3.2 And OpenAI GPT 4.1 Mini. The former is something we already had experience with, having built local LLM solutions for another business use cases here, so extending this into Halo was a natural step. The application can be easily configured to use a different model depending on requirements around data residency, cost or performance.
Context
A key part of this approach is context. The application includes a configurable contextual layer that defines who the LLM is, who it represents, and how it should communicate. This allows responses to be tailored so they feel natural, consistent and aligned with the organisation’s tone of voice, rather than sounding like generic AI output. Because this context is configurable, it can be refined over time without retraining models or changing anything within Halo itself.

How we tell the LMM their personality and context to help the responses
When processing a ticket, the application can work in two different ways. A selectable option to read only the customer’s initial enquiry and generate a response based on that alone, which is useful for first responses, acknowledgements or information gathering. In other cases it is selectable to read the entire ticket history, including previous engineer responses, and generate a reply that takes the wider conversation into account.
At the moment, all generated responses are not written back into Halo until a technical has checked the response. This keeps a human in the loop and allows a engineer to review or tweak the wording before anything is sent to the customer. It also makes adoption far easier, as the system supports engineers rather than replacing them outright. Once approved, the response is posted back to Halo using the API exactly as if it had been written manually.

Example response to an existing customer ticket, where I had already performed an initial response. Aida is reiterating.
This is currently a proof of concept, but the next step is to test full automation. The plan is to have the application monitor a specific ticket queue and respond automatically to tickets with a particular status. For example, an AI queue could be created where engineers move tickets that require additional information or have a relatively low or generic skill requirement. The AI agent would respond, update the ticket status, and then move the ticket back into the engineer’s queue.
Wider use
Because the application sits outside Halo and utilises APIs, the underlying approach isn’t limited to Halo PSA. The same design could be adapted to other PSA, ITSM, CRM or ERP platforms, provided a suitable API is available.
Ultimately, the goal here is to improve service quality, reduce response times and redistribute workload during unpredictable peak periods. Used in the right way, this kind of automation has the potential to make support teams, including our own, more effective without losing the human element that still matters.
If this is something you think could be useful in your own business, feel free to get in touch for a chat about what might be possible. Every environment is different, and the interesting part is usually in the detail.
Alex Mordey.

