AI Language Models and Business Processes: The Building Blocks

Apr 17

I've had the fortune of a diverse career. Throughout, I've been attracted to transformative activities to fundamentally improve the way that people work and interact inside and across companies. I love the challenge of finding and developing a common capability that can be harnessed in many diverse ways. For better or worse, I’m also a “from the bottom up” thinker, and so when I’m faced with a new capability I crave an understanding of the working details. I’ve become enthralled with the prospects of applying language models to business processes and prompt engineering and fine tuning existing language models(lm’s) to support more complex and reasoning-oriented business activities. Prompt engineering is a hot topic in data analytics and technical circles, but is less understood by the general business community. I’ll try to break down the basis for my excitement in terms of the moving parts and their potential impact on day-to-day work activities..

This area is evolving quickly. New capabilities emerge weekly, and things change daily. It is fueled by the rapid advances in language models on many dimensions. I’ll focus on my experience with the large language models, informed largely by OpenAI/ChatGPT, Google/Gemini, and Anthropic/Claude, but there is as much or more innovation occurring in other dimensions, with small and medium models, open source, and local models rapidly developing in terms of capabilities. These advancements will further fuel adoption as they add options that are smaller/cheaper/faster, more specific, more local, more private, and more controlled to the list of commercial options. I’m thankful that there is a very cooperative environment among “prompt engineering” practitioners, and that’s extremely helpful in this dynamic time.

The essence of prompt engineering is providing instructions to get an lm to do what you want it to do. Results are an agent that responds in a certain voice, in a particular format, and utilizing specific data. In the process context, I’m creating processes, powered by agents, each with a different objective. This reminds me of early days developing standards for communication of processes between organizations. A supplier or transportation company was “an agent” if you will, and you agreed on what and how you would communicate. The difference, an important one I think, is that automating complex processes in the past required extensive, cumbersome rules and agreements on every dimension of the process. This made them brittle and they often failed under their own weight.

By contrast, with AI agents, configuration is more intelligent. To manage coordination of orders between companies, for instance, you provide instructions something like: “You are a helpful logistics assistant providing information to companies about current orders and future orders. Only answer these questions. You have access to three tools: validate_customer, check_order, and check_inventory.” That’s it, your done. Responses in French, Italian, Portuguese, in sonnet form, your choice. There’s a little work to create the tools, but hours to days, not weeks. So, conclusion #1: lm’s and prompting technology hold the promise to dramatically improve processes while simultaneously reducing complexity and increasing flexibility.

The next question for me is: are lm’s capable and consistent enough to support complex business processes? This is a composite question and I break it down to the following: First, what’s a reasonable structure for building processes that are powered by lm’s? Second, what capabilities are required to support this?, And third, how consistent are lm’s at providing these capabilities?

Structure: The structure is responsible for the overall flow of the experience to deliver the outcomes or objectives of the process and to define the interactions with the lm’s. The core requirement is define a set of behaviors for interaction with the lm, and the second is to provide the logic to route and control these behaviors to create a process.

***A Process Example:The Administrative Assistant Process***

Here’s the language/structures I use. Agents define specific inputs to a call to a model. Processes define a group of agents and the means for controlling or routing which questions are sent to which agents. Finally, when a user launches a process, they create a conversation. Agents can be configured to be powered by OpenAi, Anthropic, and Google/Gemini models. Other models can be added either to an existing processing class or through a new one. Most of the model configurations I utilize are of the chat variety (as opposed to text), meaning that with each interaction, a history of the chat is fed to the model. I have not adopted agent containers (OpenAI assistants, AgentExecutors, etc.) provided by lm’s because I deemed that I would lose control and add complexity trying to harmonize this capability across models. However, for a more specific investigation or solution, these may prove very helpful. I dove in and created a no code platform that allows me to configure agents and processes, and to invoke conversations. It’s driven by four classes. The platform is partially built on LangChain, an excellent lm management toolkit built and led by Harrison Chase.

Capabilities Required: Here’s my distillation of the important capabilities required:

The ability to classify: No surprise, again, that classification is a strength of language models, but being able to classify questions or requests with a high degree of accuracy is critical to scaling capabilities and content. This is because when content is added, the potential for the lm to provide incorrect responses increases. For instance, while you may be tempted to provide all corporate data to a single agent and then ask it all manner off questions, you’ll get more consistent responses by classifying questions and sending them to more task-oriented agents with access to specific information.

The ability to summarize, assess sentiment, and perform other “language math”: No surprise, language models are excellent at language analysis and manipulation. These capabilities are extremely helpful in both laying the groundwork for robust conversational flow and in dynamically routing based on interests. I can see these process working in the background of any technical experience to provide subtle guidance.

The ability to invoke tools: You can arm lm’s with tools by defining a tool with a description and a format for the response. The tool can be thought of as a request by the lm to invoke the action required. For instance, an email tool would be defined with a description of “use this tool to send emails”, and a format containing the information required to send the email (from_email, to_email, subject, content, etc.). A prompt of “send an email to Joe@cs.com about the meeting tomorrow”, would invoke a response in the tool format from the lm. The lm sends the tool, and not the email directly. The associated platform is responsible for sending the email itself. The upper limit of effective concurrent tool use by an agent is probably about three according to the lm’s. This is consistent with my experience. In practice, there is a bit of difference in the way models handle tools, but I’m happy that there is enough consistency in processing so that one can interchange models. I’m very happy that Anthropic adopted json schema for it’s beta version, as the early documentation I saw suggested xml.

The ability to use your data: You can provide lm’s with data in a number of formats, and instructions like: “Use the information provided, and if you can’t answer the question based on that information, say “I don’t know”. I created an accounting assistant by sharing a chart of accounts as a .csv file. Any specific uses or restrictions can be incorporated by adding as comments. This accounting assistant will tell you all you need to know about what account code to use for your accounting simply based on the chart of accounts. Similarly, policy assistants can be configured in minutes by simply providing them with access to the policies. Send the travel policy agent the travel policy, and then ask a question about the per diem for meals in Indianapolis. There is no training and virtually no other prompt engineering required.

Futures: The ability to create an approach to analysis: Lm’s are surprising good at providing an approach to analysis. Some might call this reasoning. For instance, lm’s do a pretty good job with determining how to go about solving multi-faceted analyses. However, their internal wiring to utilize this capability varies. The key to leveraging this reasoning is to prompt them to first provide the approach, and then to use the approach. This method can be built into the configuration using tools.

Futures: Configuring lm’s as reasoning thought partners: Provide the lm with instructions that define an exploration path, and they will prompt you through a path and provide input that feels remarkably intelligent. “You are an experienced business analyst. Help companies to develop operating plans to optimize their future valuation through the following path: 1.) Understand the business and current plans, 2.) Develop a valuation model based on peer/industry standard", 3.) Determine the highest leverage opportunities to move the valuation, 4.) Develop priorities, 5.) Put together a plan and 6.) make sure it is realistic. This type of guidance in the prompt invokes a conversation that is both cooperative and additive. Once information is shared, the lm will take the lead in subsequent topics, so in the above conversation, once you share the business and the means of valuation for the industry, the lm can take the lead in the later topics. To prompt the lm to take the lead once sufficient context is established, I respond with “I don’t know, what do you think?” A good general arc is 1.) define what you are trying to do 2.) state approach or sequence, 3.) determine and categorize outcomes, and 4.) develop a plan. You can feed this to any model as a prompt and it works pretty well. You can also feed it as a tool to some models (notably OpenAI 3.5 and 4) which allows you to add specificity regarding the execution of each step in the process. As an admirer of Elon Musk’s accomplishments, I’ve prompted approaches based on “first principles”. You need to finesse the steps a little, but the resulting experience is pretty interesting.

Directional alignment: Is there a risk that lm direction and that these capabilities will be de-emphasized in the future by the lm market? Short answer: I don’t think that’s a risk. I distill this to a question of their commitment to tools, as the other capabilities are foundational. For tools, I’m confident because though rapidly evolving (they are Beta capabilities for all providers listed), there is enough consensus commitment by providers and adoption by customers that providing tools is now table stakes.

Portability between lm’s: I’ve found that basic capabilities are pretty portable between these providers, so that agents can be assigned to the model of your choosing. This entire area is all pretty dynamic. For instance, Anthropic released beta access to tools last week (early thoughts? Very capable!). In fact, each provider/model has strengths and weaknesses. I plan to write about my perspective on this next.

Some Examples: I’ve informed my perspective by building the following sample processes. Each pushes hard in one or two specific dimensions which I try to describe The objective here is to assess a.) the ability of the framework to support the complexity, and b.) the reliability and consistency of the lm responses to create robust processes. The composite capabilities that are enabled are pretty exciting.

Global Corporation Tax Provisioning Process: This process is challenging because it brings together forecasting, logic-based allocations, the need to both perform a high level (top down) calculation and then a detailed (bottom up) calculation. I thought language models would struggle with the math of calculating corporate financials to legal entity-based financials through the various allocations and across years for an assumed growth rate. Two out of three of the models got the math right out of the box, and the third got it with some simple prompt engineering.

The Administrative Assistant: This process performs activities like managing tracking lists, actions, and communications and scheduling. Under the covers it consists of five agents. The classification agent responsible for routing user request to the appropriate agent is created solely through prompt engineering and works pretty well. This process is also able to juggle multiple in process tools. For instance it can compose, but not send emails to a few people and also create but not send an appointment and wait for the user to refine. This works with OpenAi, but not on all models. I like being able to write emails as if I have a real assistant. “Compose an email to Joe about the thing and jokingly remind him that my sports team always beats his”. Then “make it longer”, “make it more formal”, etc.

The Hot Tub Assistant: This process helps an owner to manage maintenance of a hot tub, and professional maintenance technicians to keep track of maintenance activities. It is cool because the assistant itself was configured by another process, “Define an Experience”. The define experience process improved my thinking about how the professional technicians should best interact with the owners. I thought that there would be a lot of work in building the back-end to keep track of maintenance activities but as a shortcut I wrote tools to simply write any maintenance activities to a text file and feed that to the agent as context and it sorts everything out. text file: “the water should be changed every thirty days. The water was changed on April 1st. It is April 15th.” Question: “In how many days does the water need to be changed"? Answer: The water needs to be changed in 15 days, On April 30th.

The Department Budget Planning Process - This is a type of process called a forum, where questions are classified and sent to expert agents that are topic specific. In this process, the Financial Planning and Analysis Assistant handles budget and budget processing questions. The Accounting Assistant handles account and account code level questions. A Financial Analyst answers questions related to financial goals or competitive benchmarks and other competitive information. I built this classification agent by fine-tuning a text model based on Google text-bison002. Fine tuning is a process where you provide sample answers, and a derivative model is created, with new weights in the later nodes of the model. As a reference, I fine tuned this model with 1000 sample answers. It ran overnight on Google Cloud VertexAi, cost $100 to create, and is significantly cheaper, faster, and more accurate than a larger model that is prompt engineered. It’s likely that it makes sense to create department level question classifiers, one for finance, on for HR, one for Operations, etc. to route questions to the appropriate agents. This model tested and proves the governance of a fairly complex process by a department, but without coding or particularly deep AI skills. I’m confident that most company finance organizations can both configure this process and provide governance and oversight without the need for technical or machine learning resources. Creating a new agent, say a purchasing policy assistant to answer internal questions about the internal purchasing process would simply be a matter of creating the agent and process, providing it with a prompt, and access to a tool that references the purchasing policy either in whole or in parts based on the question being asked.

Futures: Though not required capabilities to support our core use cases, I’m intrigued by the following:

The ability to create an approach to analysis: Lm’s are surprising good at providing an approach to analysis. Some might call this reasoning. It’s not, but it sure feels like it is. For instance, lm’s do a pretty good job with determining how to go about solving multi-faceted analyses. For instance, the question "help me find value-oriented stocks which are under-valued relative to their peers in growing industries.” is best asked in two parts: First, “Develop an approach to find value-oriented….”, and then “Use this approach to find ten companies ordered by the most under-valued”.

Configuring lm’s as reasoning thought partners: Provide the lm with instructions that define an exploration path, and they will prompt you through a path and provide input that feels remarkably intelligent. “You are an experienced business analyst. Help companies to develop operating plans to optimize their future valuation through the following path: 1.) Understand the business and current plans, 2.) Develop a valuation model based on peer/industry standard", 3.) Determine the highest leverage opportunities to move the valuation, 4.) Develop priorities, 5.) Put together a plan and 6.) make sure it is realistic. This type of guidance in the prompt invokes a conversation that is both cooperative and additive. Once information is shared, the lm will take the lead in subsequent topics, so in the above conversation, once you share the business and the means of valuation for the industry, the lm can take the lead in the later topics. To prompt the lm to take the lead once sufficient context is established, I respond with “I don’t know, what do you think?” A good general arc is 1.) define what you are trying to do 2.) state approach or sequence, 3.) determine and categorize outcomes, and 4.) develop a plan. You can feed this to any model as a prompt and it works pretty well. You can also feed it as a tool to some models (notably OpenAI 3.5 and 4) which allows you to add specificity regarding the execution of each step in the process. As an admirer of Elon Musk’s accomplishments, I’ve prompted approaches based on “first principles”. You need to finesse the steps a little, but the resulting experience is pretty interesting.

Summary: Lm’s, together with a process framework, can be configured to support complex business processes that both improve the user experience and reduce the complexity and brittleness of processes. More importantly, we’ve been able to accomplish this with a no code platform. This allows organizations to put the command of ml based capabilities where it should go: in the hands of front-line performers. Future capabilities are exciting, and build on the capabilities to develop the core processes.

I’m excited to further investigate other leading capabilities and particularly the prospect of configuring processes that start with a definitional and creative focus, but whose final product is a fully configured working capability.

I’m also excited to explore some additional dimensions, such as design to optimize the interaction experience for the user. I recently met Shashank Gargeshwari, a technologist and game designer and we talked about the user experience. He put the challenge in the context of cognitive load. Insights and support from language models is delivered through prose. How do we further prompt engineer or interpret output to create more easily absorbed feedback? I think of lists, action buttons, or some new codification scheme for outlines, approaches, strategies, details. Shashank is much better at this than me. I’m also excited to string these capabilities together completely to create a process to create processes. We’ll report back.

Gregory Gilbert

AI Language Models and Business Processes: The Building Blocks

AI, Steaks, and Framing Nailers