Artificial General Intelligence (AGI) is a concept that has been around for a very long time. It stands for the idea that machines, powered by AI, will ultimately outperform humans in every field. The concept fed our scientific imagination for decades, and has been grounds for creative sci-fi works and literature that captured humanity’s best interests and deepest worries.

Fast forward to the present, Artificial General Intelligence is slowly but surely being pulled out of fiction to be rooted in reality. Enabled by the advances in AI research, open-source projects, and brilliant fresh ideas centered around AI, the AI technology is sky-rocketing with no limits in the horizon. New advances and innovations on Language/Image, Language-to-Code, Language-to-SQL, Speech/Text models, to name just a few, are dished out regularly to the AI community. New paradigms are also being adopted at scale, like the Mixture of Experts (MoE) layer. All these components, and many more, put together, are in my opinion the building blocks of what Artificial General Intelligence shall look like in a few years.

At Astrafy, we express interest in leading on AGI-inspired products in tomorrow’s AI market. To that end, I developed our first prototype of an autonomous, versatile, and universal AI Agent that we called Astragy.

  • Autonomous: It requires no human intervention. It can “think” and “decide” for itself, then act on those decisions in a fully autonomous fashion;

  • Versatile: It is designed to perform diverse tasks per the user’s requirements or needs, even when these must be inferred from the context;

  • Universal: The implementation could be role-agnostic and task-agnostic. This point has not been proven yet. I need to run larger and more thorough tests. However, the design seemed to be able to handle the tasks of interest to us without any change to the architecture.

I would like to emphasize that these augmentations do not interfere with Astragy’s ability to carry out conversations.

Astragy’s Architecture

Before I dive into the architecture, I would like to share a few insights on the thought process that lead to it.

When Astragy first receives the user’s message, it must decide what to do with it. In fact, the user might be explicitly asking to perform a particular task as in “schedule a Google Calendar meeting tomorrow at 9am between me and bart@example.com”, or the message could be a question for which it needs to search on the web as in “What is the Apple Inc. stock price now?”, or the message could even be just a “Hi!”. Different situations require different tools, and sometimes situations can be addressed using pure generation.

Once a decision has been made about what tools are needed, if any, to address the user’s message, actions must follow. For example, if the user asks about Apple Inc.’s stock price, a web/database search must be performed. To that end, the user’s message must be “parsed” into arguments to run the right tool for the situation.

Once the tool’s result is obtained, a decision must be made as of whether it is satisfying with respect to the original query. To follow up on the Apple Inc.’s stock price example, the result would be something like “aapl stock: 188.9$”. Let us for simplicity’s sake assume that there is another company named Apel Inc. (stock ticker: APEL). If the stock price search tool outputs the stock price of Apel Inc. instead, the result is deemed unsatisfying and another round of enhanced decision-making and parsing should be done.

This approach assumes that the tools are deterministic and bug-free, and that the only reason we may be needing to run more enhanced decision-making and parsing steps is that there was a mistake in the parsing or the decision made about the tools to use is bad all together.

These three steps, symbolized here in blue, create the following loop:


Astragy's Operations Loop


With these examples and steps in mind, here is a simplified overview of Astragy’s architecture:


A diagram depicting a structured workflow for a software system with various components including Toolkit, Tool Executor, Parser, Reasoner, Decider, Core linked to a Backend and Frontend, demonstrating the process of user interaction and internal decision-making.


The decider, parser, reasoner, and core modules are all independent LLM “instances”. You can think of this as a combination of experts in that each module specializes in a specific part of the “thought process”. For now, Astragy uses 4 “instances” of Google’s PaLM2.

Here are some insights on what each module in the architecture does:

  • decider: decides which action should be performed among the available actions. For example: database search, web search, scheduling, translation, calculus, …etc.

  • parser: parses the message to extract the arguments that are relevant to the tools chosen by the decider, if any.

  • tool executor: executes the chosen tools with the parsed arguments and returns the result.

  • reasoner: “evaluates” whether the obtained result is satisfying with respect to the original query. If not, it tries to improve result by triggering a new round of operations using extra information from the previous result and from reasoning conclusions.

  • core: this is the LLM that communicates with the user. The choice of making a separate LLM “instance” for the interaction with the user is that it can be prompted in a very different way than the reasoner.

I keep quoting the word “instance” because in truth I am using a single LLM instance, but the system is designed so that each LLM module sees only its own interactions and is unaware of the existence of the other modules.

In order to avoid infinite loops, I cap the maximum iterations of the Operations Loop to a maximum reasoning depth, after which the accumulated (and perhaps enhanced) information is used to generate a reply.

This architecture has some serious advantages:

  • Module Independence: One can plug in the LLM that performs best for the specific part it will play in the Operations Loop. For example, when in the parsing step, one could use a model that is specifically trained to parse text into JSON, or YAML, or any other format that can be used to pass arguments.

  • Safety: Astragy cannot do what it has no tool for. For example, we can develop a tool to access a bank account’s balance without exposing any means of performing transactions. This granularity in what tools and operations are exposed to our system is a strong safety guarantee.

  • Complementarity: My experiences have shown that these modules, each taken separately, can have some serious shortfalls. However, these issues are surprisingly mitigated by the interactions between them when they are combined together.

It also has some flaws to overcome in the future:

  • We have no ability to steer the execution of the subsequent steps of the Operations Loop, aside from the system’s own “reasoning” and conclusions, but then if the end-user has to intervene the system wouldn’t be autonomous;

  • Recent research has shown that we can increase the context length of LLMs to a staggering 128K tokens (yarn-llama-2–13b-128k by NousResearch), but it has also shown that for very long contexts, Transformers process the middle section less well than they process the edges (Lost In the Middle by Stanford, UC Berkeley, and Samaya AI). This means that there is inherently a limit to how many tools can be fully mastered by Astragy at a time.

Some Real Examples

We have tried Astragy with only a handful of real tools so far since we are still in development stage. Nevertheless, every time, we made it believe it has many plausible tools than it really does to see if it would still choose and be able to properly execute the right tools in a variety of scenarios. Here are few examples:

Web Search Example 1

user:
when was chatgpt enterprise released?

Astragy:
ChatGPT Enterprise was released on August 28, 2023.

Web Search Example 2

user:
who is the current president of Switzerland?

Astragy:
Alain Berset is the current president of Switzerland. He was elected on December 7, 2022, for 2023. It is his second term.

user:
what party does he belong to?

Astragy:
He belongs to the Social Democratic Party of Switzerland.

Web Search Example 3

user: 
how much should a man sleep?

Astragy: 
According to the National Sleep Foundation, adults need 7–8 hours of sleep per night. However, some people may need more or less sleep, depending on their
individual needs. If you are not sure how much sleep you need, talk to your doctor.

Mathematical Calculation Example

user:
what is 120293 times 292?

Astragy:
The result is 35125556.0.

Conclusion

While many technical and ethical challenges lie ahead of Artificial General Intelligence, we are hopeful that, at the current pace of development and innovation, AGI-inspired systems will grow safe, reliable, and powerful enough to usher a new era of opportunities.

If you are looking for support on your AI journey, feel free to reach out to us at sales@astrafy.io.