AI as a Platform

New Subject Area

AI is a hot topic. We often discuss whether AI can replace human programmers and, if so, when exactly that might happen. There are two basic polar opinions and numerous combinations between them: on one end, some believe that soon we will all be out of luck and that intellectual work is reaching its final days. Others smirk skeptically and say that there is no significant threat: AI lacks and will never have what human brains possess...

Many of us already have experience interacting with various neural networks, both positively and not so much. Moreover, there is, I would say, a "strange" experience, a separate category from which there are more questions than answers. Tech giants announce the widespread implementation and total intelligence of future machines. Meanwhile, we find ourselves surprised at how, um... dumb and lazy this all-powerful AI can be when asked to do something genuinely useful.

In this article, I propose shifting from general considerations to a more pragmatic plane and viewing AI not as a potential threat to our future but as a new platform for development that opens up many unexplored and unexpected opportunities, along with a heap of new work for us programmers, naturally. It is this active practical application of modern AI capabilities that I refer to as a new subject area worthy of close attention and thorough examination.

Just to clarify: I suggest not to engage in debates about whether neural networks constitute true AI; we use these terms as generally accepted for convenience and brevity. By AI, in general, we mean various neural networks in their diverse combinations.

AI Functions, Fuzzy Logic

Let's start with the simplest. We all know that access to self-hosted neural networks or public AI services is possible via API. The most basic thing that can be conceived here is an asynchronous function that takes a text prompt as input and outputs a generated text response (if we are working with a text model).

If our codebase has such a function and some control over configurations and available models, we can make virtually any part of our system "intelligent." We can complicate and develop such an AI toolkit and elevate it to the state of a full-fledged general-purpose library or something powerful yet specialized. We can use the composition of such functions to create sequential AI pipelines that tackle the most non-trivial and unexpected tasks.

Next example. Imagine we are creating a corporate chatbot that should behave almost like a human from your team. One of the obvious external differences between a living person and a neural network is the capacity for initiative. The neural network operates on a primitive "I/O" model, but you want our cyber-friend-expert to engage in the dialogue and offer assistance if he sees that the meat bags are struggling without him. Perhaps he might even spontaneously suggest ideas. We'll leave the philosophical problem of free will for later discussion in the comments, but for now, let's write a function that summarizes some textual context into an output of true or false, which we will use to determine whether to take the initiative. Such functions can be much more complex and include dynamic analysis of external information, such as reading news through a headless browser. They can also trigger on a schedule, timer, or "random" manner. But those are details; I am speaking of basic principles.

AI functions yield certain results with a given probability, and even if you subsequently reduce it to primitive values or conform it to some scheme—you're dealing with that very fuzzy logic that makes your life less boring and your work more creative.

Multimodality

The author of these lines utilized "multimodal AI" in his work long before major AI vendors even announced this feature. You might ask me how? Quite simply, I have already revealed the secret earlier in the text: the composition of AI functions.

Again, an example. Suppose you want to get an article about cat breeds from AI. And what kind of article about cats would be complete without cute pictures? And what article with pictures could be without the markup that specifies how and where these pictures will be inserted? My answer is none.

Thus, we use three AI functions:

The first generates an article listing cat breeds.
The second inserts placeholders for images and prompts adapted for the model we need into the text of the previous generation.
The third function requests images from our "image" neural network and inserts them into the final result.

Voila: we end up with a document that can be proudly shown to mom right now.

In this straightforward manner, we can resolve many tasks that lie beyond the reach of any single generative model. And to do this, you don't need a degree in mathematics, a bunch of GPUs, and your own, finely annotated dataset. A knowledge of JavaScript or Python and a bit of ingenuity are sufficient.

Prompt Engineering

As you can understand, without the skills to craft good prompts, you won't get far in this field. I often encounter the phrase "prompt engineer" used ironically, yet we must accept that prompts are soon to become (or have already become) a full-fledged part of our codebase.

Control

Interacting with AI via API gives us a much higher degree of control than through any ready-made UI. We can change parameters for each request, switch model versions, automatically add necessary information to the context, and do much more. This increases the quality of the result, in the sense that it aligns much better with our expectations.

Context

Context describes what makes our case specific. It is some corpus of data that cuts off unnecessary vectors for the AI and sets the required ones to solve your specific task. It can consist of the history of requests and responses, role descriptions, output format descriptions, uploaded documents, and multimedia files, content from web pages, web search results, and much more. Context can be stored locally, passed between different models, edited, and optimized. Context can have its formal structure (e.g., record identifiers) and even its markup containing useful meta-data. The output generation is also part of the context, and our approach to working with context largely determines the quality of the result.

Formalization, Stabilization, Validation

It's time for the next example, which logically follows from the previous one. Suppose our task is to create a complete web page using AI. Usually, a web document contains the following entities: text, markup, styles, images, scripts. Almost all of this can be produced as an answer to just one request, but it is unwise to hope that the result will satisfy you immediately. It often happens that some part of the AI's response is perfectly acceptable, while others are categorically not. We can iteratively pursue a better outcome, but without the ability to make precise adjustments and cache satisfactory responses, juggling output entities can be very complicated. To solve this problem, in addition to processing the result through a chain of functions, a formalized output structure is needed, allowing generation and modification of sections and entire segments of the response. Moreover, each part must meet rigorous criteria and contain no errors, or, simply put, work properly in a browser.

Recently, OpenAI presented something called Structured Outputs: a method for formatting the AI's response to your JSON schema, which helps extract data by fixed keys and perform further manipulations. This is very similar to what we might need to tackle the aforementioned task. A web document can certainly be represented in the form of a JSON description, but a separate solution will be required to generate the final HTML...

The author of this article, in his experiments with structuring, arrived at a different format based on HTML initially. It turns out that AI understands XML-like syntax quite well, and this syntax helps to "negotiate" effectively with AI. You can create detailed instructions for using tags and attributes, both standard and completely custom. You can assign unique identifiers and other meta-data to tags and use them in subsequent requests. From your code's perspective, HTML can be conveniently processed using the DOM API: searching for and modifying elements, changing structures, and so forth. This can be done in Node.js or through APIs for controlling headless browsers, such as Puppeteer or Playwright, providing full rendering or rasterization tools in your arsenal.

AI Tags

I will continue the topic of working with AI under the HTML guise, but from a slightly unexpected angle. Among current HTML standards and specifications, there is an element known as Custom Elements. For those who are unaware: this is a way of creating your custom tags that the browser initializes upon encountering them in markup, executing behavior logic defined by you. Furthermore, there is a way to modify the behavior of standard tags (there are nuances, but they are not important within the scope of this narrative).

Now imagine that in your HTML file, instead of:

<img src="./cat.jpg" alt="Image of a black cat" />

You write something like:

<img prompt="Image of a black cat" width="640" height="480" />

And as a result, you receive the desired image, which is also cached in CDN by the prompt hash for all subsequent visitors to your page.

Or we could do this:

<ul prompt="Countries of South America"></ul>

And list the countries.

Or:

<ai-translate lang="Spanish">Lorem ipsum... some text.</ai-translate>

And receive a block of text translated into Spanish. You can even specify the translation language using... CSS Custom Properties!

Or:

<lorem-ipsum size="A4 page" topic="Cats"></lorem-ipsum>

And receive a block of placeholder text about cats. Or some SEO fluff.

The final document, of course, can be saved and published as a static file.

Web Technologies

As you may have noticed, in my narrative, I frequently reference web technologies. In my opinion, these will have a special role in the coming AI revolution. We (all of human civilization) need a standardized protocol for communicating with AI so that we can maintain our grasp on understanding and controlling it. We require simple and standard ways to visualize data. We need tools that provide a deterministic approach to typical tasks. Otherwise, singularity. It awaits us in any case (and some believe it has already arrived), but we need at least some "branch" to hold onto in the coming years. We need entities that can be relatively easily managed by an average human brain: to read, analyze, recreate independently, and modify according to personal goals. We require "human-readability." We already have a set of standards and principles, which are HTML, CSS, and JavaScript. With web technologies, we can create widgets and meta-applications that can be part of AI communication. We can add meta-data and necessary markup (as in the examples above). We can conveniently address complex issues of entity and operand composition. We can shield sensitive and confidential information from entering external datasets and avoid corresponding leaks and vulnerabilities.

I know many harbor biases against web technologies. However, I have a strong argument: the primary property of the Web, and almost everything it comprises, is transparency. We can literally see what it consists of inside. We can pop the hood and take a look inside without needing complicated decompilation or rare knowledge. Many, for some reason, overlook this factor when assessing technologies, yet I believe it is incredibly important.

Recent events add a certain spice to the topic concerning the Telegram messenger. However, I do not want to engage in speculation and instead propose focusing on the purely technical aspect. Namely, Telegram is an amazing platform with its own powerful capabilities.

With Telegram, right out of the box, we have authorization and user rights and group management. We have file sharing and notifications. We have the ability to integrate web applications, blockchain, and much more. And, of course, the ability to create bots.

I consider Telegram a fantastic "frontend" for various AI-based solutions.

Hardware

Many problems associated with developing AI services can be resolved by using self-hosted models. This pertains to security, limits, and even the possibility of leveraging external public services. R&D processes in AI are particularly sensitive to all the above, including the fact that verifying your technical hypotheses can itself consume significant resources. In this context, we are especially interested in the question of "hardware" optimal for working with neural networks. A good GPU and ample memory form the base recipe, and there is little new to say here.

However, I personally find the new Apple Silicon M4 series processors with built-in NPU very interesting in this regard. I eagerly await the upcoming update to the Mac Mini lineup. I feel this device could become the industry standard for self-hosted models, whether corporate or home-based, for the foreseeable future. Time will tell.

Conclusion

While preparing this article, I had to delete some sections for the sake of brevity and narrative coherence. However, there are myriad discussion topics and ideas - many worthy of separate articles.

I hope I have led you to conclude that AI does not take away our jobs; instead, it offers new ones: engaging, creative, and full of new opportunities.