ProjectAI is a workflow framework written in Python that I currently use primarily in the financial sector (stock trading) to identify solid stocks. The child doesn't have a final name yet, but internally I simply call it ProjectAI.
It enables the seamless integration of local and cloud-based AI models into existing, third-party ecosystems via interfaces. A workflow defines a sequential execution of data retrievals, actions, and signals via modules. The data and results of workflows are processed and passed on to the AI using JINJA2 templates. It is currently still in early development, but analysis results from test runs will be uploaded here periodically, in the hope that they might be helpful to someone.
AI news is omnipresent; the topic is almost getting on many people's nerves by now, while others see it as a fundamental threat.
I view the development calmly: AI should complement humans, not replace them. So why not use its enormous potential productively?
Looking back, it was and is much more than just a project for me. It was an intensive excursion through the world of artificial intelligence – full of eureka moments and combined with a massive build-up of knowledge about technical contexts as well as the still existing limits of the technology. More on this in the Backstory.
At the end of 2024, I wanted to know how deeply an AI could be integrated into stock trading using home-based resources. What started as a small experiment has now developed into a serious software project.
Let's not fool ourselves: Nowhere else do you encounter as many egocentric smooth-talkers as in the capital-driven financial sector.
Their goal? To deliberately drive retail investors into hyped stocks to artificially pump up prices. By the way: The insurance industry is no different.
There are well-known big players offering such AI analysis services, but at horrendous subscription prices – for occasional private investors, this is completely uneconomical. Therefore, I started experimenting with local models. I wanted to get a feel for how compact, locally run open-source models compare to the large, well-known cloud models. My AI experience at this point was still of a purely fundamental nature.
My initial attempts were based on the models BERT and BART (for text summarization), Qwen, DeepSeek, Ace Reason, and NVIDIA's Nemotron reasoning models (for logical operations), as well as Llama, Mistral, and Mixtral (for textual evaluations), all the way to the currently used Google Gemma 4, which operates here in three versions and shows enormous progress. Using Ollama, Oobabooga, and currently KoboldCpp, I systematically evaluated and benchmarked the models.
The drastic differences in model architectures quickly became apparent: While large models more or less flawlessly maintained the red thread, they were agonizingly slow on a pure CPU. Small models, on the other hand, acted extremely fast but remained superficial, quickly lost context, and began to hallucinate.
To push the limits, I developed a Needle-in-a-Haystack scenario. I fed the models with extensive SEC financial reports from companies and specifically set them on the data. Using meticulously optimized prompts (AI persona and instructions), I had them search for hidden risks:
Particularly important to me was the comparison between euphoric PR statements and the bare numbers of the SEC documents.
As expected, the small models failed; they lacked the cognitive depth for such complex analyses. Larger models, however, recognized up to 80% of the hidden information. But the actual breakthrough only came when I split the overall task into smaller subtasks.
Two paths now lay before me:
Fascinated by the second option, which could also be combined with a paid subscription model if necessary – should the results not be convincing – I started developing my own Python framework that meets the following core criteria:
Although my focus was originally on finance, the potential applications are simply limitless. The workflows can be deeply customized. Modules can not only acquire data but also trigger workflows/AIs. This allows the AI to act proactively rather than just reactively.
In the first version of ProjectAI, I used transformers and llama.cpp to natively address the local models. The sessions and memory function ensured that essential information was not lost. To overcome the hardware limitations at home, the local models were distributed across my PCs via clustering in the LAN. Tasks were thus automatically assigned to free resources based on the suitability of the models. ProjectAI has been running successfully in this configuration on my home server as the master instance for some time now.
Additionally, a modern graphics card for AI acceleration would be ideal, but at the time prices skyrocketed, which is why I hesitated for a long time. Meanwhile, however, I managed to snag a Sapphire Radeon 9600XT at a good price; the speed jump compared to pure CPU calculation is, to put it mildly, enormous.
ProjectAI 2.0 is a purely local MCP Tool Server that processes workflows, basically an extremely stripped-down ProjectAI 1.0 that only provides tools or workflows. The data structure is identical to the previous version, but only suitable for pure local use with KoboldCpp, Gemini-CLI, Codex, etc.
ProjectAI 3.0 is yet another approach and is currently being intensively further developed by me: A standalone framework, a service-oriented gateway that can be connected to clients locally or remotely via an MCP interface (fastmcp). It virtually combines the features of V1 and V2 and makes the application universal, additionally housing a dashboard as well as skills and a watchdog (daemon) that triggers scheduled actions.
Currently, I use a mix of Gemini-CLI (Gemini Flash 3) and KoboldCpp (Gemma 4) in connection with the framework. In total, 7 sub-agents of Gemma or Gemini are orchestrated here, each of which has a specially tailored prompt that defines the persona and the respective tasks.
The agents and their core tasks:
When I initiate an analysis, the respective AI (orchestrator / conductor) reads its skill, i.e., a set of instructions on how to proceed and what the final report should look like. The orchestrator AI thus sequentially issues instructions to the agents; the agents run through their prompt, gather the data, and return it to the orchestrator. Afterwards, the report is generated and saved as markdown in my framework or database, so that I can also view it in the dashboard. The AI also enters any upcoming events / dates for analysis.
Each of these AIs uses my framework or the workflows (workflows in YAML format) as tools. A workflow, in turn, contains a sequence of actions, through which, for example, data is retrieved from the SEC or APIs. The retrieved data is standardized and ends up in a large data pool. Once all data has been collected, statistics and pre-calculations are performed; these include technical indicators for charts, but also fundamental calculations. Finally, the calculated data is formatted as readable markdown for the AI and handed over to it.
A workflow can therefore not only retrieve data with little effort, as just described, but also, for example:
There are no limits here; the modules are the actual horizons, and through the modules, the workflow expands significantly.
The modules are extensions / capabilities that can be addressed by workflows. Currently, there is a number of modules for:
Broadly speaking, the watchdog is like a time-controlled workflow, but cannot be used by an AI and is therefore not declared as a tool. However, with an appropriate workflow, the AI can use the watchdog to set a scheduled event.
The dataflow is responsible for standardizing the data. It converts incoming data from modules, or their key-value pairs, into canonical data.
You can always find reliable data on major technology stocks on relevant websites. Of course, there are black sheep here too, but they are relatively easier to research in comparison. A biotech / pharma company is much trickier. Apart from the medical aspect, which can be complicated enough if you don't have a doctorate, granular statements from management, management behavior, and financial management are extremely important. A bio / pharma company does not generate money in clinical phases; the companies are dependent on financial backers, mostly supported by lenders and / or investors who believe in the product and its success. Toxic lenders, high cash burn, and not forgetting the dilution of shares that directly affects the investor—all this must be taken into account to get a clear picture.
It starts with management compensation, see the TRAX & ANAB analysis for example: A CEO for two companies, where the subsidiary is only used for clinical trials and the potential (or lack of) success is attributed to the parent company, on top of which the advertising drum is beaten across all media for such a spin-off company. Plus double compensation (from both companies) for the CEO; these are points where I distance myself. In my view, the basic idea of fighting a disease is completely lost here.