“I'm not saying women are better. I've never said that. I'm saying we deserve some respect.”
— Billie Jean King, Battle of the Sexes
Euro 2024
The important stuff first: the Germans achieved the result they needed to qualify at the top of their group in Euro 2024. The Azzurri (Italy) scraped through in the 7th of 8 minutes of stoppage time. Our value trap theory on Italy almost proved correct. We’re betting with our heads here, not hearts. Unlike England, Italy fans have seen their team win and come runners-up in a number of World Cups and Euros during their lifetimes, so expectations are realistic. So far, it has been a tournament dominated by master coach tacticians rather than individual star players.
Here are the latest betting odds:

By the time this update is published, the odds might have changed. The GPU (Gupta Processing Unit) has spotted some good value in backing Belgium. He also runs a successful systematic futures trading fund when not sports betting. With the ‘easier' side of the bracket, Italy might be a decent bet as well. The order should be Germany as favorites with Spain slightly behind them, followed by France and England in 3rd and 4th place, respectively. An outside team, such as Belgium or Austria, might be a good candidate to rise into 3rd place.
Eurozone
The best chance our readers have of making money from our ideas near-term are mentioned above. A Eurozone sovereign debt crisis is more like a 5/1 fair odds bet with a 9/1 payoff. This is something to reassess if or when the two align. Time and again, the “there are no easy trades or nothing to do” mindset in markets has proved otherwise. We hear this a lot from the credit and macro hedge fund community. The better way to frame it is that there’s always plenty to do; we’re just unable to see it or don’t have the risk appetite for it.
UK Election Insider Betting
In the corridors of Westminster, the Tory Party's ‘chief data officer' has been suspended under suspicion of insider trading on the timing of the UK general election, along with a few others. A Desmond (2:2) from Durham in PPE and a Google Analytics online certificate will probably clinch the role of Tory Chief Data Scientist if anyone is interested in a temp gig for a couple of weeks. The bookies probably snitched on them given the Financial Times has a copy of the trade (bet) blotter. That’s like your drug dealer telling you they’ve had enough, and they’ve also had enough of you. Their leader managed to get duped into a £1000 bet with Piers Morgan, on television, on the timing of the first deportation of asylum seekers to Rwanda.
Mrs Sunak is probably hoping he quickly lands a job after politics rather than sat at home investing their own money. Tell your old man this time next year we'll be millionaires…
Part 3: AGI, Superhuman Intelligence & Natural Stupidity
In part 2, we covered the definitions for AGI and SI, and began part 1 by saying the success of pre-transformer LLMs in domain-specific applications took us (like many in NLP) by surprise. The AI landscape can be quite tribal. That is to say, it's better to let the real experts, such as Yann LeCun at Meta, at the forefront of AI pursue their endeavors while we enjoy the benefits of the fast-growing set of tools available for application developers.
Picking the right lens for the job
Nevertheless, to use a photography analogy, while we should not doubt the ability of talented engineers to develop more powerful multi-purpose lenses, we ought to be wary of claims which defy the laws of optics. Like multifocal length lenses, LLMs can be very expensive and clunky to carry around. A portrait photographer needs a high-quality lens with a narrow range of optics to earn a living.
Regarding language models, there is still a lot more technological innovation than viable products coming to market. As previously discussed, most of the interesting commercial applications of auto-generative AI are taking place in the computer graphics and video gaming industry.
Data analysis before AI
There's a lot that can be achieved with old-fashioned data and statistical analysis without resorting to complex deep neural networks. It's always better to first start from the raw data to scope out the applicability of machine learning, perhaps even saving a lot of time and money on electricity or server bills. Is the training data stationary or non-stationary? For example, vocabulary and the rules of grammar change much slower than the volume of text produced for training language models. What aspects and features do we want to capture and optimize outputs? Is the available data a good representation, or must we harvest our own? There is also such a thing as too much data. Not just for the purposes of training machine learning models, but also concerning users' personal data. If data is the new oil for power AI technologies, then it must also be refined to be useful.
Finally, it’s also helpful to have an idea or benchmark for what a perfect set of AI model outputs looks like. For some, a subject matter expert (SME) is the ultimate desired goal, while others will settle for no less than superhuman level intelligence. Both expectations may be unrealistic. To the untrained eye, a simulation of SME knowledge can be readily mistaken for the real thing.
For the rest of this update, we'll go through a series of examples in sports and finance to illustrate our points.
Designing a superhuman Tennis player
Over his long and highly successful tennis career, Roger Federer won 103 ATP singles titles, including 20 Grand Slam titles, 28 ATP Masters 1000 titles, 6 ATP Finals titles, an Olympic gold medal in doubles, and a silver medal in singles. He played a total of 1,526 singles matches, winning approximately 82% and 54% of the points. Yes, 54% sounds a bit low.
We chose Federer as our reference not because many regard him as the greatest of all time (GOAT) tennis player, but because he mentioned those statistics in his Dartmouth College graduation ceremony speech. Therefore, it's useful for cross-referencing our data sources. According to our stat sources, Federer won 2% more (82%) more singles matches in his competitive career than said in his speech (80%). His speech focused on the percentage of points won.
If Federer had won 55% of the points instead of 54%, and the exact same number of titles, would he be considered a better or worse tennis player? Similarly, if he had won 53% of the points with the same match results? The answer is not immediately obvious. We just know that if our super clone of Federer wins close to 100% of the points, he would certainly win every match and tournament. Perhaps not as fun for spectators.
Let’s look at other famous players’ stats over the course of their professional careers competing in singles tournaments for comparison…


Looking at Grand Slam titles won, what most people have in mind for top tennis players (note: the percentage of points won is for all competitive tournaments, not just Grand Slams):

The linear-regression least square fits are included, for men and women separately (lines are getting blurred these days), to show the potential for large errors and uncertainties in models. At best, we can conclude that the most successful tennis players of all time according to the number of major trophies victories have a higher than average points win percentage. Also, Maggie Connor, the Iron Lady of Oz, is a seriously underrated champion of tennis history.
Side note: “Battle of the Sexes” is a fitting movie on this topic. It tells the story of a Bobby Riggs, a retired male tennis pro with a gambling addiction, who challenges then Women's World Champion Billie Jean King to a duel. According to Riggs, a gambling addiction is only a problem if you suck at it.
Who is the tennis GOAT?
It should be clear to the reader that statistics alone are not useful for determining the GOAT tennis player. A player's career success also depends on the competition that lays in their way, as do their statistics. Federer would say that had he not faced stiff competition from Rafael Nadal and Novak Djokovic over his career, he would perhaps have won more titles but not developed his game as well.
In short, if we cannot decide on which metric defines the GOAT tennis player, we stand little chance of designing superhuman AI tennis players using player statistics as inputs. By targeting the percentage of point wins as our key metric, we may end up with a player that struggles to make it into the professional ranks of tennis. The chart below illustrates our point using stats from another set of top-ranked tennis players:

The format of tennis tournaments is more likely to be the constraint that prevents players from achieving more than 55% of the points than physical limits.
As a guide, researchers at major AI labs, such as DeepMind, use machine vision technology for sports projects—physically tracking and analyzing players' movement on the screen. The resulting AI cloned tennis player must be within the range of physical abilities of a human to be relevant. As far as we can tell, these projects make better news headlines than teaching sports professionals new modes of playing.
RAG, convenience and pitfalls
The charts above were generated exclusively with ChatGPT 4.o. As discussed in Part 2, AI efficiencies tend to be front-loaded. One must experiment with various data retrieval, plot, and formatting prompts to get close to the desired result. Even then, the final tweaks are tedious, and we have no idea if the original data sources are good or reliable. The data might also be subject to selection bias, skewed toward the most prominent players.
Retrieval Augmented Generation (RAG) and semantic search have been around longer than foundational language models and are most commonly used by Google to give answers to simple questions without clicking on the links page. The pitfalls of RAG technology are separate, and sometimes indistinguishable from, language model hallucinations. Convenience has both benefits and costs, especially on nuanced topics.
In our example, the initial stats ChatGPT retrieved on Djokovic's were an outlier—around 56% of points won. Andy Murray's figures are also prone to error, occasionally showing 55% career point wins based on a series of online articles about his form during his 2016 Grand Slam tour. In summary, it would have been far quicker in the end to just download the raw data off the ATP website and analyze it directly. The Matplotlib python script generated by ChatGPT to produce the charts could also potentially shave off a little of time.
As a one-off exercise on tennis, had we just used Excel from the very beginning, it would have been much quicker. It's not often that we write about tennis. As someone put it, the technology is using me rather than the other way around. I knew it beforehand, and still managed to fall into the trap. You start out thinking you’ll breeze through gracefully like Roger Federer, instead, you’re smashing rackets like John McEnroe.
Linear Regression and Recommendation Engines
Linear regression models are the foundation of early-generation content recommendation algorithms, like the one first rolled out by Netflix. User-generated data enables platforms to make better recommendations using solely viewership records and collaborative filters, unlike content-based filters which rely on metadata (e.g. movie genre, director, cast, run-time, etc.).
The results are best when the two methods (collaborative and content filters) are used in conjunction. Linear regression models are also conducive to forming echo chambers, which works better for ad revenue businesses but not so well for customers wishing to optimize for time and knowledge. These types of algorithms are both prevalent and unavoidable in our day-to-day online activities. A similar principle can be applied to generate leads for potential customers; a few hedge funds have unique clusters of their own in the painful to deal with versus value contribution domain.
What does it take to build soccer league champions?
To win a major European domestic soccer league, a team plays 36-38 matches and must win between 70-80% while losing no more than 3. This roughly equates to winning 80% of the total points available. Last season, Bayern Leverkusen won the German Bundesliga with an impressive undefeated record. This is more an exception than the norm.
The most successful traders can be right less than 50% of the time and still make money. More akin to improving market discipline and risk management skills rather than polishing crystal balls. Statistical arbitrage desks would aim for a success rate of 52-54% on high-frequency trades while minimizing execution costs. We don't know a lot about the current approach and key metrics for these strategies.
AI Bond Pricing Model
Here's an example we've mentioned in the past, based on a real situation, that's also relevant here:
A computer science student from a top university lands a summer internship job at a major Wall Street investment bank. She proposes to her superiors an AI project with a novel proprietary approach for calculating bond prices and yields using deep-learning. The idea is to calculate the theoretical fair price for any bonds, and the corresponding yield, using only historical market data. This would then allow her team to identify assets that are trading too cheap or rich and the syndicate desk to price deals for first-time bond issuers.
Enthused by the prospect of entering into the AI space, the team authorizes the intern to purchase lots of expensive data on every bond on Bloomberg, in addition to cloud compute capacity. The basic idea is to use hundreds of characteristics for every fixed-rate bullet bond to train the model, including daily close prices, yields, maturity, credit rating, currency, coupon rate, coupon frequency, country, sector, seniority, stock price (if listed), total short and long-term liabilities, market capitalization, equity option volatility term structure, moving averages and historical volatilities for various timeframes, etc. In other words, throwing everything to the wall as our inputs, hoping the AI model will be able to identify unfamiliar patterns in the data.
The intern's AI bond-yield model output will probably look as follows:

A market specialist should immediately realize something's not right if the model can't replicate a relatively trivial bond-to-yield relationship. What’s more annoying is that as the AI model receives more training data, the outputs keep changing—it lacks analytical traceability and backward compatibility. Before Bloomberg Terminals and Excel spreadsheets, Texas Instruments used to make calculators for bond traders which could convert bond prices to yields, and vice versa. The solution is calculated by a simple numerical iteration method. So what went wrong with the AI model?
By including too much data as inputs, the deep neural network eventually converges to a statistical model that minimizes the average output errors for the entire training dataset. We should also mention that the model does not always converge. In this example, the model should eventually learn that the yield is primarily a function of the price, coupon, and maturity of the bond. While bond math tells us that these three factors are all that's required. While the AI model will place a (statistical) weight on all input parameters, albeit small. Readers familiar with the Merton model may realize the contribution of stock price, leverage, and volatility for driving bond yields, especially for high yield borrowers. The credit ratings will also impact the results.
In essence, by including too many parameters hoping to achieve a comprehensive bond pricing model, the result is a jumbling up of different pricing frameworks that do not meet our basic requirements. It might be useful for other (more advanced) applications; however, we embarked on the journey without clear goals and expectations. The end result is likely to be a very expensive multi-regression model which does not require as much computational power and data. Darrell Duffie and Rohan Douglas wrote a paper in 2018 on a similar topic but unrelated to machine learning.
Most finance professionals will already be familiar with bond math and the split between the risk factors; such as interest rates, credit spreads, and funding cost, which drive yields. The Duffie-Singleton model (published in the late 1990s) uses a similar approach, eventually laying the foundation for what is today the market standard for pricing credit risk and credit default swaps (CDS). This Duffie guy is always several steps ahead of us, and we borrow a lot from his latest ideas. The Duffie-Singleton framework likely prevailed over a couple of others during the credit derivatives boom because of its simplicity and the way the market grew to resolve the chicken and egg problem. An example of seeking models with practice convenience.
We shouldn’t feel sorry for the other two. One of the others was awarded a Nobel Prize in Economics for options pricing. The other, Dimitri Kaavathas, went on to enjoy a very successful career post-academia at Goldman Sachs running credit sales; now pursuing other interesting venture outside banking.
AI project outcome
After the initial excitement and spending a fortune, everyone is disappointed, most of all the intern for a failed project and not having been offered a permanent role. She also didn’t get much time to learn the basics. As an AI specialist, she did nothing wrong and correctly identified a real problem and potential use case of the technology. The failures were from above. And sadly, they’re unlikely to be equipped with the tools to realize it even if they are the sort to admit to mistakes. The intern will be fine; she will graduate with a degree from Stanford in computer science and receive several job offers from Big Tech, which she was already contemplating.
Fortunately, it’s possible to avoid some costly mistakes by tapping into the wisdom of friends with experience in the field. At the same time, new breakthroughs come about when fresh eyes and minds are allowed to do it their own way. As someone leading a project, it's a fine line between offering guidance and assistance without becoming the obstacle to success. So, we can’t say for sure if the AI Bond Pricing model we outlined won’t work for sure since we haven’t tried it. Please feel free to have a crack at it and let us know how you get along either way.
The 4th and final part will follow…
Updates
The Euro 2024 odds have already changed after some final group stage matches:

The odds on Austria have repriced significantly after they qualified for the playoffs. GPU's Belgium call is doing well. France not winning their group put them in the tougher side of the bracket with Spain, Portugal and Germany. Not helping us backing Germany to win the tournament. We still favor Germany ahead of Spain, though it's a close call and the latter has a couple of very talented young players. How do England and Italy compare in terms of relative value? Both are on the ‘easier' side of the bracket to the final.