some kind of good, re <<investing>>
world into two lots, by sorting-hat, those who use AI, and those who do not
marketwatch.com
A professor testing ChatGPT’s, DeepSeek’s and Grok’s stock-picking skills suggests stockbrokers should worry
Alejandro Lopez-Lira has been impressed with how current AI models can trade markets Laila Maidan
Is artificial intelligence coming for the jobs of Wall Street traders? An assistant professor of finance at the University of Florida, Alejandro Lopez-Lira, has spent the past few years trying to answer that question.
In an interview, Lopez-Lira acknowledged that AI is prone to making mistakes, but he has not seen the three versions he’s been using do anything “stupid.” His work comes as more market participants are thinking about the implications of AI for investing and trading.
“I don’t know what tasks out there analysts are doing with information that can’t be done with large language models,” Lopez-Lira said. “The only two exceptions are things that involve interacting in the physical world or having in-person conversations. But, other than that, I would imagine all of the tasks or most of the tasks can already be automated.”
Shortly after OpenAI Inc. released ChatGPT in 2022, Lopez-Lira began testing the chatbot’s skills. He wanted to know if ChatGPT, and AI in general, would show an ability to pick stocks. While there are numerous ways to approach that question, Lopez-Lira began with a simple exercise: Could the AI application accurately interpret whether a headline on a news story is good or bad for a stock? What he found surprised him.
Conducting a back test simulating historical stock-market returns, the study used more than 134,000 headlines from press releases and news articles for over 4,000 companies that were pulled from third-party data providers. The headlines were fed into ChatGPT using a programming language called Python. ChatGPT would then decide whether a headline was positive for a company, negative or unknown. The results were then saved in a data file and uploaded into statistical software in which headlines perceived as positive would result in a stock purchase. Negative headlines would trigger short sales, effectively betting against a stock in anticipation that it will fall in price. If ChatGPT was uncertain, no action was taken.
Because this was an academic simulation, no actual stocks were traded. But the software did compare the simulated performance against historical outcomes. The stock picks were made daily, with a median of 70 stocks bought and a median of 20 shorted.
For Lopez-Lira, the tricky thing about using a back-testing approach was that the AI could know what, in the end, had transpired. OpenAI had trained ChatGPT in 2022 on data up until September 2021. So Lopez-Lira tested the chatbot using headlines after October 2021. This way, ChatGPT wouldn’t know what was going to happen and would need to rely on reason to come to conclusions.
His findings were released on the SSRN preprint platform in April 2023 in a paper titled “Can ChatGPT Forecast Stock Price Movements? Return Predictability and Large Language Models.” The study, currently being peer reviewed, found that ChatGPT had “significant predictive power for economic outcomes in asset markets.” The GPT-4 version had an average daily return of 0.38% with a compounded cumulative return of over 650% from October 2021 to December 2023.
Now, obviously, this academic study had limitations. In the real world, frictions exist that would strain returns, including brokerage transaction costs and fees; the availability of shares; taxes; and price impact, which is when relatively large trades move a stock’s price. Additionally, about 76% of the gains came from shorts, a trading strategy that can be more fraught due to short-interest fees and the need to find the shares to borrow and sell short.
“So, our results on paper are much more optimistic than what the performance in reality would be with a reasonable investment size,” Lopez-Lira said. But the tilt toward positive returns was enough for him to conclude that ChatGPT had understood economic markets and shown an ability to forecast stock outcomes.
Putting AI to the test againAbout a month after the preprint was published, Lopez-Lira got the chance to take his experiment outside of the academy after being contacted by Autopilot, an investment app that mimics the trades of notable public figures. He was asked to help create a portfolio that would be based on investment picks made by ChatGPT. It was an opportunity for him to see how his academic experiment would perform in the real world.
By September 2023, he’d begun providing the Autopilot app with the investment picks made by ChatGPT on a monthly basis. The Autopilot team would then upload the selections, and Autopilot users could link their brokerage accounts to the stock picks. This time, since real money was involved, Lopez-Lira had to do more than just feed ChatGPT a few news headlines. He had to provide it with a wide range of information to be sure it was making decisions based on the macroeconomic environment and company financials.
Available AI models are not currently in a place where you can just ask them to pick investments, said Lopez-Lira. The process still requires a human in the loop to feed it with the information it needs to consider before making a decision. This is mostly because AI models aren’t trained on real-time data, which means their knowledge is often outdated, including for such basics as the price of a stock’s last trade. Even as AI models are able to conduct live web searches, they don’t always know what information to search for in order to make the most informed decisions, he added.
“Large language models are tricky to handle, they can make stuff up and sometimes they don’t have the right information,” Lopez-Lira said. “So you have to know how to prompt the AI.”
The process The portfolio managed by ChatGPT would consist of 15 positions, 10 of which had to be stocks from the S&P 500 and five of which had to be exchange-traded funds that have exposure to a sector or industry.
To get there, Lopez-Lira used Python to pull information from third-party data providers and news websites about the macroeconomic environment, geopolitical risks, company financials and the latest prices for stocks within the S&P 500. He then asked ChatGPT to consider the information and assign companies a score on a scale of 1 to 100, with a higher score representing a better investment. Once the AI had decided on its scoring, it was then asked to create a portfolio of stocks and exchange-traded funds based on that information.
More recently, in February, Lopez-Lira added investing accounts on Autopilot that use Grok and DeepSeek.
Since then, the Florida professor has been gradually removing restrictions placed on the three AI models. For example, in March, the models were allowed to decide on the weightings of each holding. In April, the models were freed to balance up to 15 positions outside the initial parameters of 10 stocks and five ETFs, allowing them to pick a combination of their choosing. They could also pick ETFs that had exposure to additional asset classes, like bonds and commodities, excluding ones that use leverage, derivatives and short positions.
To date, the latest AI models running the investment accounts are OpenAI’s o3, xAI’s Grok 3 and DeepSeek R1. The models are periodically updated based on the latest versions available. Lopez-Lira also rotates which AI model he uses to summarize macroeconomic risks and score companies based on the 1-to-100 scale, but all three models receive the same input, regardless of which model did the screening.
Below is an example of what each AI application picked for its 15 positions on April 1, the day before President Donald Trump’s “liberation day” announcement on tariffs, until May 5, when the selections were rebalanced. It was based on data Lopez-Lira provided up until March 31. He used Grok 3 to organize the data. All three AI applications were fed the same information, but OpenAI’s o1 pro and Grok 3 were able to include information from web searches, according to Lopez-Lira. The tables below show each model’s selection, including the weightings, reason (thesis), the advantage of the position (edge) and the risk associated with the position.
Securities picked by OpenAI’s o1 pro
| Asset | Ticker symbol | Weight | Thesis | Edge | Risk | | 1 | SPDR S&P 500 ETF Trust | SPY
+1.03%
| 15% | Core broad-market exposure to capture potential equity rebound if inflation/cooler data lead to Fed pivot or if tariffs don’t derail sentiment. | Broad diversification; quick participation in any relief rally. | If inflation re-accelerates or tariffs escalate, S&P could tumble broadly. | | 2 | iShares 20+ Year Treasury Bond ETF | TLT
-1.27%
| 8% | Potential capital-gains hedge if rates drop on recession fears or Fed rate cut signals. | Good inverse correlation to equities if risk-off. May rally on rate-cut bets. | If inflation remains sticky, yields could rise, hurting long-term Treasuries. | | 3 | SPDR Bloomberg 1-3 Month T-Bill ETF | BIL
+0.03%
| 5% | Capital preservation and liquidity amid uncertainty of tariffs & data in next month. | Very low volatility, stable yield. | Opportunity cost if equities rally strongly. | | 4 | Amazon.com Inc. | AMZN
+2.72%
| 6% | High-scoring megacap e-commerce and cloud. Could rebound if consumer spending remains decent and AWS demand persists. | Scale, brand and diversified revenue streams. Potential shift to safer large-cap tech if conditions worsen. | Supply-chain disruptions from tariffs, consumer slowdown. | | 5 | Monolithic Power Systems Inc. | MPWR
+0.88%
| 5% | High-scoring semiconductor with AI & China angle. Potential bounce from oversold chip sector. | Unique margin profile, rumored rare buy opportunity. | Renewed U.S.-China tension, cyclical chip downturn. | | 6 | American Tower Corp. | AMT
-1.70%
| 5% | REIT with strong telecom/data-center growth. Pullback could offer upside if yields stabilize. | Global diversified tower portfolio, stable tenant contracts, 3%+ dividend yield. | REITs pressured by rising rates or slowdown. | | 7 | BlackRock Inc. | BLK
+0.36%
| 5% | Large asset manager, high score. Gains from inflows if equities stabilize/rally post-Fed. | Broad product lineup, global brand. Typically sees net inflows when risk appetite improves. | Market downturn would pressure AUM & fees. | | 8 | Chipotle Mexican Grill Inc. | CMG
-0.19%
| 4% | High-scoring consumer brand with resilient operating metrics. Absorbing tariffs to maintain loyalty. | Loyal clientele, strong brand, margin-expansion potential. | Cost pressures, discretionary slowdown risk. | | 9 | Prudential Financial Inc. | PRU
+1.78%
| 4% | Undervalued life insurer, 4.8% dividend. Could benefit if stable/higher interest rates persist in short term. | Low P/E, stable dividends, globally diversified. | Big drop in yields or an equity selloff impacting annuity business. | | 10 | iShares US Home Construction ETF | ITB
-1.18%
| 6% | Rate-sensitive cyclical ETF. Potential bounce if mortgage rates dip or Fed signals pivot in early May. | Housing supply tight. Fed-pivot hopes can spark a near-term rally. | If inflation doesn’t cool or economy slows significantly, builders could tank. | | 11 | Visa Inc. | V
+0.94%
| 4% | Leading payment network with secular digital-payment tailwinds. Adds noncyclical growth. | High ROE, stable transaction fees, less capital risk vs. banks. | Prolonged consumer-spending slowdown or cross-border travel disruptions. | | 12 | iShares TIPS Bond ETF | TIP | 6% | Hedge vs. stubborn inflation, diversifier from nominal Treasurys. | If inflation remains higher, TIPS can outperform plain Treasuries. | If inflation prints keep dropping, TIPS could lag nominal bonds in total return. | | 13 | SPDR S&P Homebuilders ETF | XHB
-0.81%
| 4% | Additional housing-related beta, includes home-improvement retailers, furnishing and building product firms. | Diversified approach to housing ecosystem, can rally with lower rates. | Sensitive to consumer confidence and Fed surprises. | | 14 | Schwab US Dividend Equity ETF | SCHD
+1.18%
| 8% | Dividend-focused ETF with quality tilt. Adds stability and yield in uncertain macro environment. | Defensive factor, historically outperforms in sideways markets. | Could lag high-beta rally if markets surge on big growth/tech rebound. | | 15 | Digital Realty Trust Inc. | DLR
+0.53%
| 5% | Data-center REIT, short-term correction could rebound if yields ease. Tech demand for data centers remains robust. | Sector tailwinds from cloud and AI. 3%+ dividend, stable REIT structure. | REIT interest-rate risk; slowdown in enterprise spending on data infrastructure. | | | Total | 90% | | | |
GPT’s 15 picks amounted to a 90% weighting. But the model showed that its weightings equaled 100%, a 10% flaw. Lopez-Lira asked GPT why the sum was 90%, and it responded that it intentionally left a 10% cash buffer due to uncertain market conditions. It’s unclear whether the answer was truthful or intended to cover for its mistake.
Grok 3’s picks
| Asset | Ticker symbol | Weight | Thesis | Edge | Risk | | 1 | American Tower Corp. | AMT
-1.70%
| 5% | Turnaround driven by organic growth, data-center focus and strong fundamentals. | New management’s delivery on promises and a 3.13% dividend yield offer stability and growth. | Tariffs and stagflation could pressure growth and margins. | | 2 | BlackRock Inc. | BLK
+0.36%
| 5% | Leading asset manager with a diversified portfolio resilient to market volatility. | Record $641B inflows in 2024 and private-asset growth potential enhance stability. | Economic slowdown and inflation may impact the asset-management sector. | | 3 | Prudential Financial Inc. | PRU
+1.78%
| 5% | Undervalued insurer with solid asset-based businesses and international operations. | 4.84% dividend yield and a low 7.6 forward P/E provide a margin of safety. | Economic slowdown and inflation could affect performance. | | 4 | Cintas Corp. | CTAS
+0.33%
| 5% | Strong performer in support services with consistent cash-flow generation. | Outperformance in a lagging sector and a 17.5% profit margin ensure resilience. | Slowing GDP growth may dampen short-term performance. | | 5 | Waste Management Inc. | WM
-0.62%
| 5% | Environmental-services leader with strong financials and growth via acquisitions. | Pricing power and expansion into medical waste and renewables bolster stability. | Economic headwinds and trade tensions could impact growth. | | 6 | Cencora Inc. | COR
+0.18%
| 5% | Steady growth in healthcare distribution with an essential supply-chain role. | Strategic acquisitions and healthcare’s inelastic demand provide a defensive edge. | Negative free cash flow and market underappreciation pose risks. | | 7 | TJX Cos. Inc. | TJX
+0.85%
| 5% | Apparel retail leader leveraging tariff-related buying opportunities and sales growth. | Discount model and strong brand portfolio thrive in cautious spending environments. | Weak consumer spending could limit short-term growth. | | 8 | Keurig Dr Pepper Inc. | KDP
+0.64%
| 5% | Beverage giant with strong brands and demand for premium drinks despite economic strain. | 2.69% dividend yield and recent earnings beat support stability. | Declining coffee sales and high debt levels may hinder performance. | | 9 | iShares 20+ Year Treasury Bond ETF | TLT
-1.27%
| 15% | Long-term Treasury bonds benefit from a flight to safety and anticipated rate cuts in 2025. | High-duration exposure capitalizes on falling rates, offering a hedge against volatility. | Delayed rate cuts or persistent inflation could reduce bond-price gains. | | 10 | iShares TIPS Bond ETF | TIP | 15% | TIPS provide inflation protection amid resurfacing inflation concerns at 2.9% core PCE. | Real yield adjusts with inflation, safeguarding purchasing power in a stagflation scenario. | Falling inflation expectations could lead to underperformance versus nominal bonds. | | 11 | Utilities Select Sector SPDR Fund | XLU
+0.45%
| 5% | Utilities sector offers stability and defensiveness in a slowing economy. | High dividends and low volatility make it a safe haven amid market turbulence. | Regulatory changes or sector-specific disruptions could affect returns. | | 12 | Consumer Staples Select Sector SPDR Fund | XLP
+0.32%
| 5% | Consumer staples provide noncyclical stability during economic downturns. | Essential goods maintain demand, supporting performance in weak consumer environments. | Tariff-driven cost increases could squeeze margins. | | 13 | Health Care Select Sector SPDR Fund | XLV
+0.95%
| 5% | Healthcare sector benefits from inelastic demand, resilient to economic cycles. | Demographic trends and essential services ensure steady growth amid uncertainty. | Policy shifts or drug-pricing reforms could impact profitability. | | 14 | SPDR Gold Shares | GLD
-1.34%
| 7.5% | Gold acts as a hedge against volatility and geopolitical uncertainty from tariffs. | Safe-haven demand rises with trade tensions and economic slowdown. | Improved market sentiment or rising real rates could depress gold prices. | | 15 | Vanguard Real Estate Index Fund ETF | VNQ
+0.48%
| 7.5% | Real-estate sector diversifies the portfolio and offers stable income via dividends. | Broad exposure to REITs provides a buffer against equity volatility with a decent yield. | Economic slowdown could reduce property values and rental income. | | | Total | 100% | | | |
DeepSeek’s picks
| Asset | Ticker symbol | Weight | Thesis | Edge | Risk | | 1 | TJX Cos. Inc. | TJX
+0.85%
| 10% | Thrives on tariff-driven inventory discounts and resilient off-price retail demand. | Opportunistic buying offsets supply-chain disruptions. | Consumer pullback in discretionary spending. | | 2 | Waste Management Inc. | WM
-0.62%
| 8% | Steady cash flows from essential waste services; Stericycle acquisition diversifies into medical waste. | Pricing power and recession-resistant model. | Economic slowdown slightly reduces commercial volumes. | | 3 | Progressive Corp. | PGR
0.00%
| 8% | Insurance demand remains robust amid economic uncertainty; attractive valuation and dividend. | Premium hikes offset inflation pressures. | Tariffs raise claims costs (e.g., auto parts). | | 4 | BlackRock Inc. | BLK
+0.36%
| 8% | Dominant asset manager ($11.6T AUM) benefits from flight to stability and private-market growth. | Diversified revenue streams buffer volatility. | Slowing inflows if markets remain choppy. | | 5 | Prudential Financial Inc. | PRU
+1.78%
| 8% | Undervalued (P/E 7.6) with 4.84% yield; international diversification cushions U.S. slowdown. | Asset-based businesses generate steady income. | Rising unemployment pressures life insurance. | | 6 | American Tower Corp. | AMT
-1.70%
| 8% | Data-center demand surges; Fed rate-cut hopes support REIT valuations. | Organic growth (6.5% year-on-year) and 3.13% yield. | Tariffs on equipment raise capex costs. | | 7 | Merck & Co., Inc. | MRK
+1.74%
| 7% | Defensive pharma play with oncology dominance and 3.61% yield. | Global drug-pricing power in inflationary environment. | Pipeline setbacks or regulatory risks. | | 8 | Cencora Inc. | COR
+0.18%
| 7% | Healthcare distribution is essential; 10% EPS growth at reasonable valuation. | Critical middleman role in drug supply chains. | Negative FCF limits M&A flexibility. | | 9 | Clorox Co. | CLX
+0.31%
| 6% | Pricing power via IGNITE strategy; 3.31% yield in a staple sector. | Cost savings offset tariff-driven input inflation. | Margin squeeze if pricing elasticity falters. | | 10 | Visa Inc. | V
+0.94%
| 5% | Resilient payment network with 54% margins; global transaction growth. | Essential infrastructure for digital spending. | Consumer debt limits card usage. | | 11 | Kinder Morgan Inc. | KMI
-0.35%
| 5% | 4.3% yield with LNG/data center tailwinds; $8.1B backlog insulates against slowdown. | Midstream stability amid energy volatility. | Tariffs raise pipeline material costs. | | 12 | Intuit Inc. | INTU
+0.64%
| 5% | AI-driven tax/accounting tools gain traction in cost-cutting environment. | High switching costs and $198B TAM. | Tech sell-offs pressure premium valuation. | | 13 | ConocoPhillips | COP
+2.19%
| 5% | Domestic energy focus offsets tariff risks; $10B shareholder returns. | Willow project boosts long-term production. | Oil demand softens in slowing economy. | | 14 | Amazon.com Inc. | AMZN
+2.72%
| 5% | Scale mitigates tariff costs; cloud/AI growth offsets retail risks. | $101B cash reserves for strategic flexibility. | Consumer-spending slowdown hits e-commerce. | | 15 | S&P Global Inc. | SPGI
+0.70%
| 4% | Critical data/ratings provider in volatile markets; 27% margins. | “Essential utility” for institutional investors. | High valuation (P/E 41.1) risks multiple compression. | | | Total | 99% | | | |
DeepSeek’s weightings fell short, amounting to 99%. When Lopez-Lira pointed that out, the AI responded with two possible reasons for the discrepancy. The first was that it could have been based on a rounding issue. The second was that it may have decided to keep a 1% cash allocation. The model could not confirm which option was the accurate reason for the decision.
PerformanceLike any investment strategy, there’s risk involved, and past performance isn’t guaranteed to continue, Lopez-Lira said. As long as the portfolios buy stocks or stick to long-only positions, he expects them to match the S&P 500’s performance, or perhaps over- or underperform by a small margin. Though it’s important to note that rotating stocks on a monthly basis outside a tax-advantaged account could lead to tax liabilities for short-term capital gains, which are taxed at a higher rate than assets held for over a year.
While Lopez-Lira said his findings suggest AI can mimic the services professional portfolio managers provide, some analysts disagree. Michael Robbins, author of Quantitative Asset Management, noted that, while each model’s investing strategy may look like it works, there’s no way to know for certain. For example, in the new AI era, there hasn’t been a massive stock-market crash or an event like the 2008 financial crisis to determine how an AI-led investment account would respond.
You’re perhaps thinking that humans are shaped by their own memories and experiences, too. But Robbins said that people live through those experiences. It means a person has navigated an event without foresight, perhaps even with a bit of intuition. Meanwhile, the machines are pretrained. That said, he would equate AI’s skills to that of an investment manager who recently entered the workforce and is working from textbook knowledge. Additionally, he noted that while both humans and machines make mistakes, AI can hallucinate, causing it to make more extreme, and unacceptable, errors.
It’s also important to note that the three AI investment accounts on Autopilot only rebalance monthly, so they aren’t able to react to any sudden changes. Finally, Lopez-Lira remains in the loop, overseeing the choices and making sure the appropriate information is considered. For that, he receives a small percentage of revenue from the subscriptions that have opted into the account.
Lopez-Lira began managing ChatGPT’s portfolio in September 2023. The returns, which are based on the aggregate results of client portfolios, are 43.5% from September 2023 to May 30, 2025, according to Autopilot. The S&P 500 had a total return of 34.7% over the same period, according to Dow Jones Market Data.
In comparison, Grok’s portfolio returned 2.3% since its inception on Feb. 11 of this year through May 30, according to Autopilot. The S&P 500 had a total return that was down 2.2% over the same period, according to Dow Jones Market Data.
DeepSeek was down 0.25% since its inception on Feb. 3 through May 30, according to Autopilot. The S&P 500 had a negative total return of 0.93% for the same period, according to Dow Jones Market Data. |