It is 7:14 a.m. on a Tuesday. An e-commerce operations manager in Vilnius opens her laptop and starts opening tabs. Shopify admin. Google Merchant Center. Meta Commerce Manager. The Allegro seller panel for the Polish market. The Pigu.lt account dashboard. Klaviyo. GA4. Search Console. Trustpilot. A spreadsheet from a supplier with new wholesale prices, sent at 11 p.m. Slack. Two inboxes. The 39 customer service tickets that came in overnight.
She counts because she is the kind of person who counts. Twenty-six tabs.
Two ad sets are pacing past budget. A supplier missed a delivery window. A product feed has 84 disapprovals she has not opened yet. A competitor on Pigu dropped its price on the store's bestselling SKU some time over the weekend, and nobody knows when. Her CEO wants the monthly performance review by lunch.
She has about ninety minutes before her 9 a.m. stand-up. She opens ChatGPT in a fresh tab and starts typing.
That is the visible version of her morning. The invisible version matters more.
While she opens tabs, eight things that should run on a schedule are not running. A daily competitor price diff across the top 40 SKUs. A feed health audit across Google Shopping, Meta Catalog, and the four marketplaces the store sells on. A schema and pixel check on the 200 highest-traffic product pages. A review sweep across Google, Trustpilot, and each marketplace, flagging anything below three stars from the last 24 hours. A stockout-versus-ad-spend cross-check showing which campaigns are still spending on SKUs that went out of stock on Friday. A returns reason-code tally for last week. A cart-abandonment pattern check across the previous seven days. A monthly customer-cohort retention number nobody has looked at since February.
None of these jobs is hard. Each one would pay for itself many times over inside a quarter. None is running.
This is what people inside an e-commerce business mean when they say we know we should be doing more. They are right. There is a backlog of work the team has quietly given up on, because there are not enough hours in the day for one person to do it by hand.
Codex App, the desktop application from OpenAI launched in February 2026 and given a major upgrade in April, is what makes that backlog affordable.
EcomExpo's Agentic Commerce Playbook covers one half of the picture: how AI agents are starting to shop, and what your store needs (feeds, JSON-LD, robots.txt, llms.txt) to be picked when they do. This article is the other half. Agents are also starting to run your operations. The operators who understand both halves earlier than their competitors will compound for two years before anyone else catches up.
What This Article Argues
Chat-based AI made the work your team was already doing faster. Codex App does the work your team was never going to do at all. That second category is bigger than the first.
The Shape of E-Commerce Work
E-commerce work has three properties that a chat window cannot match.
It is plural. There is never one task. Twenty SKUs need price checks, six campaigns need creative refreshes, four marketplaces need listing audits, three suppliers need stock updates, eighteen customer tickets need triage, and the CEO wants a number by lunch. The work fans out.
It is persistent. Last week's stockout is this week's lost ad spend. The pricing change you made on Allegro on Tuesday needs to stay in sync with Pigu and your Shopify storefront on Wednesday. The brand voice you settled on in March is the one you defend in October. An operation without memory is one where you explain the catalogue to your own helper every Monday.
It is parallel. While ad copy is drafted, a feed audit can be running. While the feed audit runs, a competitor scan can read public marketplace pages. While the scan reads, the weekly review assembles itself for the 10 a.m. call. None has to wait for the others.
A chat window is the opposite. One conversation at a time. Memory that lasts a session, not a week. You ask, it answers, you ask again. You are the clock.
A single chat thread can be made to look plural by opening fifteen tabs. It can be made to look persistent by pasting your catalogue at the top of every conversation. It can be made to look parallel by having three browser windows open. What this gets you is the operations manager from the opening scene, doing the visible work twice as fast and the invisible work not at all.
The Four Ideas Worth Understanding
Codex App ships with four primitives. Learn the words. The vocabulary changes how the work feels before any configuration does.
Skills
A skill is a packaged, reusable workflow with a name. Your competitor-price-diff skill encodes how your team thinks about competitor pricing. Which 40 SKUs matter most. Which sources to check. How to flag a meaningful move versus a daily wobble. What to ignore. Write it once, with examples. Anyone on the team — and any agent — runs the same skill by name. No more pasting the same six instructions every Monday morning.
Automations
A scheduled or event-triggered run of a skill. Hourly, daily, weekly, or wake up when this happens. A nightly competitor price diff that drops a one-page note into Slack at 7 a.m. A feed health audit that fires every time Google Merchant Center pushes a disapproval. A weekly review skeleton that builds itself at 4 p.m. on Friday and waits for the operations manager to add commentary on Monday. The work runs whether anyone is at the desk.
Computer use
Since the April 2026 update, the agent can see a screen, click, and type in any application a human can use. This matters in e-commerce because the long tail of seller platforms, regional marketplaces, and supplier portals does not have clean APIs. The agent can log into the Allegro seller panel, pull last week's order data, screenshot a report, and paste the number into the right cell of the right sheet. The Monday check nobody had time to do gets done every Monday.
One caveat. As of May 2026, OpenAI's documentation indicated computer use was not generally available in the EEA, the UK, or Switzerland at launch. If you operate inside those regions, confirm the current status against your account before designing a workflow around the capability.
Parallel threads
Multiple agent conversations running at once, each with its own context. Three campaigns, three threads, no contamination. While one thread waits on a supplier reply, two others keep moving. The bottleneck used to be that a person can only do one thing at a time. Now the bottleneck is the operator's review queue. That is a different problem and a better one to have.
Two more capabilities ride on top. The April update added a plugin ecosystem with more than 90 connectors covering most of the typical e-commerce stack, plus an in-app browser for public pages. And a memory layer that holds what is true about your team, products, and clients across sessions. You stop explaining the catalogue every Monday.
Five Automations to Ship in Your First Month
Most teams that start with Codex App build the same five jobs first because the payback on each is obvious. The list below is operator-tested. Build them in roughly this order.
Nightly competitor price diff across your top SKUs
The agent checks each tracked competitor URL or marketplace listing once a day, compares against yesterday's snapshot, and posts a one-screen Slack message at 7 a.m. listing only what moved. Quiet on most days. Loud on the days that matter. The price check that used to happen quarterly now happens every night.
Daily product feed health audit
Google Merchant Center disapprovals. Meta Catalog rejections. Marketplace listing errors. The agent pulls each feed status, groups errors by severity, and posts a digest with the worst offenders linked. A hero SKU dark on Google Shopping for two weeks usually does not announce itself; it shows up as a missing number in the next monthly report. This automation catches it on day one.
Weekly review sweep across rating sources
Google Business, Trustpilot, Allegro reviews, Pigu reviews, Amazon if you sell there. The agent surfaces every rating under three stars in the past seven days, with the review text, the product, and a suggested response drafted in your store's voice. The reply still gets a human pair of eyes. The queue arrives sorted.
Schema, pixel, and llms.txt audit on your top pages
Tracking pixels stop firing about once a quarter at any store that ships landing pages. You usually find out when a number goes missing in a report a client cares about. The agent visits each of your top 50 pages weekly, checks GA4, GTM, Meta Pixel, and the schema markup for product, review, and offer types, and flags anything broken. In 2026, that audit should also check for an up-to-date llms.txt and the structured data that shopping agents read, which the EcomExpo playbook covered in April.
Continuous anomaly alert on conversion and revenue
Every fifteen minutes, the agent watches conversion rate and revenue against a rolling 28-day baseline. If a metric breaks the band by 25% or more, you get a Slack message with the account, the metric, and a one-line suspected cause. The agent does not pause anything. It does not change a budget. It tells a human what just happened, who decides. A broken purchase pixel found at 03:14 on a Saturday and fixed at 09:00 on Sunday is the kind of save that pays for the entire stack for a quarter.
None of these jobs publishes, spends, sends, or posts anything externally. Each produces information for a human who then decides. That is by design.
The Line Worth Holding: Reversible vs. Irreversible
Every action an agent can take is either reversible or irreversible. The line is not high-stakes versus low-stakes. A wrong screenshot is reversible: take a new one. A wrong send is not.
Three categories belong on the human side of the line until a specific, written reason moves them.
- Publishing anything that leaves your control once it leaves your machine — a social post, a price change on a public listing, a marketplace listing update.
- Spending — increasing a daily ad budget past a threshold, switching a bid strategy, launching a new campaign.
- Sharing externally — an email to a supplier, a reply to a customer review, a report to an investor.
The control is not in the prompt. Telling an agent do not update the wrong listing is guidance, not a guardrail. The control is in how the skill is built (one isolated working directory per marketplace, one explicit account ID per run) and in an approval gate that holds before any external action. The cheapest insurance your operation will buy this year.
What This Looks Like After Three Months
The operations manager from the opening scene is at the same desk on a Tuesday morning. She does not open tabs. She opens Codex App, which looks less like a chat product and more like the back office of a small kitchen at 6 a.m. Prep done. Mise en place. The day staged.
She reads the overnight competitor price report. Two SKU moves worth a price response, one needs a meeting. She reads the feed health digest. Eleven disapprovals, all addressable today. She reads the review sweep. One angry one-star on Trustpilot deserves a personal reply within the hour, drafted in her store's voice and waiting for her edit. She reads the schema and pixel audit. Nothing broken this week. She reads the anomaly log. Conversion rate dipped on one campaign Saturday night, caught at 02:11, paused by her at 09:00 Sunday, recovered.
It is 7:55. She has read five short reports. Four of them are work the store never used to do. One is work the team used to do badly, once a quarter, when somebody remembered.
She spawns three threads for the morning. One drafts the monthly performance review in the store's voice using last month's data. One uses the in-app browser to pull comparison screenshots from two competitor product pages for the 10 a.m. category review. One queues responses to seven customer service tickets that the agent has already triaged into a single template each.
While those run, she works on the work that requires her. The strategic question the CEO wanted answered before lunch. The supplier negotiation. The conversation with the marketing lead about which 200 SKUs the next ad-creative refresh should focus on. Three agents are doing the work that used to swallow her morning. Five more are doing the work she never had time for at all.
She leaves at six.
What to Do This Week
If your team already pays for ChatGPT Plus, Pro, Business, or Enterprise, you already have Codex App at no extra cost. Download it from the OpenAI Codex page, sign in with the same account, and you are at the start line.
Three short moves for this week, regardless of when you start building.
- Write down the five operational checks you wish someone on your team ran every week and nobody does. Be specific. Not monitor competitors — check Allegro and Pigu listings on the top 30 SKUs against ours and flag any price gap over 8%. The list of five is your build order for the first month.
- Write down the brand voice rules a new junior should follow when answering a one-star review on Trustpilot. Three short paragraphs. Examples of good and bad. That document is the seed of your first skill.
- Decide where the line between reversible and irreversible lives in your operation. Anything past it stays human. Write the list down. Print it. Show it to the person who will hold the line when the agent gets clever.
The 2027 e-commerce stack will have an agent layer underneath every operations job. The question is not whether. The question is who builds the layer first inside your company. EcomExpo on October 1 in Vilnius is one of the rooms where Baltic operators will compare notes on what is working. Be one of the operators with notes to share.
SCALE or FAIL — the next move is yours
EcomExpo 2026 on October 1 in Vilnius covers agentic commerce end-to-end: the agents that shop, the agents that run operations, and the operator moves that decide who wins the next 24 months. 600–800 attendees, two stages, an expo hall, and the first EcomExpo Awards across 10 categories.
Claim your ticketOctober 1, Samsung Conference Center, Tech Zity, Vilnius