AI Agent Cost Models Explained: Per-Task vs Capability vs Flat

The cost-model question is where AI agent platforms separate from one another more than the technology does. Two platforms can run the same model on the same task at roughly the same reliability and present radically different bills, because the cost model is upstream of unit economics on both sides. This piece walks through the three dominant models, the buyer and vendor incentives each creates, and where each breaks.

The underlying cost driver, at the vendor side, is model tokens. McKinsey estimated AI agent unit costs at scale to be dominated by inference cost as of late 2025, with infrastructure overhead a distant second (McKinsey QuantumBlack research, retrieved 2026-05-07). The translation from token cost to buyer-facing price is what creates the model differentiation.

The three cost models

The naming varies; the structure does not. Every AI agent vendor uses one of three models or a hybrid, and the choice has implications for which buyers the vendor attracts and how the vendor's economics work.

Per-task wins on revenue alignment for the vendor; flat wins on predictability for the buyer; capability sits in the middle and rewards efficient use.

Per-task pricing

Per-task pricing meters every agent execution. The buyer is charged per run, often with the run priced as a function of model tokens consumed. The model maps cleanly to the vendor's underlying cost: every task the buyer runs adds inference cost, and the buyer pays for that cost plus a margin.

The advantage is alignment. The vendor and the buyer both pay for usage; if the agent runs nothing, nobody owes anything. Per-task is also the natural model for developer-facing API tiers, where the buyer expects metered pricing and is comfortable estimating usage.

The disadvantage is predictability. Token use varies by task: a sales follow-up that requires three tool calls costs less than one that requires twelve. Buyers who cannot predict their usage end up either overpaying for capacity they did not use or, worse, rate-limiting their own automation to avoid overruns. The economics shows up sharply at the operator tier, covered in economics of bootstrapped AI agents: a buyer who self-limits their agent's runs is getting less value than the vendor's pricing implies, but they cannot see that until the bill arrives.

Per-task pricing also has a reputation problem in 2026. Buyers who came from cloud-bill horror stories of the late 2010s have an allergic reaction to anything that smells like consumption-based pricing without a hard cap. Vendors who use per-task usually pair it with budget alerts, hard caps, or both.

Capability-based pricing

Capability-based pricing charges a fixed price per capability the buyer turns on, irrespective of how many times that capability runs. A sales-follow-up agent is one capability; a lead-enrichment agent is another; a meeting-scheduling agent is a third. Use within each capability is unlimited or capped at a level most buyers do not hit.

The model has an unusual property: it rewards the vendor for shipping new capabilities, not for buyers running existing capabilities harder. That changes vendor incentives in a productive direction. Per-task vendors are tempted to make agents chatty (more tool calls, more tokens, more revenue per run); capability-based vendors are tempted to make agents efficient (cheaper to run inside a fixed price means higher margin) and to invest in building more capabilities (each new capability is a price expansion).

The disadvantage is fit. Some workloads do not fit cleanly into "capabilities". A research-and-write agent that does many different things is hard to slot into one capability; trying to split it into multiple capabilities feels artificial. Capability-based pricing works well when the agent's value is tied to a recognisable use case and less well when the agent is a generalist.

Flat subscription

Flat subscription is what most B2B SaaS buyers expect from any tool that is not pure infrastructure. A monthly or annual fee covers use up to a fair-use cap, and the buyer's budget line is the same every month.

Flat works for the buyer because it removes the variable that procurement teams hate most: surprise bills. It works for the vendor only when the vendor understands its cost distribution well enough to price the cap correctly. Vendors that price flat without knowing their cost distribution either lose money on the heaviest users or have to introduce unannounced caps, which both produce churn.

The flat-subscription option for AI agents typically sits at the operator tier: an individual or small team pays a fixed monthly amount, gets agents that do a defined set of jobs, and does not think about usage. The vendor handles the cost distribution by pricing the cap at the 80th percentile of usage and accepting that the top 20% of users are loss-making per dollar but valuable for product feedback.

The pricing approach also interacts with deployment model, covered in AI agent deployment models explained: self-hosted deployments are usually flat (the vendor licenses the software, the buyer pays for their own infrastructure), while cloud-hosted deployments more often run per-task or capability-based.

How to pick: a buyer-side rule

The choice depends on which constraint binds.

If your usage is variable and unpredictable, flat subscription removes the variance and lets you plan.
If your usage is steady and you want price transparency, per-task gives you a clean unit cost.
If your usage maps to clear use cases, capability-based gives you predictable cost per use case and rewards you for not running needless tasks.

The vendor side mirrors the buyer side. Vendors that target operators (small teams, individual buyers) lean flat. Vendors that target platform builders lean capability-based. Vendors that target developers or API consumers lean per-task. A vendor that picks the wrong model for its target buyer creates friction that shows up as long sales cycles and high churn.

Gravity is positioned at the operator tier. The pricing approach has to match the buyer profile: someone who wants an autonomous agent running for them in 60 seconds (the same buyer described in describe outcome, not workflow) does not want to be metered. The cost-model decision is downstream of who you are selling to, not upstream of it. The framework I learned the hard way across three startups is that pricing follows positioning; reversing the order is how you end up with high CAC and unhappy buyers.

Frequently asked questions

What are the main AI agent cost models?

Three primary models: per-task (you pay for each agent execution, often metered by token use), capability-based (you pay per capability or per agent template, regardless of execution count), and flat subscription (you pay a fixed monthly fee for unlimited or capped use). Each aligns vendor and buyer incentives differently.

Which AI agent pricing model is most predictable?

Flat subscription is most predictable for the buyer; capability-based is second. Per-task pricing is least predictable because token use varies by task complexity. Buyers planning budgets quarter-by-quarter usually prefer flat or capability-based; buyers running highly variable workloads often pick per-task to avoid overpaying for unused capacity.

Why do some AI agent vendors charge per task?

Per-task pricing aligns vendor revenue with buyer usage and matches the underlying token-based cost the vendor pays to model providers. It works well when usage scales smoothly. It works less well when buyers cannot predict usage, because budget overruns become common and the buyer ends up rate-limiting their own automation.

What is capability-based AI agent pricing?

Capability-based pricing charges a fixed price per capability the buyer turns on, often per agent template or per use case. Sales-follow-up agent: one price. Lead-enrichment agent: another price. Use within a capability is unlimited or generously capped. The model rewards the vendor for shipping useful capabilities, not for buyers running tasks needlessly.

Which AI agent cost model wins in 2026?

Likely capability-based at the platform tier and flat at the operator tier, with per-task surviving at the API tier where buyers want explicit metering. The trend reflects buyer fatigue with surprise bills and vendor desire for predictable revenue. Pure per-task pricing is hard to defend as the headline model when buyer budgeting cycles are quarterly or annual.

Three takeaways before you close this tab

Per-task aligns revenue with usage; flat aligns the bill with the budget. Capability-based sits between.
Vendors who pick the wrong model for their buyer end up with high CAC and high churn. Pricing follows positioning.
The 2026 dominant pattern is hybrid: flat or capability for end-buyer plans, per-task at the API tier.

Sources

McKinsey QuantumBlack, "AI agent unit economics research", retrieved 2026-05-07, mckinsey.com/capabilities/quantumblack
Bessemer Venture Partners, "State of the Cloud 2025", retrieved 2026-05-07, bvp.com/atlas/state-of-the-cloud-2025
a16z, "AI agent business models", retrieved 2026-05-07, a16z.com/ai
Gravity team, "Gravity pricing-model analysis", internal v1, May 2026, About