This startup is betting tokenmaxxing will create the next compute giant

by | Apr 15, 2026 | Technology

“Give me tokens. Just give me tokens. I want them fast. I want them cheap. I want them now.”

That’s the mantra for developers building software on generative AI models, or at least what Parasail CEO Mike Henry hears. Parasail provides a cloud computing service to companies running AI models for inference, and Henry told TechCrunch it generates 500 billion tokens a day. How’s that for tokenmaxxing?

Henry was an executive at Groq, the LLM-focused chipmaker, where he built the company’s cloud offering, an early recognition that developers building software on AI models would want cloud processing specialized to their needs. Now, after coming out of stealth a year ago, Parasail has raised a $32 million Series A to do that at scale.

Henry has a background in physical chip design, but Parasail isn’t committed to owning its own chips. While some of its GPUs are its own, the company mainly rents processing time at 40 data centers in 15 countries around the globe, and buys more from liquidity markets, orchestrating that all behind the scenes to drive down the cost of inference requests.

By allocating workloads cleverly and avoiding demand peaks, the company aims to compete with firms that own their own silicon and might be constrained by existing customer commitments and workloads.

The company’s potential relies on the continued proliferation of open-source models and agents outside of frontier labs. Parasail’s executives and investors say this is driven by the growing cost and friction of using offerings from companies like Anthropic and OpenAI.

Instead, a hybrid architecture is emerging, according to Andreas Stuhlmüller, the CEO of Elicit, a startup that has raised a $22 million Series A to develop a research assistant for scientific literature. His customers at top pharmaceutical companies use the LLM-based tool to review and ana …

Article Attribution | Read More at Article Source