DeepSeek Just Added Eyes. It's Free, It's Fast, and Open-Source Multimodal AI Just Got Real

DeepSeek quietly turned on Vision — multimodal image understanding — on chat.deepseek.com sometime around June 18, 2026. No blog post, no press release, no X thread from a founder. Just a feature that appeared, and a Hacker News thread that hit 63 upvotes and 26 comments in under two hours.

🔍 THE BOTTOM LINE

DeepSeek has been text-only since launch. Adding vision puts it head-to-head with GPT-4o and Claude 3.5 Sonnet’s image capabilities — but free on the chat interface, and with the company’s track record of open-weighting its models, this could be the first genuinely competitive open-source multimodal model. No API yet. When it comes, based on DeepSeek’s pricing history, it’ll likely undercut OpenAI’s vision API by 75% or more.

What Changed

The Vision feature is live on chat.deepseek.com right now. Users can upload images and the model understands them — identifying objects, reading scenes, answering questions about photos. One HN commenter who tested it wrote: “It’s really good and fast. Have tested with bunch of odd photos on what is happening. Overall the training set seems large enough to know what’s what and where.”

Another commenter noted the feature has been in A/B testing in China “for a while” — meaning this isn’t a fresh build but a quiet public rollout of something that’s been cooking internally. The global deployment is the news, not the capability existing.

The critical gap: there’s no API yet. Multiple HN users are asking for it in the comments, and the reason matters. One developer pointed out that if DeepSeek adds vision to its API, it could fully drive the Claude Agents SDK and Claude Code — which currently require a vision-enabled model. Right now, that means paying for Claude or GPT-4o vision. A cheap DeepSeek vision API would change that equation entirely.

Context

DeepSeek’s trajectory makes this less surprising than it looks. The company’s V4 model matched GPT-5.5 on benchmarks at 86% less cost and was fully open-source. Their Reasonix coding agent hit 8K GitHub stars by engineering around prefix-cache mechanics. And they’ve been running a permanent 75% discount on API pricing compared to OpenAI — a price war that’s already forced US labs to respond.

Adding vision is the next logical gap to close. GPT-4o, Claude 3.5 Sonnet, and Gemini all have multimodal capabilities. DeepSeek didn’t. Now it does. The pattern is consistent: DeepSeek ships the capability, makes it free on the chat interface, then opens the API at a fraction of the competition’s price.

The geopolitical backdrop is unavoidable. US labs have been uniting against Chinese AI competition, and the Trump administration’s export controls on AI chips are partly aimed at slowing China’s frontier model development. A free, fast, potentially open-source vision model from a Chinese lab landing without warning is exactly the kind of move those controls are supposed to prevent — and clearly haven’t.

NZ Angle

Any Kiwi with a browser can use this right now, for free. No API key, no paywall, no waitlist. For NZ developers building apps that need image understanding — document scanning, visual search, accessibility tools — this is a free alternative to OpenAI’s vision API that costs per request.

When the API drops, the economics get sharper. DeepSeek’s API pricing has historically been 75% below OpenAI’s. If vision API pricing follows the same pattern, a Kiwi startup processing 100K images a month could see their bill drop from roughly $2,000 to $500. That’s the difference between “we can’t afford vision features” and “we ship them by default.”

The policy angle: NZ’s AI strategy has to navigate between US export controls (which pressure allies to restrict Chinese AI tools) and the reality that Chinese open-source models are increasingly the best value option. Pretending DeepSeek doesn’t exist doesn’t help Kiwi developers. Acknowledging it and building data sovereignty frameworks around it does.

The Other Side

Three honest caveats. First, this is a chat interface feature, not an API. Until there’s programmatic access, it’s a consumer tool, not a developer platform. Second, “really good and fast” from HN commenters is not a benchmark — we don’t have MMBench, MMMU, or DocVQA scores yet. DeepSeek’s text models have been benchmarked rigorously; vision needs the same. Third, DeepSeek is subject to China’s AI regulations, which require content moderation and alignment with “core socialist values.” How that manifests in a vision model interpreting images — particularly political or sensitive content — is an open question that HN commenters haven’t tested yet.

The Bigger Picture

The multimodal race has been a two-horse game: OpenAI and Anthropic, with Google’s Gemini as the third. DeepSeek entering with a free chat interface and a likely-cheap API changes the dynamic. If the model gets open-weighted — and DeepSeek’s history says it probably will — the open-source community gets its first genuinely competitive vision model. That matters more than any single product launch.

The quiet rollout strategy is also worth noting. No announcement, no press cycle, no founder tweetstorm. Just ship it and let the community discover it. For a company that’s built its brand on capability over marketing, it’s a consistent play — and it worked. 63 upvotes in 2 hours from a community that does its own quality checking is worth more than a press release.

❓ FAQ

Can I use DeepSeek Vision right now? Yes. Go to chat.deepseek.com, sign in, and upload an image. It’s free. No API key needed. The feature is live as of June 18, 2026.

Is there an API? Not yet. HN commenters are actively requesting it. Based on DeepSeek’s pattern with V4 and their coding agent, an API release within weeks is likely. Pricing has historically been 75% below OpenAI.

How does it compare to GPT-4o vision? Anecdotally, HN users say it’s “really good and fast.” There are no published benchmark scores yet (MMBench, MMMU, DocVQA). Until independent benchmarks land, treat the quality claims as promising but unverified.

Could this drive Claude Code or other agentic tools? Yes, once the API is available. Claude Code and the Claude Agents SDK require a vision-enabled model. A cheap DeepSeek vision API would let developers swap in DeepSeek for the vision component while keeping Claude for reasoning — a hybrid that could cut costs significantly.

Does this work in New Zealand? Yes, the chat interface is accessible globally. No geo-restrictions observed. The API, when it launches, should follow the same pattern as DeepSeek’s text API, which is available worldwide.

🔍 THE BOTTOM LINE

DeepSeek didn’t announce Vision. They just shipped it. That’s the story — a Chinese AI lab closing the multimodal gap with US frontier models, for free, with zero fanfare, and the community noticed anyway. When the API lands and the weights drop, the open-source multimodal landscape changes. Until then, it’s a free tool worth testing and a signal worth watching.