Agent Setup11 min readMay 22, 2026

Chemistry MCP for Cursor, DeepSeek and Qwen — Open-Weights AI for Drug Discovery

Not every pharma team can run on Claude or GPT — IT security, data residency, or budget force open-weight stacks. This guide shows how to attach CovaSyn's 130 chemical MCP tools to Cursor, DeepSeek, Qwen and any LiteLLM-routed open-weight model. With concrete configs, the cost math, and where open-weight breaks down for chemistry.

Oliver Kraft

CovaSyn

Chemistry MCP for Cursor, DeepSeek and Qwen — Open-Weights AI for Drug Discovery

Lead

Not every pharma team can route every workload through Anthropic, OpenAI, or Google. IT security policy, data residency rules, budget constraints or simple preference push teams to open-weight LLMs and editor-native agents like Cursor. The same CovaSyn MCP server works there too. This guide is the concrete setup for Cursor, DeepSeek, Qwen, and any LiteLLM-routed model. We also cover the honest limits: open-weight models score lower than frontier-class Claude or GPT, but the structural lift from a deterministic tool layer still applies.

What are chemical MCP tools?

Chemical MCP tools are deterministic cheminformatics functions — ADMET, ICH M7 triage, ICH Q1 stability, NMR / mass-spec interpretation, druglikeness, retrosynthesis hints, biologics developability — exposed to an LLM agent through the Model Context Protocol. Whatever LLM sits in the agent slot, the deterministic tools underneath return the same audit-logged, version-pinned output. The Model Context Protocol is the open standard Anthropic introduced in 2024 and that Anthropic, OpenAI, and the open-source ecosystem now jointly maintain.

For open-weight models the architectural payoff is largest: open-weight LLMs lag frontier-class models on raw chemistry knowledge, but they orchestrate tools well enough that the deterministic-layer lift compensates.

Why Cursor and open-weight models matter for chemistry

Three reasons this stack is worth setting up, even if your team primarily uses Claude or ChatGPT.

First, IT security. The pharma IT-security wall against external cloud is real, especially in the EU. Open-weight models running on your own infrastructure, called via a self-hosted LiteLLM router, plus a self-hosted CovaSyn container, is a fully on-prem stack. No data leaves your premises. For GxP-tight workflows, this is the only architecture some IT functions will sign off on.

Second, cost at scale. A DeepSeek or Qwen instance on a single GPU node costs a fraction of frontier-LLM API calls when usage is heavy. A 100K-compound nightly screening campaign is feasible on this stack; on Opus it's a budget conversation.

Third, Cursor is where engineers already live. If your team is half pharma scientists and half computational chemists, the chemists want a chat surface; the engineers want their IDE. Cursor with MCP tools lets the engineer ask the same chemistry question in the same place where they write the analysis scripts, with the same audit log.

The lift from the deterministic tool layer is structural and applies to any model: 21–41 percent baseline → 85–92 percent with CovaSyn MCP on the ICLR 2026 benchmark for frontier LLMs. Open-weight models start lower than that 21–41 percent and gain the same multiplier — open-weight + CovaSyn MCP is typically in the 65–80 percent band, lower than Opus + CovaSyn but high enough for many production workflows.

How CovaSyn connects chemistry to Cursor and open-weight models

CovaSyn exposes 130 deterministic chemistry tools as an MCP server. The connection surface differs by client:

Cursor has a built-in MCP client. Add CovaSyn in the Cursor settings and the tools appear in the agent panel.
DeepSeek, Qwen and other open-weight models are served through a router (LiteLLM, vLLM, Ollama). The router exposes an OpenAI-compatible chat-completions endpoint with function-calling. Wire the CovaSyn MCP client to that endpoint and the agent can call chemistry tools.
Continue.dev (the open-source Cursor alternative) speaks MCP natively.

Coverage is identical to the Claude / Gemini / ChatGPT setups: 130 tools across drug discovery, biologics, process, analytical and regulatory domains.

A worked tool call

Concrete: a computational chemist using Cursor wants to design a stability study for a development compound.

{
  "tool": "covastab_design_study",
  "arguments": {
    "compound_smiles": "O=C(Nc1ccccc1Cl)c1cccc(C(F)(F)F)c1",
    "guideline": "ICH_Q1A_R2",
    "container_closure": "HDPE",
    "intended_zones": ["I", "II", "IVa"]
  }
}

Tool response:

{
  "study_design": {
    "long_term": { "temp_c": 25, "rh_percent": 60, "duration_months": 24, "timepoints_months": [0, 3, 6, 9, 12, 18, 24] },
    "intermediate": { "temp_c": 30, "rh_percent": 65, "duration_months": 12, "timepoints_months": [0, 3, 6, 9, 12] },
    "accelerated": { "temp_c": 40, "rh_percent": 75, "duration_months": 6, "timepoints_months": [0, 1, 2, 3, 6] }
  },
  "rationale": "ICH Q1A(R2) sections 2.1.7.1 and 2.1.7.2 require all three conditions for Zones I/II/IVa drug substance",
  "audit_id": "cs-2026-05-22-1a2b3c4d",
  "engine_version": "covastab-v2.8.0"
}

Same answer next week, same auditor sign-off path. Open-weight model on the LLM side, deterministic tool layer underneath.

Cost: cents per compound on an open-weight stack

For a stability-plus-mutagenicity assessment on a 100-compound batch:

| Path | LLM cost / compound | CovaSyn / compound | Total | Accuracy | |---|---|---|---|---| | DeepSeek-R1 (open-weight, self-host) | €0.005 | €0.04 | €0.045 | ~70% | | Qwen-3 (open-weight, self-host) | €0.004 | €0.04 | €0.044 | ~68% | | Claude Haiku 4.5 + CovaSyn | €0.001 | €0.04 | €0.041 | ~85% | | Manual cheminformatician | €15 | €0 | €15 | high but slow |

Frontier models hold the accuracy edge. Open-weight stays in the game because the deterministic tool layer underneath does most of the heavy lifting; the LLM only needs to orchestrate.

Where this breaks down

Three honest limits.

Tool-calling reliability. Open-weight models still hallucinate function names and argument schemas more often than frontier models. Solid prompt scaffolding, JSON-schema validation in the MCP client, and strict retry-on-error policies are necessary.
Multi-step reasoning depth. Workflows that chain 6+ tool calls degrade noticeably on open-weight stacks compared to Opus. Break long workflows into shorter, well-scoped sub-agents.
Vision-plus-chemistry. Hand-drawn structures or spectrum images are still best handled by Claude or GPT. Use them for the vision step, then route the resulting structured payload to the open-weight stack for the bulk processing.

Getting started — Cursor

Open Cursor → Settings → MCP → Add Server.

{
  "covasyn": {
    "command": "npx",
    "args": ["-y", "@covasyn/mcp-client"],
    "env": {
      "COVASYN_API_KEY": "sk-cova-…"
    }
  }
}

Restart Cursor. The 130 CovaSyn tools appear in the agent panel.

Getting started — DeepSeek, Qwen via LiteLLM

LiteLLM is the standard router for swapping open-weight models behind an OpenAI-compatible API. Once that's running, the same OpenAI Python SDK code works.

# litellm_config.yaml
model_list:
  - model_name: deepseek-r1
    litellm_params:
      model: deepseek/deepseek-r1
      api_key: os.environ/DEEPSEEK_KEY
  - model_name: qwen-3
    litellm_params:
      model: openrouter/qwen/qwen-3
      api_key: os.environ/OPENROUTER_KEY

```python from openai import OpenAI from mcp.client.stdio import stdio_client

client = OpenAI(base_url="http://localhost:4000") async with stdio_client("npx", "-y", "@covasyn/mcp-client", env={"COVASYN_API_KEY": "sk-cova-…"}) as mcp: tools = [t.to_openai_function() for t in await mcp.list_tools()] response = client.chat.completions.create( model="deepseek-r1", tools=tools, messages=[{"role": "user", "content": "ICH Q1 design for compound X"}], ) ```

Getting started — Continue.dev

For VS Code, JetBrains and Zed users who want an open-source Cursor alternative.

{
  "mcpServers": {
    "covasyn": {
      "command": "npx",
      "args": ["-y", "@covasyn/mcp-client"],
      "env": { "COVASYN_API_KEY": "sk-cova-…" }
    }
  }
}

Add to ~/.continue/config.json. Reload. Done.

Getting started — Self-hosted CovaSyn for fully on-prem stacks

For IT-security contexts where neither the LLM nor the chemistry layer may leave the perimeter. CovaSyn ships as a signed OCI container with SBOM. Three deployment shapes:

Docker Compose for VM-based deployments
Helm chart for Kubernetes / OpenShift clusters
Air-gap bundle for environments without internet

Pair with a self-hosted DeepSeek or Qwen on a single A100 / H100 node, served via LiteLLM. End-to-end on-prem chemistry agent stack, GxP-aligned, no external dependency at runtime. Discovery call required for the self-hosted track — details on the MCP product page.

FAQ

Can open-weight models hit the same 85+ percent accuracy as Claude + CovaSyn?

Not quite. Open-weight + CovaSyn typically lands in the 65–80 percent band on the ICLR benchmark, versus 85–92 percent for frontier + CovaSyn. The structural lift is the same; the absolute number is lower because open-weight tool-calling is less reliable. For many production workflows the open-weight number is still acceptable, especially with retry-on-error policies.

Is CovaSyn open-source?

The MCP client wrapper and the integration snippets are MIT-licensed at github.com/oliverkraft93-ops/covasyn-mcp-examples. The MCP server itself is closed-source, distributed as a signed container for self-hosted deployments. The combination of self-hosted CovaSyn plus open-weight models gives you full code-and-data-sovereignty without sacrificing the deterministic tool layer.

Does CovaSyn work with Llama, Mistral, Gemma?

Yes. Any model with reliable function-calling — that is, any modern open-weight model — works once it's behind an OpenAI-compatible router like LiteLLM, vLLM, or Ollama. Smaller models (≤7B) will struggle with multi-step tool plans; pick a 30B-class or larger.

What does the cost look like for a fully on-prem stack?

Hardware-dependent. A single A100 80GB serving a 30B-class open-weight model handles roughly 200K tool-orchestration calls per day. CovaSyn site licenses are quoted per engagement. The point is that the variable per-call cost drops near zero once you own the hardware.

Sources

ICLR 2026 MolecularIQ benchmark: covasyn.com/en/benchmark.
AI for chemistry pillar: covasyn.com/en/ai-for-chemistry.
Comparison of chemistry MCP servers: The 5 leading chemistry MCP servers.
Companion guides for closed-model stacks: Claude, Gemini, ChatGPT.

CovaSyn MCP

Scientific tools in your AI workflow.

130+ functions for pharma, biotech and chemistry. Free tier instantly active.

See CovaSyn MCP →