Technical Architecture and Model Selection

A demo running on a free API key looks the same as a production system in a sales deck. These questions force the partner to show the engineering decisions underneath.

1. Which deployment model do you recommend for our data sensitivity: direct API, AWS Bedrock, or Azure, and why?

The answer should map directly to your data classification, not default to whatever the consultant used last. An evasive answer treats every client the same regardless of what the data is.

2. How do you decide between RAG and fine-tuning for a given use case?

RAG and fine-tuning solve different problems, and a partner who reaches for fine-tuning by default usually hasn't priced out the retraining cost. A good answer names the specific signal that tips the decision, such as whether the knowledge changes weekly or stays fixed for years.

3. Which foundation model fits our workload, and what tradeoffs come with picking it over alternatives?

Cost per token, context window size, and latency all pull in different directions, and the right model changes by task. Watch for a partner who recommends one model across every use case in the proposal.

4. What's your approach to prompt versioning and regression testing when a model provider ships an update?

Model providers push updates on their own schedule, and a prompt that worked last month can silently degrade. A partner with a real practice here can describe a test suite that runs before any model swap goes live.

5. How do you handle model hallucination in a workflow that touches customer-facing output?

Every model hallucinates under some conditions, so the question isn't whether it happens but what catches it before a customer sees it. A weak answer waves at "prompt engineering" with no verification step behind it.

6. What latency and throughput numbers should we expect at production volume, and how were they measured?

Demo latency and production latency under concurrent load are different numbers entirely. A credible partner has load-tested figures to cite, not a guess based on a single test call.

7. How do you structure context windows and retrieval for documents that exceed a single prompt?

Chunking strategy determines whether retrieval actually surfaces the right passage or buries it in noise. Ask for the chunk size and overlap they use and why, since "we let the framework handle it" means nobody tuned it.

Security and Data Handling

This is where most AI vendors get vague, because the honest answer often requires admitting a limitation. Push past the marketing language and ask for the mechanism.

8. Where does our data live during inference, and can you guarantee it never trains a third-party model?

Some API tiers retain prompts for model training by default unless you opt out in the contract. A trustworthy partner points to the specific provider terms or VPC configuration that prevents it, not a verbal assurance.

9. What's your approach to PII redaction before data reaches the model?

Redaction has to happen before the prompt leaves your network, not after the fact. An evasive answer describes a policy with no pipeline step that actually enforces it.

10. How do you isolate our tenant from other clients' data and prompts?

A partner running multiple clients through a shared vector database or shared API key has a tenant isolation problem waiting to surface. Ask for the architecture diagram, not a description.

11. What encryption applies to data at rest and in transit through your pipeline?

This should be a one-line answer naming the standard, such as TLS 1.2 or higher in transit and AES-256 at rest. If the answer turns into a longer explanation, the encryption probably isn't consistent across every hop.

12. Who has access to our prompts and outputs, and how is that access logged?

Engineers debugging a production issue often need to see real prompts, and that access needs an audit trail. A good answer names the logging system; an evasive one says "only the team that needs it" without defining who that is.

13. What happens to our data if we terminate the engagement?

Data deletion timelines and confirmation processes belong in the contract, not in a verbal promise made during the sales call. Ask to see the offboarding clause before you sign.

14. How do you test for prompt injection and jailbreak attempts before launch?

Any system that accepts user input is a target for prompt injection, and the test has to happen before launch, not after an incident. A partner who hasn't thought about this hasn't shipped a customer-facing AI system before.

Governance and Compliance

AI regulation moved from theoretical to enforceable fast, and a partner who treats governance as paperwork after the fact will leave you exposed when an auditor asks for documentation.

15. Do you have a documented framework for EU AI Act or NIST AI RMF alignment, or is this your first regulated client?

A documented framework means you can request it and review it before the project starts. If they're building the framework on your engagement, you're paying tuition for their learning curve.

16. How do you classify AI systems by risk tier, and what controls apply at each tier?

A high-risk use case, such as one that affects hiring or credit decisions, needs different controls than an internal drafting tool. The answer should distinguish between tiers with specific examples from past work.

17. Can you produce an audit trail showing why a model made a specific decision?

Regulators and internal auditors both eventually ask this question, so the system needs to log the inputs, the retrieved context, and the model version behind every output. A partner who says "the model decided" without that trail has built something you can't defend.

18. What's your process for model cards and documentation that a regulator could review?

Model cards document intended use, known limitations, and testing results in a format an outside reviewer can read without a technical briefing. If none exist for past projects, none will exist for yours either.

19. How do you handle bias testing across protected categories before deployment?

Bias testing requires a defined test set and a defined threshold for acceptable variance, not a one-time eyeball check. Ask what categories they test and what they do when a model fails the test.

20. Who signs off on a new AI use case before it touches production data?

A named approval step, whether it's a governance committee or a designated risk owner, shows the partner builds review into the process. "We move fast and fix issues later" is a fine pitch for a prototype and a bad answer for anything touching real customer data.

Integration and Existing Systems

A standalone chatbot demo is easy. Wiring an AI workflow into a system your business already depends on, without breaking it, is the actual job.

21. Have you connected an LLM to a Salesforce or ERP instance in production, or only built standalone demos?

Ask for the name of the system and a description of the data flow, not just a yes. A partner who's only built demos will struggle the moment your CRM's field validation rejects a malformed write.

22. How do you handle authentication between the AI layer and our existing identity provider?

The AI layer shouldn't carry its own separate set of credentials floating outside your identity provider. A good answer references your existing SSO or OAuth setup directly.

23. What happens when the AI workflow needs to write back to a system of record?

Writing back introduces a new failure mode: a confident, wrong AI output corrupting real data. The answer should describe a validation or human-approval step before any write commits.

24. How do you orchestrate multiple agents without one agent's error cascading into another's task?

Multi-agent systems fail in ways single-model systems don't, because a bad output from one agent becomes bad input to the next. Ask how they isolate failures and where they put checkpoints between agent handoffs.

25. What's your rollback plan if an integration breaks a downstream system?

Every integration ships with risk, and the rollback plan needs to exist before go-live, not get improvised during an outage. A partner without one is asking you to be the test environment.

26. How do you handle version drift between our API and the connectors you build?

Your internal systems will change after the project ships, and a connector built without monitoring for upstream API changes breaks quietly. Ask what alerts them when your endpoint changes shape.

Pricing, Delivery, and SLAs

Pilots are cheap to sell and easy to demo. The real cost and the real commitment show up at the production handoff.

27. What does the pilot-to-production handoff look like, and what are the acceptance criteria?

Acceptance criteria written down before the pilot starts prevent the goalposts from moving once you're invested. If the criteria are vague or undefined, the pilot can drag on indefinitely without ever reaching a decision point.

28. How is pricing structured: per token, per seat, fixed fee, or some mix?

Token-based pricing scales with usage in ways that are hard to forecast without a usage model. A good partner walks you through a sample month's cost at your expected volume instead of quoting an abstract rate.

29. What's included in the quoted price, and what gets billed separately?

Model API costs, hosting, monitoring tools, and support hours often sit outside the headline number. Get the full list in writing before you compare quotes across partners.

30. What SLA applies once we're in production, and what's the penalty if you miss it?

An SLA without a consequence attached is a suggestion, not a commitment. Ask for the specific response time and resolution time, and what credit or remedy applies if they're missed.

31. How long does a typical pilot run before we decide whether to scale it?

A partner with real delivery experience can cite a typical timeline from past engagements. "However long it takes" signals they haven't run enough of these to know.

32. What does support look like after go-live, and is it the same team that built it?

Handing a finished system to a separate support team that didn't build it usually means slower diagnosis when something breaks. Ask whether the engineer who wrote the integration is still reachable after launch.

33. How do you handle scope changes once requirements shift mid-project?

Requirements shift on every real project once stakeholders see the system working. The answer should describe a change-request process with transparent pricing, not an open-ended "we'll figure it out."

Team and Accountability

A confident pitch deck can be written by anyone. These questions find out who actually does the work and what happens when something goes wrong.

34. Who is the named engineer on our account, and what happens if they leave mid-project?

A named person with documented continuity plans beats a vague reference to "our team." If they can't name someone, no one is actually assigned yet.

35. How many other clients is this team supporting at the same time?

An engineer spread across eight concurrent accounts won't have the bandwidth to debug your production incident at 2am. Ask for the number directly.

36. What's your escalation path when something breaks at 2am?

A named on-call rotation with a phone number beats "email support and we'll get back to you." Find out what actually happens between the page going off and a human responding.

37. Can we speak with a reference client running a similar workload in production today?

A reference running in production today, not a logo on a website, tells you whether the system holds up past launch week. Hesitation here is itself an answer.

38. How do you transfer knowledge to our internal team so we aren't locked into your support contract?

Documentation, code walkthroughs, and admin training during the engagement determine whether your team can operate the system independently later. A partner who avoids this question profits from your dependency.

39. What happens to the code, prompts, and documentation if we part ways?

Ownership of prompts, fine-tuned weights, and integration code needs to sit in the contract in plain language. Get this settled before the relationship starts, not after a disagreement.

40. Who owns the outcome if the AI system produces a wrong answer that costs us money?

Liability for a bad output sitting entirely on your side of the contract is a sign the partner hasn't priced their own confidence in the system. A partner willing to share accountability has tested the system enough to stand behind it.

How Mindcat Answers These Questions

Ask Mindcat any question on this list and the answer points to a specific deployment option, not a hypothetical. Claude runs through direct API, AWS Bedrock, or Azure depending on where your data needs to stay. Gemini runs through Google Workspace or Vertex AI depending on whether the use case sits inside your existing Workspace tenant or needs custom model tuning. AWS Bedrock handles workloads that need VPC isolation for regulated industries. MuleSoft Agent Fabric governs multi-agent workflows that need an audit trail across every agent handoff, not just the final output.

On governance, our AI Governance service builds the documented framework this list asks about directly: EU AI Act risk classification, NIST AI RMF controls, ISO 42001 management system alignment, and DIFC Regulation 10 for clients operating in the Dubai International Financial Centre. None of it is theoretical. It's the same framework we'd hand you to review before a project starts, because a partner who can't survive their own question list shouldn't be answering yours.

Technical Architecture and Model Selection

A demo running on a free API key looks the same as a production system in a sales deck. These questions force the partner to show the engineering decisions underneath.

1. Which deployment model do you recommend for our data sensitivity: direct API, AWS Bedrock, or Azure, and why?

The answer should map directly to your data classification, not default to whatever the consultant used last. An evasive answer treats every client the same regardless of what the data is.

2. How do you decide between RAG and fine-tuning for a given use case?

3. Which foundation model fits our workload, and what tradeoffs come with picking it over alternatives?

4. What's your approach to prompt versioning and regression testing when a model provider ships an update?

5. How do you handle model hallucination in a workflow that touches customer-facing output?

6. What latency and throughput numbers should we expect at production volume, and how were they measured?

Demo latency and production latency under concurrent load are different numbers entirely. A credible partner has load-tested figures to cite, not a guess based on a single test call.

7. How do you structure context windows and retrieval for documents that exceed a single prompt?

Security and Data Handling

This is where most AI vendors get vague, because the honest answer often requires admitting a limitation. Push past the marketing language and ask for the mechanism.

8. Where does our data live during inference, and can you guarantee it never trains a third-party model?

9. What's your approach to PII redaction before data reaches the model?

Redaction has to happen before the prompt leaves your network, not after the fact. An evasive answer describes a policy with no pipeline step that actually enforces it.

10. How do you isolate our tenant from other clients' data and prompts?

A partner running multiple clients through a shared vector database or shared API key has a tenant isolation problem waiting to surface. Ask for the architecture diagram, not a description.

11. What encryption applies to data at rest and in transit through your pipeline?

12. Who has access to our prompts and outputs, and how is that access logged?

13. What happens to our data if we terminate the engagement?

Data deletion timelines and confirmation processes belong in the contract, not in a verbal promise made during the sales call. Ask to see the offboarding clause before you sign.

14. How do you test for prompt injection and jailbreak attempts before launch?

Governance and Compliance

AI regulation moved from theoretical to enforceable fast, and a partner who treats governance as paperwork after the fact will leave you exposed when an auditor asks for documentation.

15. Do you have a documented framework for EU AI Act or NIST AI RMF alignment, or is this your first regulated client?

A documented framework means you can request it and review it before the project starts. If they're building the framework on your engagement, you're paying tuition for their learning curve.

16. How do you classify AI systems by risk tier, and what controls apply at each tier?

17. Can you produce an audit trail showing why a model made a specific decision?

18. What's your process for model cards and documentation that a regulator could review?

19. How do you handle bias testing across protected categories before deployment?

Bias testing requires a defined test set and a defined threshold for acceptable variance, not a one-time eyeball check. Ask what categories they test and what they do when a model fails the test.

20. Who signs off on a new AI use case before it touches production data?

Integration and Existing Systems

A standalone chatbot demo is easy. Wiring an AI workflow into a system your business already depends on, without breaking it, is the actual job.

21. Have you connected an LLM to a Salesforce or ERP instance in production, or only built standalone demos?

Ask for the name of the system and a description of the data flow, not just a yes. A partner who's only built demos will struggle the moment your CRM's field validation rejects a malformed write.

22. How do you handle authentication between the AI layer and our existing identity provider?

The AI layer shouldn't carry its own separate set of credentials floating outside your identity provider. A good answer references your existing SSO or OAuth setup directly.

23. What happens when the AI workflow needs to write back to a system of record?

Writing back introduces a new failure mode: a confident, wrong AI output corrupting real data. The answer should describe a validation or human-approval step before any write commits.

24. How do you orchestrate multiple agents without one agent's error cascading into another's task?

25. What's your rollback plan if an integration breaks a downstream system?

Every integration ships with risk, and the rollback plan needs to exist before go-live, not get improvised during an outage. A partner without one is asking you to be the test environment.

26. How do you handle version drift between our API and the connectors you build?

Your internal systems will change after the project ships, and a connector built without monitoring for upstream API changes breaks quietly. Ask what alerts them when your endpoint changes shape.

Pricing, Delivery, and SLAs

Pilots are cheap to sell and easy to demo. The real cost and the real commitment show up at the production handoff.

27. What does the pilot-to-production handoff look like, and what are the acceptance criteria?

28. How is pricing structured: per token, per seat, fixed fee, or some mix?

29. What's included in the quoted price, and what gets billed separately?

Model API costs, hosting, monitoring tools, and support hours often sit outside the headline number. Get the full list in writing before you compare quotes across partners.

30. What SLA applies once we're in production, and what's the penalty if you miss it?

An SLA without a consequence attached is a suggestion, not a commitment. Ask for the specific response time and resolution time, and what credit or remedy applies if they're missed.

31. How long does a typical pilot run before we decide whether to scale it?

A partner with real delivery experience can cite a typical timeline from past engagements. "However long it takes" signals they haven't run enough of these to know.

32. What does support look like after go-live, and is it the same team that built it?

33. How do you handle scope changes once requirements shift mid-project?

Team and Accountability

A confident pitch deck can be written by anyone. These questions find out who actually does the work and what happens when something goes wrong.

34. Who is the named engineer on our account, and what happens if they leave mid-project?

A named person with documented continuity plans beats a vague reference to "our team." If they can't name someone, no one is actually assigned yet.

35. How many other clients is this team supporting at the same time?

An engineer spread across eight concurrent accounts won't have the bandwidth to debug your production incident at 2am. Ask for the number directly.

36. What's your escalation path when something breaks at 2am?

A named on-call rotation with a phone number beats "email support and we'll get back to you." Find out what actually happens between the page going off and a human responding.

37. Can we speak with a reference client running a similar workload in production today?

A reference running in production today, not a logo on a website, tells you whether the system holds up past launch week. Hesitation here is itself an answer.

38. How do you transfer knowledge to our internal team so we aren't locked into your support contract?

39. What happens to the code, prompts, and documentation if we part ways?

Ownership of prompts, fine-tuned weights, and integration code needs to sit in the contract in plain language. Get this settled before the relationship starts, not after a disagreement.

40 Questions to Ask Your AI Implementation Partner

Technical Architecture and Model Selection

1. Which deployment model do you recommend for our data sensitivity: direct API, AWS Bedrock, or Azure, and why?

2. How do you decide between RAG and fine-tuning for a given use case?

3. Which foundation model fits our workload, and what tradeoffs come with picking it over alternatives?

4. What's your approach to prompt versioning and regression testing when a model provider ships an update?

5. How do you handle model hallucination in a workflow that touches customer-facing output?

6. What latency and throughput numbers should we expect at production volume, and how were they measured?

7. How do you structure context windows and retrieval for documents that exceed a single prompt?

Security and Data Handling

8. Where does our data live during inference, and can you guarantee it never trains a third-party model?

9. What's your approach to PII redaction before data reaches the model?

10. How do you isolate our tenant from other clients' data and prompts?

11. What encryption applies to data at rest and in transit through your pipeline?

12. Who has access to our prompts and outputs, and how is that access logged?

13. What happens to our data if we terminate the engagement?

14. How do you test for prompt injection and jailbreak attempts before launch?

Governance and Compliance

15. Do you have a documented framework for EU AI Act or NIST AI RMF alignment, or is this your first regulated client?

16. How do you classify AI systems by risk tier, and what controls apply at each tier?

17. Can you produce an audit trail showing why a model made a specific decision?

18. What's your process for model cards and documentation that a regulator could review?

19. How do you handle bias testing across protected categories before deployment?

20. Who signs off on a new AI use case before it touches production data?

Integration and Existing Systems

21. Have you connected an LLM to a Salesforce or ERP instance in production, or only built standalone demos?

22. How do you handle authentication between the AI layer and our existing identity provider?

23. What happens when the AI workflow needs to write back to a system of record?

24. How do you orchestrate multiple agents without one agent's error cascading into another's task?

25. What's your rollback plan if an integration breaks a downstream system?

26. How do you handle version drift between our API and the connectors you build?

Pricing, Delivery, and SLAs

27. What does the pilot-to-production handoff look like, and what are the acceptance criteria?

28. How is pricing structured: per token, per seat, fixed fee, or some mix?

29. What's included in the quoted price, and what gets billed separately?

30. What SLA applies once we're in production, and what's the penalty if you miss it?

31. How long does a typical pilot run before we decide whether to scale it?

32. What does support look like after go-live, and is it the same team that built it?

33. How do you handle scope changes once requirements shift mid-project?

Team and Accountability

34. Who is the named engineer on our account, and what happens if they leave mid-project?

35. How many other clients is this team supporting at the same time?

36. What's your escalation path when something breaks at 2am?

37. Can we speak with a reference client running a similar workload in production today?

38. How do you transfer knowledge to our internal team so we aren't locked into your support contract?

39. What happens to the code, prompts, and documentation if we part ways?

40. Who owns the outcome if the AI system produces a wrong answer that costs us money?

How Mindcat Answers These Questions

Ask Us These Same Questions

Related Resources

Choosing an AI Implementation Partner

AI Partner Evaluation Framework

Enterprise AI Deployment

Our Services

Salesforce Consulting

AI Automation

AI Readiness Assessment

Explore More Solutions

40 Questions to Ask Your AI Implementation Partner

Technical Architecture and Model Selection

1. Which deployment model do you recommend for our data sensitivity: direct API, AWS Bedrock, or Azure, and why?

2. How do you decide between RAG and fine-tuning for a given use case?

3. Which foundation model fits our workload, and what tradeoffs come with picking it over alternatives?

4. What's your approach to prompt versioning and regression testing when a model provider ships an update?

5. How do you handle model hallucination in a workflow that touches customer-facing output?

6. What latency and throughput numbers should we expect at production volume, and how were they measured?

7. How do you structure context windows and retrieval for documents that exceed a single prompt?

Security and Data Handling

8. Where does our data live during inference, and can you guarantee it never trains a third-party model?

9. What's your approach to PII redaction before data reaches the model?

10. How do you isolate our tenant from other clients' data and prompts?

11. What encryption applies to data at rest and in transit through your pipeline?

12. Who has access to our prompts and outputs, and how is that access logged?

13. What happens to our data if we terminate the engagement?

14. How do you test for prompt injection and jailbreak attempts before launch?

Governance and Compliance

15. Do you have a documented framework for EU AI Act or NIST AI RMF alignment, or is this your first regulated client?

16. How do you classify AI systems by risk tier, and what controls apply at each tier?

17. Can you produce an audit trail showing why a model made a specific decision?

18. What's your process for model cards and documentation that a regulator could review?