World

How to conduct performance reviews when AI did part of the work

By Clarisse Ng

September 28, 2025 at 6:30:00 PM

Image Credits: Unsplash

The awkward truth in many startups is that managers still review people as if the last year looked like the year before. AI produced drafts, wrote code scaffolds, summarized data, generated test cases, and even suggested design variations. Some employees disclosed this and others did not. If your review process assumes every artifact maps cleanly to a person, you will reward the best presenters rather than the best decision makers. The fix is not harsher rules. The fix is a clearer design for how work is credited, evidenced, and coached when intelligent tools sit in the loop.

The first design choice is to decide what you are actually rating. If the goal is to reward impact and repeatable judgment, you must separate outcome ownership from content production. A product spec can be ninety percent AI words and still represent a strong human decision. Likewise, a beautiful slide deck can hide weak thinking if it was pasted from a tool with no critique. Your system should measure how people frame problems, constrain tools, validate outputs, and integrate the result into team delivery. That requires explicit language in the rubric, and it requires every manager to stop equating word count with contribution.

Now set a calm expectation of disclosure. Employees fear that declaring tool use will get them labeled as lazy. Managers fear that not declaring will create unfair advantage. Solve both fears with one standard. Require a short line of attribution on major artifacts that answers three questions in plain language. What part was tool assisted. What risks were considered. What verification was performed. Keep it short so it is practical. Make it routine so it does not feel like a confession. When disclosure is normalized, coaching becomes easier because the conversation pivots from suspicion to technique.

Once disclosure exists, you can finally rewrite the rubric. A workable rubric for AI assisted performance reviews has four anchors. The first is problem framing. Did the person define constraints, success criteria, and acceptance tests that made the tool useful. The second is prompt and guardrail quality. Did they structure inputs that reflect domain knowledge, ethics, and company policy. The third is verification. Did they test, fact check, peer review, or run code against real data rather than trusting a glossy draft. The fourth is integration. Did they land the final outcome inside the team system with the right documentation, handoffs, and learning captured. If your current rubric talks about communication, velocity, and quality without these anchors, you are grading the surface, not the system.

Calibration breaks when managers guess contribution from the artifact alone. Fix this with a light attribution map that travels with key deliverables during the review window. The map is a simple table that lists the deliverable, human owner of the outcome, collaborators, tools invoked, and verification steps. If your teams already track tickets or pull requests, fold the map into that workflow rather than inventing a new tool. The point is not surveillance. The point is preserving context so that reviews do not rely on memory or storytelling talent.

Evidence should change too. In a pre AI world, an elegant spec could stand alone as proof of skill. In an AI world, proof looks like iterations and decisions. Keep snapshots of key drafts, the prompt evolution where it matters, the rejection notes that explain why a generated idea was dropped, and the checks that caught an error before it shipped. Do not hoard every prompt. Curate the moments that show thinking. You are training evaluators to reward discernment, not volume.

Compensation decisions become cleaner when you draw a bright line between outcome ownership and production support. Pay bands rise on the ability to frame, decide, and integrate. Tool operation is table stakes at most levels. Rewarding typing speed or cosmetic polish is a mistake. The leader who selected the right approach, decomposed the problem, and orchestrated design, data, and engineering to deliver the outcome should see that reflected in their rating. A junior who used a model to accelerate code but relied on others for debugging can still earn a solid rating if they disclosed their method and improved testing discipline through the cycle. The conversation should always end with what the person can do next cycle to move up one level of judgment.

Fairness requires boundaries. Prohibit pasting proprietary or sensitive data into public tools unless your security team has enabled a safe path. Prohibit using generated text that impersonates a colleague or misrepresents a credential. Require citations when external facts are included. When violations occur, treat them as process breaches rather than performance secrets. Educate first, then escalate if intent or repeated disregard is clear. Your goal is to keep the learning curve safe without turning your culture into an audit machine.

Manager readiness is the quiet risk in this shift. Many managers do not feel qualified to critique prompts or evaluate verification. Give them a short, repeatable reviewer protocol. Ask them to start each review with two questions. Where did intelligent tools accelerate or shape this work. How did you ensure the output was correct, safe, and on brand. Teach them to request one concrete example where a generated suggestion was rejected and why. Teach them to scan for over reliance. If a person cannot explain failure modes or alternative approaches, they are outsourcing thinking. That is a coaching conversation before it becomes a rating penalty.

Peer review gains new weight. Encourage teams to run small critique sessions where a contributor walks through the prompt strategy, the failure cases discovered, and the checks performed. Keep the tone curious and technical rather than moral. The question in the room is not whether AI is allowed. The question is whether the way it was used made the work better and safer. Over time, these sessions become the best training ground for both juniors and managers because they surface real examples and remove the mystery around good practice.

Role clarity matters more than ever. If you do not define what the owner is responsible for and what the reviewer is responsible for, you will drift into polarity. Owners will hide tool use. Reviewers will hunt for it. Write it down. Owners must disclose tool involvement, capture key decision points, and deliver outcomes that meet criteria. Reviewers must evaluate judgment, not just prose, and must separate their personal tool preferences from the rubric. Operations must provide a safe path for model use and a light way to retain evidence. Leadership must protect people who disclose mistakes early and must never reward superficial perfection over honest iteration.

Training should target three gaps. People who have never built a good prompt need a short playbook that treats prompting like briefing a smart intern. People who have never verified outputs need a checklist that turns verification into muscle memory. People who have never written acceptance criteria need examples that show what testable looks like. Run these sessions with your own work as the case study. Avoid vendor theater. The fastest way to raise the floor is to practice on your real deliverables.

Global and cross cultural teams need an extra layer. In some markets, direct disclosure feels risky. In others, tool use is seen as initiative. If you operate across Singapore, UAE, and Taiwan, expect different comfort levels. Normalize the language by modeling it at the leadership level. Write your own attributions on your strategy memos. Say what the tool did and what you did. People copy what leaders do more than what leaders say. If you make disclosure prestigious, your review data gets cleaner without a crackdown.

You will face edge cases. A salesperson used a model to write outreach and closed a key account. A researcher used a model to summarize interviews and missed a cultural nuance. A designer leaned on generated variations and shipped something that felt off brand. Do not litigate these cases with broad rules. Return to the anchors. Was the problem framed clearly. Were the risks named and managed. Was verification real. Was the final integration strong. Then coach for the missing piece. In each case, ask what signal the team missed earlier in the cycle and how to catch it next time.

If you are about to run reviews next quarter, start with a small pilot on one team. Introduce the attribution map, update the rubric with the four anchors, and train reviewers on the protocol. Observe friction points and adjust language before rolling wider. Announce the shift with calm framing. Say that the goal is to recognize thinking, protect fairness, and reduce rework. Leaders can make this sound like a policing move, or they can make it sound like an upgrade in professional standards. Choose the latter, then act like it.

There is one final behavioral cue that will tell you if your design is working. When a contributor says, I used a model for the first draft, here is what I kept, here is what I rejected, and here is how I tested it, a good manager should feel relieved rather than threatened. That relief comes from clarity. You can see the thinking. You can see the safeguards. You can coach the next step. If you want a culture that learns faster than the technology changes, build reviews that reward the craft of judgment, not just the shine of output.

The shift to AI assisted performance reviews is not a trend piece. It is a design problem. You are designing for accurate credit, useful coaching, and responsible acceleration. The tools will keep changing. The anchors of good work will not. Frame the problem well. Constrain the tool with context. Verify like a professional. Integrate the result into the system so others can use it. When your review process reflects that sequence, your team will get faster without losing its standards, and your people will feel seen for the hard skill that matters in this era. They will be measured for how they think, not only for what the tool typed.

Use the focus keyword once more in your internal documentation so search can find this piece later. The change you are making here is subtle and powerful. It protects fairness today and teaches judgment that compounds over every future review cycle.

Leadership

LeadershipSeptember 28, 2025 at 5:30:00 PM

How to keep and attract talent after a flop?

A flop has a way of puncturing the story a founder tells inside a company and outside it. The product misses the mark,...

Leadership

LeadershipSeptember 28, 2025 at 5:30:00 PM

The influence of founder personalities on startup success

Every early team reflects its founder. That reflection is not a slogan. It is a daily operating system that determines how decisions are...

Leadership

LeadershipSeptember 28, 2025 at 4:00:00 PM

How does networking contributes to business growth

Networking that moves a business never looks like a stack of coffee chats and a crowded inbox. It behaves like a system that...

Leadership

LeadershipSeptember 28, 2025 at 3:00:00 PM

Why managers are important in an organization

Startups rarely fail for lack of smart people. They stall because no one owns the handoffs. In the beginning, small teams run on...

Leadership

LeadershipSeptember 28, 2025 at 2:30:00 PM

How leadership style affect employees performance

I used to think my team needed a stronger leader. The kind who could walk into a room, make a decision in three...

Leadership

LeadershipSeptember 28, 2025 at 2:30:00 PM

Is micromanaging necessary in the workplace

Micromanagement shows up when a leader is trying to protect quality with proximity. It feels responsible in the moment. You jump into a...

Leadership

LeadershipSeptember 26, 2025 at 7:30:00 PM

Why employers should prioritize soft skills and potential over grades?

Hiring is not only about predicting individual performance. It is about shaping the operating system of your team. When employers place soft skills...

Leadership

LeadershipSeptember 26, 2025 at 2:30:00 PM

Should HR ask the candidate if they prefer to work alone or in a team during the interview?

Hiring teams love tidy signals. A single question that sorts candidates into neat boxes feels efficient, almost scientific. The problem is that people...

Leadership

LeadershipSeptember 25, 2025 at 8:00:00 PM

Can you be friends with your boss?

Can a boss be a friend? The question sounds personal. In early teams it is actually structural. Power does not disappear when people...

Leadership

LeadershipSeptember 25, 2025 at 7:30:00 PM

Are middle managers necessary?

There have been books and think pieces about flat organizations. No bosses. No managers. Just autonomous people shipping great work. The idea sells...

Leadership

LeadershipSeptember 25, 2025 at 6:30:00 PM

Does leadership have an impact on employee performance?

Leadership is not a mood and motivation is not a pep talk. When teams deliver consistently, it is because leaders have translated vision...

How to conduct performance reviews when AI did part of the work

By Clarisse Ng

Subscribe to Our Newsletter

How to keep and attract talent after a flop?

The influence of founder personalities on startup success

How does networking contributes to business growth

Why managers are important in an organization

How leadership style affect employees performance

Is micromanaging necessary in the workplace

Why employers should prioritize soft skills and potential over grades?

Should HR ask the candidate if they prefer to work alone or in a team during the interview?

Can you be friends with your boss?

Are middle managers necessary?

Does leadership have an impact on employee performance?