Article from

First presented at The Developing Group 2 Aug 2014

Calibration and Evaluation – 3 years on

James Lawley and Penny Tompkins

1. Introduction
2. Research methodology
3. Findings
4. Conclusions

Evaluation is a kind of ‘behind the scenes’ process we all do regularly. We are constantly informally evaluating people's behaviour against our own internal (often subconscious) standards. What makes a formal assessment 'formal' is that the standards and process of assessing are known, and hopefully well-defined.

A different kind of evaluation happens when we make an assessment of another person's evaluation. To do so we need to take into account their means of evaluating, which may not be the same as ours. How accurate – or not – are we are calibrating another person's evaluation?

We devoted the December 2011 Developing Group to Clean Evaluative Interviewing. The aim on that day was to learn how to use Clean Language as a research interview method when the topic being researched was how people evaluate an experience.

Dr. Susie Linder-Pelz and James have recently concluded an academic research project in which six coaching sessions were evaluated from three perspectives: by the coach, the client and an expert-assessor.*

At the 2nd August Developing Group James updated the group on the findings of the research, and we explored how we can individually and collectively make use of the conclusions. in particular we experientially investigated:

  • As a coach, how aware are you of how your client and an expert would evaluate a coaching session?
  • Does knowing your client and an expert's opinion affect your own evaluation?

Calibration and Evaluation

Over the years we have approached the topic of calibrating in different ways.

We have long noticed that when people on a training course are asked to evaluate a practice coaching session they often give an answer which varies wildly with the opinion of the client and/or us as expert observers.

For example, one coach said a session was “catastrophic”, while the client said “I got some useful insights and lots to think about”. James who was observing said to the coach, “You did what the activity called for. The client got what they asked for with their desired outcome. A more direct approach might have got to the meat earlier, and even so, you and they now have a lot more of a landscape to work with and a good basis for the next session.”

When the coach was asked what their evaluation of the session was now, having heard the opinion of the client and expert they said “Well I’m pleased the client got something out of it and I still think it was catastrophic”. We wonder what scale the coach was using to evaluate their effectiveness, and what they would have labelled a much worse session! (See The Importance of Scale) 

Our modelling of excellent facilitators (not only those who use Clean Language) showed that a key skill was the ability to calibrate the experience of the client and to notice when it changed and in what direction. (See Systemic Outcome Orientation)

There are lots of ways to calibrate, and what seems more important than the method of calibrating is that (a) the facilitator is actively calibrating moment-by-moment; (b) there is a correspondence between the facilitator’s calibration and the client’s experience; and (c) the facilitator can quickly change in response to the results of their calibration. This led us to make the “First Principle of Symbolic Modelling” (See REPROCess and Modelling Attention):
Know what kind of experience the client is having (i.e. what you are modelling).
While calibrating is a matter of efficacy, we have pointed out that it is also an ethical matter. If you do not calibrate the kind of experience the client is having, how do you know whether what what you are doing is, or is not, working for the client? (See Calibrating Whether What You Are Doing is Working – Or Not)

James and Susie’s research of coaching sessions shows that even experienced coaches and experts can give widely differing ratings compared to those of the client and to each other. While this may be surprising at first, once it is appreciated that each tend to use different criteria in coming to their evaluations, the variation makes more sense.

In our opinion, a bigger issue is the difficultly there appears to be in managing multiple  perspectives when they diverge. Many certification and evaluation processes use one perspective: Experts decide if a coach is competent to be certified or suitable for a job, or clients decide if they are satisfied with the service. Rarely are both taken into account. Even more rarely does the coach’s ability to calibrate both the client and the expert perspective become part of the assessment.

One reason for this may be the difficultly in comparing apples, oranges and bananas. This is compounded if the aim is to find a single composite score. The result is likely to be an arbitrary weighting of the contribution of each perspective. Rather than trying to reduce the perspectives to a single rating, an alternative is to live with the complexity of three perspectives and set acceptable levels in all three.**

By bringing our own evaluations out from ‘behind the scenes’ and making them 'centre stage' we can play with our own patterns of assuming, and get a ‘reality check’ on our how and what we are unconsciously calibrating.

— — — — — —

* The first part of the study was published as: Linder-Pelz, S. & Lawley, J. (2015). Using Clean Language to explore the subjectivity of coachees' experience and outcomes. International Coaching Psychology Review, 10(2):161-174.
Download a free preprint version: Linder-Pelz_Lawley-ICPR_preprint_15_Jun_2015.pdf

The second part of the study was published as: Lawley, J. & Linder-Pelz, S. (2016). Evidence of competency: exploring coach, coachee and expert evaluations of coaching, Coaching: An International Journal of Theory, Research and Practice.
Download a free preprint version: Lawley&Linder-Pelz_CIJTRP_preprint_03_May_2016.pdf

** We are grateful to Michelle Duval who helped us to get clear on this point.

Research Methodology used at Developing Group, 2 Aug 2014.

1. A Goal-focused Coaching Skills Questionnaire (GCSQ) was emailed to participants in advance with a request to complete it and bring it on the day.* The instruction given was:
Circle the number that most reflects your assessment of your current clean coaching competency.
2. Twelve questionnaires were completed. The average scores for each person were converted to the equivalent scores out of ten:
The scores ranged from 6 to 9 out of 10, with an average of 7.3.
3. Ten of the participants were paired up and allocated an expert-observer (a recognised assessor of Clean Facilitator competencies). The participants in each dyad took turns to be the coach and the client for an observed 30 minute session.

4. At the end of each session the client, coach and observer completed in private a sheet designed specifically for that role.** The sheets were collected without other participants seeing them. The sheets contained requests for:
(i) numerical evaluations out-of-ten from various perspectives, and
(ii) a textual list of the key criteria used in the evaluation of the session (see below).
5. After both coaching sessions had finished and the figures entered into a computer, the sheets were handed back to each triad for reflection and discussion.

6. An anonymised summary of the results were shown to the whole group for more reflection and discussion.

7. Lastly, the group was spit in half and two of the expert-observers conducted a 30 minute coaching session observed by the other four participants and another expert-observer. Evaluation sheets were completed as #4 and compared within the group.


* Questionnaire used with permission. See: Goal-focused Coaching Skills Questionnaire (GCSQ), Anthony M. Grant & Michael J. Cavanagh. Social Behavior and Personality, 2007, 35 (6), 751-760.

The GCSQ sent out was slightly modified from the original. One word in each of questions f, g, h and j was changed to make the questions compatible with clean coaching. Download: Clean_modified_GFCSQ_questionnaire.pdf

** The three sheets asked for the following evaluations and information. The letters (a)-(f) refer to Table 1 in Findings:
Completed by CLIENT:

On a scale of 1 to 10, the value of the session to me was .......................... (a)

Please list the key criteria you used to assess the value of the session to you:

Completed by COACH:
On a scale of 1 to 10, I evaluate the quality of the session as ..................... (f)

Please list the key criteria you used to assess the quality of  the session:

I estimate the CLIENT rated the value of the session to him/her as ............. (b)

I estimate the OBSERVER rated my clean coaching skills as ....................... (e)

On a scale of 1 to 10, I evaluate the coach’s clean coaching skills as ........... (d)

Please list the key criteria you used to assess the clean coaching you observed:

I estimate the CLIENT rated the value of the session to him/her as.............. (c)


Table 1 shows the evaluations for the 5 triads involving 10 observed coaching sessions.

a = Client's rating for value of session to them.
b = Coach's estimate of client's rating (a)
c = Observer's estimate of client's rating (a)

d = Observer's rating of coach's clean coaching skills
e = Coach's estimate of observers' rating (d)

f = Coach's rating for quality of the session

 a  b  c
 Coach    f  d
 Observer    e  

Table 1: Ratings by Clients, Coaches and Expert-Observers

Table of results


Table 2: Clients' criteria used to assess the value of the session

Criteria collated into three categories:
Effect on self
Relationship with coach
Other coaching skills
Note: The italics has been added by the authors to indicate why category was chosen.

  • New insights.
  • Some ‘aha’ moments.
  • I like to feel [I've] have had insights.
  • New information came out.

  • Do I feel I have a clearer idea of what I want and what the current state is?
  • I now have a clearer idea of what is happening and how I can go forward.
  • Did I feel clearer?
  • Had clarification of what I needed to do and a check on whether I would actually do it.
  • Helped me clarify and develop my outcome.
  • The importance of the situation is now more obvious.
  • The things that are combining to perpetuate the present situation are more developed and understandable.

  • Changes in the metaphorical representation.
  • Changes in my inner response.
  • Paper mapping provided different perspective.
  • I “renovated” - reframed two [of my] coaching programs into a new offer for 2015.
  • Disconnected a big value criterion.

  • I like to feel like I’ve made progress
  • The movement towards what I want out of the session.
  • Learned something about ‘system’ outcome.

  • Support in identifying actions that feel appealing and potentially useful.
  • This is also something I can develop with [name].
  • I feel I have something I can take away of use that will make a difference
  • Confidence in following thru on identified actions.
  • Able to take action on my outcome.
  • Had actions to do what felt correct.

  • Resonates with things which have come up before.
  • My resources from the past are memorable now and can be used in the future.

  • The importance of the topic that I was working on (the value to me).

  • Sense of permission to explore.
  • Sense of acceptance that I could explore whatever I wanted to explore.
  • How safe I felt in the session to say what I wanted to say.

  • Attention of coach on my words and actions.
  • Level of presence of the coach.
  • Would have liked to feel more presence from the coach.

  • Rapport with coach.
  • Did facilitator feel sympathetic

  • Coach was a valuable asset in this.
  • The fluidity/flow of the session.
  • Did not write so session flowed.
  • Necessary conditions to ‘feel’
  • [Getting to a] Decision point, gut-mind together.
  • Clean Space modelling of 1st step provided valuable information.
  • The extent that the coach could distinguish between problem and desired outcome
  • Responsiveness and useful directing of attention.
  • When had to answer questions that did not feel relevant/or clean that detracted from the session


Numerical ratings

Table 1 shows that six of the ten clients rated their session as 7.5 or 8 out of 10. Two rated it higher at 9 or 9.5, one at 6 and one much lower at 3.

Table 3 compares the figures in Table 1 in pairs. The codes a to f refer to the key above Table 1.

Table 3: Comparison of Client, Coach and Observer ratings.

Table 3 shows that, generally the coaches and client's ratings were close with eight being within one point of each other (f-a). The two exceptions with larger variations occurred when the client rated the session highest (E at 9.5) or lowest (J at 3).

It was similar picture for the coaches estimate of their client's rating (b-a).

Together, this suggests that the coaches were able to calibrate their clients when the session was 'close to the norm', but they were less able to do so when the client's evaluations were at the extremities.

A comparison of observer and client ratings (d–a) and observer estimate of the client's ratings (c–a) showed a similar pattern to the coach-client comparisons. However, in two instances, the observer ratings of the coach’s skills and the client ratings were at a variance by three or more points (Observer 1 and Observer 3). Interestingly, these were also the sessions where observers most misjudged the client’s ratings (c–a). This shows that even expert's can, on occasion, seriously misjudge the value perceived by the client. (This is inline with the Linder-Pelz & Lawley research)

The coaches seemed to have more difficulty estimating the observer's rating (d–e) than they did the client's rating (b–a). Only four of the coaches estimates were within one point of the observer's rating of them, suggesting some coaches either lack awareness of what the observer is looking for, or are unable to take the observer perspective while (or immediately after) they are coaching.

Client-Rating Criteria

The 45 criteria mentioned by the ten clients and listed in Table 2 were clustered into three group:
28 - 62% - the effect on the client
  8 - 18% - relationship with the coach
  9 - 20% - other coaching skills
Much research suggests that the coaching relationship (or "alliance") is the primary factor in the outcome of coaching. And while this may be so, for these clients at least, it was mentioned a relative low number of times compared to the effect (outcome) on the client.


Penny and James are supervising neurolinguistic psychotherapists – registered with the United Kingdom Council for Psychotherapy since 1993 – coaches in business, certified NLP trainers, and founders of The Developing Company.

They have provided consultancy to organisations as diverse as GlaxoSmithKline, Yale University Child Study Center, NASA Goddard Space Center and the Findhorn Spiritual Community in Northern Scotland.

Their book,
Metaphors in Mind
was the first comprehensive guide to Symbolic Modelling using the Clean Language of David Grove. An annotated training DVD, A Strange and Strong Sensation demonstrates their work in a live session. They have published over 200 articles and blogs freely available on their website:

All information on this web site (unless otherwise stated) is Copyright © 1997- Penny Tompkins and James Lawley of The Developing Company. All rights reserved. You may reproduce and disseminate any of our copyrighted information for personal use only providing the original source is clearly identified. If you wish to use the material for any other reason, please get in touch via our Contact Form