For those of you curious to learn which trees scored best in the recent Kokufu scoring exercise (see part 1 and part 2 for details), here they are!
Large conifer – shimpaku juniper
Large deciduous – Japanese beech
Medium display – trident maple and shimpaku juniper
Shohin display – white pine, black pine, gardenia, privet, chojubai, kinzu
I was happy to see that readers identified the above shohin display as the best of the bunch as it received the Kokufu prize (the gold-colored card on the table).
Likewise with the other selections. There were several great trees in each category and I think the above trees represented some of the best on the list.
For those curious about lessons learned from the exercise, read on.
First, the number of responses.
- Deciduous and Conifer survey: 258
- Medium and Shohin survey: 107
I calculated the winners in each category using average scores and z-scores. Both approaches yielded the same results for the top trees.
I also looked at how people used the rating systems. For the deciduous/conifer survey, very few trees received a score of “1” (about 2%) on a five-point scale. The most common score given was “4” (37% of scores given).
The same was true of the medium/shohin survey. On a seven-point scale, very few trees received scores of “1” or “2” (.8% and 3.1% respectively). The most popular score was “5” (30% of scores).
The reluctance to award low scores to Kokufu trees suggests that respondents took a somewhat absolute approach to scoring (in other words, no “Kokufu tree” deserves a “1”).
On this note, average scores for individual judges varied dramatically. The most skeptical judge gave an average score of 2.5 (out of 7) to the trees in the medium/shohin survey. The scores of the most generous judge averaged 6.4 out of 7.
In light of these observations, we have a few takeaways that we can apply to our scoring system for the Pacific Bonsai Expo.
- To encourage the use of all five points in the scoring system (the Expo trees will be rated on a scale of 1-5), we’ll provide guides for awarding different scores. (For example, “3” will be appropriate for trees of average quality based on the trees in the exhibit.) The idea is to encourage broader distribution of scores so the overall results can better rank all of the trees in the exhibit.
- To accommodate judges of varying skepticism and ensure that all judges make relatively equal contributions to the overall results, we’ll use z-scores.
- We’re also planning to measure as much as possible to better inform future events. For example, we’re going to look at rater reliability (how far are individual judge’s scores vary from the mean), effects of recency or latency (whether raters award higher or lower scores at the beginning, middle, and end of the evaluation), and look at how judges evaluate their own displays.
Most importantly, we’re going to publish the raw scores anonymously so the community can identify any interesting patterns or opportunities for improvements that have yet to come to light.
That’s enough for now about scoring mechanics. When I revisit the topic, the focus will be on how we recognize bonsai and display quality.
Subscribe to Bonsai Tonight
New Posts Delivered Every Tuesday and Friday
Hank says
Thank you for the update, Jonas, I was really looking forward to hear how the scores panned out. Just a general musing about quantitative scoring, whether it’s bonsai or movie reviews or Olympic competition: Some believe it forgoes consideration of the qualitative. While I was scoring, for example, my constant thought was” “I must use the entire scoring spectrum, because otherwise this renders the meaning of the grades meaningless.” This caused me to go back on my work and re-evaluate some trees and displays trees relative to others, which may or may not be a built-in aspect of the scoring process. This is all just my interpretation of the scoring system, but you can see how it can become more of an exercise in evaluating the scoring instead of the trees. Perhaps a future alternative would be to offer categories for each display that judges would score quantitatively, further guiding judges on what aspects are “valuable” and what are not based on their own judgement. I really liked this exercise, and I look forward to seeing the detailed results!
Jonas Dupuich says
Thanks, Hank – those are great points! For the exercise of evaluating an exhibit (within a limited amount of time) we’re going with a simple, single score. For the purpose of evaluating the quality of the tree, there are lots of great approaches that take into consideration a variety of characteristics. I’ll get more into that in future posts when I look at scoring trees in greater detail.
Thanks!