- Database of Resources
- Important Themes
- Guides for Lecturers
- Events and Workshops
- Teaching Development Projects
- Materials Awareness Projects
Materials education as a discipline, whether in materials science or engineering, is concerned with the development of both knowledge and skills. It may be concerned also with the development of attitudes, but assessment in that area is difficult (see Assessment of Attitudes below). In all of this it is no different from many other disciplines, but because of its position between the pure and applied sciences and its comparative novelty as a discipline, it may be easier to meet current needs and be innovative than might be the case in more established disciplines. In line with the well known dictum that 'assessment drives the curriculum', assessment in materials education may therefore provide an unusually favourable opportunity to the innovator and iconoclast.
Assessment of knowledge may in principle be well understood, but much practice falls short of what is known as good practice, particularly in relation to the application to knowledge of the higher order academic skills, such as analysis, synthesis and evaluation. A brief account of aspects of good traditional practice is therefore given in the next section, which is adapted from the Appendix of Elton and Johnston 2002.
Assessment is of two kinds, usually referred to as 'formative' and 'summative' the purpose of the former being to help with learning, while that of the latter is to judge what has been learned. The conventionally accepted view is that the two must be kept separate, because they fulfil very different purposes and usually occur at very different times in a student's course - although coursework assessment can have both formative and summative aspects. In principle, formative assessment is much the more important, since it aims to improve learning, while summative assessment merely verifies the learning that has been achieved. In practice, as students' careers and future lives may be dependent on the outcome - they treat summative assessment more seriously and it is often very difficult for teachers to get their students to put effort into formative assessment. Recently Knight (2002) has questioned the need for the complete separation of the two kinds of assessment in a way that will be seen as highly relevant to the matters treated later in this guide.
Another, and very fundamental, dichotomy is that between two fundamentally very different forms of assessment. One, for which the technical term is positivist, makes the assumption that 'truth' (the commas are to indicate that the concept of truth is not simple) is absolute and that assessment matches student performance against a previously established model response which exists in the minds of examiners. The other, for which the technical term is interpretivist, explicitly accepts that 'truth' is a social construct, ie 'a matter of consensus among informed and sophisticated constructors'(Guba and Lincoln 1989, p. 44), a consequence of which is that experienced examiners can agree on a student's performance only after the performance has taken place. The word 'experienced' here is designed to ensure against the 'anything goes' phenomenon. In the past, most examining has taken place in a firmly positivist mode, but there has always been a notable exception to this rule - assessment in architecture and in art and design, where the famous 'crit' is firmly interpretivist. Materials Science and Engineering, with their strong claims to be creative subjects (see eg Dewulf and Baillie 1999), in which creativity is however reigned in by the laws of science, may call for an assessment system that is in part positivist and in part interpretivist. This booklet will propose such a case. A similar case can be made for mathematics, which is inherently positivist, but where mathematical modelling 'particularly in applied disciplines' might constitute an ideal topic for an interpretivist treatment. A final dichotomy is that between process and product. There has been a strong demand recently from powerful bodies, such as the Quality Assurance Agency, that assessment should verify specified outcomes (ie products) and furthermore that these should be specified in advance. Educationists argue against this view, saying that:
Knowledge is primarily assessed in formal examination papers in which students are asked to select a portion of the questions set, which are ostensibly of the same difficulty. As this type of paper can test knowledge only selectively, it should only be used if the knowledge being tested is of very minor importance and if the academic skills tested by the different questions are of comparable difficulty. A broadly speaking hierarchical scale of such skills (developed from the work of B S Bloom et al 1956) might be as follows:
Clearly, basic knowledge should be treated comprehensively and not selectively, while the higher skills, which are concerned with academic skills applicable to the basic knowledge, can be exhibited with any basic knowledge. It is therefore reasonable that students should have a choice of knowledge base in which they demonstrate their academic skills. This leads to the idea of a structured examination paper, with progressively more choice (or of course to different types of papers for the different levels of the hierarchy). A structured paper might look as follows:
Formulating questions in this way implicitly raises the question of the learning objectives which are being tested. All too often, questions are formulated in terms of content, but questions of similar content may be asked at different skill levels in the above hierarchy, in which case the same type of learning is not tested. It is therefore vital to be clear of the learning objectives which are being tested, in terms of both content and skill level. And not only must the examiners be clear on this matter, but students must have been made explicitly aware of the importance of these levels.
Many examination questions are of the essay type and these present another difficulty. Essay questions normally contain an 'operative' word, such as:
Such operative words give an indication how the question is to be answered and they should have definite meanings, which are shared through previous preparation by teachers/examiners and students. Unfortunately this is rarely the case, the understanding being generally tacit for the teacher or examiner and for that reason not shared with the students. There is often no way of knowing whether this tacit understanding is the same for all examiners, and so only those students who happen to share the tacit understanding of a particular examiner are likely to do well. What is needed for a fair assessment is for that understanding to become explicit between teachers and students and for examiners to agree to it. Furthermore, it may be noted that perhaps the most common operational word, ie 'discuss', is not in the list, as it has too many different meanings in different circumstances, each one of which is better expressed by one of the words in the above list. Could it be that the popularity of this word in examination questions reflects the not uncommon attitude in examiners to deliberately be vague in a question, in order to see how students tackle it and then match their responses to the examiner's preconceived perceptions, which are of course unknown to the examinees? And never, but never should an essay question take the form of literally a question, unless a straight 'yes' or 'no' is considered an adequate answer.
Finally, it is worth raising the issue of using computers in essay examinations. Students are now strongly encouraged to word process their coursework. Is it then right that they should have to return to pen and paper for their examinations? And if not, can the problems associated with bringing computers into examinations be overcome? There are at present no answers to these questions, but they are worth asking.
Questions which are in several parts, and in which information in the earlier parts helps in responding to the later ones, are common in the sciences. In such composite questions, the first part is often a purely memory question, the second part uses the information contained in the first part in solving a problem and the third part may similarly use it for a more theoretical question or, put differently, the level of learning objectives goes up as the question progresses. However, in such cases, the wording of the first part gives information on the later parts (eg what might be a good starting point for solving a particular problem) and thereby makes the later parts easier than they would have been, if the first part had been omitted. Since a most important aspect of problem solving lies in the identification of a suitable starting point, this may devalue the question as a test of problem solving skills. Furthermore, if such a question is now set in an open book examination, and the first part is omitted, as being purely memory work and therefore unsuitable for an open book question, then the problem part may have been made considerably harder by this omission - a fact that is often not appreciated.
A form of question which is at a deceptively high level of learning objective is one where the information provided may be too much, too little, or just right to solve the problem set in the question. Such a question introduces an element of uncertainty, which is normal in everyday life and therefore is well designed to test certain life skills.
The perception that MCQs can test only for knowledge recall is doubly wrong: all MCQs test for recognition rather than recall, but they can test at all levels of the hierarchy discussed on page 4. However, the higher the level to be tested, the more difficult is it to set good questions.
'Quite generally, because multiple choice questions give no information concerning students' thought processes, it is particularly important that they should be of a high professional standard'.
Examiners should not use MCQs, unless they have been trained in their use, and this is true even if the MCQs are taken from professionally generated question banks.
This may be a good point to refer to equal opportunities issues. The following ought to be considered in connection with any questions set in an examination:
Finally, if one's concern is no more sophisticated than: 'How can one write an exam question without just using the same one as last year and changing the numbers, which doesn't cause one too much work in marking and which tests what is wanted?' The answer lies in an analysis of last year's question in terms of its learning objectives, followed by a new question which satisfies the same objectives, but for a different content. That should not be too difficult, and is not to be despised.
The most basic features of traditional assessment are:
These two features of assessment are not independent of each other and in all but the most basic assessments (eg testing memory) there must be a trade off between them. It is often assumed that multiple choice questions are 100% reliable, but this is not so. They obviously are 100% reliable in marking, but they are not in setting, where ostensibly equivalent questions may be of very different difficulty for particular students. In consequence, while the best and the worst, may well be equally identified by equivalent tests, the ranking order in the middle may be very different. What must never be done is to sacrifice validity in order to increase reliability, eg by expecting students to learn at the higher skill levels, which can be assessed only moderately reliably, and then to test at the lower levels, which can be assessed much more reliably.
'There must always be a trade off between validity and reliability with validity being dominant'.
This is because students' learning is guided by the assessment to come - the objectives being assessed largely become the students' learning objectives - the so-called backwash effect.
The unreliability of assessment is generally considered more acceptable in disciplines with a strong creative and/or aesthetic component, and it is there that the concept of 'connoisseurship' originally arose. However, if it is accepted that all good assessment is to some degree unreliable, then perhaps it becomes acceptable to use the concept of connoisseurship also qualitatively in forms of assessment in other disciplines (Eisner 1985), eg in portfolio assessment (see below). Such assessment is often referred to as 'impression marking', but the important point to bear in mind is that the impression must be agreed by the examiners in advance - and these examiners should be expert, not only in their discipline, but also in examining. This is what is meant by connoisseurship. It is however often wise in such situations to confine oneself to pass/fail assessment, since reliability there is usually greater than in more finely graded assessment and - if faulty - affects far fewer candidates.
Increasingly, the grading of degree work along a single dimension is being called into question. Is it really meaningful to describe three or four years' work in terms of a single number? If not, then it is time that the reporting of degree results was done in a way that more meaningfully reflects what students have done, and produced, during their course. This can be done most readily through a 'profile', which replaces the degree certificate. A profile may be no more than a 'transcript', in which the individual assessments are allowed to speak for themselves and are not conflated into a single degree class. However, a genuine profile ought to assess also aspects of student learning which are difficult or even impossible to grade, and hence may include some assessments done on the traditional classified basis, some on the basis of pass/fail - perhaps through connoisseurship assessment, some brief verbal reports or reflections and, in some instances, perhaps no more than a certification of attendance. One advantage of such a compilation is that it also offers students the opportunity to include specific items of which they are proud, and to have them acknowledged in the crucial assessment arena. Such variety more appropriately meets the needs of different learning objectives and goes well beyond the American 'transcript', in which all judgments are expressed in grades. Universities might still want to provide degree certificates, but these would simply state that, on the basis of the attached profile, a student has been considered worthy of a degree.
Students have always been expected to develop certain higher order mental skills, but it has often been assumed in the past that they would acquire those skills in the process of learning and that these skills could not be explicitly tested. However, as the development of these higher order skills is often the most important outcome of learning at degree level and is certainly something much valued by employers, the explicit assessment of high level skills has become of great importance. The kind of skills that will be considered below go beyond the rather trivial ones, like numeracy, and extend the meaning of 'skills' to include mental abilities (often referred to as 'higher order skills') such as those associated with problem solving, criticality and creativity, as well as social skills, such as communication and team work - all against a background of materials education. Attempts to treat such abilities and skills as generic, and hence transferable, should be avoided; the very concept of transferability is one that lacks an adequate theoretical, as well as evidential, basis.
As has been argued in the last section, skill learning does not lead to outcomes in the way that knowledge learning does, and can generally be assessed best through the assessment of the processes that lead to the development of the skills. In principle, a well-known way to achieve this is through coursework assessment as this is particularly suitable for assessing process objectives. Even if there are product objectives of skills, eg a finished artifact or a solved problem, which respectively assess creativity and genuine problem solving abilities, these cannot be assessed under the stresses and time pressures inevitable in formal examinations. In these cases, it is essential to use a method which is not strictly time limited, such as coursework assessment. However, it is then largely meaningless - but is done all the time - to lump such coursework assessment in with a formal examination assessment into some aggregated mark. However, the reporting of process learning through coursework is not straight forward, and much of it is either too simplistic in its reporting - mostly through products such as essays and reports - or unnecessarily unreliable through the lack of a formalised reporting process.
A formalised reporting process, which has existed for a long time in architecture and in art and design courses, is the portfolio. This usually consists of a number of products, eg artifacts, and/or accounts of learning processes. The guiding principle is that a portfolio documents students' work, largely or entirely selected by them, and normally including 'evidence drawn from practice and, crucially, usually containing reflective commentary'; (Baume and Yorke 2002, p7). Here we are on largely unexplored ground, as far as materials education is concerned - Baume and Yorke were concerned with teacher education! To what extent there should be such a form of assessment in materials education - and indeed what form such portfolios might take - remains to be found out. At a recent materials science assessment workshop the following were suggested, as linking process evidence with skills:
Reflective writing (criticality)
Case Study (criticality)
Computer programmes (creativity and communication skills)
Learning experiences (life skills)
Project work examples (problem solving and group skills)
Web pages (communication skills)
Extracurricular activities (initiatives)
Publications (communication skills)
Patents (management skills)
An important feature of portfolios is that they treat students' achievements positively, ie in terms of what students have achieved - all too many examinations are designed to discover what students have failed to achieve. Portfolios are particularly suited to the assessment of high-level skills, such as those contributing to problem solving, criticality and creativity, since students present their learning development over time. Such development is very individual and makes it impossible to treat all students in the same way, which in turn leads to a very different interpretation of what is 'fair'. Instead of fairness being based on everyone being treated identically, in portfolio assessment, fairness is based on the concept of every student being given an equal opportunity to show their best work. Incidentally, this makes the most common form of plagiarism, copying from each other, almost impossible. Also, as Yorke (1998, p181) argues in a seminal article, 'where the detail of performance is concerned, the emphasis is switched from prior expectation of outcomes to post hoc recognition that what has been achieved is consistent with the general expectations of the reward'. This appears to shift assessment firmly into the interpretivist mode. Going further, Knight (2002) argues that:
This last point would require some education of employers, to get them to base their judgment of a student's work (as expressed in a portfolio, in which students have made their own claims through self-assessment) on the basis of the formative feedback which the student had received. These self assessments would of course have been commented on within the portfolios by teachers, and employers could then judge the evidence from these portfolios in relation to their requirements of a future employee. The essence of the argument is the need for portfolios, based on formative feedback and in the first place often self-assessed, for curricular aspects that relate largely to processes.
Portfolio learning requires students to be independent learners, with their teachers acting as facilitators rather than as sources of knowledge; a teacher ceases to be in authority, although they remain an authority in their field of knowledge. Two other features are that much of the learning takes place in groups, and that students have to learn how to reflect on their learning. However, the pay-off is that students are then able to demonstrate the kind of skill development which traditional teaching does not foster, and which traditional examining does not assess.
The essence of portfolio assessment is that it is based on the documentation of ongoing learning experiences, as well as outcomes, and it may include self-assessment by the student. It is difficult to see how the assessment of ongoing learning experiences could be achieved in any other way, but it certainly makes reliability harder and, for that reason, this report will suggest (in the next section) that the assessment should be holistic and purely on a pass/fail basis, ie not graded. (The very applicability of the concept of reliability to such assessment has been critiqued by Johnston ). Furthermore, if assessment is to be based on portfolios, then both students and staff have to be made familiar with the concepts and execution underlying this method of reporting learning achievements. This is very obvious, but it still needs saying, since all too often innovations in teaching, learning and assessment fail because students and staff have been inadequately prepared for them. The introduction of portfolios necessitates substantial educational development for both teachers and students. Not only do they constitute a very different form of reporting on the achieved learning, but they also take a very different approach from the traditional one to the learning process. In general, this will also include the development of skills such as those on which group learning and reflection are dependent - this in addition to the skills that are already a focus for development.
The style of working in relation to portfolios is alien at present for both students AND tutors, so it is necessary to encourage some shifts in learning practices such that everyone learns why and how. Advice to be given in connection with portfolios can be summarised as follows:
Advice to tutors:
Advice to students:
Advice to students and tutors, to include information on and help with:
Advice to assessors after reading a portfolio, but before its assessment:
Advice to assessors after assessment:
An example of the use of portfolio assessment in Engineering is provided by Payne et al (1993). As summarised by Johnston (2002) it includes:
For an inspiring article on portfolios and their use, see C Rust (2000).
The difficulty with designing attitude assessments is that in traditional forms of assessment, eg essays, it is almost impossible to distinguish a genuine from a pretended report. This has been the case for a long time, eg Whyte (1956) had a section on 'How to cheat on personality tests'. However, the development of a student, as documented in a portfolio, together with its self assessment, has the potential of assessing attitudinal development and change.
We can now compare positivist and interpretivist approaches to assessment:
|Objective reality exists||Reality socially constructed|
|Objective standards can be set||Assessment by interpretive community|
|Assessment does not involve students||Assessment involves students|
|Importance of reliability
|Dependability replaces reliability|
|Conflict between reliability
|Transferability replaces validity|
Check on quality in outcomes
|Check on quality within process|
|Fairness in uniformity||Fairness in diversity|
The concepts of dependability and transferability need further explanation. The former is a measure of the stability of data over the process of assessment (eg portfolio assessment can cover a substantial time period); the latter is concerned with the extent to which salient conditions match in supposedly similar assessments. These are not easy concepts and for a more complete description see Guba and Lincoln (1989), pp241-42.
By now it should be clear that the conflation of grades or percentage marks of different parts of a degree assessment into a single final degree class not only obscures most of the valuable information obtained, but has aspects that are so arbitrary (what weight to be given to different parts of the assessment, how to reconcile different means and different spreads of marks, etc ) that the abolition of the final degree class must be seriously considered. Its main purpose is that it is supposed to have predictive value, particularly for prospective employers, but that is patently untrue. Its main value to employers would appear to be that it provides a first sift to cut down the number of applications for a post to manageable proportions, but this is not a sufficient justification for an essentially meaningless procedure which is expensive in resources and can have lasting psychological consequences on students who fail to get a particular class of degree. The degree certificate should therefore be replaced by a profile that states in some detail the extent to which students have succeeded in the different parts of their courses and how these different parts were assessed. If this is done, then portfolios might well be graded merely as pass/fail, holistically and on the basis of connoisseurship, which would significantly reduce the problem of the reliability of their assessment. Most employers spend very large sums on the recruitment and first appointment of graduates; what has now to be added is a short training booklet or course that will enable them to make the most of the large amount of information provided by a profile or portfolio. The proposal to abolish degree classification may well be the most contentious one in this guide, although it received a very favourable hearing in what I believe to be the only conference which addressed that subject (Winter 1993) and now has emerged as part of a possible government agenda for higher education.
It has already been suggested that much plagiarism can be avoided, if students are treated as individuals, rather than being all treated the same. This may not of course always be possible or indeed desirable. Thus, in a large first year class, all students do the same laboratory experiments and many are likely to copy from each other. This can be kept in check by adding an oral assessment of one or two laboratory reports, which have been chosen by the examiner. Finally, as it is important at times for students to work in groups, it is vital to discuss with them where collaboration ends and copying starts.
This guide has attempted to provide an argument for assessment, which covers fairly both knowledge and skills; how the assessment of skills - which is much the less well explored might be done through 'connoisseurship' and by means of portfolios; how the use of portfolios might soften the sharp divide between summative and formative assessment; and finally whether this approach to assessment may logically have to lead to the abolition of the classified degree.
My thanks are due to Dr Brenda Johnston for much valuable criticism.
D Baume and M Yorke (2002), 'The Reliability of Assessment by Portfolio on a Course to Develop and Accredit Teachers in Higher Education', Studies in Higher Education 27, pp7-25.
S Dewulf and C Baillie (1999), 'Creativity in Art, Science and Engineering: how to foster creativity', London: Department for Education and Employment.
E W Eisner (1985), 'The Art of Educational Evaluation', London: Falmer.
L Elton and B Johnston (2002), 'Assessment in Universities: a critical review of research', LTSN Generic Centre.
E G Guba and Y S Lincoln (1989), 'Fourth Generation Evaluation', London: Sage.
B Johnston (2002), 'Summative Assessment of Portfolios: An Examination of Positivist and Interpretivist Approaches to Agreement over Outcomes', in preparation.
P Knight (2002), 'Summative Assessment in Higher Education: practices in disarray', Studies in Higher Education 27, pp275-286.
R N Payne et al, 'Portfolio Assessment in Practice in Engineering'; International Journal of Technology 'Design Education 3 (1993), pp37-42.
C Rust (2000), 'An opinion piece: A possible student-centred assessment solution to some of the current problems of modular degree programmes', Active Learning in Higher Education 1, pp126-131.
W H Whyte (1956), 'The Organization Man', Simon and Schuster.
R Winter (ed) (1993), 'The Future of the Classified Honours Degree', Cambridge: Anglia Polytechnic University.
M Yorke (1998), 'Assessing Capability', in J Stephenson and M Yorke, 'Capability and Quality in Higher Education', Kogan