Assessment and development center

enhancing project managers’ growth


At the National Aeronautics and Space Administration (NASA), career development is a shared responsibility among employees, managers and supervisors, and the organization. This is more of a critical necessity due to the nature of projects in a “Faster, Better, Cheaper” environment. The Project Management Development Process (PMDP) ensures appropriate competency and knowledge in advance of placement into key NASA project positions.

Background and Project Understanding

The basic process for PMDP involves the following steps, in which NASA employees:

•  Envision what they want their career to look like today, five years and ten years in the future, taking into account NASA's vision, missions, and Enterprise goals

•  Assess their aptitudes, strengths, and development needs with their mentor(s) and supervisor using the PMDP competency model and Center-specified competencies

•  Seek mentor and supervisor input and prepare an IDP that supports both the current job and longer-term professional goals within the work environment

•  Work with their supervisor to schedule appropriate on-the-job training, required complementary formal training, and development activities

•  Complete a Record of Accomplishments (RoA) and an Individual Development Plan (IDP) for their current level of PMDP, and forward to their Manager and to the Center PMDP Point of Contact.

In today's fast-paced environment, project managers need to receive as much feedback as possible in realistic project scenarios in order to develop themselves to their maximum capability. Additionally, this process needs to be integrated as part of a systematic developmental model that involves managers and supervisors as well as the employee, since NASA is moving closer to a project manager certification process.

The Project Mirror is designed to be a voluntary development activity that accomplishes the following goals:

•  Provides developmental information in simulated project environments to individuals who aspire to become NASA project managers

•  Provides developmental information in a simulated project environment to individuals who want to identify additional developmental needs

•  Suggests areas where additional education, training or developmental assignments may be beneficial in increasing the capability of NASA project managers

•  Involves the Center management in the development process. The design of the developmental center will be guided by the following criteria:

•  The Project Mirror activities are grounded in practical content and processes that are directly applicable to the NASA project environment

•  The design of Project Mirror incorporates managers and supervisors within NASA in order to forge a closer connection and appreciation of the problems that project managers face in the field

•  The components of Project Mirror are off-the-shelf components to the maximum extent possible in order to be fielded in an expeditious manner

•  Reliability and validity of the process were developed concurrently and are periodically updated

•  Evaluative components of Project Mirror are based on current competencies identified in the latest version of the NASA PMDP competency model

•  All areas of individual and team development are evaluated for inclusion in Project Mirror, and the highest priority areas are included.

This paper is designed to present the supporting theories used to create Project Mirror. The presentation will focus on and present the practical side and the design and development challenges faced by the project team, as well as the most recent outcomes of the event.

Brief History and Theory of Assessment Centers

There are a multitude of instruments and techniques available for organizations to assess their personnel for managerial potential; cognitive ability, personality styles, leadership, 360-degree evaluations and work samples of managerial behavior are examples. With this wide variety of tools available for data collection purposes, organizations continuously search for ways to employ multiple assessment procedures in one unified setting in order to achieve a “total person” evaluation. Currently, a methodology called assessment centers has again gained widespread popularity among business consultants as well as academics. However, are assessment centers really effective in predicting actual job performance, or are they just a measure of how well subjects follow the perceived rules of their particular organization?

The assessment center method allows for a set of instruments and evaluative techniques to be employed in one location and iteration. Typically, an assessment center refers to a standardized set of simulated activities that optimally reflect as closely as possible managerial situations to which the employee reacts in some measurable behavioral fashion. Assessment center activities, if properly constructed, are a result of a comprehensive needs analysis of the jobs that exist within an organization. This needs analysis should take into account the organizational, job and individual employee factors that exist within that organization.

German military psychologists first used multiple assessment procedures during World War II, largely as an effort to achieve an integrated picture of candidates for specific wartime missions and responsibilities. Other wartime offices such as the British War Office Selection Board and the U.S. Office of Strategic Services followed these early efforts. Industry first adopted these practices in 1956 with the AT&T Management Progress Study, with subjects (N = 422; college = 274, non-college = 148) participating over a period of several summers from 1956 to 1960 (Bray & Grant, 1966). In 1965, the career progression of these subjects was evaluated against the original assessments, and another assessment was made eight years following this first evaluation. The predictive validities for the original assessments (college = .44, non-college = .71) were so impressive that use of the assessment center methodology spread rapidly throughout organizations in the private and public sectors.

The result generated since this first application of the assessment center methodology in an industrial setting have been generally supportive in terms of predictive validity, but not of construct validity. There have also been efforts to apply the assessment center method to women and minorities, with respectable predictive validity results in each category (Klimoski & Strickland, 1977). The independence from prior experience and educational attainment that was a characteristic of the AT&T study continues as a key element of assessment center methodology, and goes to the heart of the definition of the activity (Task Force on Assessment Center Guidelines, 1989): An assessment center consists of a standardized evaluation of behavior based on multiple inputs.

These inputs are provided by trained assessors observing the performance of participants engaged in specifically developed assessment simulations. The idea is that individuals can be brought into an environment that simulates real-life work activities as closely as possible in terms of tasks and activities, and that the performance in this simulated environment will yield predictive assumptions about future performance.

Many organizations today use the assessment center methodology for a wide variety of purposes. In a major survey of assessment center practice in the United States (N = 291, response rate 48%), the results indicate that selection (50%), promotion (45.8%), and development (39.2%) are the most popular objectives (Spychalski et al., 1997). The populations of interest has also expanded beyond managers to include sales personnel, engineers, military personnel, and other public and private sector individuals, as well as students and production personnel. Another current and popular expansion of the methodology has been into the area of teams, where the application of these techniques is applied in a group setting.

The survey researchers applied The Guidelines and Ethical Considerations for Assessment Center Operations (1989) in determining the design quality of Assessment center practices and the level of rigor present in their implementation. The percentages of centers that adhered to the design elements follow the element description below:

Job analysis—The dimensions and exercises of an assessment center are based on a job analysis that defines the necessary components of effective job performance, and specifies those that can be measured in the assessment center (93.3%). Behavioral elicita-tion and classification—The techniques applied in an assessment center are geared toward providing data on selected dimensions of behavior that are meaningful and relevant to the desired analysis goals (81.4%).

Assessment techniques—Multiple and valid assessment techniques are employed in the gathering of behavioral data that are appropriate to the defined dimensions. Survey results were:

•  In-basket exercise (81.7%)

•  Leaderless group discussions (43.6% with assigned role, 59.4% without assigned role)

•  Interviews (57.1%)

•  Simulations (53.5%)

•  Analysis problems (49.3%)

•  Presentations (46.2%)

•  Fact-finding exercises (37.6%)

•  Skills and abilities tests (31%)

•  Self-evaluations (31.3%)

•  Peer evaluation (22%).

Multiple assessments—Each identified dimension is measured using multiple observations techniques that are reliable, objective, and efficient. Survey results indicate compliance:

•  Taking notes (95.2%)

•  Checklists (41.3%).Behavioral observations scales (24.2%)

•  Simulation—Multiple job-related simulations are employed for each behavioral dimension that is as accurate to the job definitions as possible (53.5%).

Assessors—Multiple assessors are used for each assessee in order to increase reliability of the behavioral observations. Based on the results of the aggregation questions, the authors assume that multiple assessors were used. In terms of selection of assessors, survey results indicate that assessors were selected from:

•  Supervisor recommendation (53.3%)

•  Self-nomination (43.95%)

•  Performance on other selection devices (37.7%)

•  Line positions (48.5%)

•  Staff positions (25.6%)

•  Psychologists (5.7%).

Assessor training—Assessors are trained thoroughly and held to the standards of the Assessment Center Guidelines. In assessor training, the survey indicated that they received preparation in the following areas:

•  Assessment center method and procedure (98%)

•  Demonstration of procedure (87.7%)

•  Explanation of exercises (99%)

•  Typical rating errors such as halo and leniency (87.7%)

•  Data integration (67.5%)

•  Practice and feedback in classifying and evaluating behavior (82.8%)

•  Simulations of assessors acting as assesses (68%).

Recording behavior—Behavioral observations are systematically and accurately recorded at the time of their occurrence. The comments made on the use of multiple assessment methods seem to indicate at the very least note taking at the time of behavioral occurrence.

Reports—A consolidation of the observations for each assessor is recorded in a report in preparation for an integration discussion. In terms of the survey:

•  Oral feedback to assesses (70.5%)

•  Written performance reports to assesses (60.5%)

•  No performance feedback to assesses (8.1%).

Data integration—Data is pooled through a meeting of the assessors or through a statistical integration process that meets professionally accepted standards. The survey indicated that data was stored on average for 4.5 years, and that data integration occurred through:

•  Assessor consensus (84.1%)

•  Statistical aggregation (14%)

•  Voting (2.4%).

In another survey, this time conducted on assessment centers in the public sector. Lowry (1996) found serious flaws in the adherence to The Guidelines and Ethical Considerations for Assessment Center Operation (1989). This survey covered 105 usable survey responses from a random sample of personnel managers in state and local government and 440 police and fire chiefs in cities with a population of 50,000 or more, resulting in a 19% return rate. Survey results indicate: 80% of the organizations used job analysis.

Validation was reported as lacking or inappropriate. Of the sample of 105, five reported using a validation strategy, with 33 using a content validation strategy rather than construct validation strategies. Assessors were not always properly trained with 29 reporting fewer than four hours of assessor training, with the median being four to seven hours reported.

Feedback to and from participants was not always provided, in that fewer than 50% of the centers provided final reports.

In recent years, assessment centers have transformed into a hybrid function, or what is termed “second-generation centers” (Griffiths & Goodge, 1994), serving both the assessment and development function for the employees of an organization. Griffiths and Goodge label another set of centers as “third-generation centers” that focus only on the development of employees. This certainly seems to be the direction that NASA wants to pursue. These centers differ primarily in how the collected data is used in an organization, in that a development-focused center's data is not used in any way to make decisions about the individual. Engelbrecht and subsequent on-the-job development improves managerial performance at the level of behavior. They found significant statistical differences between experimental and control groups, and found that the results persisted after a three-month period.

Woodruffe (1997) has found that several public and private sector companies are now embracing the “second-generation” hybrid centers as the model to emulate simply because of economic necessity. These organizations feel that the data generated on each employee is too valuable not to apply toward career and development decisions. Blaney and her colleagues (1993) adapted assessment center methodology, which comes from the nursing profession, to help unit directors critique their managerial strengths and weaknesses, and used the center as a developmental tool as well as an evaluation tool. The researchers go as far as to describe their center as “a positive learning environment,” which certainly suggests a developmental mind-set.

Reviews and studies have indicated that assessment centers possess predictive validity for a variety of criteria. Byham (1992) specifies that the most important feature of the assessment center methodology is that it relates to future job performance rather than current job performance. In an optimal setting, the predictor constructs and predictor measures of the assessment center are meant to show a relationship to the criterion construct (usually job performance) through the criterion measures. Klimoski and Strickland (1997) question the very premises of assessment centers; we know that they seem to predict success, but what is it that we are measuring? Their question continues to reflect the status of assessment center today: “are assessment centers valid, or merely prescient?” Their paper suggests that the superior ability of assessment centers in predicting managerial potential over performance criteria could be due to the assessor staff's intuitive grasp of the organization's norms and values, and to the assessor's proficiency in predicting who will most likely be promoted in the organization.

Other researchers have suggested that the popularity of assessment centers may be due to the content-oriented validation strategies that have been employed to justify the approach. Tenopyr (1977) argues that this content validity strategy suggests and eventually supports construct validity in the long run. This viewpoint certainly agrees with the Guidelines (1989), which specify job analysis as an important element of assessment center development. The problem occurs in the construction of exercises that attempt to accurately reflect the identified behaviors that are tied to the job description.

Sackett and Dryer (1982) criticized the content-oriented validation strategies in terms of exercise design as inadequate for proving assessment center validation. They studied the relationships among 15 dimensions (oral communications, oral presentation, written communications, stress tolerance, leadership, sensitivity, tenacity, risk, initiative, planning and organizing, management control, problem analysis, decision-making, decisiveness, responsiveness) across four exercises (analysis problem, in-basket, two separate group exercises). In three separate organizations, they found no convergence among the various dimensional measures across exercises, a high correlation between within-exercise ratings, and high correlations among different dimensions measured in different exercises. Sack-ett and Dryer (1982) conclude that the study does not bode well for the assumption that assessment centers generate dimensional scores that can be interpreted as representing complex constructs such as leadership, decision-making, or organizational acumen. The factor analysis of assessment center ratings reveals solutions that point to exercise factors, not dimension factors, suggesting that assessors are capturing exercise performance as opposed to stable personal characteristics (Joyce et al., 1994).

A significant issue in assessment center validation is that they are designed in many organizations to measure constructs that are assumed to represent conceptually distinct job-related activities (Bycio et al., 1987). The problem occurs when the situational context of the activity influences the demonstration of the behavior. In these types of situations, convergent validity would not be demonstrated, since the behaviors elicited would be assumed as representative of distinct job-related activities that ties to distinct constructs.

Bycio (1987) observed the assessment center ratings of eight abilities (organizing and planning, analyzing, decision-making, controlling, oral communications, interpersonal relations, influencing, and flexibility) from each of five situational exercises (problem-solving group, in-basket, role-play, human relations group, interview) to determine their cross-situational consistency and discriminant validity. In conducting an intercorrelation matrix of the eight abilities and five situations, Bycio found that the assessment center ratings were largely situation-specific. Additionally, there were near-perfect correlations among many of the ability factors. These results suggest there is an issue at the very foundation of assessment center validity, since there is no possible way to represent all situational aspects of a particular job. Add to this the complexity of a cross-functional and cross-situational set of abilities such as managerial abilities, and we have reason for concern in terms of validity and purpose of assessment centers. There seems to be a need for careful assessment of construct validity as applied to assessment center methodology.

Carless and Allwood (1997) investigated the construct validity of managerial assessment center competencies (critical thinking, achievement and action, influencing, interpersonal and communications, self-management and learning) against the three psychological correlates of verbal ability, personality type, and dominant career interests. These psychological dimensions were based on strong research indicating that general intellectual ability (measured by the ACER High Test PL-PQ); high MBTI extraversion scores (measured by the Myers-Briggs Type Indicator Form G); and organizational role-congruency (measured by the Australian version of the Self-Directed Search) were important variables in overall managerial ability. The results of the study (N = 875) are consistent with previous research indicating a lack of support for the construct validity of multidimensional assessment center ratings. Carless and Atwood discovered through exploratory and confirmatory factor analyses that assessors did not differentiate among the specified managerial competencies: verbal ability, interpersonal skills, and the career interest of Enterprising were significant predictors of managerial competencies.

Given the previous discussion on assessment center validity concerns, it is fair to say that researchers cannot completely explain why assessment centers are successful at generating ratings that are predictive of successful job performance. In a meta-analysis study across research studies, the overall predictive validity of assessment centers (corrected r = .37) are acceptable, but researchers have found that there are many moderator variables that influence these validity coefficients (Gaugler et al., 1987). In a meta-analysis of 50 assessment center studies containing 107 validity coefficients, Gaugler discovered that the predictive validity of assessment centers increased with a concurrent increase in the number of female assesses, psychologist assessors, number of evaluation devices used, and use of peer evaluations in the assessment center process. She also found that predictive validity varied with the intended or stated purpose of the assessment center. This validity decreased when promotion was used as a criterion and increased when ratings of potential were used as a criterion.

The construct validity issue is particularly relevant in light of the use of assessment center methodology that serves developmental purposes. The feedback that is provided at these types of centers is directly linked to the construction of learning objectives and a plan for developmental activities to address the gaps uncovered in the exercises. Unless the center can accurately represent these constructs, the results of the feedback are erroneous and developmental plans and action could be useless or detrimental (Joyce et al., 1994).

In terms of the issue that there are no good explanations as to why or how assessment center ratings predict later performance, several researchers have focused on mechanisms that may mediate the performance of assessors in the assessment process. In particular, they have focused on the complexity of the assessor's job, the definitions and relationship of the behavioral dimensions to the assessment center exercises, and the consistency of the exercises in eliciting specific behaviors (Reilly et al., 1990).

In one study, Reilly and his fellow researchers studied a quasi-experimental design where 10 assessors were trained in assessment center techniques and tools, covering eight work samples over a two-day period. There were eight dimensions: teamwork/interpersonal skills, leadership, problem solving, work orientation, attention-to-detail, comprehending/following instructions, planning/organizing, and safety awareness. The focus of the study concerned the structured and unstructured group exercises (assembly procedure for a flashlight or an electrical device that operated a light and fan, and a group construction approach for building a robot). Following the first set of assessments, the assessors were asked to provide behavioral examples corresponding to each rating category. A total 384 clarified and non-redundant behaviors were collected and created behaviors ranging from 32 to 4 items across the categories. A new sample of assesses received their assessments using the checklists as a rating aid.

The behavioral checklists increased average convergent validity (same dimension across exercise) from .24 to .43. The average discriminate validity (different dimension within exercise) dropped from .47 to .41. The behavioral checklist sums had moderate correlation (average of .47) with corresponding dimension ratings. The study suggests there may be an issue with construct and content validity, and that there may be an unnecessarily narrow focus on the exercises and dimensions of an assessment center. The construct may have several different aspects, and may need to take into account peer ratings, supervisory ratings, records of accomplishment, and other indicators. The authors recommend that dimensions need to be better defined in the beginning, through better job analysis using subject matter experts. Better construct definitions, better operational definitions, and less cognitive demand on assessors will result in a better assessment center.

Joyce and his colleagues (1994) addressed the construct validity of assessment centers by using an alternative set of constructs that were based on the functional structure of managerial work as opposed to the traditional person-oriented dimensions. The researchers studied 75 state government middle managers in a centralized training program that contained two assessment centers and used the same exercises. One of the centers, however, rated the behaviors against personal attributes, while the other rated the behaviors against a set of task-oriented dimensions. Data was collected over a four-year period, with managers attending both assessment centers at different points in their career. The exercises were the in-basket, case analysis, written situational test, and performance counseling. The center that measured personal attributes rated oral communication, sensitivity, persuasion, organization/planning, initiative, and problem detection. The other center rated structuring and staffing, structuring jobs, recruitment/selection, establishing effective work group relationships, performance management, internal contracts, and handling information and daily problems.

Current trends in assessment center development in organizations point to a movement toward a developmental mission in addition to the traditional assessment mission. The literature indicates that the data generated by a hybrid assessment/development center and the cost of such an activity are forcing organizations to use these centers in as broad a context as possible. The issues that these studies raised for NASA and Project Mirror are the following:

•  Behavioral checklists can relieve raters of an overload of data, and help in the organization of information. More specific dimensional definitions based on strong theories may improve the ties to operational behavioral definitions in the center's exercises.

•  Targeting a smaller subset of behaviors across a broader set of exercises may allow for more accurate descriptions of elicited behaviors.

NASA Project Tasks and Methodology

There was certainly solid research evidence and justification for the assessment/development center approach. It was also important that NASA situate Project Mirror in the larger context of PMDP. The dimensions defined in the design of a center needed to be syn-ergistic with prerequisites, follow-on activities, coaching, mentoring, and developmental assignments that follow attendance at the center. Project management development needed to occur within the context of a total organizational system that emphasized consistency across its activities and performance dimensions, both in the center and in job performance.

It was decided early on in the development of Project Mirror that it incorporate the relationship between personal dimensional constructs (such as oral communication) and functional/task-oriented dimensional constructs (such as establishing effective team relationships). The personal dimensions of project management were to be made operational in terms of task dimensions, and these task dimensions needed to be cross-functional when defined against broader competency descriptions under PMDP. (For example, oral communications may manifest itself in particular ways that are more effective under the task-oriented construct of effective team relationships.)

Simulations were rapidly becoming an important part of assessment/development center methodology. As technology makes it possible to recreate more accurate exercises and richer data situations, this approached needed to have an impact on the ability of exercises to elicit behaviors more closely associated with the construct dimensions due to the realism of simulations.

The trends that were evident in the assessment center literature indicate a willingness to entertain the possibility that the methodology might not lend to traditional definitions of construct validity, and that the methodology needed to be evaluated using other types of analysis that were more congruent with the interpretive requirements concerning each observed behavior within the context of each exercise. Simulations and intact teams were considered also as a way to clarify the validity issue in that the exercises would gain fidelity and realism as technology allowed us to improve on the delivery mechanisms.

Based on the above background, history, and criteria, Project Mirror was originally organized under the following task dimensions:

•  Managing Tasks

•  Managing Teams

•  Managing Situations

•  Managing Self.

The critical tasks listed below were required to be complete before the first pilot run of Project Mirror.

Task 1

For each dimension, the NASA PMDP competency model was scaled for more definition with behavioral checklists and anchors involving:

•  Validation of current PMDP competency categories and content/skill components

•  Developed behaviorally-anchored rating scales (BARS)

•  Developed critical incidents from the simulation protocol tied to prioritized competencies

•  Developed prerequisite exercises and preparation

•  Integrated a dichotomized personal and task-related component

•  Developed cognitive components

•  Integrated with global PMI's project management standards.

Task 2

For the individual development session:

•  Developed individual career development protocol to include work experience, education, training, developmental assignments, PMDP level, and a professional SWOT analysis

•  Selected and integrated personality/attitude instruments

•  Selected and integrated work habit and process instrument.

Task 3

For the development center cycles:

•  Developed a timed script development integrating BAS checklists, simulation elements, and external critical incidents

•  Developed assessor protocol, rating instruments, training protocol

•  Defined simulation project success criteria for individuals and teams for simulation in-briefing, to include deliverables, resources, customer satisfaction.

Task 4

For the feedback cycle:

•  Developed between session feedback protocol

•  Developed individual format and process

•  Developed management report for developmental issues

•  Developed process for Agency and Center input.


Blaney, D., Hobson, C., Meade, M., & Scodro, J. 1993. The Assessment Center: Evaluating Managerial Potential. Nursing Management, 24, pp. 54–58.

Bray, D., & Grant, D. 1966. The Assessment Center in the Measurement of Potential for Business Management. Psychological Monographs, 80, (17, Whole No. 625).

Bycio, P., Alvares, K., & Hahn, J. 1987. Situational Specificity in Assessment Center Ratings: A Confirmatory Factor Analysis. Journal of Applied Psychology, 72, pp. 463–474.

Engelbert, A., & Fisher, H. 1995. The Managerial Performance Implications of a Developmental Assessment Center Process. Human Relations, 48, pp. 387–404.

Carless, S., & Allwood, V. 1997. Managerial Assessment Centers: What Is Being Rated? Australian Psychologist, 32, pp. 101–105.

Gaugler, B., Rosenthal, D., Thornton III, G., & Bentson, C. 1987. Meta-analysis of Assessment Center Validity. Journal of Applied Psychology, 72, pp. 493–511.

Griffiths, P., & Goodge, P. 1994. Developmental Centers: The Third Generation. Personnel Management, 25, pp. 307–319.

Joyce, L., Thayer, P., & Pond III, S. 1994. Managerial Functions: An Alternative to Traditional Assessment Center Dimensions? Personnel Psychology, 47, pp. 109–121.

Klimoski, R., & Strickland, W. 1997. Assessment Centers: Valid or Merely Precient. Personnel Psychology, 30, pp. 307–319.

Reilly, R., Henry, S., & Smither, J. 1990. An Examination of the Effects of Using Behavior Checklists on the Construct Validity of Assessment Center Dimensions: Some Troubling Empirical Findings. Journal of Applied Psychology, 67, pp. 401–410.

Spychalski, A., Quinones, M., & Gaugler, B. 1997. A Survey of Assessment Center Practices in Organizations in the United States. Personnel Psychology, 50, pp. 71–90.

Task Force on Assessment Center Guidelines. 1989. Guidelines and ethical considerations for assessment center operations. Bridgeville, PA: Development Dimensions International.

Tenopyr, M. 1977. Content-Construct Confusion. Personnel Psychology, 30, pp. 47–54.

Woodruffe, C. 1997, February 20. Going Back a Generation [Article posted on Web site Dow Jones Interactive Publications Library]. Retrieved December 30, 1998 from the a from the World Wide Web:…LL&noSnips=&highlight=on&Doc-Type=Text

Proceedings of the Project Management Institute Annual Seminars & Symposium
October 3–10, 2002 • San Antonio, Texas, USA



Related Content