The scoring systems use complex algorithms which are initially trained using human rated samples.
Thousands of marks and examples of performance are used to train these automated scoring systems until sufficiently high enough levels of reliability are achieved. The field testing of PTE involves assessing responses from over 10,000 candidates from more than 120 different language groups.
For the speaking component nearly 400,000 responses were collected and marked by human raters.The correlation between the human scores and the machine scores for an overall measure of speaking was 0.96 thus it proving the reliability of the measure of speaking in PTE Academic. The measurement applied to assess how accurate a language test is assessing an individual’s ability is known as the ‘Standard Error of Measurement’ or SEM.
Comparing data published by some of the other major English tests recognized by Government bodies and Higher Education Institutions, PTE Academic has the highest reliability estimates for both the overall score and the communicative skills scores based on the SEM of all the major academic English tests.
For more information on Standard Error of Measurement (SEM) and scoring concordance of PTE Academic with other major English Languag Assessments, please refer to p42-50 of the PTE Academic Score Guide.