FCEM Preparation: Critical Appraisal

Showing posts with label Critical Appraisal. Show all posts

Sunday, 1 February 2015

Critical Appraisal Practice Paper 4 (Diagnostic)

Total marks: 23
Time allowed: 90 mins

You might wish to download the paper. Do it in 90 minutes and then compare with the answers provided here.

Paper: Mallampati test as a predictor of laryngoscopic view

Biomed Pap Med Fac Univ Palacky Olomouc Czech Repub. 2010 Dec;154(4):339-43.

Download (PDF, 65KB)

1. Provide a summary / abstract for the paper. (Up to 5 marks)

Answer

This should include all or some of the following points:

Aim. To determine the accuracy of the modified Mallampati test for predicting the difficulty of subsequent tracheal intubation.

Design. A cross-sectional, clinical, observational, non-blinded study. A quality analysis of anaesthetic care.

Setting. Operating theatres and department of anaesthesia in a university hospital in the Czech Republic.

Material and methods. Following local ethics committee approval and patients’ informed consent to anaesthesia, all adult patients (> 18 yrs) presenting for any type of non-emergency surgical procedures under general anaesthesia requiring endotracheal intubation were enrolled.

Prior to anaesthesia, Samsoon and Young"s modification of the Mallampati test (modified Mallampati test) was performed.

Following induction, the anaesthetist described the laryngoscopic view using the Cormack-Lehane scale. Classes 3 or 4 of the modified Mallampati test were considered a predictor of difficult intubation. Grades 3 or 4 of the Cormack-Lehane classification of the laryngoscopic view were defined as impaired glottic exposure.

The sensitivity, specificity, positive and negative predictive value, relative risk, likelihood ratio and accuracy of the modified Mallampati test were calculated on 2x2 contingency tables.

Results. Of the total 1,518 patients enrolled, 48 had difficult intubation (3.2%).

We failed to detect as many as 35.4% patients in whom glottis exposure during direct laryngoscopy was inadequate (sensitivity 64.4%).

Compared to the original article by Mallampati, we found lower specificity (82.4% vs. 99.5%), lower positive predictive value (0.107 vs. 0.933), higher negative predictive value (0.986 vs. 0.928), lower likelihood ratio (3.68 vs. 91.0) and accuracy (0.819 vs. 0.929).

Conclusion. When used as a single examination, the modified Mallampati test is of limited value in predicting difficult intubation in elective surgery patients.

2. Give three weaknesses of the study design and suggest improvements for these (up to 3 marks)

Answer

Strengths - there aren’t that many!

They did get ethical approval

The sample size was large

Sample recruited from a good cross section of patients undergoing elective surgery

Weaknesses – there are lots of these!

Patients studied are elective not emergency patients, so use in emergency situations cannot be inferred.

It’s unclear who assessed the Mallampati Score but it’s likely to be the same anaesthetist who also assessed the outcome (laryngoscopic grade).

It’s unclear what grade of anaesthetist assessed Mallampati or laryngoscopic grade.

? only one person assessed Mallampati Score.

The person assigning the Cormack / Lehane grade was not unaware of the previously assigned Mallampati Score. In diagnostic studies it is important that the outcome is assigned without knowledge of the intervention test result.

The study could have been improved (for an EM readership) by:

Studying a group of patients needing emergency airway control.

The authors could have described in more detail who exactly was assigning the Mallampati Score e.g. grade, level of training etcc.. and how this was done.

More than one assessor could have done this. The authors could have got two people to do this to ensure consistency and assessed agreement using a Kappa statistic or similar. This would help the reader to assess if the Mallampati Score was reproducible enough to make it worth doing amongst a wider range of clinicians.

Someone else (who didn’t do the Mallampati Score) should have assigned the Cormack & Lehane Grade. The C&L grade given could have been influenced by knowledge of the Mallampati Score.

3. Name one checklist which is useful in evaluating the quality of diagnostic papers such as this. Give two further questions / points within this checklist not covered by the weaknesses you have mentioned in question 2. above. (up to 3 marks)

Answer

Common checklists include QUADAS and STARD (see below)

The STARD statement can be found here (it is similar to QUADAS):

http://www.stard-statement.org/

In this study the “index test” is the Mallampati Score and the “reference standard” is the Cormack & Lehane Grade. You should go through this checklist with the study and see how many weaknesses you can now identify!

4. The table 3 below is taken from the results section.

Summarise the results in the table in one sentence. What is the Mann-Whiney U test? (Up to 2 marks)

Answer

Men are taller and heavier than women!

The Mann Whitney U Test is a test used to compare continuous (or ordinal) data in two independent groups, when the data is non parametric (i.e does not follow a normal distribution). It is analogous to a t-test which does the same thing but for normally distributed data.

5. Construct a 2 x 2 table illustrating the main data from the current study (not Mallampati’s original). (Up to 2 marks)

Answer

		Actual Difficulty of Intubation(by Cormack Lehane Grade)
		Difficult (3/4)	Easy (1/2)
Predicted Difficulty of Intubation (Mallampati Class)	Difficult (3/4)	31	258	289
Predicted Difficulty of Intubation (Mallampati Class)	Easy (1/2)	17	1212	1229
		48	1470	1518

6. Use your table in 5. above to demonstrate how the positive likelihood ratio and the accuracy were calculated. Explain how you would interpret the positive likelihood ratio in this study. (4 marks)

Answer

Positive LR = Sensitivity / 1 – Specificity

Sensitivity = a / a+c = 31 / 48 = 0.646

Specificity = d / b+d = 1212 / 1470 = 0.824

Positive LR = 0.646 / 0.176 = 3.67

Accuracy = total number (%) of “correct” predictions = a + d / a + b + c + d

Accuracy = 1243 / 1518 = 81.9%

LR+ above 10 means that a positive test (i.e. a higher Mallampati Score) will significantly increase the post test probability (of a difficult intubation) enough to make the test worth doing. Figures below 10 (like 3.67) mean that a positive test doesn’t really alter your chances of predicting the outcome enough to make it worth doing.

7. The authors used Fishers Exact Test to statistically compare their results with the results of Mallampati. Describe the indications for using this test as opposed to a Chi Squared Test? (Up to 2 marks)

Answer

Fishers Exact Test is used for 2 x 2 data when the expected count in any of the 4 boxes is “low”. The rule of thumb is if the expected number is less than 10 in any box then Fishers Exact Test should be used.. The expected number in any box can be calculated by multiplying the total of the column by the total row value and dividing by the overall number of patients / data points.

The authors appear to have used 2 x 2 tables for each of the possible outcomes. An example for “true negative” is illustrated below.

		Study
		Adamus et al	Mallampati et al
Observed Result	True Negative	1212	181	1393
Observed Result	Not True Negative	306	29	335
		1518	210	1728

Thus for box d (value 29), the expected value is 210 (the total column value) x 335 (the total row value) / 1728 (the overall total). This is 40.1. In this case all the expected values are greater than 10 and so Chi Squared could have been used. However, for some of the others (e.g. false positives), the expected values will be low so I suppose the authors went for Fishers Test for consistency!

8. What are your conclusions overall? Is this paper going to influence your practice? Briefly suggest any ideas for future research in this area? (Up to 2 Marks)

Answer

Overall this paper is not great and is unlikely to influence your practice. There are multiple weaknesses and potential areas of bias. In addition the results are very different from the original Mallampati Study.

The authors hint at the other factors which allow a good assessment of the airway (e.g. weight or patients etc..). A better study might look at the whole LEMON acronym which you are probably familiar with from ATLS. It could be done in an ED setting with independent assessment of LEMON and the final C&L grade / ease of intubation.

Finally given that the study results are so different from the original Mallampati Study you could propose some secondary research (a systematic review, Best BET or even CTR!) to answer the question posed.

Mallampati test as a predictor of laryngoscopic view. Biomed Pap Med Fac Univ Palacky Olomouc Czech Repub. 2010 Dec;154(4):339-43.

A clinical sign to predict difficult tracheal intubation: a prospective study. Can Anaesth Soc J. 1985 Jul;32(4):429-34.

Critical Appraisal Practice Paper 4 (Diagnostic)

Tuesday, 6 January 2015

Critical Appraisal Practice Paper 2

Total marks: 23
Time allowed: 90mins

Paper: High-sensitivity versus conventional troponin in the emergency department for the diagnosis of acute myocardial infarction

Download (PDF, 446KB)

1. Provide a summary / abstract for the paper. (Up to 5 marks)

Answer

This should include all or some of the following points:

Background: Recently, newer assays for cardiac troponin (cTn) have been developed which are able to detect changes in concentration of the biomarker at or below the 99th percentile for a normal population.

Objective: The objective of this study was to compare the diagnostic performance of a new high-sensitivity troponin T (HsTnT) assay to that of conventional cTnI for the diagnosis of acute myocardial infarction (AMI) according to pretest probability (PTP).

Design: A prospective observational study of consecutive patients who presented to emergency departments in France with chest pain suggestive of AMI.

Setting: Three French Emergency Departments.

Participants: Adult patients presenting with chest pain suggestive of acute myocardial infarction with onset within the previous 6 hours.

Outcome measures: Levels of HsTnT were measured at presentation, blinded to the emergency physicians, who were asked to estimate the empirical pretest probability (PTP) of AMI.

The discharge diagnosis (of AMI or not) was adjudicated by two independent experts (both ED physicians) on the basis of all available data up to 30 days post presentation.

Results: A total of 317 patients were included, comprising 149 (47%) who were considered to have low PTP, 109 (34%) who were considered to have moderate PTP and 59 (19%) who were considered to have high PTP.

AMI was confirmed in 45 patients (14%), 22 (9%) of whom were considered to have low to moderate PTP and 23 (39%) of whom were considered to have high PTP (P < 0.001).

In the low to moderate PTP group, HsTnT levels ≥ 0.014 μg/L identified AMI with a higher sensitivity than cTnI (91%, (95% CI 79 to 100), vs. 77% (95%

CI 60 to 95); P = 0.001), but the negative predictive value was not different (99% (95% CI 98 to 100) vs. 98% (95% CI 96 to 100)).

There was no difference in area under the receiver operating characteristic (ROC) curve between HsTnT and cTnI (0.93 (95% CI 0.90 to 0.98) vs. 0.94 (95% CI 0.88 to 0.97), respectively).

Conclusion: In patients with low to moderate PTP of AMI, HsTnT is slightly more useful than cTnI. Our results confirm that the use of HsTnT has a higher sensitivity than conventional cTnI.

2. Give three strengths and three weaknesses of the study design (up to 3 marks)

Answer

Strengths:

Prospective study; consecutive patients

Done in a European ED setting

Pragmatic approach to other aspects of patient care (very real world)

Outcome measures clearly defined (AMI, unstable angina, other)^*. At least two independent people decided upon the outcome.

Treating and enrolling physicians blind to the results of HsTnT assay.

Biologists assessing the level of HsTnT blind to patient data.

Weaknesses:

Small numbers of patients

Different analysers measured the HsTnT levels

*ED physicians (not cardiologists) diagnosed AMI and other outcomes.

Not all patients were admitted – its unclear if positive attempts were made to contact discharged patients to find out what happened to them at 30 days or if the authors simply used what information they had to decide upon outcome. I suspect the later.

Different timings for the HsTnT level based only on when the patient arrived.

Gestalt used to risk stratify patients (rather than a standardised scoring system)

3. Explain why this study may have been subject to “incorporation” or “work up” bias. Suggest two ways in which this could have been avoided in this study (up to 4 marks)

Answer

It appears to me that the result of the conventional troponin test (cTnI) was used in some cases to determine the need for admission (or not). Since all patients had a conventional troponin test, those with a low pre-test probability of AMI and a negative cTnI were probably discharged. This seems to amount to 39% of the patients (61% were admitted).

Thus the cTnI result helps to determine whether or not further investigations are carried out. It is not too great a leap to assume that the results of cTnI and those of HsTnT are highly correlated.

Ideally the diagnosis of the reference standard (which in this case is based on a review of notes and subsequent investigations in hospital) should be made entirely independently of the interpretation of the diagnostic test under evaluation.

The test under evaluation is the troponin. However, some of the investigations in hospital were only arranged if the troponin was positive. Other patients were discharged. Hence some of the patients may have been “deprived” of the opportunity to be diagnosed with AMI

Incorporation bias thus results in an over estimation of the diagnostic accuracy of a test.

Further explanation of incorporation bias / work-up bias can be found here: http://www.cjem-online.ca/v10/n2/p174

Ways to reduce the influence of incorporation bias in this study include:

Changing the primary outcome measure to, for example, death by 30 days and obtaining the answer for all patients (but probably fewer deaths and hence a bigger study would be needed)

Admitting all patients and performing a standardised set of investigations regardless of the troponin result.

4. The following is an excerpt from the methods section:

“We followed most of the recommendations concerning the reporting of diagnostic studies set forth by the Standards for Reporting of Diagnostic Accuracy initiative”

Give 4 elements of the STARD guidelines which should be reported in a diagnostic study such as this. (Up to 4 marks)

Answer

STARD checklist for reporting of studies of diagnostic accuracy (version January 2003)

Section and Topic	Item#		On page #
TITLE/ABSTRACT/KEYWORDS	1	Identify the article as a study of diagnostic accuracy (recommend MeSH heading "sensitivity and specificity").
INTRODUCTION	2	State the research questions or study aims, such as estimating diagnostic accuracy or comparing accuracy between tests or across participant groups.
METHODS
Participants	3	The study population: The inclusion and exclusion criteria, setting and locations where data were collected.
	4	Participant recruitment: Was recruitment based on presenting symptoms, results from previous tests, or the fact that the participants had received the index tests or the reference standard?
	5	Participant sampling: Was the study population a consecutive series of participants defined by the selection criteria in item 3 and 4? If not, specify how participants were further selected.
	6	Data collection: Was data collection planned before the index test and reference standard were performed (prospective study) or after (retrospective study)?
Test methods	7	The reference standard and its rationale.
	8	Technical specifications of material and methods involved including how and when measurements were taken, and/or cite references for index tests and reference standard.
	9	Definition of and rationale for the units, cut-offs and/or categories of the results of the index tests and the reference standard.
	10	The number, training and expertise of the persons executing and reading the index tests and the reference standard.
	11	Whether or not the readers of the index tests and reference standard were blind (masked) to the results of the other test and describe any other clinical information available to the readers.
Statistical methods	12	Methods for calculating or comparing measures of diagnostic accuracy, and the statistical methods used to quantify uncertainty (e.g. 95% confidence intervals).
	13	Methods for calculating test reproducibility, if done.
RESULTS
Participants	14	When study was performed, including beginning and end dates of recruitment.
	15	Clinical and demographic characteristics of the study population (at least information on age, gender, spectrum of presenting symptoms).
	16	The number of participants satisfying the criteria for inclusion who did or did not undergo the index tests and/or the reference standard; describe why participants failed to undergo either test (a flow diagram is strongly recommended).
Test results	17	Time-interval between the index tests and the reference standard, and any treatment administered in between.
	18	Distribution of severity of disease (define criteria) in those with the target condition; other diagnoses in participants without the target condition.
	19	A cross tabulation of the results of the index tests (including indeterminate and missing results) by the results of the reference standard; for continuous results, the distribution of the test results by the results of the reference standard.
	20	Any adverse events from performing the index tests or the reference standard.
Estimates	21	Estimates of diagnostic accuracy and measures of statistical uncertainty (e.g. 95% confidence intervals).
	22	How indeterminate results, missing data and outliers of the index tests were handled.
	23	Estimates of variability of diagnostic accuracy between subgroups of participants, readers or centers, if done.
	24	Estimates of test reproducibility, if done.
DISCUSSION	25	Discuss the clinical applicability of the study findings.

5. The following figure is taken form the results section of the paper:

Briefly describe how the ROC curve is generated. Broadly what does the “area under the curve” (AUC) tell you about the test? What value for AUC would be given for a perfect test and for a completely useless test? (Up to 3 marks)

Answer

ROC curves can be generated by using different cut off values to represent “positive” and “negative” for tests with continuous data points (e.g. quantitative D-Dimer, troponin I). Several points are chosen and then the sensitivity and specificity of the test for each cut off point is calculated. The ROC curve is simply a plot of the results with 1 – specificity on the x-axis and sensitivity of the y-axis (as above).

The ROC curve gives an assessment of the overall performance of the test at different cut off points for positive and negative. The larger the area under the curve then the better the test. Good tests will have curves which tend towards the top left of the graph.

A perfect test will have an AUC of 1.0 and a useless test will have an AUC of 0.5 (a straight diagonal line at 45 degrees to the origin, representing a 50:50 chance or a test no better than tossing a coin).

You will note that the 95% confidence interval for the AUC of HsTnT ranges from 0.881 to 0.971. i.e. the test appears to have good utility in the diagnosis of AMI.

6. The following is a portion of one of the results tables:

a Write one sentence explaining the results in each of the 4 columns pertaining to all patients with a positive cTnI. (4 marks)

Answer

71% of patients who are diagnosed with AMI will have a positive cTnI at presentation in the ED. Thus a negative test is not very good at ruling out likely development of AMI (SnOUT). It is not a very sensitive test when taken this early.

97% of patients who are not diagnosed with AMI will have a negative cTnI at presentation in the ED. cTnI is quite a specific test i.e. a positive test tends to rule AMI in (SpIN).

Only 78% of patients with a positive cTnI at presentation will be diagnosed with AMI (the PPV).

95% of patients with a negative test will not be diagnosed with AMI (the NPV).

The 95% confidence intervals around these levels indicate the range of plausible results (i.e. the range within which we are 95% certain that the real result lies).

b Briefly explain how the prevalence of the target condition in the population influences the sensitivity of a test and its negative predictive value (2 marks)

Answer

The sensitivity of a test is unaffected by the prevalence of the disease as it only relates to patients who actually have the condition. Similarly the specificity of a test is unrelated to the prevalence of the disease as it only relates to those without the condition.

The negative predictive and positive predictive value, however, are both influenced by the prevalence of the disease in the population. If the prevalence of a condition in a population is small then the reported NPV will be higher than when using the same test in populations with a greater prevalence of the disease. This is one reason why it is important to ensure that the study patients are similar to your own before implementing the results of a study into your practice.

Put another way, if there are hardly any patients with the condition in any case then useless tests such as e.g. tossing a coin, might have quite a good NPV (because there aren’t that many cases to miss anyway). If however, half the patients had the disease tossing a tail wouldn’t always be associated with those without the disease and the NPV will fall.

Conversely, PPV for the same test is higher in populations with a high prevalence of disease and lower if the disease is uncommon.

7. The following is another section from the same results table:

Write down the formulae for calculating the positive and negative likelihood ratios.

What do the positive and negative likelihood ratio results given in the top row mean? (Up to 3 marks)

Answer

Positive likelihood ratio: LR+= sensitivity/1-specificity

Negative likelihood ratio: LR-=1-sensitivity/specificity

LR + = 0.71 / 0.03 Approximately 21.5

LR- = 0.29 / 0.97 Approximately 0.32

LR+ above 10 means that a positive test will significantly increase the pre test probability enough to make the test worth doing. Figures like 21.5 mean that a positive test (cTnI) significantly increases the chance of predicting AMI (and are thus helpful in decision making).

LR – less than 0.1 means that a negative test will significantly decrease the pre test probability enough to make the test worth doing. Thus, the figure of 0.32 means that a negative test (cTnI) is not at all helpful in excluding subsequent AMI. (Mainly because it is done at arrival in ED and not the usual 12 hours).

Fagans nomogram is used to convert pre test to post test probabilities based on LRs.

Fagans nomogram

8. What are your conclusions overall? Is this paper going to influence your practice? (Up to 2 Marks)

Answer

The HsTnT does appear to have a better sensitivity (93%) than the traditional troponin when taken on arrival in ED. This may be helpful in improving time to discharge following attendance with chest pain and may reduce admissions.

However, the 95% CIs for the sensitivity are 89 – 100, and the possible sensitivities are thus too low to allow it to be confidently used to rule out MI based on this study.

Further studies are required.

Note: The answers are not done by me, they were given to me when I prepared my exam. If anyone have any questions, just drop a line below and we all can discuss.

Critical Appraisal Practice Paper 2