Free access

Research Article

20 September 2017

Comparison of therapies in lumbar degenerative disc disease: a network meta-analysis of randomized controlled trials

Authors: Jack Zigler, Nicole Ferko, Chris Cameron, and Leena Patel [email protected]Author Info & Affiliations

Publication: J. Comp. Eff. Res.

Volume 7, Number 3

https://doi.org/10.2217/cer-2017-0047

PDF

Abstract

Aim: To compare the efficacy and safety of total disc replacement, lumbar fusion, and conservative care in the treatment of single-level lumbar degenerative disc disease (DDD). Materials & methods: A network meta-analysis was conducted to determine the relative impact of lumbar DDD therapies on Oswestry Disability Index (ODI) success, back pain score, patient satisfaction, employment status, and reoperation. Odds ratios or mean differences and 95% credible intervals were reported. Results: Six studies were included (1417 participants). Overall, the activL total disc replacement device had the most favorable results for ODI success, back pain, and patient satisfaction. Results for employment status and reoperation were similar across therapies. Conclusion: activL substantially improves ODI success, back pain, and patient satisfaction compared with other therapies for single-level lumbar DDD.

First draft submitted: 20 June 2017; Accepted for publication: 25 August 2017; Published online: 20 September 2017

Degenerative disc disease (DDD) in the lumbar spine is a leading cause of pain and disability in adults and is the reason for over 90% of spinal surgeries performed [1,2]. Treatment for patients with symptomatic lumbar DDD always begins with rehabilitation and pain management [3]. For patients with functionally disabling discogenic low back pain that does not improve after conservative care, surgical options such as lumbar spine fusion and arthroplasty, which is the surgical reconstruction or replacement of a joint, may be considered. Unlike lumbar fusion, which is broadly indicated for several spinal disorders [4], arthroplasty is generally only indicated in patients with discogenic low back pain due to DDD between the fourth and fifth lumbar vertebrae (L4/L5) or fifth lumbar and first sacral vertebrae (L5/S1) that has been unresponsive to conservative care [5,6].

Surgical approaches to the treatment of discogenic low back pain are associated with risks of procedure- and device-related complications and additional stress on other spinal segments, potentially leading to repeat surgeries or adjacent segment disease [7–10]. Since its commercial introduction over 10 years ago, arthroplasty continues to be an effective alternative to current therapies. In patients with lumbar DDD, evidence from randomized controlled trials (RCTs) and meta-analyses demonstrate greater success with arthroplasty than with lumbar fusion or conservative care and lower risk of adjacent segment disorder with arthroplasty than with lumbar fusion [11–21]. Advancements in the design and technology of total disc replacement (TDR) devices may further improve effectiveness and safety outcomes in arthroplasty [22]. First-generation TDR devices such as Charité and ProDisc-L offered a degree of motion preservation to protect against the increased mechanical stress imposed by lumbar fusion. The latest-generation device, the activL^® Artificial Disc (Aesculap, Tuttlingen, Germany), advances motion-preserving technology to more closely align with the natural motion of the healthy human spine, thereby potentially further reducing wear on facet joints and adjacent segments [22].

With the evolution of treatments for lumbar DDD and proven benefits associated with arthroplasty over lumbar fusion, the design of recent RCTs has so far precluded comparison of newer TDR devices such as activL (comparators consisted of two other TDR devices) with fusion or conservative care [23,24]. Thus, there is need for a comprehensive analysis that allows comparison across all relevant treatments for lumbar DDD. A network meta-analysis (NMA) enables the indirect comparison of two treatments that have not been directly compared in clinical trials but have a direct comparator in common. Therefore, the objective of this study was to conduct an NMA of RCTs to evaluate the efficacy and safety of the activL Artificial Disc compared with currently available alternative therapies, including other TDR devices for arthroplasty, lumbar fusion approaches and conservative care modalities, in patients with single-level lumbar DDD.

Materials & methods

A Bayesian statistical model was used to conduct the NMAs. Reporting of the analysis follows the Preferred Reporting Items of Systematic Reviews and Meta-analyses (PRISMA) Extension Statement for Reporting of Systematic Reviews Incorporating Network Meta-analyses of Health Care Interventions [25].

Data sources & search strategy

A comprehensive strategy was developed to search the PubMed and Cochrane Central Register of Controlled Trials (CENTRAL) databases to identify relevant RCTs published until December 2015 (Supplementary Appendix 1). Bibliographic and grey literature searches were also conducted. Only English language articles were reviewed.

Eligibility criteria & study selection

RCTs that included patients with discogenic low back pain due to single-level lumbar DDD, who were unresponsive to conservative therapy, were considered if they compared a TDR device (Charité, ProDisc-L, Maverick, Kineflex-L, Flexicore, activL) with other TDR devices, fusion (anterior, posterior, or circumferential) or conservative care (rehabilitation, exercise). Studies must have reported at least one of the following outcomes at 2-year follow-up: Oswestry Disability Index (ODI) success (≥15-point improvement), mean back pain score (Visual Analog Scale [VAS] or Numeric Rating Scale [NRS]), patient satisfaction, employment status, reoperations (device failures resulting in removal, revision, reoperation or supplemental fixation) and device-related serious adverse events (SAEs). Studies were excluded if they had >50% loss-to-follow-up, were single-arm or observational studies, reported outcomes for a subset of the population captured in the full study (i.e., double reporting) or presented data for treatment arms in an aggregated form. Studies were selected by two independent reviewers, with differences resolved by discussion resulting in consensus or by a third party.

Data extraction

Data on study and patient baseline characteristics, treatments, loss to follow-up and key outcomes pertinent to this analysis were extracted by a single reviewer and checked for accuracy by a second reviewer; disagreements were resolved by consensus or by a third party.

Some data were extracted by digitizing data (TechDig v2.0 digitizing software) and mean differences were calculated from baseline and follow-up data provided. For continuous outcomes, a normal distribution was assumed. No data imputation was required.

Quality assessment

The Cochrane Collaboration's tool for assessing risk of bias was used to evaluate study quality. The tool assesses the risks of selection bias (sequence generation, allocation concealment), performance bias (blinding of participants and personnel), detection bias (blinding of outcome assessment), attrition bias (incomplete outcome data), reporting bias (selective outcome reporting) and other bias [26]. Two independent reviewers scored studies for low, unclear or high risk of bias, with discrepancies resolved by a third party.

Data synthesis & analysis

An NMA permits comparison of treatments across a network of trials, where different treatment strategies are employed within the same or very similar patient population. The NMA approach allows any two treatments within the network of evidence to be compared, even when a direct comparison from a trial is not available. Each treatment was considered as an individual node in the network, with networks constructed using NetMetaXL (Figure 1) [27] and analyses performed using WinBUGS Software (version 1.4.3, MRC Biostatistics Unit, Cambridge, UK). For outcomes such as device-related SAEs where a network could not be constructed because of insufficient data, an NMA was not conducted. Analyses were conducted using Bayesian fixed effects models that were fit to the data with a binomial likelihood. The models and priors used were consistent with recommendations made by the NICE Decision Support Unit [28]. Results of pairwise comparisons were reported as odds ratios (ORs) and mean differences (MDs). ORs were summarized using the median and 95% credible interval (CrI). ORs less than or greater than 1 favor one of the compared lumbar DDD therapies over the other, whereas ORs equal to 1 indicate equivalency. Mean differences were summarized using the mean and 95% CrI; MDs less than or greater than 0 favor one of the compared lumbar DDD treatments over the other, whereas MDs equal to 0 indicate equivalency. In brief, CrIs can be interpreted as the Bayesian equivalent to confidence intervals. Credible intervals excluding 1 for ORs or 0 for MDs indicate statistical significance. The probability that each treatment was the most efficacious was calculated by counting the number of times each treatment had the highest OR or MD in the model [29]. A single numerical summary of the probability that a treatment is best, second best and so on, known as the surface under the cumulative ranking curve area (SUCRA), was also calculated for each treatment. More detailed information regarding the statistical methods is included in Supplementary Appendix 3.

**Figure 1.** PRISMA flowchart.
NMA: Network meta-analysis.

Because an NMA requires that studies are sufficiently similar in order to pool results, available study and patient characteristics were assessed to ensure similarity and to investigate the potential effect of clinical heterogeneity on ORs or MDs and 95% CrIs. Another key assumption behind NMA is that the analyzed network is consistent; that is, there is no conflict between direct and indirect evidence within a closed loop in the network, where the closed loop is comprised of different studies [30]. Evidence networks in our NMA consisted of single-study connections and contained a single closed loop based on data from 1 trial; therefore, we were unable to perform an inconsistency analysis [30]. Additionally, the results from our NMA were qualitatively compared with the results from direct comparisons generated from traditional meta-analyses.

Sensitivity analyses

Sensitivity analyses were conducted for the following: variations in efficacy of anterior lumbar interbody fusion (ALIF) instrumentation [31], where efficacy was adjusted using a simulated trial of 100 patients to consider assumptions regarding the difference between using recombinant human bone morphogenetic protein 2 (rhBMP-2) and iliac crest bone graft in ALIF (i.e., 10% increase in efficacy in favor of rhBMP-2) [14,17]; outcome definitions, where studies that used different definitions or measures for outcomes of interest were excluded [15,17]; follow-up rate, where studies with a ≥20% loss to follow-up were excluded [12]; exclusion of studies that included patients with 1 or 2 affected spinal segments [12]; and exclusion of studies with a high risk of bias [12,17].

Results

Study inclusion & baseline characteristics

From the literature search, 2458 potentially relevant records were identified, of which 2278 records were excluded during title and abstract review. Of the 180 eligible, full-text articles assessed for inclusion, 6 publications were included in the analysis [12,14,15,17,23,24] (Figure 1). Three trials compared a TDR device with fusion [14,15,17], two compared a TDR device with another TDR device [23,24] and one compared a TDR device with conservative care [12] (Table 1). Only one study included three treatment arms [23]. Charité and ProDisc-L were included in three trials each (each as an investigational device in one and as a control device in two others), whereas Maverick, Kineflex-L and activL were included in one trial each. ALIF was included as a comparator in two trials, circumferential fusion in one trial and rehabilitation in one trial. Five studies, which were corporately sponsored, were regulated under the US FDA Investigational Device Exemption (IDE) program; one study was not involved in the FDA's IDE program and was not corporately sponsored.

Table 1. Summary of study characteristics and baseline patient demographics.

Study (year)	Blumenthal (2005)	Gornet (2011)	Zigler (2007)	Garcia (2015)	Guyer (2014)	Hellum (2011)
Comparison	TDR vs fusion	TDR vs fusion	TDR vs fusion	TDR vs TDR	TDR vs TDR	TDR vs CC
Treatment	Charité	Maverick	ProDisc-L	activL	Kineflex-L	ProDisc-L
Comparator #1	ALIF	ALIF	Circumferential fusion	Charité	Charité	Rehabilitation
Comparator #2	–	–	–	ProDisc-L	–	–
n	304	577	292	324	394^§	173
Blinding	OL	OL	SB^†	SB	SB^‡	–
Analysis population	ITT	–	ITT	ITT	–	ITT
Lost to follow-up at 2 years (%)	9%	9%	1.80%	9%	–	20%
Mean age (years)	39.6	40	39	39	40	41
Prior spinal surgery	34%	28%	34%	25%	27%	28%
Mean BMI (Kg/m²)	26	–	27	27	27	26
Number of vertebral levels affected	1	1	1	1	1	1 & 2
L4/5 level treated (%)	30%	24%	32%	30%	24%	22%^¶
L5/S1 level treated (%)	69%	75%	65%	70%	76%	46%^¶
Work status (% working)	53%	60%	–	–	–	27%
Mean baseline ODI score	51%	54%	63%	58%	60%	42%
Mean baseline back pain score^#	72	72	76	79	80	69

^#Back pain score assessed using VAS (in mm) in Blumenthal 2005, Zigler 2007, Garcia 2015 and Hellum 2011; NRS used in Gornet 2011.

^†Patients were unblinded after the surgery.

^§Represents total number of patients randomized to either treatment group and does not include nonrandomized patients in the treatment groups.

^‡Patients were blinded until after surgery.

^¶Represents surgery group only. Not reported for nonsurgical group.

BMI: Body mass index; CC: Conservative care; ITT: Intention to treat; L4/L5: Fourth and Fifth Lumbar Vertebral Segment; L5/S1: Fifth Lumbar and First Sacral Vertebral Segment; NRS: Numeric Rating Scale; ODI: Oswestry Disability Index; OL: Open label; SB: Single blind; TDR: Total disc replacement; VAS: Visual Analog Scale.

Sample sizes for included trials ranged from 173 to 577 participants (Table 1). Follow-up for all studies was 2 years; 1.8–20% loss to follow-up was observed at 2 years. Only one study included patients with lumbar DDD at 1 and 2 levels [12]. Most patients were treated at the L5/S1 level. In all studies, patients were unresponsive or had insufficient improvement to non-surgical therapy. At baseline, the ODI scores were 42–63% and mean back pain scores were 69–80 across studies.

Risk of bias assessment

The risk of bias was similar across studies (Supplementary Appendix 2). Methods for randomized sequence generation were reported in all studies: all studies used block randomization, with three studies reporting use of block sizes of six [14,15,17]. Among patients randomized to the control arm in one study, surgeons were given the choice of implanting ProDisc-L or Charité [23]. Central allocation was reported as the method used in five studies [12,14,15,17,24], with sites notified of allocation using sealed envelopes in three studies [14,17,24], by telephone in one study [15] and through website access in one study [12]. Surgeons and/or staff were not blinded for preparatory purposes in three studies [14,15,23], were blinded until informed consent was received in one study [17] and were blinded until shortly after randomization in one study [12]. Participants were blinded to their randomization group until after surgery in four studies [14,15,23,24]. Intention-to-treat (ITT) analyses were conducted in five studies [12,14,15,23,24]. Analyses in one study were not conducted by ITT, however, this study was deemed to have low risk for selection bias because analyses were approached conservatively: the one study participant who was randomized to receive the investigational device received the control treatment and had successful outcomes postoperatively; therefore, this patient was included within the control group [17].

ODI success

All studies were included in the evidence network for ODI success, with all eight treatment comparators considered (Figure 2A). Outcomes data from the individual studies are presented in Supplementary Appendix 4A. When compared with all treatment comparators, activL was associated with the highest ORs of ODI success (Supplementary Appendix 5A). The ORs of achieving ODI success were significantly in favor of activL when compared with circumferential fusion (OR: 2.58; 95% CrI: 1.13–5.83), ALIF (OR: 2.57; 95% CrI: 1.08–6.05) and rehabilitation (OR: 3.87; 95% CrI: 1.64–9.13) (Figure 3A, Supplementary Appendix 5A). Overall, activL had the highest probability of being the best treatment among all therapies (72%; Figure 4) and had the highest overall SUCRA (94%; Supplementary Appendix 6).

**Figure 3.** Forest plots presenting effect estimates and 95% credible intervals of activL versus comparator for (A) ODI success, (B) back pain score, (C) patient satisfaction, (D) employment status and (E) reoperation.
Comparisons of activL versus comparator are ordered in relative ranking of success for activL. For ODI success, back pain score, patient satisfaction and employment status, ORs >1 favor activL, whereas for reoperations, ORs <1 favor activL.
ALIF: Anterior lumbar interbody fusion; CrI: Credible interval; MD: Mean difference; ODI: Oswestry Disability Index; OR: Odds ratio.

**Figure 4.** Probability of being the best treatment (%) across outcomes.
Bayesian network meta-analysis allows estimates for the probability that a treatment is best, second best and so on, for a particular outcome. The figure presents results for the probability that a treatment is best for the given outcome, where the darker green indicates the highest probability of being the best treatment for a given outcome.
ALIF: Anterior lumbar interbody fusion; ODI: Oswestry Disability Index.

The OR estimates for ODI success from direct comparisons aligned well with the estimates obtained from the NMA (Supplementary Appendix 8). Model fit statistics indicated reasonable model fit.

Back pain score

The analysis for back pain score included all studies and comparators (Figure 2B). Outcomes data from the individual studies are presented in Supplementary Appendix 4B. When compared with all treatment comparators, activL showed the greatest mean difference from baseline in back pain score (Supplementary Appendix 5B). The mean change for back pain score, from baseline to 2 years, was significantly greater for activL than for Charité (MD -10.42; 95% CrI: -20.07, -0.82), Kineflex-L (MD -11.60; 95% CrI: -22.98, -0.33), and ALIF (MD -16.84; 95% CrI: -29.22, -4.39) (Figure 3B, Supplementary Appendix 5B). Overall, activL had the highest probability of being the best treatment among all therapies (probability best, 61%; Figure 4) and had the highest SUCRA (91%; Supplementary Appendix 6).

The MD estimates for back pain score from direct comparisons aligned well with the estimates obtained from the NMA (Supplementary Appendix 8). Model fit statistics indicated reasonable model fit.

Patient satisfaction

For patient satisfaction, the evidence network included five studies and seven comparators (Figure 2D). Outcomes data from the individual studies are presented in Supplementary Appendix 4A. Overall, activL had the highest ORs for patient satisfaction from all comparisons (Supplementary Appendix 5C). The ORs for patient satisfaction significantly favored activL compared with rehabilitation (OR 3.30; 95% CrI: 1.39–7.84) and ALIF (OR 3.75; 95% CrI: 1.56–8.77) (Figure 3C, Supplementary Appendix 5C). Overall, activL had the highest probability of being the best treatment among all therapies (probability best, 46%; Figure 4) and had the highest SUCRA (86%; Supplementary Appendix 6).

The OR estimates for patient satisfaction from direct comparisons aligned well with the estimates obtained from the NMA (Supplementary Appendix 8). Model fit statistics indicated reasonable model fit.

Employment status

Four studies were included in the evidence network for employment status and seven comparators were considered (Figure 2D). Outcomes data from the individual studies are presented in Supplementary Appendix 4A. When compared with patients who received other treatments, activL patients had the highest ORs of being employed (Supplementary Appendix 5D). Overall, there were no statistically significant differences between the comparators (Figure 3D; Supplementary Appendix 5D).

The OR estimates for employment status from direct comparisons aligned well with the estimates obtained from the NMA (Supplementary Appendix 8). Model fit statistics indicated reasonable model fit.

Reoperation

The evidence network for reoperation, defined as device failures requiring revision, removal, reoperation, or supplemental fixation, included all studies but one [12] (Figure 2E); seven treatment comparators were considered. Outcomes data from the individual studies are presented in Supplementary Appendix 4A. Overall, ORs for reoperation were in favor of Kineflex-L (Supplementary Appendix 5E). However, there were no statistically significant differences between comparators.

The OR estimates for reoperations from direct comparisons aligned well with the estimates obtained from the NMA (Supplementary Appendix 8). Model fit statistics indicated reasonable model fit.

Sensitivity analyses

Results of the primary analysis were robust for the variables tested in the sensitivity analyses (Supplementary Appendix 9). For ODI success, when adjusted for differences in efficacy associated with ALIF based on the technique used, results for ODI success were slightly sensitive to the assumption of better efficacy with rhBMP-2 than with iliac crest bone graft. Results remained in favor of activL when the analysis excluded studies using a modified version of the ODI questionnaire, studies with ≥20% loss to follow-up, studies that included 1- and 2-level surgeries and studies with a high risk of bias. Results for back pain score were sensitive to studies with a high loss to follow-up and studies that included 1- and 2-level surgeries, with MDs significantly in favor of activL. Back pain score results were robust to studies using the NRS pain scale and those with a high risk of bias. Patient satisfaction results were robust to excluding studies with ≥20% loss to follow-up and studies with a high risk of bias. Results from the primary analysis for employment status were robust to exclusion of studies with ≥20% loss to follow-up and those that included 1- and 2-level surgeries. Excluding studies with a high risk of bias resulted in numerically higher ORs for employment status that continued to favor activL; however, statistical significance was not reached. Results for reoperations were slightly sensitive to the use of the FDA's criteria for reoperation success in the Kineflex-L IDE study and to the exclusion of studies with a high risk of bias.

Discussion

Several treatment options are available for the treatment of symptomatic, single-level lumbar DDD, including conservative care, fusion and arthroplasty. Evidence from several studies shows that arthroplasty is superior to fusion and conservative care [18–20,32–34]. The activL Artificial Disc is the latest evolution of TDR devices for arthroplasty that further advances mechanical motion-preservation in the lumbar spine. Clinical trial data show that activL is noninferior to Charité and ProDisc-L in terms of efficacy and safety, with range of motion outcomes being superior, but does not directly evaluate the impact of activL compared with fusion and conservative care [23]. Thus, the current study sought to conduct a comprehensive NMA of data from all available RCTs of lumbar TDR devices to evaluate the efficacy and safety of activL compared with not only other devices, but now also with lumbar fusion and conservative care.

Overall results from this analysis demonstrate that activL is associated with improved benefits in efficacy and with similar benefits in safety compared with other treatments for single-level lumbar DDD. Benefits include having the most favorable effect estimates for ODI success, back pain score and patient satisfaction compared with all other treatments. Additionally, activL is associated with the highest probability of being the best treatment not only among all surgical options for ODI success, back pain score and patient satisfaction, but also for all nonsurgical options. The results from the current analysis, given similar study designs and patient populations, are unlikely to arise from technical differences between studies; rather, its findings are more likely due to the advanced design attributes and improved outcomes associated with the activL Artificial Disc. As reported in this NMA, activL has shown better outcomes, albeit not statistically significant, than ProDisc-L in 2-year patient outcomes for ODI improvement, back pain score, patient satisfaction and employment status. Charité, the other approved TDR comparator, is no longer commercially available.

Currently, there is no consensus on what constitutes a meaningful improvement [35]. Several different thresholds have been used to interpret the importance of ODI changes, however, meaningful score changes are likely different for each patient and a single cut-off value may not appropriately capture change for all [35]. One study defined a 10-point minimum clinically important difference between treatment groups, which was not achieved despite results being statistically significant [12]. Alternatively, the FDA recommends the use of a 15-point minimum improvement for each patient; thus, FDA IDE studies have assessed the proportion of patients who have achieved this threshold [14,15,17,23,24]. Because of the lack of agreement on a single threshold and difficulties in interpreting clinical meaningfulness the current analysis did not assess the minimum clinically important difference for ODI, and instead chose to adopt the FDA's recommendation for interpreting ODI success.

This analysis includes the most recent evidence from all available RCTs evaluating TDR devices at 2-year follow-up, including the activL FDA IDE trial [23]. All studies included had relatively low loss-to-follow-up. Interestingly, the highest loss-to-follow-up occurred in the study comparing ProDisc-L with rehabilitation, with greater loss-to-follow-up observed in the rehabilitation group (24%) than in the TDR group (15%) [12]. When this particular study was omitted in the sensitivity analysis for back pain score, results were significantly in favor of activL across all treatment comparisons in contrast to those from the primary analysis. Sensitivity analysis results for ODI success, patient satisfaction and employment status results remained in favor of activL.

Meta-analyses of therapies for lumbar DDD are prevalent in the literature and are consistent with our findings. Most of these focused on comparing TDR with fusion and results demonstrated that TDR is safe and effective for the treatment of lumbar DDD. Across several outcomes such as disability, pain and patient satisfaction, improvements were statistically significantly in favor of TDR compared with fusion [18–20,32–34]. Furthermore, TDR was demonstrated to have either similar or improved safety compared with fusion, as typically measured by total complications, reoperation rates and adjacent segment degeneration [18–20,32–34,36,37]. When compared with conservative care, TDR showed greater improvements in disability, pain and quality of life [32]. Typically, meta-analyses of therapies for lumbar DDD included studies of single-level lumbar DDD with a 2-year follow-up period; only one of these studies compared TDR with conservative care. The latest TDR devices such as activL have not been included in published meta-analyses thus far.

One limitation of this study was the use of different outcome measures among included studies. In particular, a modified version of the ODI questionnaire was used in one study [15] and the NRS was used to assess pain in one study [17]. Results of sensitivity analyses removing these studies were comparable to the primary analyses of ODI success and back pain score, respectively. In another study, results for a different definition for reoperation were also reported [24], and inclusion of these data had little impact on the primary analysis results. Furthermore, there were limited data to inform analyses; evidence networks consisted of single-study connections between the multiple treatments. Data for specific outcomes were further limited due to differences in reporting; for instance, device-related SAEs were reported in only a small number of studies that lacked a common comparator by which to conduct indirect comparisons. Another limiting factor is that the results of this study are relevant for a 2-year follow-up period; however, evidence from RCTs and recently pooled analyses on long-term outcomes support these results, showing that the benefits with TDR devices over fusion are maintained over 5 years [Zigler et al., Unpublished data] [38–42].

The lack of direct comparative data between alternative treatments creates variations across health technology assessments and challenges for decision-makers. Network meta-analyses have not been previously published in the field of lumbar DDD, but have the potential to significantly aid in the clinical management and decision-making around numerous potential therapies. NMA, incorporating relevant clinical data while maintaining rigorous and credible methods, provides a feasible alternative to direct comparisons of all therapeutic options through RCTs.

Our study allows for the indirect comparison of activL with all relevant treatment comparators, including competitive TDR devices, lumbar fusion, and conservative care, such that decision-makers can make informed and relevant decisions in this continuously growing therapeutic area. Enhanced outcomes with TDR have been shown to improve the economic burden associated with lumbar DDD. Health economic studies using 2-year data clearly demonstrate that reduced operating time and length of hospital stay with TDR compared with fusion translate into lower direct costs [43–47]. Further, evaluation of cost–effectiveness using RCT data indicate that TDR may be cost-effective compared with conservative care at 2-year follow-up [48].

Future research using different materials to improve axial load transmission, wear and interface with the native spine should only further improve outcomes. Even at this relatively early stage in the evolution of lumbar arthroplasty, clinical results are very encouraging. Using the broader view afforded by this NMA allows us to clearly see that in appropriately selected and screened patients, arthroplasty offers better results than fusion or continued conservative care.

Conclusion

This is the first highly rigorous NMA that compares TDR devices with each other and with other forms of standard care, including conservative care, for the treatment of single-level lumbar DDD at 2-year follow-up. The analysis includes the most recent evidence from all available RCTs evaluating lumbar TDR devices, including data from the FDA IDE trial of the most recently approved device, activL.

This NMA clearly demonstrates that, compared with other TDR devices, surgical fusion approaches, and even conservative care, the activL Artificial Disc substantially improves ODI success, back pain score and patient satisfaction without significantly impacting reoperations in patients with single-level lumbar DDD. When choosing arthroplasty, comparison of outcomes shows that patients who received activL did better than those who received ProDisc-L, currently its only competitor in the USA.

Summary points

Network meta-analyses allow pooling of data from multiple studies and indirect comparison of treatments that have not been compared in direct head-to-head trials.

This is the first highly rigorous network meta-analysis that compares total disc replacement (TDR) devices, lumbar fusion and conservative care for the treatment of single-level lumbar degenerative disc disease (DDD) at 2-year follow-up.

Methods

Randomized controlled trials comparing a TDR device with another TDR device, fusion or conservative care for single-level lumbar DDD were included. Studies reported at least one of the following outcomes: Oswestry Disability Index (ODI) success, back pain score, patient satisfaction, employment status, need for reoperation and device-related serious adverse events.

Network meta-analyses were conducted using a fixed-effects model that reported odds ratios (ORs) or mean differences (MDs) and 95% credible intervals (95% CrIs). Other measures reported were the probability of being the best treatment and the surface under the cumulative ranking curve area.

Results

activL had the highest ORs for achieving ODI success overall; statistically significant results were observed when activL was compared with circumferential fusion (OR 2.58; 95% CrI: 1.13–5.83), anterior lumbar interbody fusion (ALIF; OR 2.57; 95% CrI: 1.08–6.05), and rehabilitation (OR 3.87; 95% CrI: 1.64–9.13).

activL had the greatest MDs for back pain score overall, with significantly greater reductions than Charité (MD -10.42; 95% CrI: -20.07, -0.82), Kineflex-L (MD -11.60; 95% CrI: -22.98, -0.33) and ALIF (MD -16.84; 95% CrI: -29.22, -4.39).

Results for patient satisfaction were highest for activL; ORs significantly favored activL over rehabilitation (OR 3.30; 95% CrI: 1.39–7.84) and ALIF (OR 3.75; 95% CrI: 1.56–8.77).

Results for employment status and reoperations were similar across comparators.

Conclusion

Compared with other TDR devices, surgical fusion approaches and even conservative care, the activL Artificial Disc substantially improves ODI success, back pain score and patient satisfaction without significantly impacting reoperations in patients with single-level lumbar DDD.

When choosing arthroplasty, comparison of outcomes shows that patients who received activL did better than those who received ProDisc-L, currently its only competitor in the USA.

Acknowledgements

The authors would like to thank David Banko of B Braun (PA, USA) and Katie Kleinschuster of Aesculap (PA, USA) for constructive discussions on study design.

Financial & competing interests disclosure

This study was sponsored by Aesculap Implant Systems, LLC. J Zigler received no compensation for work on this manuscript; he has received consultancy fees from Aesculap and DePuy Synthes outside the submitted work. Cornerstone Research Group, Inc., was contracted both to conduct the analysis and develop the manuscript. N Ferko, C Cameron and L Patel are employees of Cornerstone Research Group, Inc. Cornerstone Research Group, Inc., receives consultancy fees from major pharmaceutical and device companies, including Aesculap. The authors have no other relevant affiliations or financial involvement with any organization or entity with a financial interest in or financial conflict with the subject matter or materials discussed in the manuscript apart from those disclosed.

No writing assistance was utilized in the production of this manuscript.

Supplementary Material

File (supplementary material_patel.docx)

Download
263.98 KB

References

Papers of special note have been highlighted as: • of interest; •• of considerable interest

Cihangiroglu M, Yildirim H, Bozgeyik Z et al. Observer variability based on the strength of MR scanners in the assessment of lumbar degenerative disc disease. Eur. J. Radiol. 51(3), 202–208 (2004).