Bug Tracker

Opened 6 years ago

Closed 5 years ago

#14536 closed bug (wontfix)

Dealing with flaky tests

Reported by: jzaefferer Owned by:
Priority: undecided Milestone: None
Component: unfiled Version: 1.10.2
Keywords: Cc:
Blocked by: Blocking:

Description

For example, testing anything related to focus is difficult and unreliable. Tests that are flaky have very questionable value in the first place.

Suggestion: Add a checkbox for “Run flaky tests” via QUnit.config.urlConfig. When enabled specifically, it will run those flaky tests, by default, on TestSwam, it won’t.

(from the Amsterdam meeting)

Change History (10)

comment:1 Changed 6 years ago by scottgonzalez

When the checkbox isn't checked (default), there should be a warning saying that some tests were skipped, so that devs don't forget there are other tests when running locally.

comment:2 Changed 6 years ago by timmywil

I really don't like the idea that tests would be skipped in testswarm. If they are skippable, they are removable. If they are not removable, but they are flaky, they should be fixed.

comment:3 Changed 6 years ago by scottgonzalez

Well, obviously nobody has a fix for them, and nobody wants to actually delete them. If either of those were the case, this wouldn't have been discussed several times over the past year or so ;-)

comment:4 Changed 6 years ago by dmethvin

I feel like we should keep them in there as a sign that we wish they *would* work reliably, but have some facility in QUnit to indicate when they've failed without getting a fail for the whole test. Does that sound wishy-washy enough?

Also, I want to have my cake and eat it too. And a pony.

comment:5 Changed 6 years ago by scottgonzalez

Well, a skip feature in QUnit has been turned down several times already. Besides, you want to be able to easily opt-in to running these tests, without going and deleting the skipped marker for each individual test.

comment:6 Changed 6 years ago by timmywil

Well, obviously nobody has a fix for them, and nobody wants to actually delete them.

I don't think either of these things are true. In my experience, it's not that the flaky tests are not fixable. It is about the lack of time to spend stabilizing these tests. Also, I'm sure we could find some tests that we are willing to delete if, again, we took the time to do so. Nonetheless, neither lack of time nor the desire to see green on testswarm are good enough reasons to have a config flag to skip any tests. The problem remains at the test level. No excuse suffices to make it otherwise.

On a related note, I would also not mind a pony.

comment:7 Changed 6 years ago by jzaefferer

some facility in QUnit to indicate when they've failed without getting a fail for the whole test

That sounds like a great plugin.

comment:8 in reply to:  6 Changed 5 years ago by m_gol

Replying to timmywil:

Well, obviously nobody has a fix for them, and nobody wants to actually delete them.

I don't think either of these things are true. In my experience, it's not that the flaky tests are not fixable. It is about the lack of time to spend stabilizing these tests.

I don't fully agree. Some tests tend to fail when run on TestSwarm but not necessarily locally, some of them pass if run separately. If I test locally, it's acceptable for me to see on or two failed tests if I can then click on the "rerun" button for them separately and see if they really fail. It would be better to have them marked as flakey so that they don't cause the page to go red but yellow and so that I know those are the tests that I have to re-run to make sure they're not broken by the patch. Integrating such a solution to our test setup would be difficult and it would seriously bump the overall test time.

Of course, it would be more difficult to maintain such a situation if we had too many of those tests so you have a point.

comment:8 in reply to:  6 Changed 5 years ago by m_gol

Replying to timmywil:

Well, obviously nobody has a fix for them, and nobody wants to actually delete them.

I don't think either of these things are true. In my experience, it's not that the flaky tests are not fixable. It is about the lack of time to spend stabilizing these tests.

I don't fully agree. Some tests tend to fail when run on TestSwarm but not necessarily locally, some of them pass if run separately. If I test locally, it's acceptable for me to see on or two failed tests if I can then click on the "rerun" button for them separately and see if they really fail. It would be better to have them marked as flakey so that they don't cause the page to go red but yellow and so that I know those are the tests that I have to re-run to make sure they're not broken by the patch. Integrating such a solution to our test setup would be difficult and it would seriously bump the overall test time.

Of course, it would be more difficult to maintain such a situation if we had too many of those tests so you have a point.

comment:9 Changed 5 years ago by timmywil

Resolution: wontfix
Status: newclosed

Some tests would inevitably end up in the flaky category undeservedly and we'd end up with inconspicuous regressions. The core team will deal with flaky tests at the test level on a case-by-case basis.

Note: See TracTickets for help on using tickets.