--- title: Tcid emoji: 👁 colorFrom: indigo colorTo: pink sdk: gradio sdk_version: 5.38.0 app_file: app.py pinned: false short_description: A dashboard --- # TCID This space displays the state of the `transformers` CI on two hardwares, for a subset **important models**. The CI is run daily, on both AMD MI325 and Nvidia A10. The CI runs a different number of tests for each model. When a test finishes, it is assigned a status depending on its outcome: - passed: the test finsihed and the expected output (or outputs) were retrieved; - failed: the test either did not finish or the output was different from the expected output; - skipped: the test was not run, but it not expected to. More details on this at the end of the README; - error: the test did not finish and python crashed; The dashboard is divided in two main parts: ## Summary page On the summary page, you can see a snapshot of the mix of test passed, failed and skipped for each model. The summary page also features an "Overall failures rate" for AMD and NVIDIA, which is computed this way: ```overall_failure_rate = (failed + error) / (passed + failed + error)``` We do not account for the test skipped in this overall failure rate, because skipped test have no chance to neither pass nor fail. We only consider the tests for a **subset of models** out of all the models supported in `transformers`. This subset is named important models, and is mainly defined by model usage. ## Models page From the sidebar, you can access a detailled view of each model. In it, you will find the breakdown of test statuses and the names of the test that failed for single and multi-gpu runs. ## Skipped test You can probably see many skipped tests in the `transformers` CI, which be perplexing. When a test is skipped, it's usually one of three reasons: - the test requires a package that is not included in the default transformers docker that the CI uses, like flash attention 3 or deepspeed; - the hardware is not the correct one, for instance there are a bunch of MPS (apple hardware) tests that are of course not run on AMD or Nvidia CI; - the model is incompatible with what the test is for, say torch.fx or flash-attention, which are incompatible with some models architecture; Skipping tests rather than not collecting them offers the advantage of having similar test counts across CIs that do not run on the same hardware. Thus, it total test count differs between two CIs, one can immediately know that one of the two only ran partialy. This would not be the case if some skipped tests were not collected at all.