ror HF Staff commited on
Commit
e539bb0
·
1 Parent(s): 1b39b38

Test count display, important models, typo fix

Browse files
Files changed (4) hide show
  1. README.md +13 -3
  2. app.py +1 -1
  3. data.py +1 -1
  4. summary_page.py +41 -25
README.md CHANGED
@@ -12,11 +12,11 @@ short_description: A dashboard
12
 
13
  # TCID
14
 
15
- This space displays the state of the `transformers` CI on two hardwares, for a subset of models. The CI is run daily, on both AMD MI325 and Nvidia A10. The CI runs a different number of tests for each model. When a test finishes, it is assigned a status depending on its outcome:
16
 
17
  - passed: the test finsihed and the expected output (or outputs) were retrieved;
18
  - failed: the test either did not finish or the output was different from the expected output;
19
- - skipped: the test was not run, which usually happens when a test is incompatible with a model. For instance, some models skip `flash-attention`-related tests because they are incompatible with `flash-attention`;
20
  - error: the test did not finish and python crashed;
21
 
22
  The dashboard is divided in two main parts:
@@ -26,8 +26,18 @@ The dashboard is divided in two main parts:
26
  On the summary page, you can see a snapshot of the mix of test passed, failed and skipped for each model. The summary page also features an "Overall failures rate" for AMD and NVIDIA, which is computed this way:
27
  ```overall_failure_rate = (failed + error) / (passed + failed + error)```
28
 
29
- We do not account for the test skipped in this overall failure rate, because skipped test have no chance to neither pass nor fail.
 
30
 
31
  ## Models page
32
 
33
  From the sidebar, you can access a detailled view of each model. In it, you will find the breakdown of test statuses and the names of the test that failed for single and multi-gpu runs.
 
 
 
 
 
 
 
 
 
 
12
 
13
  # TCID
14
 
15
+ This space displays the state of the `transformers` CI on two hardwares, for a **subset of models**. The CI is run daily, on both AMD MI325 and Nvidia A10. The CI runs a different number of tests for each model. When a test finishes, it is assigned a status depending on its outcome:
16
 
17
  - passed: the test finsihed and the expected output (or outputs) were retrieved;
18
  - failed: the test either did not finish or the output was different from the expected output;
19
+ - skipped: the test was not run, but it not expected to. More details on this at the end of the README;
20
  - error: the test did not finish and python crashed;
21
 
22
  The dashboard is divided in two main parts:
 
26
  On the summary page, you can see a snapshot of the mix of test passed, failed and skipped for each model. The summary page also features an "Overall failures rate" for AMD and NVIDIA, which is computed this way:
27
  ```overall_failure_rate = (failed + error) / (passed + failed + error)```
28
 
29
+ We do not account for the test skipped in this overall failure rate, because skipped test have no chance to neither pass nor fail.
30
+ We only consider the tests for a **subset of models** out of all the models supported in `transformers`. This subset is named important models, and is mainly defined by model usage.
31
 
32
  ## Models page
33
 
34
  From the sidebar, you can access a detailled view of each model. In it, you will find the breakdown of test statuses and the names of the test that failed for single and multi-gpu runs.
35
+
36
+ ## Skipped test
37
+
38
+ You can probably see many skipped tests in the `transformers` CI, which be perplexing. When a test is skipped, it's usually one of three reasons:
39
+ - the test requires a package that is not included in the default transformers docker that the CI uses, like flash attention 3 or deepspeed;
40
+ - the hardware is not the correct one, for instance there are a bunch of MPS (apple hardware) tests that are of course not run on AMD or Nvidia CI;
41
+ - the model is incompatible with what the test is for, say torch.fx or flash-attention, which are incompatible with some models architecture;
42
+
43
+ Skipping tests rather than not collecting them offers the advantage of having similar test counts across CIs that do not run on the same hardware. Thus, it total test count differs between two CIs, one can immediately know that one of the two only ran partialy. This would not be the case if some skipped tests were not collected at all.
app.py CHANGED
@@ -61,7 +61,7 @@ def get_description_text():
61
  ]
62
  msg = ["**" + x + "**" for x in msg] + [""]
63
  if Ci_results.latest_update_msg:
64
- msg.append(f"*({Ci_results.latest_update_msg})*")
65
  else:
66
  msg.append("*(loading...)*")
67
  return "<br>".join(msg)
 
61
  ]
62
  msg = ["**" + x + "**" for x in msg] + [""]
63
  if Ci_results.latest_update_msg:
64
+ msg.append(f"*This dashboard only tracks important models*<br>*({Ci_results.latest_update_msg})*")
65
  else:
66
  msg.append("*(loading...)*")
67
  return "<br>".join(msg)
data.py CHANGED
@@ -19,7 +19,7 @@ IMPORTANT_MODELS = [
19
  "vit", # old (vision) - fixed comma
20
  "clip", # old but dominant (vision)
21
  "detr", # objection detection, segmentation (vision)
22
- "table-transformer", # objection detection (visioin) - maybe just detr?
23
  "got_ocr2", # ocr (vision)
24
  "whisper", # old but dominant (audio)
25
  "wav2vec2", # old (audio)
 
19
  "vit", # old (vision) - fixed comma
20
  "clip", # old but dominant (vision)
21
  "detr", # objection detection, segmentation (vision)
22
+ "table_transformer", # objection detection (visioin) - maybe just detr?
23
  "got_ocr2", # ocr (vision)
24
  "whisper", # old but dominant (audio)
25
  "wav2vec2", # old (audio)
summary_page.py CHANGED
@@ -38,15 +38,17 @@ LABEL_OFFSET = 1 # Distance of label from bar
38
  FAILURE_RATE_FONT_SIZE = 28
39
 
40
 
41
- def calculate_overall_failure_rates(df: pd.DataFrame, available_models: list[str]) -> tuple[float, float]:
42
  """Calculate overall failure rates for AMD and NVIDIA across all models."""
43
  if df.empty or not available_models:
44
  return 0.0, 0.0
45
 
46
- total_amd_tests = 0
47
- total_amd_failures = 0
48
- total_nvidia_tests = 0
49
- total_nvidia_failures = 0
 
 
50
 
51
  for model_name in available_models:
52
  if model_name not in df.index:
@@ -56,21 +58,16 @@ def calculate_overall_failure_rates(df: pd.DataFrame, available_models: list[str
56
  amd_stats, nvidia_stats = extract_model_data(row)[:2]
57
 
58
  # AMD totals
59
- amd_total = amd_stats['passed'] + amd_stats['failed'] + amd_stats['error']
60
- if amd_total > 0:
61
- total_amd_tests += amd_total
62
- total_amd_failures += amd_stats['failed'] + amd_stats['error']
63
-
64
  # NVIDIA totals
65
- nvidia_total = nvidia_stats['passed'] + nvidia_stats['failed'] + nvidia_stats['error']
66
- if nvidia_total > 0:
67
- total_nvidia_tests += nvidia_total
68
- total_nvidia_failures += nvidia_stats['failed'] + nvidia_stats['error']
69
-
70
- amd_failure_rate = (total_amd_failures / total_amd_tests * 100) if total_amd_tests > 0 else 0.0
71
- nvidia_failure_rate = (total_nvidia_failures / total_nvidia_tests * 100) if total_nvidia_tests > 0 else 0.0
72
 
73
- return amd_failure_rate, nvidia_failure_rate
74
 
75
 
76
  def draw_text_and_bar(
@@ -118,7 +115,12 @@ def create_summary_page(df: pd.DataFrame, available_models: list[str]) -> plt.Fi
118
  return fig
119
 
120
  # Calculate overall failure rates
121
- amd_failure_rate, nvidia_failure_rate = calculate_overall_failure_rates(df, available_models)
 
 
 
 
 
122
 
123
  # Calculate dimensions for N-column layout
124
  model_count = len(available_models)
@@ -181,6 +183,26 @@ def create_summary_page(df: pd.DataFrame, available_models: list[str]) -> plt.Fi
181
  visible_model_count += 1
182
 
183
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
184
  # Add legend horizontally in bottom right corner
185
  patch_height = 0.3
186
  patch_width = 3
@@ -190,12 +212,6 @@ def create_summary_page(df: pd.DataFrame, available_models: list[str]) -> plt.Fi
190
  legend_spacing = 10
191
  legend_font_size = 15
192
 
193
- # Add failure rate explanation text on the left
194
- # explanation_text = "Failure rate = failed / (passed + failed)"
195
- # ax.text(0, legend_y, explanation_text,
196
- # ha='left', va='bottom', color='#CCCCCC',
197
- # fontsize=legend_font_size, fontfamily='monospace', style='italic')
198
-
199
  # Legend entries
200
  legend_items = [
201
  ('passed', 'Passed'),
 
38
  FAILURE_RATE_FONT_SIZE = 28
39
 
40
 
41
+ def get_overall_stats(df: pd.DataFrame, available_models: list[str]) -> tuple[list[int], list[int]]:
42
  """Calculate overall failure rates for AMD and NVIDIA across all models."""
43
  if df.empty or not available_models:
44
  return 0.0, 0.0
45
 
46
+ total_amd_passed = 0
47
+ total_amd_failed = 0
48
+ total_amd_skipped = 0
49
+ total_nvidia_passed = 0
50
+ total_nvidia_failed = 0
51
+ total_nvidia_skipped = 0
52
 
53
  for model_name in available_models:
54
  if model_name not in df.index:
 
58
  amd_stats, nvidia_stats = extract_model_data(row)[:2]
59
 
60
  # AMD totals
61
+ total_amd_passed += amd_stats['passed']
62
+ total_amd_failed += amd_stats['failed'] + amd_stats['error']
63
+ total_amd_skipped += amd_stats['skipped']
64
+
 
65
  # NVIDIA totals
66
+ total_nvidia_passed += nvidia_stats['passed']
67
+ total_nvidia_failed += nvidia_stats['failed'] + nvidia_stats['error']
68
+ total_nvidia_skipped += nvidia_stats['skipped']
 
 
 
 
69
 
70
+ return [total_amd_passed, total_amd_failed, total_amd_skipped], [total_nvidia_passed, total_nvidia_failed, total_nvidia_skipped]
71
 
72
 
73
  def draw_text_and_bar(
 
115
  return fig
116
 
117
  # Calculate overall failure rates
118
+ amd_counts, nvidia_counts = get_overall_stats(df, available_models)
119
+
120
+ amd_failure_rate = (amd_counts[1] / sum(amd_counts)) if sum(amd_counts) > 0 else 0.0
121
+ amd_failure_rate *= 100
122
+ nvidia_failure_rate = (nvidia_counts[1] / sum(nvidia_counts)) if sum(nvidia_counts) > 0 else 0.0
123
+ nvidia_failure_rate *= 100
124
 
125
  # Calculate dimensions for N-column layout
126
  model_count = len(available_models)
 
183
  visible_model_count += 1
184
 
185
 
186
+ # Add AMD and NVIDIA test totals in the bottom left
187
+ # Calculate line spacing to align middle with legend
188
+ line_height = 0.4 # Height between lines
189
+ legend_y = max_y + 1
190
+
191
+ # Position the two lines so their middle aligns with legend_y
192
+ amd_y = legend_y - line_height / 2
193
+ nvidia_y = legend_y + line_height / 2
194
+
195
+ amd_totals_text = f"AMD Tests - Passed: {amd_counts[0]}, Failed: {amd_counts[1]}, Skipped: {amd_counts[2]}"
196
+ nvidia_totals_text = f"NVIDIA Tests - Passed: {nvidia_counts[0]}, Failed: {nvidia_counts[1]}, Skipped: {nvidia_counts[2]}"
197
+
198
+ ax.text(0, amd_y, amd_totals_text,
199
+ ha='left', va='bottom', color='#CCCCCC',
200
+ fontsize=14, fontfamily='monospace')
201
+
202
+ ax.text(0, nvidia_y, nvidia_totals_text,
203
+ ha='left', va='bottom', color='#CCCCCC',
204
+ fontsize=14, fontfamily='monospace')
205
+
206
  # Add legend horizontally in bottom right corner
207
  patch_height = 0.3
208
  patch_width = 3
 
212
  legend_spacing = 10
213
  legend_font_size = 15
214
 
 
 
 
 
 
 
215
  # Legend entries
216
  legend_items = [
217
  ('passed', 'Passed'),