Great job

by Trilogix1 - opened 4 days ago

4 days ago

This one is very well done. The first one that it do not loop and it actually finish the job (most of them correctly) at decent speeds. I am impressed, so few tokens and accurate result.
This is one of my many tests, but is usually a hard one:

<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8">
<title>Gene expression under 12 stress conditions</title>
<script src="https://cdn.plot.ly/plotly-latest.min.js"></script>
<style>
  body{font-family:Arial,Helvetica,sans-serif;margin:0;padding:0;}
  #plot{width:100%;height:90vh;}
</style>
</head>
<body>
<div id="plot"></div>

<script>
// ---------------------------------------------------------------------
// Raw data – 6 genes, 12 conditions, 3 replicates each
// ---------------------------------------------------------------------
const raw = `Gene,As_1,As_2,As_3,Bs_1,Bs_2,Bs_3,Ctrl_1,Ctrl_2,Ctrl_3,Li_1,Li_2,Li_3,Mig_1,Mig_2,Mig_3,Nd_1,Nd_2,Nd_3,Ns_1,Ns_2,Ns_3,Oss_1,Oss_2,Oss_3,Oxs_1,Oxs_2,Oxs_3,Sp_1,Sp_2,Sp_3,Tm_1,Tm_2,Tm_3,Vic_1,Vic_2,Vic_3
SL1344_RS09220,2.920039927,2.088597766,2.351852594,0.395701907,-0.037878549,0.002366695,0.61958973,0.529263057,0.406385383,0.238016771,-0.088636689,0.034388695,-0.612358556,-0.607858916,-0.697820107,-0.906114583,-0.962514778,-0.850801496,-0.711272314,-0.787974085,-0.722047149,-0.26694706,-0.362615123,-0.451280425,-0.854071922,-0.996363752,-0.964899556,-0.972183742,-0.832297051,-0.969806712,0.23462426,0.57133122,0.512196046,1.752142222,0.096479678,0.902766615
SL1344_RS13225,2.049882023,1.907231059,2.363275326,-0.21236263,0.057910802,0.038901715,-0.111562582,-0.246630265,-0.23983352,-0.528688003,-0.526762519,-0.528922038,-0.478567568,-0.537458449,-0.536981066,1.575686015,0.67915306,1.502125186,-1.146987066,-1.264593397,-1.097411643,-0.742750196,-0.607401591,-0.806717834,-1.177445741,-1.267363424,-1.327737523,-0.509010218,-0.262973746,0.088542395,0.426950249,0.37049552,0.773172796,1.397371223,-0.02055947,0.94802312
SL1344_RS19175,1.596388576,1.718135269,1.79428816,0.74699178,0.211998795,0.134529092,0.532929636,0.224503524,-0.284021453,0.401650539,0.382925614,0.857906692,-1.09956742,-0.982035268,-1.157369203,-1.134115236,-0.962057223,-0.872334067,0.043217401,0.094328966,0.287967287,-0.390282146,-0.479151247,-0.915539924,-0.915539924,-0.928562834,-1.159715538,-0.928562834,-0.988290034,-0.647906101,-0.660859576,0.850265106,0.591251538,0.818029397,2.755340798,0.27007314,-1.717309988
SL1344_RS19100,1.488545323,2.126717051,2.86715515,0.18914478,-0.366788889,0.097087392,0.397591846,0.346262629,0.423539624,-0.627348924,-0.572813586,0.05343984,-0.073278619,-0.292082825,-0.402875456,-0.618025858,-0.865920437,-0.529925891,-0.871990993,-0.799999588,-0.459452728,0.256993853,0.276465344,-0.286431404,-0.71002051,-0.730631642,-0.832485693,-0.191472225,-0.444055503,0.152447845,0.162054112,-0.013786033,-0.432829239,3.306386447,-1.010807595,-1.010807595
SL1344_RS10730,1.776681289,1.729793187,1.475857904,0.565026444,0.970272533,0.746381895,1.30857376,0.743106199,0.614629122,-0.349504762,-0.296774814,-0.339240323,-0.755446959,-0.622827201,-0.545635387,-1.039522568,-1.366719386,-1.240086696,-0.248257759,-0.028027792,-0.067603286,0.969263532,0.789001114,0.549772586,0.842689503,0.563745477,0.367439317,-1.290066611,-1.433199676,-0.917437598,0.791959161,0.20594673,0.334857241,-1.582023687,-1.693683437,-1.528939056
SL1344_RS22030,1.761847164,1.636198222,1.80107548,-0.482996176,0.003545357,-0.301613182,-0.2153358,-0.24394108,-0.538189375,-0.144701461,-0.530659092,-0.692769269,-1.218559769,-1.08924836,-1.3169253,-0.620084536,-0.262413545,-0.105524104,0.098177479,0.303504557,0.125278841,0.123379819,0.548767104,-0.861998811,-0.706624624,-0.969983753,-0.73780992,0.371805109,0.539382823,0.819347609,-1.099007983,-0.890716735,-0.837143282,2.192809466,0.887287232,2.653839897`;

// ---------------------------------------------------------------------
// Helper functions
// ---------------------------------------------------------------------
function parseCSV(text) {
  const rows = text.trim().split('\n').map(r=>r.trim());
  const header = rows[0].split(',');
  const data = rows.slice(1).map(line=>{
    const vals = line.split(',');
    const obj = {Gene: vals[0]};
    header.slice(1).forEach((h,i)=>{obj[h]=parseFloat(vals[i+1]);});
    return obj;
  });
  return {header, data};
}

function meanStd(values) {
  const n = values.length;
  const mean = values.reduce((s,v)=>s+v,0)/n;
  const varSum = values.reduce((s,v)=>s+(v-mean)*(v-mean),0);
  const std = Math.sqrt(varSum/(n-1));
  return [mean,std];
}

// ---------------------------------------------------------------------
// Process data
// ---------------------------------------------------------------------
const {header, data} = parseCSV(raw);
// conditions we want to plot (12)
const conds = ['As','Bs','Ctrl','Li','Mig','Nd','Ns','Oss','Oxs','Sp','Tm','Vic'];

// Build a data structure: {gene:{cond:{mean,std}}}
const plotData = {};
data.forEach(rec=>{
  const g = rec.Gene;
  plotData[g] = {};
  conds.forEach(c=>{
    const vals = [rec[`${c}_1`],rec[`${c}_2`],rec[`${c}_3`]];
    const [m,s] = meanStd(vals);
    plotData[g][c] = {mean:m, std:s};
  });
});

// ---------------------------------------------------------------------
// Create Plotly traces
// ---------------------------------------------------------------------
const colors = ['#1f77b4','#ff7f0e','#2ca02c','#d62728','#9467bd','#8c564b'];
const traces = [];
let i=0;
Object.keys(plotData).forEach(g=>{
  const x = conds;
  const y = conds.map(c=>plotData[g][c].mean);
  const err = conds.map(c=>plotData[g][c].std);
  traces.push({
    x, y,
    name: g,
    mode:'lines+markers',
    line:{color:colors[i%colors.length]},
    error_y:{type:'data', array:err, visible:true},
    hovertemplate: `${g}<br>%{x} : %{y:.3f} ± %{error_y.array:.3f}<extra></extra>`
  });
  i++;
});

// ---------------------------------------------------------------------
// Plot
// ---------------------------------------------------------------------
const layout = {
  title:'Mean expression (± SD) of 6 genes in 12 stress conditions',
  xaxis:{title:'Condition'},
  yaxis:{title:'log₂ fold‑change'},
  hovermode:'closest',
  legend:{orientation:'h', y:-0.2},
  margin:{l:60,r:20,t:60,b:80}
};
Plotly.newPlot('plot',traces,layout);
</script>
</body>
</html>

What the file does

Data – the table you supplied is embedded as a multiline string.
Parsing – the string is split into rows and columns; the first column is the gene ID, the remaining 36 columns are the 3 replicates of each of the 12 conditions.
Statistics – for every condition and every gene the mean and standard deviation of the 3 replicates are computed.
Plot – a single Plotly line chart is created.
- Each gene gets its own colour.
- Error bars show ± SD.
- Hovering over a point displays the gene name, condition, mean value and the SD.

Open the file in a browser and you’ll see the interactive SVG plot you requested. If you want to change colours, add more genes, or adjust the axes, you can edit the JavaScript section accordingly

Result:

Now we have a real competition to Qwen and Deepseek Open weights with same precision (in many tasks) but better speeds. Qwen coder is still better in general but not so fast and it has higher ctx. In my opinion Gpt-oss has real potential if increased the size to half a million. Still very happy with your distill.
Thank you :)

armand0e

Owner 3 days ago

Thanks a bunch! Means a lot, will keep pumping these out. Huge thanks to @Liontix for such a high quality dataset.

I think I messed up with the dataset formatting funcs which causes <|message|> to appear at the start of each response. Still was overall impressed with this model's efficiency as well. Thanks for sharing this data.

Trilogix1

2 days ago

Can you apply Cot to it on top of your distill: https://huggingface.co/xl-zhao?

armand0e

Owner about 10 hours ago

Can you apply Cot to it on top of your distill: https://huggingface.co/xl-zhao?

Hello @Trilogix1 , I'm unsure exactly what you mean by this. The point of this distill was to mimic the output and behavior of GLM-4.6

armand0e

Owner about 10 hours ago

This one is very well done. The first one that it do not loop and it actually finish the job (most of them correctly) at decent speeds. I am impressed, so few tokens and accurate result.
This is one of my many tests, but is usually a hard one:

I've redone the distill, was curious exactly what prompt you gave it and through what framework (llama.cpp, lm studio, jan, etc). Can use these to evaluate for improvements.

armand0e changed discussion status to closed about 10 hours ago

armand0e changed discussion status to open about 10 hours ago

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment