Just because I don't care doesn't mean I don't understand.
589 stories
·
3 followers

Loop targets

1 Comment

I posted a Python tidbit about how for loops can assign to other things than simple variables, and many people were surprised or even concerned:

Sample Python assigning to a dict item in a for loop, same as text below
params = {
    "query": QUERY,
    "page_size": 100,
}

# Get page=0, page=1, page=2, ...
for params["page"] in itertools.count():
    data = requests.get(SEARCH_URL, params).json()
    if not data["results"]:
        break
    ...

This code makes successive GET requests to a URL, with a params dict as the data payload. Each request uses the same data, except the “page” item is 0, then 1, 2, and so on. It has the same effect as if we had written it:

for page_num in itertools.count():
    params["page"] = page_num
    data = requests.get(SEARCH_URL, params).json()

One reply asked if there was a new params dict in each iteration. No, loops in Python do not create a scope, and never make new variables. The loop target is assigned to exactly as if it were an assignment statement.

As a Python Discord helper once described it,

While loops are “if” on repeat. For loops are assignment on repeat.

A loop like for <ANYTHING> in <ITER>: will take successive values from <ITER> and do an assignment exactly as this statement would: <ANYTHING> = <VAL>. If the assignment statement is ok, then the for loop is ok.

We’re used to seeing for loops that do more than a simple assignment:

for i, thing in enumerate(things):
    ...

for x, y, z in zip(xs, ys, zs):
    ...

These work because Python can assign to a number of variables at once:

i, thing = 0, "hello"
x, y, z = 1, 2, 3

Assigning to a dict key (or an attribute, or a property setter, and so on) in a for loop is an example of Python having a few independent mechanisms that combine in uniform ways. We aren’t used to seeing exotic combinations, but you can reason through how they would behave, and you would be right.

You can assign to a dict key in an assignment statement, so you can assign to it in a for loop. You might decide it’s too unusual to use, but it is possible and it works.

Read the whole story
jgbishop
1 day ago
reply
I had no idea you could do this. This is definitely something I'll use going forward!
Durham, NC
Share this story
Delete

The Big Wait

1 Comment

The Big Wait is a lovely short documentary about a couple who live alone in the middle of nowhere in Western Australia, managing an emergency airport and a small row of guest cottages that are rarely occupied. I got this from Colossal, which calls the film “poetic and dryly humorous”; I cannot improve upon that.

Tags: video

💬 Join the discussion on kottke.org

Read the whole story
jgbishop
12 days ago
reply
Pretty good little documentary!
Durham, NC
Share this story
Delete

Ephesians 1:9-10

1 Comment
“God has now revealed to us his mysterious will regarding Christ—which is to fulfill his own good plan. And this is the plan: At the right time he will bring everything together under the authority of Christ—everything in heaven and on earth.”
Read the whole story
jgbishop
14 days ago
reply
Can the right time be today?
Durham, NC
GaryBIshop
14 days ago
I'm for it.
Share this story
Delete

Scientists glue two proteins together, driving cancer cells to self-destruct

1 Comment

Stanford researchers hope new technique will flip lymphoma protein’s normal action — from preventing cell death to triggering it.

October 22, 2024 - By Rachel Tompa

A new molecule developed by Stanford Medicine researchers (turquoise and yellow) tethers two proteins (purple and red) that together switch on self-destruction genes in cancer cells.
Ella Maru Studio

Our bodies divest themselves of 60 billion cells every day through a natural process of cell culling and turnover called apoptosis.

These cells — mainly blood and gut cells — are all replaced with new ones, but the way our bodies rid themselves of material could have profound implications for cancer therapies in a new approach developed by Stanford Medicine researchers.

They aim to use this natural method of cell death to trick cancer cells into disposing of themselves. Their method accomplishes this by artificially bringing together two proteins in such a way that the new compound switches on a set of cell death genes, ultimately driving tumor cells to turn on themselves. The researchers describe their latest such compound in a paper published Oct. 4 in Science.

The idea came to Gerald Crabtree, MD, a professor of development biology, during a pandemic stroll through the forests of Kings Mountain, west of Palo Alto, California. As he walked, Crabtree, a longtime cancer biologist, was thinking about major milestones in biology.

One of the milestones he pondered was the 1970s-era discovery that cells trigger their own deaths for the greater good of the organism. Apoptosis turns out to be critical for many biological processes, including proper development of all organs and the fine-tuning of our immune systems. That system retains pathogen-recognizing cells but kills off self-recognizing ones, thus preventing autoimmune disease.

“It occurred to me, Well gee, this is the way we want to treat cancer,” said Crabtree, a co-senior author on the study who is the David Korn, MD, Professor in Pathology. “We essentially want to have the same kind of specificity that can eliminate 60 billion cells with no bystanders, so no cell is killed that is not the proper object of the killing mechanism.”

 

Gerald Crabtree

Traditional treatments for cancer — namely chemotherapy and radiation — often kill large numbers of healthy cells alongside the cancerous ones. To harness cells’ natural and highly specific self-destruction abilities, the team developed a kind of molecular glue that sticks together two proteins that normally would have nothing to do with one another.

Flipping the cancer script

One of these proteins, BCL6, when mutated, drives the blood cancer known as diffuse large cell B-cell lymphoma. This kind of cancer-driving protein is also referred to as an oncogene. In lymphoma, the mutated BCL6 sits on DNA near apoptosis-promoting genes and keeps them switched off, helping the cancer cells retain their signature immortality.

The researchers developed a molecule that tethers BCL6 to a protein known as CDK9, which acts as an enzyme that catalyzes gene activation, in this case, switching on the set of apoptosis genes that BCL6 normally keeps off.

“The idea is, Can you turn a cancer dependency into a cancer-killing signal?” asked Nathanael Gray, PhD, co-senior author with Crabtree, the Krishnan-Shah Family Professor and a chemical and systems biology professor. “You take something that the cancer is addicted to for its survival and you flip the script and make that be the very thing that kills it.”

This approach — switching something on that is off in cancer cells — stands in contrast to many other kinds of targeted cancer therapies that inhibit specific drivers of cancer, switching off something that is normally on.

“Since oncogenes were discovered, people have been trying to shut them down in cancer,” said Roman Sarott, PhD, a postdoctoral scholar at Stanford Medicine and co-first author on the study. “Instead, we’re trying to use them to turn signaling on that, we hope, will prove beneficial for treatment.”

Nathanael Gray

When the team tested the molecule in diffuse large cell B-cell lymphoma cells in the lab, they found that it indeed killed the cancer cells with high potency. They also tested the molecule in healthy mice and found no obvious toxic side effects, even though the molecule killed off a specific category of of the animals’ healthy B cells, a kind of immune cell, which also depend on BCL6. They’re now testing the compound in mice with diffuse large B-cell lymphoma to gauge its ability to kill cancer in a living animal.

Because the technique relies on the cells’ natural supply of BCL6 and CDK9 proteins, it seems to be very specific for the lymphoma cells — the BCL6 protein is found only in this kind of lymphoma cell and in one specific kind of B cell. The researchers tested the molecule in 859 different kinds of cancer cells in the lab; the chimeric compound killed only diffuse large cell B-cell lymphoma cells.

And because BCL6 normally acts on 13 different apoptosis-promoting genes, the researchers hope their strategy will avoid the treatment resistance that seems so common in cancer. Cancer is often able to rapidly adapt to therapies that target only one of the disease’s weak spots, and some of these therapies may stop cancer from growing without killing the cells entirely. The research team hopes that by blasting the cells with multiple different cell death signals at once, the cancer will not be able to survive long enough to evolve resistance, although this idea remains to be tested.

“It’s sort of cell death by committee,” said Sai Gourisankar, PhD, a postdoctoral scholar and co-first author on the study. “And once a cancer cell is dead, that’s a terminal state.”

Crabtree and Gray, both members of the Stanford Cancer Institute, are co-founders of a biotech startup, Shenandoah Therapeutics, that aims to further test this molecule and a similar, previously developed molecule in hopes of gathering enough pre-clinical data to support launching clinical trials of the compounds. They also plan to build similar molecules that could target other cancer-driving proteins, including the oncogene Ras, which is a driver of several different kinds of cancer.

The study was funded by the Howard Hughes Medical Institute, the National Institutes of Health (grants CA276167, CA163915, MH126720-01 and 5F31HD103339-03), the Mary Kay Foundation, the Schweitzer Family Fund, the SPARK Translational Research Program at Stanford University and Bio-X at Stanford University.

About Stanford Medicine

Stanford Medicine is an integrated academic health system comprising the Stanford School of Medicine and adult and pediatric health care delivery systems. Together, they harness the full potential of biomedicine through collaborative research, education and clinical care for patients. For more information, please visit med.stanford.edu.

Adblock test (Why?)

Read the whole story
jgbishop
16 days ago
reply
Clever!
Durham, NC
Share this story
Delete

Shrunked JavaScript monorepo Git size by 94%

1 Comment

This isn't click bait. We really did this! We work in a very large Javascript monorepo at Microsoft we colloquially call 1JS. It's large not only in terms of GB, but also in terms of sheer volume of code and contributions. We recently crossed the 1,000 monthly active users mark, about 2,500 packages, and ~20million lines of code! The most recent clone I did of the repo clocked in at an astonishing 178GB.

alt

For many reasons, that's just too big, we have folks in Europe that can't even clone the repo due to it's size.

The question is, how did this even happen?!

Lesson #1

When I first joined the repo a few years ago, I noticed after a few months that it was growing, when I first cloned it was a gig or 2, but after a few months was already at around 4gb. It was hard to know exactly why.

Back then I ran a tool called git-sizer , and it told me a few things about some blobs that were large. Large blobs happens when someone accidentally checks in some binary, so, not much you can do there other than enforce size limits on check ins which is a feature of Azure DevOps. Retroactively, once the file is there though, it's semi stuck in history.

Secondly, it flagged me about our Beachball change files, which we weren't deleting. We use them in the same way that Changesets work, accomplishing similar goals as semantic-release where we want to tell the packages how to automatically bump their semver ranges.

At times we'd get to 40k of them in a single folder, which we found out causes a large tree object to be created every time you add a new file into that folder.

alt

So, lesson #1 we learned was...

Don't keep thousands of things in a single folder.

We ended up implementing two things to help here. One was a pull request into beachball which did several changes in a single change file instead of one per package.

Second, we wrote a pipeline which runs and automatically cleans up that change folder periodically to stop it from getting so large.

Huzzah! We fixed git bloat!

Lesson #2

alt
we fixed git bloat! no we didn't

Our versioning flow at scale maintains a mirror of main called versioned which stores the actual versions of packages so we can keep main free of git conflicts, and have an accurate view of which git commits correspond to which semver versions we release via NPM packages. (this needs another blog post, but I digress...)

I noticed that the versioned branch seeming to get harder and harder to clone because it kept getting so huge. But, we'd dealt with the change file issue, and the only thing going in that versioned branch in terms of commits was appends to CHANGELOG.md and CHANGELOG.json files.

alt

Time passed on, and our repo, while growing slightly slower, still grew and grew. However, it was sort of difficult to know whether this growth was now due to simply scale, or something else altogether. We were adding hundreds of thousands of lines of code, and hundreds of developers every year since 2021, so a case was to be made that natural growth was occurring. However, once we came to realize that we had surpassed the growth rate of the one of the biggest monorepos at Microsoft, the Office one, we realized, something else must be wrong!

That's when we called for backup...

alt

The author of such git features as git shallow checkout, git sparse index, and all kinds of other features created because of the size of our monorepos in Office, had just re-joined our organization after a stint at Github bringing those features to the world.

He took a look, and immediately realized something was definitely not right with this growth rate. When we pulled our versioned branches, those branches that only change CHANGELOG.md and CHANGELOG.json, we were fetching 125GB of extra git data?! HOW THO??

Welp, after some super deep git digging, it turned out that some old packing code checked in by Linux Torvalds (ever heard of him 🤷‍♂️) was actually only checking the last 16 characters of a filename when it gets ready to do compression of a file before it pushes the diffs. For context, usually git just pushes the diffs of changed files, however, because of this packing issue, git was comparing CHANGELOG.md files from two different packages!

For example, if you changed repo/packages/foo/CHANGELOG.md, when git was getting ready to do the push, it was generating a diff against repo/packages/bar/CHANGELOG.md! This meant we were in many occasions just pushing the entire file again and again, which could be 10s of MBs per file in some cases, and you can imagine in a repo

We were then able to try repacking our repo with a larger window git repack -adf --window=250 to have git do a better job compressing the pack files for our repo to reduce the size. This did definitely reduce the size of the repo significantly, however, we can do even better!

This PR https://github.com/git-for-windows/git/pull/5171 added a new way to pack the repo based upon walking git paths as opposed to the default of walking commits.

The results are staggering...

alt

I ran a new git clone on my machine yesterday to try the new version of git in Microsoft's git fork (git version 2.47.0.vfs.0.2)...

alt

And after running the new git repack -adf --path-walk ...

alt

Crazy. It went from 178GB to 5GB. 😱

alt

The other new configuration option being added will further ensure that the right types of deltas are generated at git push time...

git config --global pack.usePathWalk true

That will make sure your git push commands are performing the correct compression.

Any developer on the git version 2.47.0.vfs.0.2 can now repack the repo once cloned locally, as well as use the new git push path walk algorithm to stop the growth rate.

On Github, re-packing and git garbage collection happens periodically, but again, the type of packing which Github does will not correctly compute the deltas of these CHANGELOG.md and CHANGELOG.json files, or potentially any file that has the same 16+ character names which change a lot over time. Think i18n type of large string files and such.

Azure DevOps, which we're on, doesn't do any such re-packing, yet. So, we're working on getting that done as well so we can reduce the size of the repo on the server side as well.

Those changes will all make their way into the upstream of git as well! Hurray for OSS.

Wrap Up

If you work in a large-ish scale monorepo, and you have CHANGELOG.md or really any file that has a relatively long-ish name (>16 characters) which repeatedly gets updated, you may want to keep your eyes on this path walk stuff.

You can also try out thew new git survey command to see all kinds of new heuristics such as Top Files By Disk Size, Top Directories By Inflated Size, or Top Files By Inflated Size.

alt

These heuristics will help give you a sense of whether the path walk work will affect your repo size too.

Overall I am so impressed and excited about our commitment to trying to produce solutions that help us scale repositories at Microsoft, but also take those solutions to the rest of the world..

Adblock test (Why?)

Read the whole story
jgbishop
24 days ago
reply
Posts like this reinforce my gratitude that I don't work for Microsoft; there's clearly no discipline. From the sound of it, this repo sounds like the wild west.
Durham, NC
Share this story
Delete

Pickles by Brian Crane for Fri, 25 Oct 2024

1 Comment

Pickles by Brian Crane on Fri, 25 Oct 2024

Source - Patreon

Read the whole story
jgbishop
26 days ago
reply
Hahaha!
Durham, NC
Share this story
Delete
Next Page of Stories