You have to install their MDM device management software on your computer. Basically that computer is theirs now. So don't plan on just handing over your laptop temporarily unless you don't mind some company completely owning your box. Still might be a validate use for people with slightly old laptops lying around, but beware trying to share this computer with your daily activities if you e.g. use a bank on a browser on this computer regularly. MDM means they can swap out your SSL certs level of computer access, please correct me if I'm wrong.
MDMs on macOS are permissioned via AccessRights, and you can verify that their permission set is fairly minimal and does not allow what you've described here (bits 0, 4, 10).
That said, their privacy posture at the cornerstone of their claims is snake oil and has gaping holes in it, so I still wouldn't trust it, but it's worth being accurate about how exactly they're messing up.
Which part is snake oil? I'm reading the paper now and they seem to have considered all the bases.
The trick is that the SEP can remotely attest to Apple:
• SIP status
• OS disk image root hash
• Ownership of a non-exportable HSM key
and all the other cryptographic goop needed to make a remote attestation hang together.
If you can prove a public key is generated by the SEP of a machine running with all Apple's security systems enabled, then you can trivially extend that to confidential computing because the macOS security architecture allows apps to block external inspection even by the root user.
Gajesh, if you read this, this whole infrastructure with the TDX server and such could really use being refactored out of Darkbloom. There's lots of uses for this and no reason it has to be tightly tied to AI inference.
Interesting to see an offering with this heritage [1] proposing flat earnings rates for inference operators here, rather than trying to sell a dynamic marketplace where operators compete on price in real-time.
Right now the dashboards show 78 providers online, but someone in-thread here said that they spun one up and got no requests. Surely someone would be willing to beat the posted rate and swallow up the demand?
[1] Layer Labs, formerly EigenLayer, is company built around a protocol to abstract and recycle economic security guarantees from Ethereum proof of stake.
I installed this so you don't have to. It did feel a bit quirky and not super polished. Fails to download the image model. The audio/tts model fails to load.
In 15 minutes of serving Gemma, I got precisely zero actual inference requests, and a bunch of health checks and two attestations.
At the moment they don't have enough sustained demand to justify the earning estimates.
I kind of see your point, but I also kind of don't.
Sure, it would be great if you'd immediately get hammered with hundreds of requests and start make money quickly. It would also be great if it was a bit more transparent, and you could see more stats (what counts as "idle"? Is my machine currently eligible to serve models?). But it's still very new, I'd say give it some time and let's see how it goes.
If you have it running and you get zero requests, it uses close to zero power above what your computer uses anyway. It doesn't cost you anything to have it running, and if you get requests, you make money. Seems like an easy decision to me.
weird to learn that they do not generate inference requests to their network themselves to motivate early adopters at least to host their inference software
and I don't think they ever will unless they're highly competitive (hopefully that price they have stays? at least for users)
I was thinking of building this exact thing a year ago but my main stopper was economics: it would never make sense for someone to use the API, thus nobody can make money off of zero demand.
I guess we just have to look at how Uber and Airbnb bootstrapped themselves. Another issue with my original idea was that it was for compute in general, when the main, best use-case, is long(er)-running software like AI training (but I guess inference is long running enough).
But there already exist software out there that lets you rent out your GPU so...
People underestimate how efficient cost/token is for beefy GPUs if you are able to batch. Unlikely for one off consumer unit to be able to compete long term.
I have a hard time believing their numbers. If you can pay off a mac mini in 2-4 months, and make $1-2k profit every month after that, why wouldn’t their business model just be buying mac minis?
Solid q. I think the part of it is that it’s really easy to attract some “mass” (capital) of users, as there are definitely quite a few of idle Macs in the world.
Non-VC play (not required until you can raise on your own terms!) and clear differentiation.
If you want to go full-business-evaluation, I would be more worried about someone else implementing same thing with more commission (imo 95% and first to market is good enough).
I think the point they’re making though is that the numbers seem too good to be true.
ie. Does anyone know the payback time for a B100 used just for inference? I assume it’s more than a couple of months? Or is it just training that costs so much?
Because they don’t have that much initial money in their pocket, while the idle computer is already there, and the biggest friction point is convincing people to install some software. Both producing rhetoric and software are several order of magnitude cheaper than to directly own and maintain a large fleet of hardware with high guarantee of getting the electrical stable input in a safe place to store them.
Assuming that getting large chunk of initial investment is just a formality is out of touch with 99% of people reality out there, when it’s actually the biggest friction point in any socio-economical endeavour.
The numbers are obviously high, because if this takes off then the price for inference will also drop. But I still think it’s a solid economic model that benefits low income countries the most. In Ukraine, for example, I know people who live on $200/month. A couple Mac Minis could feed a family in many places.
As a business owner, I can think of multiple reasons why a decentralized network is better for me as a business than relying on a hyperscaler inference provider. 1. No dependency on a BigTech provider who can cut me off or change prices at any time. I’m willing to pay a premium for that. 2. I get a residential IP proxy network built-in. AI scrapers pay big money for that. 3. No censorship. 4. Lower latency if inference nodes are located close to me.
On the latency point - your requests are still going through the coordinator of the system here. So on average strictly worse than a large provider.
You - Darkbloom - Operator - Darkbloom - you, vs
You - Provider - you
---
On the censorship point - this is an interesting risk surface for operators. If people are drawn my decentralized model provisioning for its lax censorship, I'm pretty sure they're using it to generate things that I don't want to be liable for.
If anything, I could imagine dumber and stricter brand-safety style censorship on operator machines.
> These are estimates only. We do not guarantee any specific utilization or earnings. Actual earnings depend on network demand, model popularity, your provider reputation score, and how many other providers are serving the same model.
There are many people who do not have ready access to a million dollars to purchase said Mac minis, much less the operating capital to rack & operate them.
Very smart play to build a platform, get scale, and prove out the software. Then either add a small network fee (this could be on money movement on/off platform), add a higher tier of service for money, and/or just use the proof points to go get access to capital and become an operator in your own pool.
If those numbers are true, they could tart with one Mac and can double every few months. But, I guess there are also many people who do not have ready access to whatever a Mac mini costs either...
I guess if it only works at scale capital is maybe the answer. Like enough cash to buy 5 or 10 or even 100 minis seem doable - but if the idea only works well when you have 10,000 running - that makes some sense.
They use the TEE to check that the model and code is untampered with. That's a good, valid approach and should work (I've done similar things on AWS with their TEE)
The key question here is how they avoid the outside computer being able to view the memory of the internal process:
> An in-process inference design that embeds the in-
ference engine directly in a hardened process, elimi-
nating all inter-process communication channels that
could be observed, with optional hypervisor mem-
ory isolation that extends protection from software-
enforced to hardware-enforced via ARM Stage 2 page
tables at zero performance cost.[1]
I was under the impression this wasn't possible if you are using the GPU. I could be misled on this though.
This entire paper smells of LLM, I'm sure even the most distinguished academic would refrain from using notation to prove that the SIP status cannot change during operation.
While they do make this argument, realistically anyone sending their prompt/data to an external server should assume there will be some level of retention.
And more so in particular, anyone using Darkbloom with commercial intents should only really send non-sensitive data (no tokens, customer data, ...) I'd say only classification tasks, imagine generation, etc.
> PT_DENY_ATTACH (ptrace constant 31): Invoked
at process startup before any sensitive data is loaded.
Instructs the macOS kernel to permanently deny all
ptracerequests against this process, including from
root. This blocks lldb, dtrace, and Instruments.
> Hardened Runtime: The binary is code-signed with
hardened runtime options and explicitly without the
com.apple.security.get-task-allow
entitlement. The kernel denies task_for_pid()
and mach_vm_read()from any external process.
> System Integrity Protection (SIP): Enforces both of
the above at the kernel level. With SIP enabled, root
cannot circumvent Hardened Runtime protections, load
unsigned kernel extensions, or modify protected sys-
tem binaries. Section 5.1 proves that SIP, once verified,
is immutable for the process lifetime.
Looking at their paper at [1], there's a gaping hole: there's no actual way to verify the contents of the running binaries. The binary hash they include in their signatures is self-reported, and can be modified. That's simply game over.
A note, as others have posted on this thread: I mention this as a concrete and trivial flaw in their whole strategy, but the issue is fundamental: there's no hardware enclave for third-party code available to do the type of attestation that would be necessary. Any software approach they develop will ultimately fall to that hole.
Apple is perfectly capable of doing remote attestation properly. iOS has DCAppAttest which does everything needed. Unfortunately, it's never been brought to macOS, as far as I know. Maybe this MDM hack is a back door to get RA capabilities, if so it'd certainly be intriguing, but if not as far as I know there's no way to get a Mac to cough up a cryptographic assertion that it's running a genuine macOS kernel/boot firmware/disk image/kernel args, etc.
It's a pity because there's a lot of unique and interesting apps that'd become possible if Apple did this. Darkbloom is just one example of what's possible. It'd be a huge boon to decentralization efforts if Apple activated this, and all the pipework is laid already so it's really a pity they don't go the extra mile here.
Well. Running your machine to do inference will utilize more than 50W sustained load, I'd say more than double that. Plus electricity is more expensive here (but granted, I do have solar panels). Plus don't forget to factor in that your hardware will age faster.
Your hardware will age slower if you have consistent load.
Thermal stress from bursty workloads is much more of a wearing problem than electromigration. If you can consistently keep the SoC at a specific temperature, it'll last much longer.
This is also why it was very ironic that crypto miner GPUs would get sold at massive discounts. Every assumed that they had been run to shit, but a proper miner would have undervolted the card and ran it at consistent utilization, meaning the card would be in better condition than a second hand gamer GPU that would have constantly been shifting between 1% to 80% utilization.
Their estimate is based on significantly lower consumption when under load. E.g. 25W for an M4 Pro mac mini. I have no idea if that’s realistic - but the m4s are supposedly pretty efficient (https://www.jeffgeerling.com/blog/2024/m4-mac-minis-efficien...)
Their example big earner models are FLUX.2 Klein 4B and FLUX.2 Klein 9B, which i imagine could generate a lot more tokens/s than a 26B model on your machine.
I have no idea if that is a good estimate of how much an M5 Pro can generate - but that’s what it says on their site.
They do a bit of a sneaky thing with power calculation: they subtract 12Ws of idle power, because they are assuming your machine is idling 24/7, so the only cost is the extra 18W they estimate you’ll use doing inference. Idk about you, but i do turn my machine off when i am not using it.
OpenAI has only about 5% paying customers, how does it generate revenue?
I don’t think this is a sustainable business model. For example, Cubbit tried to build decentralised storage, but I backed out because better alternatives now exist, and hardware continues to improve and become cheaper over time.
Your electricity and ownership are going to get lower return and does not actually requce CO2.
Yeah, only way to get there is assuming they're not giving prompt caching discounts while my laptop is getting prompt caching benefits, with very many large prompts. So yes I am skeptical of their numbers.
Any idea what makes for such a diff between your and theirs numbers? Batching? Or could they do a crazy prefix caching across all nodes to reduce the actual processing.
Unfortunately, verifiable privacy is not physically possible on MacBooks of today. Don't let a nice presentation fool you.
Apple Silicon has a Secure Enclave, but not a public SGX/TDX/SEV-style enclave for arbitrary code, so these claims are about OS hardening, not verifiable confidential execution.
It would be nice if it were possible. There's a lot of cool innovations possible beyond privacy.
I wrote a whole SDK for using SGX, it's cool tech. But in theory on Apple platforms you can get a long way without it. iOS already offers this capability and it works OK.
macOS has a strong enough security architecture that something like Darkbloom would have at least some credibility if there was a way to remotely attest a Mac's boot sequence and TCC configuration. The OS sandbox can keep apps properly separated if the kernel is correct and unhacked. And Apple's systems are full of mitigations and roadblocks to simple exploitation. Would it be as good as a consumer SGX enclave? Not architecturally, but the usability is higher.
As if you get privacy with the inference providers available today? I have more trust in a randomly selected machine on a decentralized network not being compromised than in a centralized provider like OpenAI pinky promising not to read your chats.
I'd love a way to do this locally -- pool all the PCs in our own office for in-office pools of compute. Any suggestions from anyone? We currently run ollama but manually manage the pools
Seems like so much more work than "just" paying for https://huggingface.co or whichever other neocloud who already did all the setup for you and just waits for your credit card per minute/seconds/token.
If you set CPUSchedulingPolicy=idle Nice=19 IOSchedulingClass=idle in the ollama server configuration it should run in the background with lowest priority.
I'm not sure how the economics works out. Pricing for AI inference is based on supply/demand/scarcity. If your hardware is scarce, that means low supply; combine with high demand, it's now valuable. But what happens if you enable every spare Mac on the planet to join the game? Now your supply is high, which means now it's less valuable. So if this becomes really popular, you don't make much money. But if it doesn't become somewhat popular, you don't get any requests, and don't make money. The only way they could ensure a good return would be to first make it popular, then artificially lower the number of hosts.
"These are estimates only. We do not guarantee any specific utilization or earnings. Actual earnings depend on network demand, model popularity, your provider reputation score, and how many other providers are serving the same model.
When your Mac is idle (no inference requests), it consumes minimal power — you don't lose significant money waiting for requests. The electricity costs shown only apply during active inference.
Text models typically see the highest and most consistent demand. Image generation and transcription requests are bursty — high volume during peaks, quiet otherwise."
They are almost claiming FHE, isn't it just a matter of creating the right tool to get the generated tokens from RAM before it gets encrypted for transfer. How is it fundamentally different than chutes?
Three binaries and a Python file:
darkbloom (Rust)
eigeninference-enclave (Swift)
ffmpeg (from Homebrew, lol)
stt_server.py (a simple FastAPI speech-to-text server using mlx_audio).
The good parts:
All three binaries are signed with a valid Apple Developer ID and have Hardened runtime enabled.
Bad parts:
Binaries aren't notarized. Enrolls the device for remote MDM using micromdm. Downloads and installs a complete Python runtime from Cloudflare R2 (Supply chain risk). PT_DENY_ATTACH to make debugging harder. Collects device serial numbers.
That solution actually makes great sense. So Apple won in some strange way again?
Guess there are limitations on size of the models, but if top-tier models will getting democratized I don’t see a reason not to use this API. The only thing that comes to me is data privacy concerns.
I think batch-evals for non-sensitive data has great PMF here.
Heh, what did they win exactly? This is just a way for another company to extract value out of the single region of the world where Apple is a relevant vendor, and it happens to be the one where it's the easiest to pull people into schemes.
> Apple’s attestation servers will
only generate the FreshnessCode for a genuine device that
checks in via APNs. A software-only adversary cannot
forge the MDA certificate chain (Assumption 3). Com-
bined with SIP enforcement (preventing binary replace-
ment) and Secure Boot (preventing bootloader tampering),
this provides strong evidence that the signing key resides
in genuine Apple hardware.
No no, wait they (he) clearly does understand it. This paper is great! It's written by someone with a good understanding of the space. We should take it very seriously.
The trick seems to be that Apple's corporate Mac management system has remote attestation as a part of it under the name MDA, it's just not exposed to consumers via DCAppAttest. This is fascinating. Mind blown. I had so many ideas over the years that didn't work because there was no robust consumer desktop platform with remote attestation, and now it turns out Apple has quietly built one and not even really told people about it....
Epyc has that VM encrypted memory thing, which comes pretty close. It does raise an interesting question, though: would a PCIe card passed through to a VM be able to DMA access the memory of neighboring devices?
Away this looks like a great idea and might have a chance at solving the economic issue with running nodes for cheap inference and getting paid for it.
That said, their privacy posture at the cornerstone of their claims is snake oil and has gaping holes in it, so I still wouldn't trust it, but it's worth being accurate about how exactly they're messing up.
The trick is that the SEP can remotely attest to Apple:
• SIP status
• OS disk image root hash
• Ownership of a non-exportable HSM key
and all the other cryptographic goop needed to make a remote attestation hang together.
If you can prove a public key is generated by the SEP of a machine running with all Apple's security systems enabled, then you can trivially extend that to confidential computing because the macOS security architecture allows apps to block external inspection even by the root user.
Gajesh, if you read this, this whole infrastructure with the TDX server and such could really use being refactored out of Darkbloom. There's lots of uses for this and no reason it has to be tightly tied to AI inference.
Right now the dashboards show 78 providers online, but someone in-thread here said that they spun one up and got no requests. Surely someone would be willing to beat the posted rate and swallow up the demand?
[1] Layer Labs, formerly EigenLayer, is company built around a protocol to abstract and recycle economic security guarantees from Ethereum proof of stake.
In 15 minutes of serving Gemma, I got precisely zero actual inference requests, and a bunch of health checks and two attestations.
At the moment they don't have enough sustained demand to justify the earning estimates.
Still, absolute zero is an unacceptable number. Had this running for more than an hour.
Sure, it would be great if you'd immediately get hammered with hundreds of requests and start make money quickly. It would also be great if it was a bit more transparent, and you could see more stats (what counts as "idle"? Is my machine currently eligible to serve models?). But it's still very new, I'd say give it some time and let's see how it goes.
If you have it running and you get zero requests, it uses close to zero power above what your computer uses anyway. It doesn't cost you anything to have it running, and if you get requests, you make money. Seems like an easy decision to me.
I was thinking of building this exact thing a year ago but my main stopper was economics: it would never make sense for someone to use the API, thus nobody can make money off of zero demand.
I guess we just have to look at how Uber and Airbnb bootstrapped themselves. Another issue with my original idea was that it was for compute in general, when the main, best use-case, is long(er)-running software like AI training (but I guess inference is long running enough).
But there already exist software out there that lets you rent out your GPU so...
Non-VC play (not required until you can raise on your own terms!) and clear differentiation.
If you want to go full-business-evaluation, I would be more worried about someone else implementing same thing with more commission (imo 95% and first to market is good enough).
ie. Does anyone know the payback time for a B100 used just for inference? I assume it’s more than a couple of months? Or is it just training that costs so much?
Assuming that getting large chunk of initial investment is just a formality is out of touch with 99% of people reality out there, when it’s actually the biggest friction point in any socio-economical endeavour.
As a business owner, I can think of multiple reasons why a decentralized network is better for me as a business than relying on a hyperscaler inference provider. 1. No dependency on a BigTech provider who can cut me off or change prices at any time. I’m willing to pay a premium for that. 2. I get a residential IP proxy network built-in. AI scrapers pay big money for that. 3. No censorship. 4. Lower latency if inference nodes are located close to me.
You - Darkbloom - Operator - Darkbloom - you, vs
You - Provider - you
---
On the censorship point - this is an interesting risk surface for operators. If people are drawn my decentralized model provisioning for its lax censorship, I'm pretty sure they're using it to generate things that I don't want to be liable for.
If anything, I could imagine dumber and stricter brand-safety style censorship on operator machines.
Prolly gonna make $50 a year tops.
Others are reporting low demand, eg.: https://news.ycombinator.com/item?id=47789171
Very smart play to build a platform, get scale, and prove out the software. Then either add a small network fee (this could be on money movement on/off platform), add a higher tier of service for money, and/or just use the proof points to go get access to capital and become an operator in your own pool.
- Elon Musk during Tesla's Autonomy Day in April 2019.
The key question here is how they avoid the outside computer being able to view the memory of the internal process:
> An in-process inference design that embeds the in- ference engine directly in a hardened process, elimi- nating all inter-process communication channels that could be observed, with optional hypervisor mem- ory isolation that extends protection from software- enforced to hardware-enforced via ARM Stage 2 page tables at zero performance cost.[1]
I was under the impression this wasn't possible if you are using the GPU. I could be misled on this though.
[1] https://github.com/Layr-Labs/d-inference/blob/master/papers/...
And more so in particular, anyone using Darkbloom with commercial intents should only really send non-sensitive data (no tokens, customer data, ...) I'd say only classification tasks, imagine generation, etc.
Macs have secure enclaves.
But they argue that:
> PT_DENY_ATTACH (ptrace constant 31): Invoked at process startup before any sensitive data is loaded. Instructs the macOS kernel to permanently deny all ptracerequests against this process, including from root. This blocks lldb, dtrace, and Instruments.
> Hardened Runtime: The binary is code-signed with hardened runtime options and explicitly without the com.apple.security.get-task-allow entitlement. The kernel denies task_for_pid() and mach_vm_read()from any external process.
> System Integrity Protection (SIP): Enforces both of the above at the kernel level. With SIP enabled, root cannot circumvent Hardened Runtime protections, load unsigned kernel extensions, or modify protected sys- tem binaries. Section 5.1 proves that SIP, once verified, is immutable for the process lifetime.
gives them memory protection.
To me that is surprising.
[1] https://github.com/Layr-Labs/d-inference/blob/master/papers/...
If it's not running fully end to end in some secure enclave, then it's always just a best effort thing. Good marketing though.
Apple is perfectly capable of doing remote attestation properly. iOS has DCAppAttest which does everything needed. Unfortunately, it's never been brought to macOS, as far as I know. Maybe this MDM hack is a back door to get RA capabilities, if so it'd certainly be intriguing, but if not as far as I know there's no way to get a Mac to cough up a cryptographic assertion that it's running a genuine macOS kernel/boot firmware/disk image/kernel args, etc.
It's a pity because there's a lot of unique and interesting apps that'd become possible if Apple did this. Darkbloom is just one example of what's possible. It'd be a huge boon to decentralization efforts if Apple activated this, and all the pipework is laid already so it's really a pity they don't go the extra mile here.
Protection here is conditional, best-effort. There are no true guarantees, nor actual verifiability.
My M5 Pro can generate 130 tok/s (4 streams) on Gemma 4 26B. Darkbloom's pricing is $0.20 per Mtok output.
That's about $2.24/day or $67/mo revenue if it's fully utilized 24/7.
Now assuming 50W sustained load, that's about 36 kWh/mo, at ~$.25/kWh approx. $9/mo in costs.
Could be good for lunch money every once in a while! Around $700/yr.
I'd say it's not worth it. But the idea is cool.
Thermal stress from bursty workloads is much more of a wearing problem than electromigration. If you can consistently keep the SoC at a specific temperature, it'll last much longer.
This is also why it was very ironic that crypto miner GPUs would get sold at massive discounts. Every assumed that they had been run to shit, but a proper miner would have undervolted the card and ran it at consistent utilization, meaning the card would be in better condition than a second hand gamer GPU that would have constantly been shifting between 1% to 80% utilization.
For Gemma 4 26B their math is:
single_tok/s = (307 GB/s / 4 GB) * 0.60 = 46.0 tok/s
batched_tok/s = 46.0 * 10 * 0.9 = 414.4 tok/s
tok/hr = 414.4 * 3600 = 1,492,020
revenue/hr = (1,492,020 / 1M) * $0.200000 = $0.2984
I have no idea if that is a good estimate of how much an M5 Pro can generate - but that’s what it says on their site.
They do a bit of a sneaky thing with power calculation: they subtract 12Ws of idle power, because they are assuming your machine is idling 24/7, so the only cost is the extra 18W they estimate you’ll use doing inference. Idk about you, but i do turn my machine off when i am not using it.
This seems high. At which quantization? Using LM Studio or something else?
Note: Darkbloom seems to run everything on Q8 MLX.
I don’t think this is a sustainable business model. For example, Cubbit tried to build decentralised storage, but I backed out because better alternatives now exist, and hardware continues to improve and become cheaper over time.
Your electricity and ownership are going to get lower return and does not actually requce CO2.
I’d imagine 1 year of heavy usage would somehow affect its quality.
Apple Silicon has a Secure Enclave, but not a public SGX/TDX/SEV-style enclave for arbitrary code, so these claims are about OS hardening, not verifiable confidential execution.
It would be nice if it were possible. There's a lot of cool innovations possible beyond privacy.
macOS has a strong enough security architecture that something like Darkbloom would have at least some credibility if there was a way to remotely attest a Mac's boot sequence and TCC configuration. The OS sandbox can keep apps properly separated if the kernel is correct and unhacked. And Apple's systems are full of mitigations and roadblocks to simple exploitation. Would it be as good as a consumer SGX enclave? Not architecturally, but the usability is higher.
You have no guarantees over any random connected laptop connected across the world.
https://help.kagi.com/kagi/ai/llms-privacy.html
Interesting idea, but needs some work.
;P
They lost me with just one microcopy - “start earning”. Huge red signal.
When your Mac is idle (no inference requests), it consumes minimal power — you don't lose significant money waiting for requests. The electricity costs shown only apply during active inference.
Text models typically see the highest and most consistent demand. Image generation and transcription requests are bursty — high volume during peaks, quiet otherwise."
What could possibly go wrong?
Got the latest v0.3.8 version from the list here: https://api.darkbloom.dev/v1/releases/latest
Three binaries and a Python file: darkbloom (Rust)
eigeninference-enclave (Swift)
ffmpeg (from Homebrew, lol)
stt_server.py (a simple FastAPI speech-to-text server using mlx_audio).
The good parts: All three binaries are signed with a valid Apple Developer ID and have Hardened runtime enabled.
Bad parts: Binaries aren't notarized. Enrolls the device for remote MDM using micromdm. Downloads and installs a complete Python runtime from Cloudflare R2 (Supply chain risk). PT_DENY_ATTACH to make debugging harder. Collects device serial numbers.
TL;DR: No, not touching that.
Guess there are limitations on size of the models, but if top-tier models will getting democratized I don’t see a reason not to use this API. The only thing that comes to me is data privacy concerns.
I think batch-evals for non-sensitive data has great PMF here.
Heh, what did they win exactly? This is just a way for another company to extract value out of the single region of the world where Apple is a relevant vendor, and it happens to be the one where it's the easiest to pull people into schemes.
Because they were already at the finish line with Apple Silicon.
> I don’t see a reason not to use this API. The only thing that comes to me is data privacy concerns.
The whole inference is end-to-end encrypted so none of the nodes can see the prompts or the messages.
That would finally be a crypto thing which is backed by value I believe in.
> Apple’s attestation servers will only generate the FreshnessCode for a genuine device that checks in via APNs. A software-only adversary cannot forge the MDA certificate chain (Assumption 3). Com- bined with SIP enforcement (preventing binary replace- ment) and Secure Boot (preventing bootloader tampering), this provides strong evidence that the signing key resides in genuine Apple hardware.
The trick seems to be that Apple's corporate Mac management system has remote attestation as a part of it under the name MDA, it's just not exposed to consumers via DCAppAttest. This is fascinating. Mind blown. I had so many ideas over the years that didn't work because there was no robust consumer desktop platform with remote attestation, and now it turns out Apple has quietly built one and not even really told people about it....
NVidia data center GPUs have a similar path, but not their consumer ones. Not sure about the NVidia Spark.
It's possible AMD Strix Halo can do this, but unlikely for any other PC based GPU environments.
Away this looks like a great idea and might have a chance at solving the economic issue with running nodes for cheap inference and getting paid for it.