Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is it possible to start multiple instances of VMs at the same time? #3

Open
antranapp opened this issue Feb 6, 2023 · 22 comments
Open

Comments

@antranapp
Copy link

antranapp commented Feb 6, 2023

We have a bunch of Mac Studios for CI in our office at this moment but we can use it only for run a single job at a time right now due to flaky Xcode with parallel executions.

Can Cilicon start multiple VMs at the same time, which run independent from each others? This'd help us to ultilise the Mac Studios' resources better by parallelising multiple jobs in multiple VMS

@Marcocanc
Copy link
Member

Hi @antranapp, that's definitely something we're considering to implement, however I can't give a clear timeline. In theory it should be possible to run up to 2 VMs in parallel. While I haven't tried it myself I've heard from several people that 2 VMs is a limitation of the Virtualization framework.

@AaronBurchfield
Copy link
Contributor

While it may not be intentional, this works already by starting a second instance of Cilicon. It would be awesome if a single instance were able to manage multiple vms with different configurations to support building on different macos versions.

@Sherlouk
Copy link

I've heard from several people that 2 VMs is a limitation of the Virtualization framework

It's not a limitation of the framework or the hardware, but rather the legal agreement. You can find it here and the important part is under 2.b.iii:

"to run up to two (2) additional copies or instances of the Apple
Software within virtual operating system environments on each Apple-branded computer" (shortened for brevity)

You will get a VZErrorDomain error 6 if you try to add 3 or more which tells you it's not allowed. This is a very old part of the agreement though, so hopefully if enough people ask Apple to amend it they may open it up on Apple Silicon to allow more since clearly with how powerful the new chips are it can definitely support it.


With that said, I'm adding +1 to the ticket as I'd love to see support for two added for the reason above of wanting to support multiple CI test jobs in parallel without having them impact one another.

@ivan-gaydamakin
Copy link

Any updates?

@ast3150
Copy link
Contributor

ast3150 commented Aug 18, 2023

This would be very interesting for us. What's needed in terms of implementation to support this feature?

@ast3150
Copy link
Contributor

ast3150 commented Aug 18, 2023

Fyi opening Cilicon multiple times (using open -n -a Cilicon) results in the network connection being dropped for both instances of Cilicon

@Halle
Copy link

Halle commented Oct 18, 2023

Hiya @ast3150 and @Marcocanc , I am running Cilicon right now with two VMs and it seems to be working very well (Cilicon is great full-stop, thank you @Marcocanc for publishing and supporting this project – it really hits the right balance for me between believably ephemeral instances and low-fussiness systems complexity, given a moderate need). I am doing multiple VMs via a pretty easy workaround, but it is my hope to find some discretionary time so I can submit a real PR for this feature (and possibly a couple of other itch-scratches I'm doing locally). For now I will just document the workaround, which is having two running app instances of Cilicon where the second one looks for a config entitled cilicon2.yml (this requires building your own Cilicon.app for the second app/instance).

  1. Open Cilicon.xcodeproj and change every reference to the build target Cilicon in the project and its build settings to Cilicon2 so you can export an archive build to /Applications to run alongside Cilicon.app. I also changed the PRODUCT_BUNDLE_IDENTIFIER and used my own codesigning in order to separate other potentially shared resources I don't know about that use the bundle domain. It is entirely possible that this is unnecessary and it's enough to simply rename the built app, but I decided to do it this way in order to avoid wondering about edge cases when I had other mysteries to debug.

  2. Change the ConfigManager.swift line static let configPaths = ["/cilicon.yml", "/.cilicon.yml"] to static let configPaths = ["/cilicon2.yml", "/.cilicon2.yml"].

  3. Build app, put Cilicon2.app product wherever you put Cilicon.app.

That's actually it, then you can launch both apps and at least for me, they are both getting network and can both launch different correctly-configured VMs that appear to work correctly so far.

@Halle
Copy link

Halle commented Oct 18, 2023

I'm guessing that the design of this inside a single app could be something as low-key as "if a cilicon2.yml exists, open a second window, do all the stuff" and you could probably leave it to the enduser to deal with the problem of providing a good cilicon2.yml, at least to start.

@Marcocanc
Copy link
Member

Hi Halle, I will start working on a (much-needed) refactor of Cilicon with a much cleaner architecture and support for multiple VMs soon. Let me know if you'd like to contribute and we can try find a way to collaborate on it.

@Halle
Copy link

Halle commented Oct 24, 2023

I'd enjoy that!

@Halle
Copy link

Halle commented Oct 25, 2023

BTW, there is one other change needed in code to support this currently in the form I described above. If you follow my instructions above, you will eventually encounter this issue where a complete run leads to a permanent shutdown of one or the other instance. I believe the reason for this is that in the current logic, there is a brief period in which there would be three instances of virtualized macOS, which isn't allowed. I changed this in setupAndRunVirtualMachine():

        Task { @MainActor in
            vmState = .running(virtualMachine)
            try await virtualMachine.start()
        }

to first check for whether there is a VM in the .running state in the instance, and if so, to first stop it before starting it, and also to set vmState to the .running state after virtualMachine.start() is done waiting instead of before. This results in a slow restart after there has been a runner run (maybe there is a race condition?), but it is successful.

@FabianBartels
Copy link
Contributor

FabianBartels commented Aug 22, 2024

Hi dear Colleagues,

I really like CIlicon so I wanted to share my current state of running multiple instances:

I was also able to run multiple instances of Cilicon.
The only difference and more convenient way i used to run Cilicon is by adding a launch argument and start each instant in the following way:

open /Applications/Cilicon.app -n --args -config-path /Users//cilicon.yml

Running this will start two Cilicon instances using the provided configs:
open /Applications/Cilicon.app -n --args -config-path /Users/user1/cilicon.yml
open /Applications/Cilicon.app -n --args -config-path /Users/user1/cilicon2.yml

@main
struct CiliconApp: App {
    /// If no launch argument is found. The default fallback config is used
    static let fallbackConfig = "\(NSHomeDirectory())/cilicon.yml"

    /// Launch argument to config e.g. ~/cilicon.yml
    static let configPath = UserDefaults.standard.string(forKey: "config-path") ?? fallbackConfig

   ...

}

@ccorneliu
Copy link

Hi @FabianBartels ,

Thanks for sharing and this is an interesting approach. Have you encounter the issue described by @Halle above, at some point Cilicon trying to have 3 VMs started, which is not allowed?

Cilicon instance 1 -> running 1 VM and waiting for workflows to be picked up
Cilicon instance 2 -> running 1 VM, but when restarting/recreating it, 2 VMs will attempt to run at the same time. I suspect one is the one to close and another one that's new. During this process, Cilicon instance 2 would fail.

If yes, what was the fix for that issue?

@FabianBartels
Copy link
Contributor

FabianBartels commented Aug 22, 2024

HI @ccorneliu,

i checked the fork of @Halle and the only thing that was missing compared to the current main branch of the official Cilicon was a sleep.

So far what works for me is:

  1. run "open /Applications/Cilicon.app -n --args -config-path /Users/user1/cilicon.yml"
  2. (Optional) wait till its successfully connected to github
  3. run "open /Applications/Cilicon.app -n --args -config-path /Users/user1/cilicon2.yml"

I ran multiple tests and all of them worked without any issues.

See sleep in this excerpt:

    logger.log(string: "---------- Starting Up ----------\n")
        try await virtualMachine.start()
        vmState = .running(virtualMachine)
        
        try await Task.sleep(for: .seconds(5))

Full method:

 @MainActor
    private func setupAndRunVirtualMachine() async throws {
        try await cloneBundle()
        let vmHelper = VMConfigHelper(vmBundle: activeBundle)
        let vmConfig = try vmHelper.computeRunConfiguration(config: config)
        let virtualMachine = VZVirtualMachine(configuration: vmConfig)
        virtualMachine.delegate = self

        logger.log(string: "---------- Starting Up ----------\n")
        try await virtualMachine.start()
        vmState = .running(virtualMachine)
        
        try await Task.sleep(for: .seconds(5))

        self.ip = try await fetchIP(macAddress: clonedBundle.configuration.macAddress.string)
        
        let client = try await SSHClient.connect(
            host: ip,
            authenticationMethod: .passwordBased(username: config.sshCredentials.username, password: config.sshCredentials.password),
            hostKeyValidator: .acceptAnything(),
            reconnect: .always
        )

        if let preRun = config.preRun {
            let streamOutput = try await client.executeCommandStream(preRun, inShell: true)
            for try await blob in streamOutput {
                switch blob {
                case let .stdout(stdout):
                    logger.log(string: String(buffer: stdout))
                case let .stderr(stderr):
                    logger.log(string: String(buffer: stderr))
                }
            }
        }

        if let provisioner {
            do {
                try await provisioner.provision(bundle: activeBundle, sshClient: client)
            } catch {
                logger.log(string: error.localizedDescription + "\n")
            }
        }

        if let postRun = config.postRun {
            let streamOutput = try await client.executeCommandStream(postRun, inShell: true)
            for try await blob in streamOutput {
                switch blob {
                case let .stdout(stdout):
                    logger.log(string: String(buffer: stdout))
                case let .stderr(stderr):
                    logger.log(string: String(buffer: stderr))
                }
            }
        }
        try await client.close()
        logger.log(string: "---------- Shutting Down ----------\n")
        Task { @MainActor in
            try await virtualMachine.stop()
            try await handleStop()
        }
    }




In addition to that (not related to the issue you asked for) is a change of the GitHub Provisioner:

  • This makes the ephemeral runner unique and was missing in my opinion.
  • Also it solves an auth issue with not being able to configure the runner when the Cilicon app is restarted because the token is lost at that point.
  • Be aware that for each start of Cilicon you create a new ephemeral runner in github. Github nukes ephemeral on a daily bases if not used, so its not a big issue :) just wanted to mention that.
    func provision(bundle: VMBundle, sshClient: SSHClient) async throws {
         ...
        let runnerName = self.runnerName + "-" + UUID().uuidString.lowercased()
       ...

@Marcocanc
Copy link
Member

Hey all, sorry, I missed the activity on this issue.
I've been (on-and-off) working on a new version of Cilicon that supports multiple VMs.

Screenshot 2024-08-26 at 10 19 30

It definitely still needs quite some work. Specifically :

  • The download logic is still not working well. The idea is to have a download manager that will queue OCI downloads. If both VMs want the same image, they should both wait for the same download to finish. If one Runner has the image it needs, it should run while the other is getting its image downloaded.
  • There's a pesky bug that has been haunting the current versions of Cilicon, but is more noticeable when running two VMs at the same time. Sometimes when stopping the VM, try await vm.stop() gets stuck. No Error, no success. I'm leaning towards it being an internal issue in Virtualization.framework, but maybe I'm doing something wrong.
  • We've been testing this version on one of our machines (although only with a single VM), and on rare occasions it crashes with a SwiftUI bug. Cilicon is my only SwiftUI experience. If anyone who's experienced with SwiftUI spots any red flags, please do point them out.
  • Restart Scheduling not fully implemented yet

Happy to accept contributions on the branch if anyone wants to contribute.
Will publish the branch and a bleeding edge build in this thread today.

The new version includes breaking changes for the config file. Here's an example of the new structure:

machines: 
  - id: runner-1
    source: oci://ghcr.io/cirruslabs/macos-sonoma-xcode:15.4
    provisioner:
      type: script
      config:
        run: echo Hello World
    hardware:
      ramGigabytes: 8
      cpuCores: 4
  - id: runner-2
    source: oci://ghcr.io/cirruslabs/macos-sonoma-xcode:15.4
    provisioner:
      type: script
      config:
        run: echo Hello World
    hardware:
      ramGigabytes: 8
      cpuCores: 4

@Marcocanc
Copy link
Member

Here's the branch: https://github.com/traderepublic/Cilicon/tree/cilicon-3.0
And the build: Cilicon 3.zip

@ElectricCookie
Copy link

ElectricCookie commented Sep 3, 2024

Just tried out the v3. I am able to get two runners (GitHub) set up but it seems they show up as the same runner in GitHub. Running jobs will cause both runners to restart once a job finished and they never run two things at the same time :D

I'm running them both with source pointing to the same .tart vm? so maybe thats the issue?

Edit: Was able to resolve this by adding a runnerName to the github provisioner config

Looks very promising!

@Marcocanc
Copy link
Member

@ElectricCookie The configuration validation definitely still needs a bit of work. Unfortunately multi-runner Cilicon is very low priority for us at the moment, as we don't need it internally.

@ElectricCookie
Copy link

@Marcocanc would you be open to a PR implementing apple’s https://pkl-lang.org/ as config file? Since v3 will have a breaking change anyway, this might be a good moment to switch to a typed and validated config file 😄

@Marcocanc
Copy link
Member

I'd be open for. We could also try to make it yml compatible (fallback to yml if no pkl is found). The generated code conforms to Decodable anyway. When deploying Cilicon + pkl config, would we have to ship the schema along with it, or is that only required if you want the editor to warn you about a bad config?

@ElectricCookie
Copy link

I think you would amend the template file which you can import via a URL (https://pkl-lang.org/main/current/language-reference/index.html#module-uris). I'm unsure whether you could simply point the the raw.github url of the file in the repo - which would have the benefit of being versioned. I'll have a look at this once I get around to tinkering :)

@tonyarnold
Copy link

The preview of Cilicon 3.0 is great! Took me ~10 minutes to setup and get running, and it's working really well. Thanks for putting the effort into making multiple VMs work 🌻

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests