-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add language support for Simula #7025
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤔 your samples aren't real world examples… they're the kind of thing you write when learning the language. Your search query also wouldn't find them which brings me to my next point… there are a lot of .sim
files on GitHub and I don't think they're all Simula. Merging this PR as-is will result in all .sim
files being classified as Simula.
To prevent this, please identify at least one other language that uses this extension and add support for it as part of this PR. A heuristic should also be implemented to differentiate the two.
Sorry about my long answer - I tried to shorten it as much as I could! Code examples
Yup, you're absolutely right! I followed the example of COBOL, Fortran and (to an extent Scala) which also have toy examples. The issue is licensing - Simula companies have ceased to exist, code authors may already have passed away. And some of the code is copylefted. I wrote the sample code for the purpose of this PR, to make sure to have something MIT-licensed. Existing .sim files on GitHub
I had a look, it seems like the stuff people put in .sim files is very diverse:
I couldn't really find any sort of language here that is more prevalent than the others, is there any in particular? My suggestion is to detect Simula with a heuristic - otherwise just leave the files as plaintext like today? Heuristic
There are a couple of heuristics we could choose, what would be the better option? 1) Simple
|
We accept copyleft licensed samples (we have plenty GPL samples for example) as we don't ship them as-is... we ship them as tokens which are used to train the classifier.
The only preference is for the other language to also meet our usage requirements, so go for the other language with the most files. We can't simply implement a heuristic as it needs two or more languages to apply to it hence the need for identifying at least another language.
Lets start simple. It's good enough. It can be refined as people add support for the other languages. |
Ok, I added the simple begin/end heuristic. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See inline comment. We also need a test for your heuristic.
@@ -8245,6 +8259,7 @@ YAML: | |||
- ".mir" | |||
- ".reek" | |||
- ".rviz" | |||
- ".sim" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need a sample for this language if you're going to add this extension here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need a sample for this language if you're going to add this extension here.
I'm not 100 % what you meant - was this about the Simula language, or about the YAML fallback?
I made 2 updates to the pull request:
- Pushed 5 more Simula language code samples
- Added an example of an existing
.sim
file on GitHub which would make sense to code-highlight as YAML (it's JSON, but all JSON is valid YAML).
It would be nice if linguist supported a "noop fallback" configuration, where - if we can't positively identify a .sim
file as valid Simula code - it would just be treated as unknown text format.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was referring to the YAML entry. You added it but didn't add a sample.
I see you've now added samples but you've now also added samples that are too big. Please remove any that are suppressed in the diff and any that aren't real world uses.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I removed the samples!
… YAML highlighter (JSON is also valid YAML)
Oh yes and we need a link to the source of each sample and the licence for each in the initial PR template. |
- language: Simula | ||
pattern: '(?i)\bbegin\b.*?\bend\b' | ||
- language: YAML | ||
pattern: '^[\{\[]' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We also need a test for this heuristic
Simula is a language of historic and academic interest. It sprung out of ALGOL, and went on to inspire next-generation languages like Java, C++ and C#.
It was invented by Ole-Johan Dahl (1931-2002) and Kristen Nygaard (1926 – 2002) at the Norwegian Computing Center, for which they both received the Turing Award - "for ideas fundamental to the emergence of object oriented programming". It's considered one of the most impactful Norwegian contributions to the field of Computer Science.
Personally, I think these historical languages are incredibly exciting, and learning about their design decisions teaches us a lot about designs of newer programming languages! Modern Simula tools, like syntax highlighters and compilers for recent hardware, enable new generations of software developers to take a first-hand look at how programming languages worked in the past.
Checklist:
There is also a grammar based on tree-sitter: https://github.com/eirslett/tree-sitter-simula, but I couldn't make it work with the Linguist script
script/add-grammar
- the grammar was not detected.#B22F2F