Language Selection

English French German Italian Portuguese Spanish

Moving (parts of) the Cling REPL in Clang

Filed under
Development

Motivation
===

Over the last decade we have developed an interactive, interpretative 
C++ (aka REPL) as part of the high-energy physics (HEP) data analysis 
project -- ROOT [1-2]. We invested a significant  effort to replace the 
CINT C++ interpreter with a newly implemented REPL based on llvm -- 
cling [3]. The cling infrastructure is a core component of the data 
analysis framework of ROOT and runs in production for approximately 5 
years.

Cling is also  a standalone tool, which has a growing community outside 
of our field. Cling’s user community includes users in finance, biology 
and in a few companies with proprietary software. For example, there is 
a xeus-cling jupyter kernel [4]. One of the major challenges we face to 
foster that community is  our cling-related patches in llvm and clang 
forks. The benefits of using the LLVM community standards for code 
reviews, release cycles and integration has been mentioned a number of 
times by our "external" users.

Last year we were awarded an NSF grant to improve cling's sustainability 
and make it a standalone tool. We thank the LLVM Foundation Board for 
supporting us with a non-binding letter of collaboration which was 
essential for getting this grant.


Background
===

Cling is a C++ interpreter built on top of clang and llvm. In a 
nutshell, it uses clang's incremental compilation facilities to process 
code chunk-by-chunk by assuming an ever-growing translation unit [5]. 
Then code is lowered into llvm IR and run by the llvm jit. Cling has 
implemented some language "extensions" such as execution statements on 
the global scope and error recovery. Cling is in the core of HEP -- it 
is heavily used during data analysis of exabytes of particle physics 
data coming from the Large Hadron Collider (LHC) and other particle 
physics experiments.


Plans
===

The project foresees three main directions -- move parts of cling 
upstream along with the clang and llvm features that enable them; extend 
and generalize the language interoperability layer around cling; and 
extend and generalize the OpenCL/CUDA support in cling. We are at the 
early stages of the project and this email intends to be an RFC for the 
first part -- upstreaming parts of cling. Please do share your thoughts 
on the rest, too.


Moving Parts of Cling Upstream
---

Over the years we have slowly moved some patches upstream. However we 
still have around 100 patches in the clang fork. Most of them are in the 
context of extending the incremental compilation support for clang. The 
incremental compilation poses some challenges in the clang 
infrastructure. For example, we need to tune CodeGen to work with 
multiple llvm::Module instances, and finalize per each 
end-of-translation unit (we have multiple of them). Other changes 
include small adjustments in the FileManager's caching mechanism, and 
bug fixes in the SourceManager (code which can be reached mostly from 
within our setup). One conclusion we can draw from our research is that 
the clang infrastructure fits amazingly well to something which was not 
its main use case. The grand total of our diffs against clang-9 is: `62 
files changed, 1294 insertions(+), 231 deletions(-)`. Cling is currently 
being upgraded from llvm-5 to llvm-9.

A major weakness of cling's infrastructure is that it does not work with 
the clang Action infrastructure due to the lack of an 
IncrementalAction.  A possible way forward would be to implement a 
clang::IncrementalAction as a starting point. This way we should be able 
to reduce the amount of setup necessary to use the incremental 
infrastructure in clang. However, this will be a bit of a testing 
challenge -- cling lives downstream and some of the new code may be 
impossible to pick straight away and use. Building a mainline example 
tool such as clang-repl which gives us a way to test that incremental 
case or repurpose the already existing clang-interpreter may  be able to 
address the issue. The major risk of the task is avoiding code in the 
clang mainline which is untested by its HEP production environment.
There are several other types of patches to the ROOT fork of Clang, 
including ones  in the context of performance,towards  C++ modules 
support (D41416), and storage (does not have a patch yet but has an open 
projects entry and somebody working on it). These patches can be 
considered in parallel independently on the rest.

Extend and Generalize the Language Interoperability Layer Around Cling
---

HEP has extensive experience with on-demand python interoperability 
using cppyy[6], which is built around the type information provided by 
cling. Unlike tools with custom parsers such as swig and sip and tools 
built on top of C-APIs such as boost.python and pybind11, cling can 
provide information about memory management patterns (eg refcounting) 
and instantiate templates on the fly.We feel that functionality may not 
be of general interest to the llvm community but we will prepare another 
RFC and send it here later on to gather feedback.


Extend and Generalize the OpenCL/CUDA Support in Cling
---

Cling can incrementally compile CUDA code [7-8] allowing easier set up 
and enabling some interesting use cases. There are a number of planned 
improvements including talking to HIP [9] and SYCL to support more 
hardware architectures.



The primary focus of our work is to upstreaming functionality required 
to build an incremental compiler and rework cling build against vanilla 
clang and llvm. The last two points are to give the scope of the work 
which we will be doing the next 2-3 years. We will send here RFCs for 
both of them to trigger technical discussion if there is interest in 
pursuing this direction.


Collaboration
===

Open source development nowadays relies on reviewers. LLVM is no 
different and we will probably disturb a good number of people in the 
community ;)We would like to invite anybody interested in joining our 
incremental C++ activities to our open every second week calls. 
Announcements will be done via google group: compiler-research-announce 
(https://groups.google.com/g/compiler-research-announce).



Many thanks!


David & Vassil

Read more

Also: Cling C++ Interpreter Looking To Upstream More Code Into LLVM

More in Tux Machines

Hardware Freedom: 3D Printing, RasPi and RPi CM3 Module

  • Can 3D Printing Really Solve PPE Shortage in COVID-19 Crisis? The Myth, and The Facts!

    Amid COVID-19 crisis, we see severe shortage of Personal Protective Equipment (PPE) worldwide, to the point that a strict organization like FDA is making exceptions for PPE usage, and there are volunteer effors to try to alleviate this shortage like GetUsPPE. Also, Centers for Disease Control and Prevention (CDC) provides an Excel spreadsheet file to help calculate the PPE Burn Rate. There are many blog posts, video tutorials, and guides that teach people how to print their face shields and masks.

  • Raspberry Pi won’t let your watched pot boil
  • Growing fresh veggies with Rpi and Mender

    Some time ago my wife and I decided to teach our kids how to grow plants. We both have experience as we were raised in small towns where it was common to own a piece of land where you could plant home-grown fresh veggies. The upbringing of our kids is very different compared to ours, and we realized we never showed our kids how to grow our own veggies. We wanted them to learn and to understand that “the vegetables do not grow on the shop-shelf”, and that there is work (and fun) involved to grow those. The fact that we are gone for most of the summer and to start our own garden just to see it die when we returned seemed to be pointless. This was a challenge. Luckily, me being a hands-on engineer I promised my wife to take care of it. There were two options: we could buy something that will water our plants when we are gone, or I could do it myself (with a little help from our kids). Obviously I chose the more fun solution…

  • Comfile Launches 15-inch Industrial Raspberry Pi Touch Panel PC Powered by RPi CM3 Module

    Three years ago, we noted Comfile has made 7-inch and 10.2-inch touch panel PC’s powered by Raspberry Pi 3 Compute Module. The company has recently introduced a new model with a very similar design except for a larger 15-inch touchscreen display with 1024×768 resolution. ComfilePi CPi-A150WR 15-inch industrial Raspberry Pi touch panel PC still features the CM3 module, and the same ports including Ethernet, USB ports, RS232, RS485, and I2C interfaces accessible via terminal blocks, and a 40-pin I/O header.

Programming: Vala, Perl and Python

  • Excellent Free Tutorials to Learn Vala

    Vala is an object-oriented programming language with a self-hosting compiler that generates C code and uses the GObject system. Vala combines the high-level build-time performance of scripting languages with the run-time performance of low-level programming languages. Vala is syntactically similar to C# and includes notable features such as anonymous functions, signals, properties, generics, assisted memory management, exception handling, type inference, and foreach statements. Its developers, Jürg Billeter and Raffaele Sandrini, wanted to bring these features to the plain C runtime with little overhead and no special runtime support by targeting the GObject object system. Rather than compiling directly to machine code or assembly language, it compiles to a lower-level intermediate language. It source-to-source compiles to C, which is then compiled with a C compiler for a given platform, such as GCC. Did you always want to write GTK+ or GNOME programs, but hate C with a passion? Learn Vala with these free tutorials! Vala is published under the GNU Lesser General Public License v2.1+.

  • Supporting Perl-related creators via Patreon

    Yesterday I posted about this in the Perl Weekly newsletter and both Mohammad and myself got 10 new supporters. This is awesome. There are not many ways to express the fact that you really value the work of someone. You can send them postcards or thank-you notes, but when was the last time you remembered to do that? Right, I also keep forgetting to thank the people who create all the free and awesome stuff I use. Giving money as a way to express your thanks is frowned upon by many people, but trust me, the people who open an account on Patreon to make it easy to donate them money will appreciate it. In any case it is way better than not saying anything.

  • 2020.31 TwentyTwenty

    JJ Merelo kicked off the special 20-day Advent Blog cycle in honour of the publication of the first RFC that would lay the foundation for the Raku Programming Language as we now know it. After that, 3 blog posts got already published:

  • Supporting The Full Lifecycle Of Machine Learning Projects With Metaflow

    Netflix uses machine learning to power every aspect of their business. To do this effectively they have had to build extensive expertise and tooling to support their engineers. In this episode Savin Goyal discusses the work that he and his team are doing on the open source machine learning operations platform Metaflow. He shares the inspiration for building an opinionated framework for the full lifecycle of machine learning projects, how it is implemented, and how they have designed it to be extensible to allow for easy adoption by users inside and outside of Netflix. This was a great conversation about the challenges of building machine learning projects and the work being done to make it more achievable.

  • Django 3.1 Released

    The Django team is happy to announce the release of Django 3.1.

  • Awesome Python Applications: buku

    buku: Browser-independent bookmark manager with CLI and web server frontends, with integrations for browsers, cloud-based bookmark managers, and emacs.

  • PSF GSoC students blogs: Week 9 Check-in

DRM and Proprietary Software Leftovers

  • Some Photoshop users can try Adobe’s anti-misinformation system later this year

    Adobe pitched the CAI last year as a general anti-misinformation and pro-attribution tool, but many details remained in flux. A newly released white paper makes its scope clearer. The CAI is primarily a more persistent, verifiable type of image metadata. It’s similar to the standard EXIF tags that show the location or date of a photograph, but with cryptographic signatures that let you verify the tags haven’t been changed or falsely applied to a manipulated photo.

    People can still download and edit the image, take a screenshot of it, or interact the way they would any picture. Any CAI metadata tags will show that the image was manipulated, however. Adobe is basically encouraging adding valuable context and viewing any untagged photos with suspicion, rather than trying to literally stop plagiarism or fakery. “There will always be bad actors,” says Adobe community products VP Will Allen. “What we want to do is provide consumers a way to go a layer deeper — to actually see what happened to that asset, who it came from, where it came from, and what happened to it.”

    The white paper makes clear that Adobe will need lots of hardware and software support for the system to work effectively. CAI-enabled cameras (including both basic smartphones and high-end professional cameras) would need to securely add tags for dates, locations, and other details. Photo editing tools would record how an image has been altered — showing that a journalist adjusted the light balance but didn’t erase or add any details. And social networks or other sites would need to display the information and explain why users should care about it.

  •  
  • EFF and ACLU Tell Federal Court that Forensic Software Source Code Must Be Disclosed
           
             

    Can secret software be used to generate key evidence against a criminal defendant? In an amicus filed ten days ago with the United States District Court of the Western District of Pennsylvania, EFF and the ACLU of Pennsylvania explain that secret forensic technology is inconsistent with criminal defendants’ constitutional rights and the public’s right to oversee the criminal trial process. Our amicus in the case of United States v. Ellis also explains why source code, and other aspects of forensic software programs used in a criminal prosecution, must be disclosed in order to ensure that innocent people do not end up behind bars, or worse—on death row.

             

    The Constitution guarantees anyone accused of a crime due process and a fair trial. Embedded in those foundational ideals is the Sixth Amendment right to confront the evidence used against you. As the Supreme Court has recognized, the Confrontation Clause’s central purpose was to ensure that evidence of a crime was reliable by subjecting it to rigorous testing and challenges. This means that defendants must be given enough information to allow them to examine and challenge the accuracy of evidence relied on by the government.

  •                
  • Powershell Bot with Multiple C2 Protocols
                     
                       

    I spotted another interesting Powershell script. It's a bot and is delivered through a VBA macro that spawns an instance of msbuild.exe This Windows tool is often used to compile/execute malicious on the fly (I already wrote a diary about this technique[1]). I don’t have the original document but based on a technique used in the macro, it is part of a Word document. It calls Document_ContentControlOnEnter[2]: [...]

  •      
  • FBI Used Information From An Online Forum Hacking To Track Down One Of The Hackers Behind The Massive Twitter Attack
           
             

    As Mike reported last week, the DOJ rounded up three alleged participants in the massive Twitter hack that saw dozens of verified accounts start tweeting out promises to double the bitcoin holdings of anyone who sent bitcoin to a certain account.

  •                    
  • Twitter Expects to Pay 9-Figure Fine for Violating FTC Agreement
                         
                           

    That means that the complaint is not related to last month’s high-profile [cr]ack of prominent accounts on the service. That security incident saw accounts from the likes of Joe Biden and Elon Musk ask followers to send them bitcoin. A suspect was arrested in the incident last month.

  •                    
  • Twitter Expects to Pay Up to $250 Million in FTC Fine Over Alleged Privacy Violations
                         
                           

    Twitter disclosed that it anticipates being forced to pay an FTC fine of $150 million to $250 million related to alleged violations over the social network’s use of private data for advertising.

                           

    The company revealed the expected scope of the fine in a 10-Q filing with the SEC. Twitter said that on July 28 it received a draft complaint from the Federal Trade Commission alleging the company violated a 2011 consent order, which required Twitter to establish an information-security program designed to “protect non-public consumer information.”

                           

    “The allegations relate to the Company’s use of phone number and/or email address data provided for safety and security purposes for targeted advertising during periods between 2013 and 2019,” Twitter said in the filing.

  •                
  • Apple removes more than 26,000 games from China app store
                     
                       

    Apple pulled 29,800 apps from its China app store on Saturday, including more than 26,000 games, according to Qimai Research Institute.

                       

    The removals are in response to Beijing's crackdown on unlicensed games, which started in June and intensified in July, Bloomberg reported. This brings an end to the unofficial practice of letting games be published while awaiting approval from Chinese censors.

  •                
  • Intuit Agrees to Buy Singapore Inventory Software Maker
                     
                       

    Intuit will pay more than $80 million for TradeGecko, according to people familiar with the matter, marking one of the biggest exits in Singapore since the Covid-19 pandemic. TradeGecko has raised more than $20 million to date from investors including Wavemaker Partners, Openspace Ventures and Jungle Ventures.

  •                      
  • Justice Department Is Scrutinizing Takeover of Credit Karma by Intuit, Maker of TurboTax
           
             

    The probe comes after ProPublica first reported in February that antitrust experts viewed the deal as concerning because it could allow a dominant firm to eliminate a competitor with an innovative business model. Intuit already dominates online tax preparation, with a 67% market share last year. The article sparked letters from Sen. Ron Wyden, D-Ore., and Rep. David Cicilline, D-R.I., urging the DOJ to investigate further. Cicilline is chair of the House Judiciary Committee’s antitrust subcommittee.

Security Leftovers

           
  • DNS configuration recommendations for IPFire users

    If you are familiar with IPFire, you might have noticed DNSSEC validation is mandatory, since it defeats entire classes of attacks. We receive questions like "where is the switch to turn off DNSSEC" on a regular basis, and to say it once and for all: There is none, and there will never be one. If you are running IPFire, you will be validating DNSSEC. Period. Another question frequently asked is why IPFire does not support filtering DNS replies for certain FQDNs, commonly referred to as a Response Policy Zone (RPZ). This is because an RPZ does what DNSSEC attempts to secure users against: Tamper with DNS responses. From the perspective of a DNSSEC-validating system, a RPZ will just look like an attacker (if the queried FQDN is DNSSEC-signed, which is what we strive for as much of them as possible), thus creating a considerable amount of background noise. Obviously, this makes detecting ongoing attacks very hard, most times even impossible - the haystack to search just becomes too big. Further, it does not cover direct connections to hardcoded IP addresses, which is what some devices and attackers usually do, as it does not rely on DNS to be operational and does not leave any traces. Using an RPZ will not make your network more secure, it just attempts to cover up the fact that certain devices within it cannot be trusted. Back to DNSSEC: In case the queried FQDNs are signed, forged DNS replies are detected since they do not match the RRSIG records retrieved for that domain. Instead of being transparently redirected to a fradulent web server, the client will only display a error message to its user, indicating a DNS lookup failure. Large-scale attacks by returning forged DNS replies are frequently observed in the wild (the DNSChanger trojan is a well-known example), which is why you want to benefit from validating DNSSEC and more and more domains being signed with it.

  • Security updates for Tuesday

    Security updates have been issued by Debian (libx11, webkit2gtk, and zabbix), Fedora (webkit2gtk3), openSUSE (claws-mail, ghostscript, and targetcli-fb), Red Hat (dbus, kpatch-patch, postgresql-jdbc, and python-pillow), Scientific Linux (libvncserver and postgresql-jdbc), SUSE (kernel and python-rtslib-fb), and Ubuntu (ghostscript, sqlite3, squid3, and webkit2gtk). 

  •        
  • Official 1Password Linux App is Available for Testing

    An official 1Password Linux app is on the way, and brave testers are invited to try an early development preview. 1Password is a user-friendly (and rather popular) cross-platform password manager. It provides mobile apps and browser extensions for Windows, macOS, Android, iOS, Google Chrome, Edge, Firefox — and now a dedicated desktop app for Linux, too.

  •        
  • FBI Warns of Increased DDoS Attacks

    The Federal Bureau of Investigation warned in a “private industry notification” last week that attackers are increasingly using amplification techniques in distributed denial-of-service attacks. There has been an uptick in attack attempts since February, the agency’s Cyber Division said in the alert. An amplification attack occurs when attackers send a small number of requests to a server and the server responds with numerous responses. The attackers spoof the IP address to make it look like the requests are coming from a specific victim, and the resulting responses overwhelms the victim’s network. “Cyber actors have exploited built-in network protocols, designed to reduce computation overhead of day-to-day system and operational functions to conduct larger and more destructive distributed denial-of-service amplification attacks against US networks,” the FBI alert said. Copies of the alert were posted online by several recipients, including threat intelligence company Bad Packets.

  • NSA issues BootHole mitigation guidance

    Following the disclosure of a widespread buffer-flow vulnerability that could affect potentially billions of Linux and Windows-based devices, the National Security Agency issued a follow-up cybersecurity advisory highlighting the bug and offering steps for mitigation. The vulnerability -- dubbed BootHole -- impacts devices and operating systems that use signed versions of the open-source GRUB2 bootloader software found in most Linux systems. It also affects any system or device using Secure Boot -- a root firmware interface responsible for validating the booting process -- with Microsoft's standard third party certificate authority. The vulnerability enables attackers to bypass Secure Boot to allow arbitrary code execution and “could be used to install persistent and stealthy bootkits,” NSA said in a press statement.