Skip to content

Integrate Squid #369

@ggevay

Description

@ggevay

Edit: We have resolved some more issues!

It would be really good to integrate Squid into our compiler pipeline, because its quasiquote-based API would probably make it easier to write tree transformations. I tried to do it [1], but unfortunately there were a number of issues. You can see the details of the issues in comments in the code [1], but here is a list:

Resolved issues:

  • Squid expects the tpt of ValDefs to be filled, but we don't fill them [3] (resolved by an extra transform pass (addValAndVarDefTpts) before passing the tree to Squid)
  • Implicit modifiers were falling off from ValDefs, but Lionel fixed this [6].
  • Generic inner classes were not working, but Lionel fixed this [7].
  • Squid expects implicits to be already inferred [4], but we can't really infer them before passing code to Squid. The solution suggested in [4] works (see addImplicitPlaceholders).
  • Path-dependent types were not supported by Squid [5]. This is an issue, because, e.g., TypeTag is path-dependent on the universe. Lionel fixed this in Squid.
  • The tests in emma-examples-* and emma-lib-* (e.g. TransitiveClosureFlinkSpec) rely on accesses to local variables, which is not supported by Squid by default. However, the solution suggested by Lionel in the below comment solved this issue.
  • DefDefs are not supported in Squid, which is a problem, because we represent control flow with DefDefs (DSCF). I resolved this by doing a dscfInv/dscf before/after Squid.
  • this references to enclosing classes or objects in LogRegFlinkSpec, SGDFlinkSpec, LinRegFlinkSpec, StatsFlinkSpec (and many other unit tests). Resolved by the workaround in Integrate Squid #369 (comment) (2bd6cbc)

Not yet resolved issues:

Probably Emma bugs:

  • TransitiveClosureFlinkSpec fails with squid.quasi.EmbeddingException$Unsupported: Unsupported feature: Reference to local method 'doWhile$m2'. But dscfInv should have removed this, so this is probably a bug in dscfInv.

Probably Squid missing features:

  • Existential types in NaiveBayesFlinkSpec, KMeansFlinkSpec, TokenizeFlinkSpec. The most prevalent is Array[_], which appears due to implicitly resolving CanBuildFrom in code such as line.split(",").map(_.toDouble).
    [2] says that "using type aliases may circumvent this limitation".
    We also have x$1.label.type forSome { val x$1: org.emmalanguage.lib.ml.LDPoint[Long,Double] }, but I'm not really sure how this type appears.
  • [minor] squid.quasi.EmbeddingException$Unsupported: Unsupported feature: org.emmalanguage.api.FlinkDataSet.memoizeTypeInfo[String](implicitly[scala.reflect.runtime.universe.TypeTag[String]], org.apache.flink.api.scala.`package`.createTypeInformation[String])
    This issue is not really a big problem, as these calls come in only near the end of our pipelines, so we can just avoid using Squid at that stage.

Probably Squid bugs (or maybe we give Squid some invalid trees):

  • EvalFlinkSpec, LogRegFlinkSpec, SGDFlinkSpec, LinRegFlinkSpec, BaseCodegenIntegrationSpec.'Updated tmp sink' fail with an undefined variable issue. In EvalFlinkSpec, in the tree that we give to Squid we have a var named anf$m25, and a reference to it somewhere. Squid seems to "rename" this var to anf$m25$m1 but doesn't change the reference. In the other test, the variable get renamed from p$p$r2 to p$p$r2$r1.
  • [minor] NGramsFlinkSpec fails with
java.lang.Error: bad constant value: String of class class scala.reflect.internal.Types$ClassNoArgsTypeRef
	at scala.reflect.internal.Constants$Constant.<init>(Constants.scala:52)
	at scala.reflect.internal.Constants$Constant$.apply(Constants.scala:34)
	at scala.reflect.internal.Constants$Constant$.apply(Constants.scala:272)
	at squid.ir.ScalaTyping$class.constType(ScalaTyping.scala:107)
	at squid.ir.SimpleAST.constType(SimpleAST.scala:20)
<omitted many more lines>
  • [minor] BaseCodegenIntegrationSpec.CSV fails with
(List(),Stream(java.lang.String*)) (of class scala.Tuple2)
scala.MatchError: (List(),Stream(java.lang.String*)) (of class scala.Tuple2)
	at squid.quasi.ModularEmbedding.squid$quasi$ModularEmbedding$$processArgs$1(ModularEmbedding.scala:337)
<omitted many more lines>

TODO:

  • We should consider Lionel's suggestion to implement the Base interface:
    "Note that I'm still of the opinion (as expressed earlier) that in your case, it would be more appropriate to not convert Emma's IR into a Squid IR, but instead to provide a new implementation of the Base interface so Squid can use Emma's own IR behind the scenes. Have you considered that? It represent a tighter coupling with Squid, but I could help with making it work and evolve as needed."
  • An even more extensive test of the integration would be to call Squid in compileSpecPipelines. (Or just comment in the testSquid line in Compiler.scala)

Other notes:

  • There is a list of Scala features that Squid supports [2].

[1] https://github.com/ggevay/emma-1/tree/squid
[2] https://github.com/epfldata/squid/blob/master/doc/reference/Quasiquotes.md#supported-scala-features
[3] #234
[4] epfldata/squid#55 (comment)
[5] epfldata/squid#57 (comment)
[6] epfldata/squid#55
[7] epfldata/squid#56

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions