Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

presto can't insert data #1122

Open
woodliu opened this issue Mar 6, 2020 · 13 comments
Open

presto can't insert data #1122

woodliu opened this issue Mar 6, 2020 · 13 comments

Comments

@woodliu
Copy link

woodliu commented Mar 6, 2020

After install metering, I check the logs of reporting-operator,it shows error: io.prestosql.spi.PrestoException: Failed checking path: file:/user/hive/warehouse/metering.db/datasource_metering_persistentvolumeclaim_request_bytes
Then i try a test. In Hive i create a table and insert data to the table ,it's successful. I check this table from Prosto, i can see it. But when i insert data in it,it return the same error like:

presto:metering> INSERT INTO kwang_test VALUES (1, 'San Francisco');
Query 20200306_063214_01963_k6i59 failed: Failed checking path: file:/user/hive/warehouse/metering.db/kwang_test

This is my configuration:

apiVersion: metering.openshift.io/v1
kind: MeteringConfig
metadata:
  name: "operator-metering"
spec:
  tls:
    enabled: false
  presto:
    spec:
      coordinator:
        resources:
          limits:
            cpu: 6
            memory: 6Gi
          requests:
            cpu: 4
            memory: 4Gi
  storage:
    type: "hive"
    hive:
      type: "sharedPVC"
      sharedPVC:
        claimName: "reporting-operator-pvc"
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: reporting-operator-pvc
  namespace: metering
spec:
  accessModes: ["ReadWriteMany"]
  resources:
    requests:
      storage: 95Gi
apiVersion: v1
kind: PersistentVolume
metadata:
  name: operator-metering-pv
  labels:
    name: operator-metering
spec:
  capacity:
    storage: 5Gi
  accessModes:
  - ReadWriteMany
  hostPath:
    path: "/mnt/metering/hive-metastore"
  persistentVolumeReclaimPolicy: Delete
  nodeAffinity:
    required:
      nodeSelectorTerms:
      - matchExpressions:
        - key: kubernetes.io/hostname
          operator: In
          values:
          - node1

I use a local filesystem to create pvc as the Hive storage.
Anyone help, thanks a lot !

@timflannagan
Copy link
Contributor

I haven't played around with using a ReadWriteMany PVC for metastore storage in a while, but you may want to try updating your MeteringConfig custom resource and explicitly specifying the mount path for that volume (as the default is /user/hive/warehouse/metering.db):

hive:
  storage:
    type: "sharedPVC"
    sharedPVC:
      claimName: "reporting-operator-pvc"
      mountPath: "/mnt/metering/hive-metastore"

I can try re-creating your setup later and report back.

@woodliu
Copy link
Author

woodliu commented Mar 6, 2020

I haven't played around with using a ReadWriteMany PVC for metastore storage in a while, but you may want to try updating your MeteringConfig custom resource and explicitly specifying the mount path for that volume (as the default is /user/hive/warehouse/metering.db):

hive:
  storage:
    type: "sharedPVC"
    sharedPVC:
      claimName: "reporting-operator-pvc"
      mountPath: "/mnt/metering/hive-metastore"

I can try re-creating your setup later and report back.

Thank you for replying. I have tried,but it still not work....
I install a presto use this version,it works when i use presto insert data to an exist table.

@timflannagan
Copy link
Contributor

Hmm okay, you may need to uninstall and then reinstall again as changes made post-installation to any storage configuration typically don't propagate.

@woodliu
Copy link
Author

woodliu commented Mar 6, 2020

Hmm okay, you may need to uninstall and then reinstall again as changes made post-installation to any storage configuration typically don't propagate.

Yeah, When uninstall, I allways clean all the data, include all pv and pvc, and the data in the pvs.

@woodliu
Copy link
Author

woodliu commented Mar 6, 2020

I think the issue may be related with the presto.....
I don't know the relationship between operator-framework/presto and prestosql/presto

@timflannagan
Copy link
Contributor

timflannagan commented Mar 6, 2020

The metering-operator pulls container images from that get built from the operator-framework/presto repository, which is just a fork of the upstream prestosql/presto repository. I believe the fork uses the 322 prestosql version.

@woodliu
Copy link
Author

woodliu commented Mar 8, 2020

@timflannagan1 could you tell me how to build a new presto with new version of prestosql/presto? Thank you

@timflannagan
Copy link
Contributor

timflannagan commented Mar 8, 2020

I recent built and pushed a more up-to-date Presto image using the 328 version (330 is the most recent, released version) under the quay.io/tflannag/presto:release-328 tag, and you would need to override the metering-operator image to point to the quay.io/tflannag/origin-metering-ansible-operator:release-328 too:

export METERING_OPERATOR_IMAGE_REPO=quay.io/tflannag/origin-metering-ansible-operator
export METERING_OPERATOR_IMAGE_TAG=release-328

If you're interested in using that image, you would need to update the MeteringConfig custom resource file to specify that repository and tag, e.g.:

apiVersion: metering.openshift.io/v1
kind: MeteringConfig
...
spec:
...
  presto:
    spec:
      image:
        repository: quay.io/tflannag/presto
        tag: release-328

I haven't really played around with that Presto image yet and I had to remove one of the Presto plugin/connectors (presto-prometheus) to make it build properly, but I assume it should work out-of-the-box.

If that doesn't suffice, then in order to emulate this process, I had to pull down the 328 release tag (or whatever applicable release tag) from the upstream prestosql/presto repository. I then had to update the Dockerfile.okd, adding any of the new directories in the repository that were highlighted as errors when attempted to build this new image via docker build -f Dockerfile.okd <image>:<tag> ..

Here is the full diff of that Dockerfile (with same lazy workarounds like hardcoding the presto version instead of using the $PRESTO_VERSION environment variable):

tflannag@localhost presto [cherry-pick-hive-metastore-s3-fix]  git diff Dockerfile.okd
diff --git a/Dockerfile.okd b/Dockerfile.okd
index a35f9a4554..d63571804e 100644
--- a/Dockerfile.okd
+++ b/Dockerfile.okd
@@ -44,7 +44,6 @@ COPY presto-record-decoder /build/presto-record-decoder
 COPY presto-tpcds /build/presto-tpcds
 COPY presto-plugin-toolkit /build/presto-plugin-toolkit
 COPY presto-spi /build/presto-spi
-COPY presto-prometheus /build/presto-prometheus
 COPY presto-thrift-testing-server /build/presto-thrift-testing-server
 COPY presto-cli /build/presto-cli
 COPY presto-hive /build/presto-hive
@@ -75,7 +74,10 @@ COPY presto-kudu /build/presto-kudu
 COPY presto-main /build/presto-main
 COPY presto-raptor-legacy /build/presto-raptor-legacy
 COPY presto-password-authenticators /build/presto-password-authenticators
+COPY presto-memsql /build/presto-memsql
+COPY presto-testing /build/presto-testing
 COPY src /build/src
+COPY src/modernizer/violations.xml /build/src/modernized/violations.xml
 COPY pom.xml /build/pom.xml
 
 # build presto
@@ -103,7 +105,7 @@ RUN chmod +x /usr/bin/tini
 
 RUN mkdir -p /opt/presto
 
-ENV PRESTO_VERSION 322
+ENV PRESTO_VERSION 328
 ENV PRESTO_HOME /opt/presto/presto-server
 ENV PRESTO_CLI /opt/presto/presto-cli
 ENV PROMETHEUS_JMX_EXPORTER /opt/jmx_exporter/jmx_exporter.jar
@@ -113,8 +115,8 @@ ENV JAVA_HOME=/etc/alternatives/jre
 
 RUN mkdir -p $PRESTO_HOME
 
-COPY --from=build /build/presto-server/target/presto-server-$PRESTO_VERSION $PRESTO_HOME
-COPY --from=build /build/presto-cli/target/presto-cli-$PRESTO_VERSION-executable.jar $PRESTO_CLI
+COPY --from=build /build/presto-server/target/presto-server-328 $PRESTO_HOME
+COPY --from=build /build/presto-cli/target/presto-cli-328-executable.jar $PRESTO_CLI
 COPY --from=build /build/jmx_prometheus_javaagent.jar $PROMETHEUS_JMX_EXPORTER
tflannag@localhost presto [cherry-pick-hive-metastore-s3-fix]  

Here's a link to a local branch that highlights the changes I made: timflannagan/presto@5b6e1c2

@woodliu
Copy link
Author

woodliu commented Mar 9, 2020

@timflannagan1 Hi, I have used the new image you pushed. But because there is no prometheus connector, it shows the error as below:
2020-03-09T02:08:00.879Z ERROR main io.prestosql.server.PrestoServer No factory for connector 'prometheus'. Available factories: [memory, kudu, blackhole, kinesis, redis, accumulo, gsheets, raptor-legacy, jmx, postgresql, elasticsearch, redshift, sqlserver, localfile, tpch, iceberg, mysql, mongodb, example-http, tpcds, phoenix, system, cassandra, kafka, atop, hive-hadoop2, presto-thrift] java.lang.IllegalArgumentException: No factory for connector 'prometheus'. Available factories: [memory, kudu, blackhole, kinesis, redis, accumulo, gsheets, raptor-legacy, jmx, postgresql, elasticsearch, redshift, sqlserver, localfile, tpch, iceberg, mysql, mongodb, example-http, tpcds, phoenix, system, cassandra, kafka, atop, hive-hadoop2, presto-thrift] at com.google.common.base.Preconditions.checkArgument(Preconditions.java:440) at io.prestosql.connector.ConnectorManager.createCatalog(ConnectorManager.java:180) at io.prestosql.metadata.StaticCatalogStore.loadCatalog(StaticCatalogStore.java:88) at io.prestosql.metadata.StaticCatalogStore.loadCatalogs(StaticCatalogStore.java:68) at io.prestosql.server.PrestoServer.run(PrestoServer.java:129) at io.prestosql.$gen.Presto_328_106_g2c7b27a_dirty____20200309_020751_1.run(Unknown Source) at io.prestosql.server.PrestoServer.main(PrestoServer.java:72)

And I want to ask two question about building presto

  • Just change PRESTO_VERSION from 322 to 328,then it can pull the 328 version code from prestosql/presto?

  • How to clean the outputs if it is wrong with command docker build -f Dockerfile.okd .

Thank you!

@timflannagan
Copy link
Contributor

timflannagan commented Mar 9, 2020

You would also need to override the metering-operator image to use a custom one I also pushed that removed the presto-prometheus connector catalog from being loaded:

export METERING_OPERATOR_IMAGE_REPO=quay.io/tflannag/origin-metering-ansible-operator
export METERING_OPERATOR_IMAGE_TAG=release-328
./hack/openshift-install.sh

Like I said, I haven't tested that version yet and I don't know how much changes there are between 322 and 328 in terms of the hive connector catalog configuration. At a glance, it looks like there's some changes to the TLS-related properties which may break some things. If that's the case, you may need to disable TLS entirely in the MeteringConfig custom resource:

...
spec:
  tls:
    enabled: false
  ...

And I want to ask two question about building presto

* Just change `PRESTO_VERSION` from 322 to 328,then it can pull the 328 version code from `prestosql/presto`?

Nope, that environment variable only controls copying some versioned files from a previous container layer, and you would need to run pull down the upstream (prestosql/presto) release tag and attempt to merge that (e.g. git pull <whatever remote points to the prestosql/presto repo> 328).

  • How to clean the outputs if it is wrong with command docker build -f Dockerfile.okd .

It's difficult to say, and the main thing should be copying all the new directories from the upstream release tag (COPY <new dir> /build<new-dir>) which get highlighted when attempting to build that Dockerfile.

After that, it can be a bit tricky as the maven build can take quite a while and I've found that those error messages aren't as obvious as the docker-related build ones.

@woodliu
Copy link
Author

woodliu commented Mar 9, 2020

sorry,i missed the operator images,i am tring now, thanks a lot !

@woodliu
Copy link
Author

woodliu commented Mar 9, 2020

@timflannagan1 Hi, if i use 328 version operator, the operator return error :

Setting up watches.  Beware: since -r was given, this may take a while!
Watches established.
/tmp/ansible-operator/runner/metering.openshift.io/v1/MeteringConfig/metering/operator-metering/artifacts/6129484611666145821//stdout
Traceback (most recent call last):
  File "/usr/bin/ansible-playbook", line 63, in <module>
    from ansible.utils.display import Display
  File "/usr/lib/python2.7/site-packages/ansible/utils/display.py", line 60, in <module>
    class FilterUserInjector(logging.Filter):
  File "/usr/lib/python2.7/site-packages/ansible/utils/display.py", line 65, in FilterUserInjector
    username = getpass.getuser()
  File "/usr/lib64/python2.7/getpass.py", line 158, in getuser
    return pwd.getpwuid(os.getuid())[0]
KeyError: 'getpwuid(): uid not found: 1000300000'

So i think may be i should use another version of operator-metering that earlier than 4.5... or build a new image by myself. I will try these ways later.

@woodliu
Copy link
Author

woodliu commented Mar 11, 2020

I have tried new version presto(330), use the presto connect to hive server. But got the same error...
Using presto, i can see the catalog, schema and tables, but i can't read the data from tables or insert data to tables.
Using hive, i can both read and write data. Even use command likehadoop fs -ls file:/mnt/metering/hive-metastore/metering.db/kwang_test i can get the right results.

Because we don't have the object storage like Amazon S3 or Azure. Just some OBS which implement S3 standard by other Manufacturer. Because presto returns the error when i use this kind of OBS, so i switch to file system volume to see what happend.
I see there are someone have the same issue with presto, but have no answer to sloved the problem. It takes too much time to slove it, may be i should wait for the answer...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants