[INFRA] Remove incubator/incubating for graduation

### What changes were proposed in this pull request? Remove incubator/incubating for graduation including: - Remove `incubator`/`Incubating`. - Remove `DISCLAIMER` and corresponding link. - Update Release scripts and template. Fix apache#2415. ### Why are the changes needed? The ASF board has approved a resolution to graduate Celeborn into a full Top Level Project. To transition from the Apache Incubator to a new TLP, there's a few action items we need to do to complete the transition. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? No. Closes apache#2421 from SteNicholas/infra-graduation. Authored-by: SteNicholas <[email protected]> Signed-off-by: mingji <[email protected]> (cherry picked from commit c9b878a) Signed-off-by: SteNicholas <[email protected]>
SteNicholas · May 7, 2024 · 641a802 · 641a802
1 parent 15de4e5
commit 641a802
Show file tree

Hide file tree

Showing 19 changed files with 71 additions and 92 deletions.
diff --git a/DISCLAIMER b/DISCLAIMER
diff --git a/NOTICE b/NOTICE
@@ -1,5 +1,5 @@
 
-Apache Celeborn (Incubating)
+Apache Celeborn
 Copyright 2022-2024 The Apache Software Foundation.
 
 This product includes software developed at

diff --git a/NOTICE-binary b/NOTICE-binary
@@ -1,5 +1,5 @@
 
-Apache Celeborn (Incubating)
+Apache Celeborn
 Copyright 2022-2024 The Apache Software Foundation.
 
 This product includes software developed at

diff --git a/README.md b/README.md
@@ -1,7 +1,7 @@
-# Apache Celeborn (Incubating)
+# Apache Celeborn
 
-[![Celeborn CI](https://github.com/apache/incubator-celeborn/actions/workflows/maven.yml/badge.svg)](https://github.com/apache/incubator-celeborn/actions/workflows/maven.yml)  
-Celeborn is dedicated to improving the efficiency and elasticity of
+[![Celeborn CI](https://github.com/apache/celeborn/actions/workflows/maven.yml/badge.svg)](https://github.com/apache/celeborn/actions/workflows/maven.yml)  
+Celeborn (/ˈkeləbɔ:n/) is dedicated to improving the efficiency and elasticity of
 different map-reduce engines and provides an elastic, high-efficient 
 management service for intermediate data including shuffle data, spilled data, result data, etc. Currently, Celeborn is focusing on shuffle data.
 
@@ -44,12 +44,12 @@ Celeborn worker's slot count decreases when a partition is allocated and increme
 1. Celeborn supports Spark 2.4/3.0/3.1/3.2/3.3/3.4/3.5, Flink 1.14/1.15/1.17/1.18 and Hadoop MapReduce 2/3.
 2. Celeborn tested under Scala 2.11/2.12/2.13 and Java 8/11/17 environment.
 
-Build Celeborn
+Build Celeborn via `make-distribution.sh`:
 ```shell
 ./build/make-distribution.sh -Pspark-2.4/-Pspark-3.0/-Pspark-3.1/-Pspark-3.2/-Pspark-3.3/-Pspark-3.4/-Pflink-1.14/-Pflink-1.15/-Pflink-1.17/-Pflink-1.18/-Pmr
 ```
 
-package apache-celeborn-${project.version}-bin.tgz will be generated.
+Package `apache-celeborn-${project.version}-bin.tgz` will be generated.
 
 > **_NOTE:_** The following table indicates the compatibility of Celeborn Spark and Flink clients with different versions of Spark and Flink for various Java and Scala versions.
 
@@ -67,7 +67,7 @@ package apache-celeborn-${project.version}-bin.tgz will be generated.
 | Flink 1.17 | &#x274C;          | &#10004;          | &#10004;           | &#x274C;           | &#x274C;          | &#x274C;           | &#x274C;           |
 | Flink 1.18 | &#x274C;          | &#10004;          | &#10004;           | &#x274C;           | &#x274C;          | &#x274C;           | &#x274C;           |
 
-To compile the client for Spark 2.4 with Scala 2.12, please use the following command
+To compile the client for Spark 2.4 with Scala 2.12, please use the following command:
 
 - Scala 2.12.8/2.12.9/2.12.10
 ```shell
@@ -107,8 +107,8 @@ Celeborn cluster composes of Master and Worker nodes, the Master supports both s
 
 ### Deploy Celeborn
 #### Deploy on host
-1. Unzip the tarball to `$CELEBORN_HOME`
-2. Modify environment variables in `$CELEBORN_HOME/conf/celeborn-env.sh`
+1. Unzip the tarball to `$CELEBORN_HOME`.
+2. Modify environment variables in `$CELEBORN_HOME/conf/celeborn-env.sh`.
 
 EXAMPLE:
 ```properties
@@ -117,7 +117,7 @@ CELEBORN_MASTER_MEMORY=4g
 CELEBORN_WORKER_MEMORY=2g
 CELEBORN_WORKER_OFFHEAP_MEMORY=4g
 ```
-3. Modify configurations in `$CELEBORN_HOME/conf/celeborn-defaults.conf`
+3. Modify configurations in `$CELEBORN_HOME/conf/celeborn-defaults.conf`.
 
 EXAMPLE: single master cluster
 ```properties
@@ -151,7 +151,7 @@ celeborn.worker.replicate.fastFail.duration 240s
 celeborn.storage.hdfs.kerberos.principal user@REALM
 celeborn.storage.hdfs.kerberos.keytab /path/to/user.keytab
 
-# If your hosts have disk raid or use lvm, set celeborn.worker.monitor.disk.enabled to false
+# If your hosts have disk raid or use lvm, set `celeborn.worker.monitor.disk.enabled` to false
 celeborn.worker.monitor.disk.enabled false
 ```   
 
@@ -198,26 +198,24 @@ celeborn.worker.flusher.hdfs.buffer.size 4m
 celeborn.storage.hdfs.dir hdfs://<namenode>/celeborn
 celeborn.worker.replicate.fastFail.duration 240s
 
-# If your hosts have disk raid or use lvm, set celeborn.worker.monitor.disk.enabled to false
+# If your hosts have disk raid or use lvm, set `celeborn.worker.monitor.disk.enabled` to false
 celeborn.worker.monitor.disk.enabled false
 ```
 
 Flink engine related configurations:
 ```properties
-# if you are using Celeborn for flink, these settings will be needed
+# If you are using Celeborn for flink, these settings will be needed.
 celeborn.worker.directMemoryRatioForReadBuffer 0.4
-celeborn.worker.directMemoryRatioToResume 0.6
-# these setting will affect performance. 
+celeborn.worker.directMemoryRatioToResume 0.5
+# These setting will affect performance. 
 # If there is enough off-heap memory, you can try to increase read buffers.
 # Read buffer max memory usage for a data partition is `taskmanager.memory.segment-size * readBuffersMax`
 celeborn.worker.partition.initial.readBuffersMin 512
 celeborn.worker.partition.initial.readBuffersMax 1024
 celeborn.worker.readBuffer.allocationWait 10ms
-# Currently, shuffle partitionSplit is not supported, so you should disable split in celeborn worker side or set `celeborn.client.shuffle.partitionSplit.threshold` to a high value in flink client side.
-celeborn.worker.shuffle.partitionSplit.enabled false
 ```
 
-4. Copy Celeborn and configurations to all nodes
+4. Copy Celeborn and configurations to all nodes.
 5. Start all services. If you install Celeborn distribution in the same path on every node and your
    cluster can perform SSH login then you can fill `$CELEBORN_HOME/conf/hosts` and
    use `$CELEBORN_HOME/sbin/start-all.sh` to start all
@@ -250,14 +248,14 @@ WorkerRef: null
 Please refer to our [website](https://celeborn.apache.org/docs/latest/deploy_on_k8s/)
 
 ### Deploy Spark client
-Copy $CELEBORN_HOME/spark/*.jar to $SPARK_HOME/jars/
+Copy `$CELEBORN_HOME/spark/*.jar` to `$SPARK_HOME/jars/`.
 
 #### Spark Configuration
-To use Celeborn,the following spark configurations should be added.
+To use Celeborn, the following spark configurations should be added.
 ```properties
 # Shuffle manager class name changed in 0.3.0:
-#    before 0.3.0: org.apache.spark.shuffle.celeborn.RssShuffleManager
-#    since 0.3.0: org.apache.spark.shuffle.celeborn.SparkShuffleManager
+#    before 0.3.0: `org.apache.spark.shuffle.celeborn.RssShuffleManager`
+#    since 0.3.0: `org.apache.spark.shuffle.celeborn.SparkShuffleManager`
 spark.shuffle.manager org.apache.spark.shuffle.celeborn.SparkShuffleManager
 # must use kryo serializer because java serializer do not support relocation
 spark.serializer org.apache.spark.serializer.KryoSerializer
@@ -272,13 +270,13 @@ spark.shuffle.service.enabled false
 # Sort shuffle writer uses less memory than hash shuffle writer, if your shuffle partition count is large, try to use sort hash writer.  
 spark.celeborn.client.spark.shuffle.writer hash
 
-# We recommend setting spark.celeborn.client.push.replicate.enabled to true to enable server-side data replication
+# We recommend setting `spark.celeborn.client.push.replicate.enabled` to true to enable server-side data replication
 # If you have only one worker, this setting must be false 
 # If your Celeborn is using HDFS, it's recommended to set this setting to false
 spark.celeborn.client.push.replicate.enabled true
 
 # Support for Spark AQE only tested under Spark 3
-# we recommend setting localShuffleReader to false to get better performance of Celeborn
+# we recommend setting localShuffleReader to false for getting better performance of Celeborn
 spark.sql.adaptive.localShuffleReader.enabled false
 
 # If Celeborn is using HDFS
@@ -296,7 +294,7 @@ spark.dynamicAllocation.shuffleTracking.enabled false
 ```
 
 ### Deploy Flink client
-Copy $CELEBORN_HOME/flink/*.jar to $FLINK_HOME/lib/
+Copy `$CELEBORN_HOME/flink/*.jar` to `$FLINK_HOME/lib/`.
 
 #### Flink Configuration
 To use Celeborn, the following flink configurations should be added.
@@ -322,9 +320,9 @@ taskmanager.memory.task.off-heap.size: 512m
 ```
 **Note**: The config option `execution.batch-shuffle-mode` should configure as `ALL_EXCHANGES_BLOCKING`.
 
-### Deploy mapreduce client 
-Add $CELEBORN_HOME/mr/*.jar to to `mapreduce.application.classpath` and `yarn.application.classpath`.
-And setting the following settings in YARN and MapReduce config.
+### Deploy MapReduce client 
+Copy `$CELEBORN_HOME/mr/*.jar` into `mapreduce.application.classpath` and `yarn.application.classpath`.
+Meanwhile, configure the following settings in YARN and MapReduce config.
 ```bash
 -Dyarn.app.mapreduce.am.job.recovery.enable=false
 -Dmapreduce.job.reduce.slowstart.completedmaps=1
@@ -334,7 +332,6 @@ And setting the following settings in YARN and MapReduce config.
 -Dmapreduce.job.reduce.shuffle.consumer.plugin.class=org.apache.hadoop.mapreduce.task.reduce.CelebornShuffleConsumer
 ```
 
-
 ### Best Practice
 If you want to set up a production-ready Celeborn cluster, your cluster should have at least 3 masters and at least 4 workers.
 Masters and works can be deployed on the same node but should not deploy multiple masters or workers on the same node.
@@ -371,7 +368,7 @@ Contact us through the following mailing list.
 
 ### Report Issues or Submit Pull Request
 
-If you meet any questions, feel free to file a 🔗[Jira Ticket](https://issues.apache.org/jira/projects/CELEBORN/issues) or connect us and fix it by submitting a 🔗[Pull Request](https://github.com/apache/incubator-celeborn/pulls).
+If you meet any questions, feel free to file a 🔗[Jira Ticket](https://issues.apache.org/jira/projects/CELEBORN/issues) or connect us and fix it by submitting a 🔗[Pull Request](https://github.com/apache/celeborn/pulls).
 
 | IM       | Contact Info                                                                                                                              | 
 |:---------|:------------------------------------------------------------------------------------------------------------------------------------------|

diff --git a/build/make-distribution.sh b/build/make-distribution.sh
@@ -390,7 +390,6 @@ cp "$PROJECT_DIR/docker/Dockerfile" "$DIST_DIR/docker"
 cp -r "$PROJECT_DIR/charts" "$DIST_DIR"
 
 # Copy license files
-cp "$PROJECT_DIR/DISCLAIMER" "$DIST_DIR/DISCLAIMER"
 if [[ -f $"$PROJECT_DIR/LICENSE-binary" ]]; then
   cp "$PROJECT_DIR/LICENSE-binary" "$DIST_DIR/LICENSE"
   cp -r "$PROJECT_DIR/licenses-binary" "$DIST_DIR/licenses"

diff --git a/build/release/release.sh b/build/release/release.sh
@@ -56,8 +56,8 @@ fi
 
 RELEASE_TAG="v${RELEASE_VERSION}-rc${RELEASE_RC_NO}"
 
-SVN_STAGING_REPO="https://dist.apache.org/repos/dist/dev/incubator/celeborn"
-SVN_RELEASE_REPO="https://dist.apache.org/repos/dist/release/incubator/celeborn"
+SVN_STAGING_REPO="https://dist.apache.org/repos/dist/dev/celeborn"
+SVN_RELEASE_REPO="https://dist.apache.org/repos/dist/release/celeborn"
 
 RELEASE_DIR="${PROJECT_DIR}/tmp"
 SVN_STAGING_DIR="${PROJECT_DIR}/tmp/svn-dev"

diff --git a/client-flink/flink-1.14-shaded/src/main/resources/META-INF/NOTICE b/client-flink/flink-1.14-shaded/src/main/resources/META-INF/NOTICE
@@ -1,5 +1,5 @@
 
-Apache Celeborn (Incubating)
+Apache Celeborn
 Copyright 2022-2024 The Apache Software Foundation.
 
 This product includes software developed at

diff --git a/client-flink/flink-1.15-shaded/src/main/resources/META-INF/NOTICE b/client-flink/flink-1.15-shaded/src/main/resources/META-INF/NOTICE
@@ -1,5 +1,5 @@
 
-Apache Celeborn (Incubating)
+Apache Celeborn
 Copyright 2022-2024 The Apache Software Foundation.
 
 This product includes software developed at

diff --git a/client-flink/flink-1.17-shaded/src/main/resources/META-INF/NOTICE b/client-flink/flink-1.17-shaded/src/main/resources/META-INF/NOTICE
@@ -1,5 +1,5 @@
 
-Apache Celeborn (Incubating)
+Apache Celeborn
 Copyright 2022-2024 The Apache Software Foundation.
 
 This product includes software developed at

diff --git a/client-flink/flink-1.18-shaded/src/main/resources/META-INF/NOTICE b/client-flink/flink-1.18-shaded/src/main/resources/META-INF/NOTICE
@@ -1,5 +1,5 @@
 
-Apache Celeborn (Incubating)
+Apache Celeborn
 Copyright 2022-2024 The Apache Software Foundation.
 
 This product includes software developed at

diff --git a/client-mr/mr-shaded/src/main/resources/META-INF/NOTICE b/client-mr/mr-shaded/src/main/resources/META-INF/NOTICE
@@ -1,5 +1,5 @@
 
-Apache Celeborn (Incubating)
+Apache Celeborn
 Copyright 2022-2024 The Apache Software Foundation.
 
 This product includes software developed at

diff --git a/client-spark/spark-2-shaded/src/main/resources/META-INF/NOTICE b/client-spark/spark-2-shaded/src/main/resources/META-INF/NOTICE
@@ -1,5 +1,5 @@
 
-Apache Celeborn (Incubating)
+Apache Celeborn
 Copyright 2022-2024 The Apache Software Foundation.
 
 This product includes software developed at

diff --git a/client-spark/spark-3-shaded/src/main/resources/META-INF/NOTICE b/client-spark/spark-3-shaded/src/main/resources/META-INF/NOTICE
@@ -1,5 +1,5 @@
 
-Apache Celeborn (Incubating)
+Apache Celeborn
 Copyright 2022-2024 The Apache Software Foundation.
 
 This product includes software developed at

diff --git a/dev/merge_pr.py b/dev/merge_pr.py
@@ -64,8 +64,8 @@
 GITHUB_OAUTH_KEY = os.environ.get("GITHUB_OAUTH_KEY")
 
 
-GITHUB_BASE = "https://github.com/apache/incubator-celeborn/pull"
-GITHUB_API_BASE = "https://api.github.com/repos/apache/incubator-celeborn"
+GITHUB_BASE = "https://github.com/apache/celeborn/pull"
+GITHUB_API_BASE = "https://api.github.com/repos/apache/celeborn"
 JIRA_BASE = "https://issues.apache.org/jira/browse"
 JIRA_API_BASE = "https://issues.apache.org/jira"
 # Prefix added to temporary branches

diff --git a/docs/README.md b/docs/README.md
@@ -20,11 +20,11 @@ license: |
 ---
 Quick Start
 ===
-This documentation gives a quick start guide for running Apache Spark/Flink with Apache Celeborn™(Incubating).
+This documentation gives a quick start guide for running Spark/Flink/MapReduce with Apache Celeborn™.
 
 ### Download Celeborn
 Download the latest Celeborn binary from the [Downloading Page](https://celeborn.apache.org/download/).
-Decompress the binary and set `$CELEBORN_HOME`
+Decompress the binary and set `$CELEBORN_HOME`.
 ```shell
 tar -C <DST_DIR> -zxvf apache-celeborn-<VERSION>-bin.tgz
 export CELEBORN_HOME=<Decompressed path>
@@ -37,7 +37,7 @@ cd $CELEBORN_HOME/conf
 cp log4j2.xml.template log4j2.xml
 ```
 #### Configure Storage
-Configure the directory to store shuffle data, for example `$CELEBORN_HOME/shuffle`
+Configure the directory to store shuffle data, for example `$CELEBORN_HOME/shuffle`.
 ```shell
 cd $CELEBORN_HOME/conf
 echo "celeborn.worker.storage.dirs=$CELEBORN_HOME/shuffle" > celeborn-defaults.conf
@@ -154,11 +154,15 @@ INFO [async-reply] Controller: CommitFiles for local-1690000152711-0 success wit
 ```
 
 ## Start MapReduce With Celeborn
-### Add Celeborn client jar to MapReduce's classpath
-1.Add $CELEBORN_HOME/mr/*.jar to `mapreduce.application.classpath` and `yarn.application.classpath`.
-2.Restart your yarn cluster.
-### Add Celeborn configurations to MapReduce's conf
-Modify `${HADOOP_CONF_DIR}/yarn-site.xml`
+### Copy Celeborn Client to MapReduce's classpath
+1. Copy `$CELEBORN_HOME/mr/*.jar` into `mapreduce.application.classpath` and `yarn.application.classpath`.
+```shell
+cp $CELEBORN_HOME/mr/<Celeborn Client Jar> <mapreduce.application.classpath>
+cp $CELEBORN_HOME/mr/<Celeborn Client Jar> <yarn.application.classpath>
+```
+2. Restart your yarn cluster.
+### Add Celeborn configuration to MapReduce's conf
+- Modify configurations in `${HADOOP_CONF_DIR}/yarn-site.xml`.
 ```xml
 <configuration>
     <property>
@@ -173,7 +177,7 @@ Modify `${HADOOP_CONF_DIR}/yarn-site.xml`
     </property>
 </configuration>
 ```
-Modify `${HADOOP_CONF_DIR}/mapred-site.xml`
+- Modify configurations in `${HADOOP_CONF_DIR}/mapred-site.xml`.
 ```xml
 <configuration>
     <property>
@@ -195,10 +199,11 @@ Modify `${HADOOP_CONF_DIR}/mapred-site.xml`
     </property>
 </configuration>
 ```
-Then you can run a word count to check whether your configs are correct.
+Then deploy the example word count to the running cluster for verifying whether above configurations are correct.
 ```shell
 cd $HADOOP_HOME
-hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-3.2.1.jar wordcount /sometext /someoutput
+
+./bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-3.2.1.jar wordcount /someinput /someoutput
 ```
 During the MapReduce Job, you should see the following message in Celeborn Master's log:
 ```log

diff --git a/docs/developers/glutensupport.md b/docs/developers/glutensupport.md
@@ -19,9 +19,9 @@ license: |
 # Gluten Support
 ## Velox Backend
 
-[Gluten](https://github.com/oap-project/gluten) with velox backend supports Celeborn as remote shuffle service. Below introduction is used to enable this feature
+[Gluten](https://github.com/apache/incubator-gluten) with velox backend supports Celeborn as remote shuffle service. Below introduction is used to enable this feature.
 
-First refer to this URL(https://github.com/oap-project/gluten/blob/main/docs/get-started/Velox.md) to build Gluten with velox backend.
+First refer to [Get Started With Velox](https://github.com/apache/incubator-gluten/blob/main/docs/get-started/Velox.md) to build Gluten with velox backend.
 
 When compiling the Gluten Java module, it's required to enable `rss` profile, as follows:
 
@@ -31,18 +31,18 @@ mvn clean package -Pbackends-velox -Pspark-3.3 -Prss -DskipTests
 
 Then add the Gluten and Spark Celeborn Client packages to your Spark application's classpath(usually add them into `$SPARK_HOME/jars`).
 
-- Celeborn: celeborn-client-spark-3-shaded_2.12-0.3.0-incubating.jar
-- Gluten: gluten-velox-bundle-spark3.x_2.12-xx-xx-SNAPSHOT.jar, gluten-thirdparty-lib-xx.jar
+- Celeborn: `celeborn-client-spark-3-shaded_2.12-[celebornVersion].jar`
+- Gluten: `gluten-velox-bundle-spark3.x_2.12-xx-xx-SNAPSHOT.jar`, `gluten-thirdparty-lib-xx.jar`
 
-Currently to use Gluten following configurations are required in `spark-defaults.conf`
+Currently, to use Gluten following configurations are required in `spark-defaults.conf`.
 
 ```
 spark.shuffle.manager org.apache.spark.shuffle.gluten.celeborn.CelebornShuffleManager
 
 # celeborn master
 spark.celeborn.master.endpoints clb-master:9097
 
-# we recommend set spark.celeborn.push.replicate.enabled to true to enable server-side data replication
+# we recommend set `spark.celeborn.push.replicate.enabled` to true to enable server-side data replication
 # If you have only one worker, this setting must be false 
 spark.celeborn.client.push.replicate.enabled true
 
@@ -52,7 +52,7 @@ spark.shuffle.service.enabled false
 spark.sql.adaptive.localShuffleReader.enabled false
 
 # If you want to use dynamic resource allocation,
-# please refer to this URL (https://github.com/apache/incubator-celeborn/tree/main/assets/spark-patch) to apply the patch into your own Spark.
+# please refer to this URL (https://github.com/apache/celeborn/tree/main/assets/spark-patch) to apply the patch into your own Spark.
 spark.dynamicAllocation.enabled false
 ```