A sample, which comes from monitored data reports from some environmental management services, is first classified by ideal near gray function cluster analysis. Then, the sample level is determined by gray correlation analysis, and comprehensive assessment conclusions are drawn based on the degree of correlation between the sample classification and the levels specified in *GB3095-2012*.

### Classification of the sample to be evaluated

#### Establishment of the evaluation index sequence matrix for the selected sample

Let *S* be a sequence of grouping objects, that is, *S* = {*s*_{1}, *s*_{2}…, *s*_{m}}; *X*is a sequence of variables influencing the air, i.e. *X*= {*X*_{1}, *X*_{2}…, *X*_{m}}; *X*_{I am} are the original surveillance data for *s*_{I} (*I* = 1, 2…, *m* ) and *X*_{k} (*k* = 1, 2…, *m* ); *I* and *m* represent the number of objects considered in clustering; *k* and *m*are the number of influence indices that are the pollutants mentioned above. As a result, the following matrix can be established (Eq. 1).

$$ S = begin {array} {* {20} c} {s_ {1}} {s_ {2}} ldots {s_ {m}} end {array} left[ {begin{array}{*{20}c} {x_{11} } & {x_{12} } & ldots & {x_{1n} } {x_{21} } & {x_{22} } & ldots & {x_{2n} } ldots & ldots & ldots & ldots {x_{m1} } & {x_{m2} } & ldots & {x_{mn} } end{array} } right] $$

(1)

#### Establish the matrix of clusters of close gray functions with ideal value

Let *X*_{0}= {*X*_{01}, *X*_{02} …, *X*_{0m}} be the sequence of ideal values corresponding to each influencing index. The principle for determining the ideal value is as follows (Eqs. 2, 3, 4).

The first situation: The higher the influence index (*X*_{k}) is, the better the air quality; in this case, the ideal value

$$ x_ {0k} = max left {{x_ {ik}, i = 1,2, ldots, m} right }, k = 1,2, ldots, n. $$

(2)

The second situation: the smaller the influence index (*X*_{k}) is, the better the air quality; in this case, the ideal value

$$ x_ {0k} = min left {{x_ {ik}, i = 1,2, ldots, m} right }, k = 1,2, ldots, n. $$

(3)

Third, air quality is better when the influence index (*X*_{k}) displays a moderate value and the ideal value is

$$ x_ {0k} = { text {M}}. $$

(4)

According to the ideal value *X*_{0k} (Eqs. 2, 3 or Eq. 4) and the monitored original data (*X*_{I am}), the value of the gray closing function *Yes*_{I am} is calculated using (Eq. 5).

$$ y_ {ik} = frac {{x_ {ok}}} {{x_ {ik}}} ; left ({i = 1,2, ldots, m; k = 1,2, ldots , n} right) $$

(5)

or *X*_{I am} are the original controlled data and *X*_{0k} is the ideal value corresponding to the k-th influencing index. In addition, the value of the function *Yes*_{I am} is dimensionless, and *Yes*_{I am} ?? [0,1]. *Yes*_{I am} denotes the degree of correlation of *s*_{I} and *s*_{0}for the k-th index. Specifically, the larger *Yes*_{I am} is, the closest *s*_{I} is at the ideal value *s*_{0}, and the smallest *Yes*_{I am} is, the furthest *s*_{I} is of *s*_{0}.

Thus, the following near gray matrix *Yes* can be established (Eq. 6).

$$ Y = left[ {begin{array}{*{20}c} {y_{11} } & {y_{12} } & ldots & {y_{1n} } {y_{21} } & {y_{22} } & ldots & {y_{2n} } begin{gathered} ldots hfill y_{m1} hfill end{gathered} & begin{gathered} ldots hfill y_{m2} hfill end{gathered} & begin{gathered} ldots hfill ldots hfill end{gathered} & begin{gathered} ldots hfill y_{mn} hfill end{gathered} {y_{01} } & {y_{02} } & {…} & {y_{0n} } end{array} } right] $$

(6)

In that case, *Yes* is the value of the closing function in gray. What’s more, (*Yes*_{01}, *Yes*_{02}…, *Yes*_{0m}) = (1.1…, 1)_{1 ×m} is the ideal sequence, and the largest *Yes*_{I am} it’s better *s*_{I} is; the biggest *Yes*_{I am} is equal to 1.

#### Classification of the sample to be evaluated

Since the influence of each influence index is different, the weight of each influence index must be taken into account. Let *P*_{I} be the global analysis value of *s*_{I}. *P*_{I} can be expressed as follows (Eq. 7)

$$ P_ {i} = sum limits_ {k = 1} ^ {n} {Wy_ {ik}} left ({i = 1,2 ldots, m} right) $$

(7)

or *W* is the weight of each influence index, and since the number of indexes is *k*, number of *W* values is also* k*(*W*_{1}, *W*_{2}…, *W*_{k}). Corresponding, the following equation can be established (Eq. 8).

$$ W_ {k} = frac {{ sum limits_ {i = 1} ^ {m} {X _ {{i { text {k}}}}}}} {{ sum limits_ {i = 1} ^ {m} { sum limits_ {k = 1} ^ {n} {X_ {ik}}}}} ; left ({k = 1,2 ldots, n} right) $$

(8)

Based on the actual value of the full scan *P*_{I}, *P*_{j}= (*P*_{1}, *P*_{2}…, *P*_{m})^{T}. The following equation (Eq. 9) can be used to calculate the gray closure value *P*_{I} of *P*_{I} in relationship with *P*_{j}.

$$ P_ {ij} = frac {{ min (p_ {i}, p_ {j})}} {{ max (p_ {i}, p_ {j})}} ; left ({i , j = 1,2 ldots, m} right) $$

(9)

Then,

$$ P = left ({P_ {ij}} right) _ {m times m}. $$

(ten)

Yes *P* (Eq. 10) satisfies the following three conditions: (1) reflexivity, where *P*_{I}= 1 (*I* = *j*); (2) symmetry, where *P*_{I}= *P*_{I am}; and (3) normativity, where*P*_{I} ?? [0,1], we can select the appropriate threshold value from the *P*matrix, intercept branches with weight values less than, which is the similarity coefficient^{4.5}, and establish the classification (S_ {t} ^ { prime} ) ( *t*= 1, 2…, *vs*) when the level λ meets the relevant requirement. (S_ {t} ^ { prime} ) represents each classification of air in a given region. The following equations (Eqs. 11, 12) can be established.

$$ S_ {t} ^ { prime} = left ({S_ {1} ^ { prime}, S_ {2} ^ { prime} ldots, S_ {c} ^ { prime}} right ) ^ {{ text {T}}} $$

(11)

$$ S_ {tk} ^ { prime} = left ({S_ {t1} ^ { prime}, S_ {t2} ^ { prime} ldots, S_ {tn} ^ { prime}} right ) $$

(12)

or (S_ {t} ^ { prime} ) is the t-th classification, (S_ {tk} ^ { prime} ) is the kth index of the tth classification, *t*is the number of classifications, and*k*is the number of influence indices.

(S_ {tk} ^ { prime} ) can be expressed in the following matrix form (Eq. 13).

$$ S_ {tk} ^ { prime} = left[ {begin{array}{*{20}c} {s_{11}^{prime } } & {s_{12}^{prime } } & ldots & {s_{1n}^{prime } } {s_{21}^{prime } } & {s_{22}^{prime } } & ldots & {s_{2n}^{prime } } ldots & ldots & ldots & ldots {s_{cc}^{prime } } & {s_{c2}^{prime } } & ldots & {s_{cn}^{prime } } end{array} } right] $$

(13)

### Analysis of the degree of correlation of the sample to be evaluated

Let (S_ {t} ^ { prime} ) be the sample to be evaluated, and let *X*= ( *X*_{1}, *X*_{2}…, *X*_{m}), which is the set of influence indices mentioned above and is the evaluation index used for (S_ {t} ^ { prime} ). Let ({ text {S}} _ {0} ^ { prime} ) be the air quality classification indicated in the*GB3095-2012*. Then the equation for the correlation coefficient is as follows (Eq. 14)^{14}.

$$ zeta_ {t} (k) = frac {{ mathop { min} limits_ {t in c} mathop { min} limits_ {k in n} left | {S_ {t} ^ { prime} (k) – { text {S}} _ {0} ^ { prime} (k)} right | + epsilon mathop { max} limits_ {t in c} mathop { max} limits_ {k in n} left | {S_ {t} ^ { prime} (k) – { text {S}} _ {0} ^ { prime} (k)} right |}} {{ left | {S_ {t} ^ { prime} (k) – { text {S}} _ {0} ^ { prime} (k)} right | + epsilon mathop { max} limits_ {t in c} mathop { max} limits_ {k in n} left | {S_ {t} ^ { prime} (k) – { text {S}} _ {0} ^ { prime} (k)} right |}} $$

(14)

or*??*_{t} ( *k*) is the correlation coefficient and is the resolution coefficient, with a general value of 0.5^{4.5}.

In addition, the degree of correlation ( *R*_{t}) the equation is as follows (Eq. 15).

$$ R_ {t} = frac {1} {n} sum limits_ {k = 1} ^ {n} { zeta_ {t}} (k) $$

(15)

The value of*R*_{t} is calculated using (Eq. 15). The maximum value of*R*_{t} indicates that the sample to be evaluated has the highest degree of correlation with the level of air quality considered. Therefore, the sample is classified accordingly.