Software Reliability and Content Relevance as SystemsТ Potential Reliability

Software Reliability and Content Relevance as SystemsТ Potential Reliability

аааааааааааааааааааааааааааааааааааааааааааааа Yuri Arkhipkin

аааааааааааааааааааааааааааааааааааааааааааааа aryur@yandex.ru

Abstract. This paper argues for the unified quantitative approach to software reliability and content relevance validated by the systemsТ potential reliability law.

Presented are the estimations of minimum needed software test coverage to assure the achievement of Enough and Six Sigma software reliability at minimum and maximum fault flow throughout the test process. These estimations are based on the evaluation results yielded by the Agile Software Reliability Monitoring Model (ASRMM), approaching reliability problem in software engineering. This model integrates qualitative subject matter data and quantitative software reliability metrics. The ASRMM is viewed as a foundation of the Testerbot software reliability engineering that provides continuous reliability evaluations throughout software development process, thus accounting and tracing quantitative software reliability requirements from customer to product.

Also presented are the estimations of minimum needed content semantic coverage to assure the achievement of Any Sigma content relevance for a given example content. These estimations are based on the evaluation results yielded by the Content Relevance Quantification Model (CRQM), approaching the relevance problem in content engineering. This model integrates qualitative subject matter data and quantitative content relevance metrics, providing continuous relevance evaluations throughout content engineering process, thus, making possible to trace content relevance requirements from customer to product. The CRQM is viewed as a foundation of the Sequantic content relevance engineering.

1. Introduction

ааааааааааааааааааааааа Much research was done on models approaching software reliability quantification. The results seem to be of poor satisfaction despite of the increasing number of these models. The lack of explicit evaluations of software elementsТ failure probability may be considered as one of the main problems in software reliability quantification. Software elementsТ failure data is yielded while testing (executing) software for a vast field of subject applications.а This data may be considered as an ad hoc data much depending on developerТs skill and software testing skill in particular. Software testing in general may be considered as a trial failure process of sensitizing software elements (sites) to define whether yielded results are true or faulty.

Much research was also done on models approaching content relevance quantification. The results seem to be of not enough satisfaction despite of the increasing number of these models. ContentТs grammar variety may be considered as one of the main problems in content relevance quantification. Content irrelevance (failure) data is yielded due to query occurrence (generation) through content searching, thus providing data for queryТs terms refinements and (or) search engineТs improvements. So this data generally may be considered as an ad hoc data much depending on search engineТs developer skill and query generation (testing) skill in particular. аContent searching in general may be considered as a trial failure process of sensitizing content terms to define whether this content is relevant to the terms of the query or not.

This paper offers to break through the quantification problems of software reliability and content relevance engineering by approaching any digital content as a trial failure system regardless of its grammar.

Chapter 2 introduces briefly some mathematics of the systemsТ potential reliability law proved by B. S. Fleishman [1]. This law validates the systemsТ failure intensity quantitative ranges depending on known potential operating elementsТ number as a part of total system elementsТ number and their mean operating probability.

Chapter 3 presents the quantitative approach to the software reliability engineering validated by systemsТ potential reliability law. Software siteТs operating probability is considered to be equal to the siteТs probability occurrence or potential occurrence. This probability may be evaluated at any cycle of the software development. Developer in general needs no external statistic data to monitor the achieved quantitative reliability level of the software project.

Chapter 4 presents the quantitative approach to the content relevance engineering validated by systemsТ potential reliability law. Content elementТs operating (sensitizing sense) probability is considered to be equal to the termТs content frequency. This frequency may be evaluated at any cycle of the content development. Developer in general needs no external statistic data to index the quantitative relevance level of the content project.

The pragmatics of the presented approach may be defined by its validity and verifiability that may be explicitly quantified for the vast field of subject applications in the system engineering.

2. Potential Reliability of a Trial Failure System

ааааааааааааааааааааааа At every time moment the systemТs elements belong to either operating or failure state. Moreover the conversion occurs instantly from operating to failure state, while the reverse conversions are impossible.

ааааааааааааааааааааааа It is natural in general to consider a system as an operating one at the given moment, if there exist at least some operating elements comprising a before stated minimal part of total system elementsТ number. Many uncontrollable causes, influencing elementsТ failures, make it possible to consider the failuresТ occurrence as random events.

ааааааааааааааааааааааа Let the system A_R at given moment t consists of n elements {a₁,Е, a_v,Е, a_n} with arbitrary interactions. Any element is associated with two mutually exclusive events A₁^v and A₀^v. The event A₁^v is associated with operating element a_v and the event A₀^v is associated with its failure. Let the probabilities of the events A₁^v and A₀^v are equal to p_v and 1- p_v correspondingly.

ааааааааааааааааааааааа Consider the set R_n of all possible 2ⁿ states r = (i₁,Е,i_v,Е,i_n) of the system A_R. This set depicts the operating and failure states of the system A_R (i_v =1 if a_v is in the state A₁^v and i_v =0 if a_v is in the state A₀^v).

ааааааааааааааааааааааа Let us divide the set R_n into two parts E₁ and E₀= R_n\E₁. The set E₁ is an operating set of the system A_R and E₀ is a failure set of the system A_R. Consider by the definition that the system A_R operates at the given moment only if r $\in$ аE₁.

ааааааааааааааааааааааа It is considered that the systemТs state r isа a sequence of independent trials with outcomes probabilities p_v=P(i_v=1), 1-p_v=P(i_v=0) (v=1, 2, Е, n) ofа every v-th trial. Then the probability Pv of the system A_R to operate at the given moment may be defined [1] as:

аааааааааааааааааааааааааааааааааааааааааааааа аааааа ааа_n

ааааааааааааааааааааааа P_v=P(r $\in$ аE₁)=а S ааP p_v^iv ∙ (1- p_v )^1-
iv .ааааааааааааааааааааааааааааааааааааааааааааааааааааа (2.1)

^{ааааааааааааааааааааааааааааааr}^а^E1_а^v⁼¹

ааааааааааааааааааааааа Consider the systems comprised of n elements, operating set E₁ of which consists of states, each including more than s operating elements. So the set E₁ includes all systemТs states r =(i₁,Е,i_v,Е,i_n) for which аsum(i_v)>s (v=1, 2,Е, n). Such systems are named as symmetric of s-th degree systems and formula (2.1) appears as:

ааааааааааааааааааааааа аааааа аа_n

ааааааааааа P_v=аа Sаааа аP p_v^iv ∙ (1- p_v )¹^{- iv} .аааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааа (2.2)

ааааа ^sum⁽^iv)^>^sа ^v=1

To define the possibility to operate for the symmetric system of s-th degree, it is necessary to study the asymptotic behavior of (2.2) at n g УinfinityФ.

Restricting the study by operating systems with large but constant elementsТ number n, it is said that these systems have instant operating probability P_v(t) at the given moment t. The probability R(t) that the system A_R аwill operate until some moment t (inclusive) depends on whether the system will operate at all moments t (t=1,Е,t). The sequence of independent trials schema with operating probability P_v(t) provides the reliability R(t) defined as [1]:

_а
t

аR(t) = аP P_v(t)аааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааа (2.3)

^t⁼¹

and further

ааааааааааааааааааааа _{ааааааааааааtааааааааааааааааааа ааааааааааааааааа аааааааааааа ааааt}

1 - S [1- P_v(t)] £ R(t) £ exp(- S [1- P_v(t)] ).ааааааааааааааааааааааааааааааааааааааааааааа (2.4)

^{ааааааааааа ааааааааааааааааааааааааааа}^t^{=1аааааааааааааааааааааааааа ааааааааааааааааааааааааа}^t⁼¹

Taking into consideration that P_v(t)g 1 with n(t) increase, there are possible different extreme values ofа R(t) with t increase. To refine this point, consider the ideal system A_R with postulated features as follows [1]:

1.а Operating capability. The system is capable to operate at any t time moment.ааа However if it fails at given time moment, then nothing can bring it into the operating state.

2.а Unlimited extension. If the system is operating at given t time moment, then at the next moment t+1 it may be enhanced by any number of elements. One time unit is a conditional one for a given system.

3. Physical restriction of reaction time. The system is got aware about its state only at the next t+1 time moment.

4. Math restrictions. The symmetric system with independent success and failure trial results at every given t time moment is under consideration.

ааааааааааа The reliability R(t) limit of the symmetric system of s-th degree with n=const elements that are pair wise independent and uniformly distributed is defined by equation

R(t) = exp(-l∙t),аааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааа (2.5)

where l is a system failure intensity measured in faults per element and is evaluated according to the systemsТ potential reliability law (see math proof in [1]) as follows:

-ln[1-exp(-k_L∙n) +O(ln n)] £ l £ -ln[1-exp(-k_U∙n)],а ааааааааааааааааааааааа ааааааааааа (2.6)

where

k_L=c∙ln(c/p_M) + (1-c)∙ln ((1-c)/(1-p_M));ааааааааа аааааааааааааааааааааааааааааааааа ааааааааааа (2.7)

k_U=c∙ln(c/p_S) + (1-c)∙ln ((1-c)/(1-p_S));аааааааааа аааааааааааааааааааааааааааааааааа ааааааааааа (2.8)

c = s/n;

p_M=p_L/(1+p_L-p_U);

p_L £ p_v £ p_U < 0.5;

p_L=min(p_v)а (v=1,2,Е,n);

p_U=max(p_v)а (v=1,2,Е,n);

_{а аааааааааааааааааааааааааааа ааааn}

ааа ааааааааааааааааааа аp_S=1/n∙ S p_v.ааааааааааа ааааааааааааааааааааааа аааааааааааааааааааааааааааааааааааааааааааааааааааааааааа ааааааааааа (2.9)

^{ааааааааааааааааааааааааааааааа ааv=1}

The postulated features above fit a system approach to software development process including testing and debugging in particular. The systemsТ potential reliability law application to quantify software reliability seems to be a fruitful enough approach.

3. Enough Sigma Software Test Coverage Approach

Software reliability is understood as a probability of failure free software operation in defined environment for a specified period of time. In general a failure is a deviation of operation results from customer requirements. The deviation is defined by correspondence between algorithmТs specification and its software implementation. Quality of algorithmТs subject matter specification influences software reliability throughout the development process and lifecycle of software product. To achieve continuous improvement of software engineering process, reliability requirements must be defined in an integrated manner for prediction, evaluation, validation, verification, and certification at specification, coding, testing, maintenance, and correction cycles. Thus reliability monitoring needs to be implemented as an online automated process throughout the software life cycle.

At the beginning we know nothing in general about algorithm to be implemented, but some ideas concerning input data and results. Consider algorithmТs specification based on input data formal definition. Usefulness of a software reliability model depends on the definition method of input data to be tested for exhaustive fault detection. Input data may be considered as a set of requests (sites), so their total number and variety are sufficient for reliability evaluation. Sites are viewed as structural and(or) functional software elements including subject matter data, input variables, decisions, restrictions, memory structures and the like. All sites are potentially fault inherent and may cause a failure. Software input data set may be viewed as a site set. Software site is somewhat like a software path.

аLet the site set structure defines total sitesТ number n(t) at different t time of lifecycle and occurrence probability p_v(t) (v=1,2,Е,n(t)) of v-th site to be processed, that defines failure probability 1-p_v(t) while processing this site. Failure probability is greater for the sites that may occur more rarely, because it is more difficult in general to sensitize faults by such exotic sites while testing.

It is mostly improbable that all n(t) sites will yield specifiedа results because it is impossible to implement software without faults. But even if all n(t) sites will yield specified results, this fact is undetectable because in this case we need all n(t) sites to be tested. Practically it is impossible because of great values of number n(t) for almost any software product. Not all sites are to be processed even throughout software lifecycle. To yield specified results at the required reliability level, in practice, it is enough for software product to have only s(t) number of assured fault free sites. The number s(t)а of potentially faulty (sensitive) sites, sensitized throughout testing by time t, defines a software test coverage as c(t) = s(t)/n(t) (0<c(t)<1) and is a known parameter of software reliability models.

It is natural in general to consider a software to be operating at a given t time moment, if there exists at least some fixed, before stated, minimal part c(t) of operating (fault free) sites ofа total sitesТ number n(t).

The features (see Chapter 2) provide insight into the test process in general, including fault correction and regressive testing procedures, thus refining software lifecycle process according to the postulated features above.

Any software site may sensitize fault(s) and is a potentially fault inherent one.

Consider a software as a system comprised of n(t) sites (elements), operating set of this system consists of states, each including more than s(t) operatingа sites. Consider the sites are pair wise independent and uniformly distributed over all possible sitesТ number n(t)=const. Software reliability R(t) is evaluated by known equation (2.5) as

R(t) = exp(-l(t)∙t),ааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааа (3.1)

where l(t) is a software failure intensity measured in faults per site and is evaluated (see formulas (2.6),Е, (2.9) and math proof in [1]) as follows:

-ln[1-exp(-k_L(t)∙n(t)) +O(ln n(t))] £ l(t) £ -ln[1-exp(-k_U(t)∙n(t))],ааааааааааа (3.2)

аwhere

k_L(t)=c(t)∙ln(c(t)/p_M(t)) + (1-c(t))∙ln ((1-c(t))/(1-p_M(t)));аааааааааааааааааааааааааа (3.3)

k_U(t)=c(t)∙ln(c(t)/p_S(t)) + (1-c(t))∙ln ((1-c(t))/(1-p_S(t)));аааааааааааааааааааааааааааа (3.4)

c(t) = s(t)/n(t), а(0£c(t)£1);

p_M(t)=p_L(t)/(1+p_L(t)-p_U(t));

p_L(t) £ p_v(t) £ p_U(t) < 0.5;

p_L(t)=min(p_v(t))а (v=1,2,Е,n(t));

p_U(t)=max(p_v(t))а (v=1,2,Е,n(t));

_{аааааааааааааааааа ааааааааааа ааааааааааааааааn(}_t₎

ааа ааааааааааааааааааа аp_S(t)=1/n(t)∙ S p_v(t).ааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааа (3.5)

^{ааааааааааааа аааааааааааааааа ааааааааааааааааv=1}

We consider O(ln n(t)) to have some constant value that must be defined at large values of n(t), thus minimum and maximum failure intensity according (3.2), (3.3) and (3.4) correspondingly are as follows:

l_min(t) = - ln[1-exp(-k_L(t)∙n(t))+O(ln n(t))]аааааааааааааааааааааааааааааааааааааааааааааааа (3.6)

l_max(t) = -ln[1-exp(-k_U(t)∙n(t))].ааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааа (3.7)

The Agile Software Reliability Monitoring Model (ASRMM) considers a software as a system, comprised of n(t) sites. Operating set of this system consists of states, each including more than s(t) operatingа sites. Software test process is defined in general according to the postulated features (Chapter 2), including fault correction and regressive testing, thus refining software lifecycle process as for achieving and supporting reliability requirements according to the math interrelations (3.1), (3.4), (3.7) of аtotal sitesТ number n(t), test coverage c(t), mean siteТs occurrence probability p_s(t), and failure intensity l(t). Figure 1 displays these interrelations. Time 0£t£1 flow in the ASRMM is a math one and the t-unit is based on every site r_v(v=1,2,Е,n(t)) sensitized while testing.

To define initial mean occurrence probability p_S(t), imagine a software site r_v(v=1,Е,n(t)) as a set of W(t) pair wise independent parameters x_i(i=1,Е,W(t)). Each parameter may accept value(s) x_ij(j=1,Е,J_i(t)) with s_ij(t) valuesТ number of each. These parameters define semantically sufficient parameterТs values that are sensitive as for yielding specified results, or, otherwise, values that may sensitize faults in software under test. These values are supposed to yield semantically typical results. Software sitesТ set structure may be viewed as a software sensitive sitesТ semantics matrix (SSSSM).

Semantics of the parameter values is defined by subject matter and the particularities of the algorithmТs implementation. There may be defined a number J_i(t) of semantic types having s_ij(t) (s_i(t)³s_ij(t)³1) valuesТ number of each type. So each

_{ааааааа аааааааааааааааааааааааааааJi(}_t₎

parameter x_i has a number s_i(t)=Ss_ij(t)а ofа values, thus total

^{аааааааа ааааааааааааааааааааааааааj=1}

_{аааааа ааааW(}_t_{)аааа ааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааа ааааааааааааааа аW(}_t₎

sitesТ number is n(t)=P s_i(t) and the number of sensitive sites is s₀(t) =P J_i(t) for a given

^{аааааааааааааааааааааааааааааааааааааааааа i=1аааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааа
аi=1}

аproject. Then at the specification cycle t=0, according (3.5), we have p_S(0)=1/s₀(0). LetТs name p_S(0) as an initial semantic mean of аa software project under development. Values, denoted by t (0£t£1) time (one unit of t time is t_s =1/ n(t)), vary throughout the development process due to refinements brought by customers, programmers, testers, developers, users, and the like. These refinements lead to the changes of total sitesТ number n(t) and sensitive sitesТ number s₀(t) because ofа detected faults, thus changing semantic mean p_S(t) value.

While testing, the test coverage value c(t) increases so the difference

c(t) - p_S(0)< 0 о 0 andааааааа

аааааааааа when (see Figure 1)

аааааааааааааааааааааа c(t) - p_S(0) = 0,аааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааа (3.8)

according (3.4), we haveа k_U(t) = 0а that is a starting point of failure intensity (3.7) value decreasing and software reliability (3.1) growth (see Figure 1). When the sensitive sitesТ number s₀(0) is not defined, then according (3.5) and (3.8) we have s₀(0)/n(0)=1/ s₀(0) and thus s₀(0)= sqrt(n(0)).

ааааааааааа Here is some general description of the software reliability monitoring algorithm based on the ASRMM.

аSoftware site set structure is defined during specification as a SSSSM. Semantics of the siteТs parameter values is defined by software subject matter and algorithmТs implementation particularities. The SSSSM is viewed as a data base of the ASRMM engine (TESTERBOT) for generating tests, refining semantic mean and semantic shift of input data flow, thus providing monitoring of the software engineering process.

During software concept definition and input data specification at t=0, before testing, we refine s₀(0) sensitive sites selected from n(0) total sitesТ number. Sites are ranged according to their occurrence probabilities p_v(0) (v=1, 2,Е,n(0)) thus defining initial semantic mean p_s(0) of input data flow.

Potential reliability metrics, evaluated according (3.7), if compared with required ones l_rq(1), define the appropriateа test coverage value to be achieved to meet the requirements. The difference l_rq(1)> l(0) means that it is necessary to define the additional number of sensitive sites for testing to assure the achievement of required reliability. The additional sensitive sites are defined and may be selected by the TESTERBOT either in exhaustive or extreme (random or semantic extrapolation) mode of the test process.

Test process must assure the achievement of required test coverage value. While testing, fault(s) is (are) detected and corrected thus changing sensitive sitesТ number

s₀(t+t_s):=s₀(t)+1 for each fault(s) and in general changing the number of values s_i(t)³s_ij(t)³1 of the sitesТ parameters. The sensitive sites parametersТ values are either predefined or randomly selected according semantic ranges. If total sitesТ number n(t) changes then t time unit t_s is recalculated. If fault(s) is (are) not detected, the number ofа tested sites must be changed s(t+t_s):=s(t)+1 for each tested site thus increasing test coverage c(t).

Value of t time is calculated as t:=t+t_s after each tested fault free site.

Changes are calculated for n(t), s(t), s₀(t), s_i(t), J_i(t), W(t) thus refining semantic shift p_s(0) - p_s(t). If total sitesТ number n(t) is changed, then the time is to be refined t:= t+t_s = t+1/n(t) only after each tested faulty site being corrected and retested.

аReliability metric l(t<1) being continuously evaluated as (3.7) throughout testing process and compared with required one, provide a possibility of making decisions on testing process. The achieved reliability level, being continuously monitored, depicts changes either in subject matter requirements or software implementation particularities, thus providing possibilities for optimization throughout engineering process of software product development.

The ASRMM (see also [2]) as a foundation of the Testerbot engineering yields validated and verifiable reliability evaluations, integrating qualitative subject matter data and quantitative software reliability metrics. Testerbot engineering provides continuous reliability evaluations throughout software engineering process, thus accounting and tracing quantitative software reliability requirements from customer to product.

Consider the customerТs four sigma reliability requirements for the software project of the total number as a trillion (10¹²) software sites of input data setТs elements. At the beginning of the concept definition, the TESTERBOT preliminary evaluations yield the minimum needed software test coverage as 0.0000010032 or 1,003,200 tests to assure the achievement of required four sigma software reliability (0.00621 faults per site). To improve the reliability up to six sigma (2.0∙10^-9faults per site), the extra needed schedule-budget spending on testing and, if needed, on reengineering, must be increased 1.003 times. Minimum failure intensity for the given n(t) is l_min (t)=1/n(t), thus defining Enough sigma test coverage estimations to achieve maximum possible reliability for the given software project. To achieve maximum (Enough sigma) reliability, the engineering efforts must be 1.0042 fold. It seems that Six Sigma reliability may be not enough for large scaled, semantically sophisticated, and critical software projects.

During further software input data set structure refinement and specification at t=0, before testing, the TESTERBOT refines initial number s₀(0) of sensitive sites selected from the total n(0) sitesТ number. Sites are ranged by the TESTERBOT according to their occurrence probabilities, thus refining initial semantic mean p_s(0) of input data flow. So the TESTERBOT evaluations yield the refined minimum needed software test coverage as 0.0000010016 or 1,001,600 tests to assure the achievement of required four sigma software reliability. To improve the reliability up to six sigma, the extra needed schedule-budget spending on testing and, if needed, on reengineering, must be increased 1.00156 times. To achieve Enough sigma reliability, the engineering efforts must be 1.00211 fold.

The TESTERBOT application provides 0.37 % schedule-budget savings or 3720 testsТ number decrease to achieve Enough sigma reliability requirements for the given project.

аThe ASRMM results featuring the Testerbot optimization capabilities (see Figure 2), give an idea of schedule-budget-reliability estimations of the needed test coverage changes due to refinements of total software sitesТ number. These refinements are contributed to the project by customer, developers, testers, and the like throughout the software engineering process.

Testerbot engineering assures the achievement of required software test coverage value thus defining the mostly accurate semantic mean compliance with semantic target. Any detected and corrected fault decreases semantic shift. While testing, fault(s) is (are) detected and corrected thus changing initial sensitive sitesТ number s₀(0) and in general changing total sitesТ number n(0), thus refining needed test coverage c(t) to achieve required failure intensity l(t) for any sigma software reliability. Fault flow may vary from minimum, when no faults are detected throughout the test process, to maximum, when every tested site sensitizes fault(s). The ASRMM estimations of minimum needed test coverage at minimum and maximum fault flow are shown in the Figure 2. These online reliability evaluations provide a flexible decision making Testerbot tool for software product lifecycle management.

The Testerbot software reliability engineering:

depicts real software test process either in distributed or concurrent environment and may be viewed as an online TESTERBOT software development tool (PLM solution) to be applied for any test phase and strategy including extreme (agile) and exhaustive ones;

needs no empiric fault data and may be viewed as the TESTERBOT tool institutionalizing all CMMI- SW/SE Maturity levels;

provides protective features, giving impact in safety programming and data security approach;

may be viewed as a test bed or (and) a cradle for many existing or on coming software reliability models.

Each product, released under the TESTERBOT tool, may be equipped with reliability e-certificate needed for acquisition, remedial and continual improvement processes, thus suggesting the quantifiable improvement approach to system development standards.

4. Content Relevance Quantification Approachа

Any content sensitizes sense or results while processing by brain or computer. Content search aims to provide access to the results of content processing. These results are to be of enough quality to rely on, so the content search must be reliable enough to provide relevant content for processing.

Reliability of content search may be viewed as a probability of relevant content detection or coverage by applying a query. Irrelevance may be considered as a deviation of content coverage results from customer requirements. The deviation may be defined as a compliance of query and content specifications with search engine implementation. Quality of subject matter query specification influences the content search relevance.

In general we know nothing about content to be searched, but some ideas concerning queries and content searching results.

The Content Relevance Quantification Model (CRQM) considers any content as a set of n queries with queriesТ variety number or sensitive queriesТ number s £ n. Any content query q_v (v=1,2,Е,s) may sensitize sense and is a potentially sense inherent one. Input query, applied for sense detection (sensitization), may discover or recover content with some relevance to the search results.

Queries are viewed as structural and (or) functional content (document, collection, corpus) elements, including subject matter data. Let the query set structure defines queriesТ variety number s and an occurrence probability p_v (v=1,2,Е,s) of v-th query, that defines a probability of sensitizing sense in response to this query.

It is mostly improbable, that an input query includes all s queriesТ variety number specifying content under search, because of the great values of number s for almost any content. To yield specified search results at the required relevance level, in practice, it is enough for a content to have only s_v sensitive queries. The number s_v of potentially sensitive queries, sensitized throughout content search, defines contentТs semantic coverage as c_v = s_v/n and is a known parameter of search models.

It is natural to consider any content under search to be relevant enough, if there exist at least some fixed, before stated number s_v of sensitive queries as a minimal part c_v of the total content queriesТ number n.

Consider any content as a system comprised of n queries, sensitive set of this system consists of states, each including more than s_v sensitive queries. Consider queries are pair wise independent and uniformly distributed over all possible queriesТ number n=const. Content relevance R or probability of sensitizing sense is quantified according to (2.5 ) by the equations R = exp(-l), or else according to (2.6),Е, (2.9) as

аааааааааааааааааааааааааааааааааааааааааааааа

1-exp(-k_L∙n) +O(ln n) ³ R ³ 1-exp(-k_U∙n),ааа аааааааааааааааааааааааааааааааааа (4.1)

where l is a contentТs irrelevance intensity, defined by (2.6 ), Е, (2.9 ) and measured as irrelevance (probability of not sensitizing sense) per content query.

Content query set structure is defined while indexing as a content query semantics matrix (CQSM). Semantics of the queryТs parameter values is defined by content subject matter and search engineТs algorithm implementation particularities. The CQSM may be considered as a semantic quantifierТs (SEQUANTIC tool) data base for queriesТ indexing (quantification), refining contentТs semantic mean p_s, semantic coverage c_v, and semantic shift c_v-p_s (Figure 1) thus providing content relevance quantification. Content search assures the achievement of required semantic coverage value.

Consider a content structured as total queriesТ number n with semantic mean p_s=1/n. The mostly relevant query for any content is the input query yielding R_v=1, l_v=0. The mostly irrelevant input query R_vg0 in general is defined at c_v=p_s=p_v and l_vgФinfinityФ, so that semantic coverage is equal to semantic mean and occurrence probability of the query q_v. The query discovers a content with higher relevance, the greater is the inequality c_v<p_s. The query recovers a content with higher relevance, the greater is the inequality c_v>p_s(Figure 1).

Content relevance R, being continuously evaluated throughout search process and compared with required one, provides a possibility for making decisions on searching. The achieved relevance level, depicts changes either in subject matter requirements (input queries) or search engine implementation particularities, thus providing possibilities for optimization throughout content search engineering process.

The SEQUANTIC tool as a content product lifecycle management solution may be viewed as a test bed or (and) a cradle for many existing or on coming content relevance models and search engine implementations.

Each content product, released under the SEQUANTIC tool, may be equipped with relevance e-certificate needed for acquisition, remedial and continual improvement processes, thus suggesting the quantifiable improvement approach to the content development standards.

The CRQM based Sequantic engineering yields validated and verifiable relevance estimations, integrating qualitative subject matter queries and quantitative relevance metrics of content. Sequantic engineering provides continuous relevance evaluations throughout content engineering process, thus accounting and tracing quantitative content relevance requirements from customer to the content product.

The CRQM improves latent semantic indexing, especially for unknown and (or) heterogenous collections, by increasing relevance, precision, and recall of content search, including the full text search. The CRQM may be used for data exploration and data integration tasks (due to its potential accuracy to quantify the contentТs semantics), to solve heterogeneity problems, and to provide varied levels of Querying services, that facilitates knowledge discovery at different levels of granularity. аааааааааааааааааааааааааааааааааа

аThe content above, as an example, may be structured as one-term total queriesТ number n=203 and the variety queriesТ number s=23 with semantic mean p_s=1/s =0.043 (see Figure 3). According (4.1), (2.7),Е, (2.9) quantification, the mostly relevant recovery l₁=1.66533E-15 may be performed by Уcontent У one-term query, and the mostly irrelevant recovery l₄=0.836127403 may be performed by УsensitizeФ query at the 0.433385606 relevance level. The query УtermФ may be considered as four sigma (l£0.00621) relevant discovery query for the content above. Query УtermФ is 0.99659028 relevance discovery query for the content example.

аConsider a query Уrelevant deviation resultsФ as an example. Its semantic coverage (see Figure 3) is evaluated as c= c₃ + c₁₉ + c₁₆ = 0.064 + 0.010 + 0.035 = 0.109 and this queryТs recovery relevance according (4.1), (2.7),Е, (2.9) is not less than 0.999526985. Another example of the content quantification is shown in [3].

The presented content relevance quantification approach may be applied for the human grammar content after solving the known language ambiguities.

5. Conclusion

Many research and development points may be initiated by the presented quantitative system engineering approach. This approach implies any affordable quantitative accuracy in system elementsТ refinement suitable for the customer and the developer.

аааааааааааааааааааааааааааааааааааааааааааааа References

1. ╘ыхщ°ьрэ, ┴.╤. ▌ыхьхэЄ√ ЄхюЁшш яюЄхэЎшры№эющ ¤ЇЇхъЄштэюёЄш ёыюцэ√ї ёшёЄхь. У ╤ютхЄёъюх ЁрфшюФ, ╠юёътр, 1971, 224 ё.

ааа ( Fleishman, B.S. Theory elements of complex systemsТ potential effectiveness. УSoviet RadioФ, Moscow, 1971, 224 pp.).

2. Arkhipkin, Y. Enough Sigma Software Test Coverage Approach. http://aryur.narod.ru/

3. Arkhipkin, Y. Content Relevance Quantification Model. http://sequantic.narod.ru/ а

Figure 1. General interrelation of the test/semantic coverage c(t)- (c ), software/content semantics mean p_s(t)- (ps), and reliability/relevance metrics R(t)- ( R ), l(t)- (lamb).

Figure 2. Testerbot reliability estimations yielded by the ASRMM for exhaustive testing

Total software sitesТ variety number n(0)	Sigma software reliability requirements	Minimal test coverage at min fault flow(testsТ #)	Minimal test coverage at max fault flow (testsТ #)	Schedule-budget savings % (testsТ # decrease)
1,024	4s	0.049ааааааааааааааа а(51)	0.04101562ааааааа (42)	17.65аааааааааааааа (9)
	6s	0.071аааааааааааааааа (73)	0.051757813ааааа (54)	35.65аааааааааааа (19)
	Enough sigma	0.0537аааааааааааааа (55)	0.04296875ааааааа (44)	20.00аааааааааааа (11)
89,362	4s	0.00397аааааааааа (355)	0.00367046ааааа (328)	а 7.60аааааааааааа (27)
	6s	0.004644032аа (415)	0.003994987ааа (358)	13.73аааааааааааа (57)
	Enough sigma	0.00430831аааа (385)	0.00383832ааааа (344)	10.65аааааааааааа (41)
17,343,286	4s	0.000252028 (4371)	0.000246147а (4269)	а 2.33ааа ааааааа(102)
	6s	0.000264079 (4580)	0.000252144а (4375)	а 4.47аааааааааа (205)
	Enough sigma	0.00026206аа (4545)	0.00025104ааа (4354)	а 4.20аааааааааа (191)
1,000,000,000	4s	0.000032182а ааааааааааааааааа ( 32,182)ааааааааааааа	0.0000319 аааааааааааа ааааааа(31,900)	а 0.87 ааааааааааааааааааа (282)
	6s	0.000032745 аааааааааааааааааа (32,745)	0.000032189 ааааааааааааааааааа (32,189)	а 1.70 ааааааааааааааааааа (556)
	Enough sigma	0.00003277 аааааааааааааааааа (32,770)	0.00003219 ааааааааааааааааааа (32,190)	а 1.77 ааааааааааааааааааа (580)
102,500,700,000	4s	0.0000031411 аааааааааааааааа (321,965)	0.0000031322 ааааааааааааааааа (321,054)	а 0.28 ааааааааааааааааааа (911)
	6s	0.0000031584 аааааааааааааааа (323,739)	0.000003140 ааааааааааааааааа (321,856)	а 0.59 ааааааааааааааааа (1883)
	Enough sigma	0.00000316286 аааааааааааааааа (324,196)	0.00000314316 ааааааааааааааааа (322,176)	а 0.62 ааааааааааааааааа (2020)
1,000,000,000,000	4s	0.0000010032 ааааааааааааа (1,003,200)	0.0000010016 аааааааааааааа (1,001,600)	а 0.16 ааааааааааааааааа (1600)
	6s	0.0000010063 ааааааааааааа (1,006,300)	0.00000100317 аааааааааааааа (1,003,170)	а 0.31 ааааааааааааааааа (3130)
	Enough sigma	0.00000100744 ааааааааааааа (1,007,440)	0.00000100372 аааааааааааааа (1,003,720)	а 0.37 ааааааааааааааааа (3720)

Figure 3. The CRQM relevance quantification example for the Chapter 4 content а

v	Query set structure q_v	Occur # n_v	Occurrence probability p_v= n_v /n	Semantic coverage c_v	Irrelevance intensity l_v	Relevance R_v
1	content	41	0.202	0.202	1.66533E-15	~1.0
2	search	22	0.108	0.108	0.000579228	0.99942094
3	relevant	13	0.064	0.064	0.488453851	0.613574339
4	sensitize	12	0.059	0.059	0.836127403	0.433385606
5	coverage	5	0.025	0.025	0.497992614	0.607749424
6	query	32	0.158	0.158	2.37465E-09	0.999999998
7	reliable	4	0.020	0.020	0.221295993	0.801479413
8	discover	3	0.015	0.015	0.080497264	0.922657428
9	recover	2	0.010	0.010	0.021461497	0.978767162
10	irrelevance	3	0.015	0.015	0.080497264	0.922657428
11	sense	6	0.030	0.030	0.990193344	0.371504856
12	semantics	8	0.040	0.040	3.796293118	0.022453852
13	engine	8	0.040	0.040	3.796293118	0.022453852
14	quantify	7	0.035	0.035	1.865576521	0.154806935
15	processing	3	0.015	0.015	0.080497264	0.922657428
16	results	7	0.035	0.035	1.865576521	0.154806935
17	probability	4	0.020	0.020	0.221295993	0.801479413
18	requirements	4	0.020	0.020	0.221295993	0.801479413
19	deviation	2	0.010	0.010	0.021461497	0.978767162
20	implementation	4	0.020	0.020	0.221295993	0.801479413
21	specification	4	0.020	0.020	0.221295993	0.801479413
22	collection	2	0.010	0.010	0.021461497	0.978767162
23	term	1	0.005	0.005	0.003415546	0.99659028